[jira] [Created] (HADOOP-11691) X86 build of libwinutils is broken

2015-03-09 Thread Remus Rusanu (JIRA)
Remus Rusanu created HADOOP-11691:
-

 Summary: X86 build of libwinutils is broken
 Key: HADOOP-11691
 URL: https://issues.apache.org/jira/browse/HADOOP-11691
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, native
Affects Versions: 3.0.0
Reporter: Remus Rusanu


Hadoop-9922 recently fixed x86 build. After YARN-2190 compiling x86 results in 
error:
{code}
(Link target) -
  
E:\HW\project\hadoop-common\hadoop-common-project\hadoop-common\target/winutils/hadoopwinutilsvc_s.obj
 : fatal error LNK1112: module machine type 'x64' conflicts with target machine 
type 'X86' 
[E:\HW\project\hadoop-common\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-6228) Configuration should allow storage of null values.

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened HADOOP-6228:
--

argh wrong jira.

 Configuration should allow storage of null values.
 --

 Key: HADOOP-6228
 URL: https://issues.apache.org/jira/browse/HADOOP-6228
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Reporter: Hemanth Yamijala

 Currently the configuration class does not allow null keys and values. Null 
 keys don't make sense, but null values may have semantic meaning for some 
 features. Not storing these values in configuration causes some arguable side 
 effects. For instance, if a value is defined in defaults, but wants to be 
 disabled in site configuration by setting it to null, there's no way to do 
 this currently. Also, no track of keys with null values is recorded. Hence, 
 tools like dump configuration (HADOOP-6184) would not display these 
 properties.
 Does this seem like a sensible use case ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-5606) Final parameters in Configuration doesnt get serialized

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-5606.
--
Resolution: Duplicate

Closing as a dupe of HADOOP-6317.

 Final parameters in Configuration doesnt get serialized
 ---

 Key: HADOOP-5606
 URL: https://issues.apache.org/jira/browse/HADOOP-5606
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Reporter: Amar Kamat
 Attachments: final.patch


 Here are the steps to reproduce the bug
 # Mark a parameter as _final_ in hadoop-site.xml
 # Load the conf in some job
 # Change the final parameter
 # Write the conf to a file
 The final parameter gets overridden upon serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-9529) It looks like hadoop.tmp.dir is being used both for local and hdfs directories

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-9529.
--
   Resolution: Duplicate
Fix Version/s: HADOOP-8970

Closing this as a dupe of HADOOP-8970

 It looks like hadoop.tmp.dir is being used both for local and hdfs directories
 --

 Key: HADOOP-9529
 URL: https://issues.apache.org/jira/browse/HADOOP-9529
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 0.20.205.0
 Environment: Ubuntu Server 12.04
Reporter: Ronald Kevin Burton
Assignee: Arpit Gupta
 Fix For: HADOOP-8970

   Original Estimate: 48h
  Remaining Estimate: 48h

 I would like to separate out the files that are written to /tmp so I added a 
 definition for hadoop.tmp.dir which value I understand as a local folder. It 
 apparently also specifies an HDFS folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-6228) Configuration should allow storage of null values.

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-6228.
--
Resolution: Duplicate

Closing as a dupe since the other jira has code attached.

 Configuration should allow storage of null values.
 --

 Key: HADOOP-6228
 URL: https://issues.apache.org/jira/browse/HADOOP-6228
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Reporter: Hemanth Yamijala

 Currently the configuration class does not allow null keys and values. Null 
 keys don't make sense, but null values may have semantic meaning for some 
 features. Not storing these values in configuration causes some arguable side 
 effects. For instance, if a value is defined in defaults, but wants to be 
 disabled in site configuration by setting it to null, there's no way to do 
 this currently. Also, no track of keys with null values is recorded. Hence, 
 tools like dump configuration (HADOOP-6184) would not display these 
 properties.
 Does this seem like a sensible use case ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-8056) Configuration doesn't pass empty string values to tasks

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-8056.
--
Resolution: Duplicate

Duping this to HADOOP-6228 since there are more people on that JIRA.

 Configuration doesn't pass empty string values to tasks
 ---

 Key: HADOOP-8056
 URL: https://issues.apache.org/jira/browse/HADOOP-8056
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 0.20.2, 1.0.0
Reporter: Luca Pireddu

 If I assign an *empty string* as a value to a property in a JobConf 'job' 
 while I'm preparing it to run, the Configuration does store that value.  I 
 can retrieve it later while in the same process and the value is maintained.
 However, if then call JobClient.runJob(job), the Configuration that is 
 received by the Map and Reduce tasks doesn't contain the property, and 
 calling JobConf.get with that property name returns null (instead of an empty 
 string).  Futher, if I inspect the job's configuration via Hadoop's web 
 interface, the property isn't present.
 It seems as if whatever serialization mechanism that is used to transmit the 
 Configuration from the job client to the tasks discards properties with  
 value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11685) StorageException complaining no lease ID during HBase distributed log splitting

2015-03-09 Thread Duo Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Xu resolved HADOOP-11685.
-
Resolution: Cannot Reproduce

This exception occurred before HADOOP-11523 was patched and we lacked storage 
logs for further debugging and cannot reproduce. I will close this for now, if 
this error appears again after HADOOP-11523 fix in PROD. I will reopen this 
JIRA.

 StorageException complaining  no lease ID during HBase distributed log 
 splitting
 --

 Key: HADOOP-11685
 URL: https://issues.apache.org/jira/browse/HADOOP-11685
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Reporter: Duo Xu
Assignee: Duo Xu

 This is similar to HADOOP-11523, but in a different place. During HBase 
 distributed log splitting, multiple threads will access the same folder 
 called recovered.edits. However, lots of places in our WASB code did not 
 acquire lease and simply passed null to Azure storage, which caused this 
 issue.
 {code}
 2015-02-26 03:21:28,871 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of 
 WALs/workernode4.hbaseproddm2001.g6.internal.cloudapp.net,60020,1422071058425-splitting/workernode4.hbaseproddm2001.g6.internal.cloudapp.net%2C60020%2C1422071058425.1424914216773
  failed, returning error
 java.io.IOException: org.apache.hadoop.fs.azure.AzureException: 
 java.io.IOException
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:633)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$000(HLogSplitter.java:121)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWriting(HLogSplitter.java:964)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(HLogSplitter.java:1019)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:359)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:223)
   at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:142)
   at 
 org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:79)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.fs.azure.AzureException: java.io.IOException
   at 
 org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1477)
   at 
 org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1862)
   at 
 org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1812)
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getRegionSplitEditsPath(HLogSplitter.java:502)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.createWAP(HLogSplitter.java:1211)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(HLogSplitter.java:1200)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.append(HLogSplitter.java:1243)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:851)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:843)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:813)
 Caused by: java.io.IOException
   at 
 com.microsoft.windowsazure.storage.core.Utility.initIOException(Utility.java:493)
   at 
 com.microsoft.windowsazure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:282)
   at 
 org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1472)
   ... 10 more
 Caused by: com.microsoft.windowsazure.storage.StorageException: There is 
 currently a lease on the blob and no lease ID was specified in the request.
   at 
 com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:163)
   at 
 com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306)
   at 
 com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229)
   at 
 

Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Steve Loughran

If 3.x is going to be Java 8  not backwards compatible, I don't expect anyone 
wanting to use this in production until some time deep into 2016.

Issue: JDK 8 vs 7

It will require Hadoop clusters to move up to Java 8. While there's dev pull 
for this, there's ops pull against this: people are still in the moving-off 
Java 6 phase due to that it's working, don't update it philosophy. Java 8 is 
compelling to us coders, but that doesn't mean ops want it.

You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the 
main thing is setting up JAVA_HOME. That's something we could make easier 
somehow (maybe some min Java version field in resource requests that will let 
apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could 
fail-fast if a Java version wasn't available.

What we can't do in hadoop coretoday  is set javac.version=1.8  use java 8 
code. Downstream code ca do that (Hive, etc); they just need to accept that 
they don't get to play on JDK7 clusters if they embrace l-expressions.

So...we need to stay on java 7 for some time due to ops pull; downstream apps 
get to choose what they want. We can/could enhance YARN to make JVM choice more 
declarative.

Issue: Incompatible changes

Without knowing what is proposed for an incompatible classpath change, I 
can't say whether this is something that could be made optional. If it isn't, 
then it is a python-3 class option, rewrite your code event, which is going 
to be particularly traumatic to things like Hive that already do complex CP 
games. I'm currently against any mandatory change here, though would love to 
see an optional one. And if optional, it ceases to become an incompatible 
change...

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. 
These don't break client apps. May or may not break bigtop  other downstream 
hadoop stacks, but developers don't need to worry about this:  no recompilation 
necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7  Java code.

It seems to me that I could go

git checkout trunk
mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, 
compatible at the JDK and API level with the existing java code  JDK7+ 
clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, 
saving the 3.x tag for something that really breaks things, forces all 
downstream apps to set up new hadoop profiles, have separate modules  
generally hate the hadoop dev team

This lets us tick off the recent trunk release and fixed shell scripts 
items, pushing out those benefits to people sooner rather than later, and puts 
off the Hello, we've just broken your code event for another 12+ months.

Comments?

-Steve





[jira] [Created] (HADOOP-11694) Über-jira: S3a stabilisation phase II

2015-03-09 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-11694:
---

 Summary: Über-jira: S3a stabilisation phase II
 Key: HADOOP-11694
 URL: https://issues.apache.org/jira/browse/HADOOP-11694
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Steve Loughran
 Fix For: 2.8.0


HADOOP-11571 covered the core s3a bugs surfacing in Hadoop-2.6  other 
enhancements to improve S3 (performance, proxy, custom endpoints)

This JIRA covers post-2.7 issues and enhancements.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11696) update compatibility documentation to reflect only API changes matter

2015-03-09 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11696:
-

 Summary: update compatibility documentation to reflect only API 
changes matter
 Key: HADOOP-11696
 URL: https://issues.apache.org/jira/browse/HADOOP-11696
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Allen Wittenauer


Given the changes file generated by processing JIRA and current discussion in 
common-dev, we should update the compatibility documents to reflect reality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Colin P. McCabe
Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran ste...@hortonworks.com wrote:

 If 3.x is going to be Java 8  not backwards compatible, I don't expect 
 anyone wanting to use this in production until some time deep into 2016.

 Issue: JDK 8 vs 7

 It will require Hadoop clusters to move up to Java 8. While there's dev pull 
 for this, there's ops pull against this: people are still in the moving-off 
 Java 6 phase due to that it's working, don't update it philosophy. Java 8 
 is compelling to us coders, but that doesn't mean ops want it.

 You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the 
 main thing is setting up JAVA_HOME. That's something we could make easier 
 somehow (maybe some min Java version field in resource requests that will let 
 apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could 
 fail-fast if a Java version wasn't available.

 What we can't do in hadoop coretoday  is set javac.version=1.8  use java 8 
 code. Downstream code ca do that (Hive, etc); they just need to accept that 
 they don't get to play on JDK7 clusters if they embrace l-expressions.

 So...we need to stay on java 7 for some time due to ops pull; downstream apps 
 get to choose what they want. We can/could enhance YARN to make JVM choice 
 more declarative.

 Issue: Incompatible changes

 Without knowing what is proposed for an incompatible classpath change, I 
 can't say whether this is something that could be made optional. If it isn't, 
 then it is a python-3 class option, rewrite your code event, which is going 
 to be particularly traumatic to things like Hive that already do complex CP 
 games. I'm currently against any mandatory change here, though would love to 
 see an optional one. And if optional, it ceases to become an incompatible 
 change...

 Issue: Getting trunk out the door

 The main diff from branch-2 and trunk is currently the bash script changes. 
 These don't break client apps. May or may not break bigtop  other downstream 
 hadoop stacks, but developers don't need to worry about this:  no 
 recompilation necessary

 Proposed: ship trunk as a 2.x release, compatible with JDK7  Java code.

 It seems to me that I could go

 git checkout trunk
 mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

 We'd then have a version of Hadoop-trunk we could ship later this year, 
 compatible at the JDK and API level with the existing java code  JDK7+ 
 clusters.

 A classpath fix that is optional/compatible can then go out on the 2.x line, 
 saving the 3.x tag for something that really breaks things, forces all 
 downstream apps to set up new hadoop profiles, have separate modules  
 generally hate the hadoop dev team

 This lets us tick off the recent trunk release and fixed shell scripts 
 items, pushing out those benefits to people sooner rather than later, and 
 puts off the Hello, we've just broken your code event for another 12+ 
 months.

 Comments?

 -Steve





Re: Hadoop - Major releases

2015-03-09 Thread Mayank Bansal
Hi Guys,

From my prospective @ ebay we are not going to upgrade to JDK 8 any time
soon we just upgraded to 7 and not want to move further at least this year
so I will request you guys not to drop the support for JDK 7 as that would
be very crucial for us to move forward.

We also just completed our Hadoop 2 migration for all clusters this year
which we started earlier last year, so I don't think we can do again major
upgrades this year. Stabilizing the major releases takes lots of effort and
time, I think Hadoop 3.x makes sense at least for us next year.

Thanks,

Mayank

On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote:

 Over the last few days, we have had lots of discussions that have
 intertwined several major themes:



 # When/why do we make major Hadoop releases?

 # When/how do we move to major JDK versions?

 # To a lesser extent, we have debated another theme: what do we do about
 trunk?



 For now, let's park JDK  trunk to treat them in a separate thread(s).



 For a while now, I've had a couple of lampposts in my head which I used
 for guidance - apologize for not sharing this broadly prior to this
 discussion, maybe putting it out here will help - certainly hope so.





 Major Releases



 Hadoop continues to benefit tremendously by the investment in stability,
 validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter,
 eBay, LinkedIn etc.



 A historical perspective...



 In it's lifetime, Apache Hadoop went from monthly to quarterly releases
 because, as Hadoop became more and more of a production system (starting
 with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the
 torrid pace of change.



 IMHO, we didn't go far enough in addressing the competing pressures of
 stability v/s rapid innovation.  We paid for it by losing one of our anchor
 users - Facebook - around the time of hadoop-0.19 - they just forked.



 Around the same time, Yahoo hit the same problem (I know, I lived through
 it painfully) and got stuck with hadoop-0.20 for a *very* long time and
 forked to add Security rather than deal with the next major release
 (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the
 community, is stuck - probably forever - on their fork of hadoop-0.20.



 Overall, these were dark days for the community: every anchor user was on
 their own fork, and it took a toll on the project.



 Recently, thankfully for Hadoop, we have had a period of relative
 stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves:
 Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only
 just now finishing their migration to hadoop-2.x.



 I think the major lessons here are the obvious ones:



 # Compatibility matters

 # Maintaining ?multiple major releases, in parallel, is a big problem - it
 leads to an unproductive, and risky, split in community investment along
 different lines.





 Looking Ahead



 Given the above, here are some thoughts for looking ahead:



 # Be very conservative about major releases - a major benefit is required
 (features) for the cost. Let's not compel our anchor users like Yahoo,
 Twitter, eBay, and LinkedIn to invest in previous releases rather than the
 latest one. Let's hear more from them - and let's be very accommodating to
 them - for they play a key role in keeping Hadoop healthy  stable.



 # Be conservative about dropping support for JDKs. In particular, let's
 hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn
 has already moved to jdk-1.8, which is great for the validation , but let's
 wait for the rest of our anchor users to move before we drop jdk-1.7. We
 did the same thing with jdk-1.6 - waited for them to move before we drop
 support for jdk-1.7.



 Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor
 users on their plans for jdk-1.8 specifically, and on their overall
 appetite for hadoop-3.  Let's not finalize our plans for moving forward
 until this input has been considered.



 Thoughts?


 thanks,
 Arun



 Unfortunate that it's necessary disclaimers:

 # Before people point out vendor affiliations to lend unnecessary color to
 my opinions, let me state that hadoop-2 v/s hadoop-3 is a non-issue for us.
 For major HDP versions the key is, just, compatibility?... e.g. we ship
 major, but compatible, community releases such as hive-0.13/hive-0.14 in
 HDP-2.x/HDP-2.x+1 etc.

 # Also, release management is a similar non-issue - we have already had
 several individuals step up in hadoop-2.x line. Expect more of the same
 from folks like Andrew, Karthik, Vinod, Steve etc.




-- 
Thanks and Regards,
Mayank
Cell: 408-718-9370


Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Arun Murthy
Steve,


From: Steve Loughran ste...@hortonworks.com
Sent: Monday, March 09, 2015 2:15 PM
To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. 
These don't break client apps. May or may not break bigtop  other downstream 
hadoop stacks, but developers don't need to worry about this:  no recompilation 
necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7  Java code.

It seems to me that I could go

git checkout trunk
mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, 
compatible at the JDK and API level with the existing java code  JDK7+ 
clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, 
saving the 3.x tag for something that really breaks things, forces all 
downstream apps to set up new hadoop profiles, have separate modules  
generally hate the hadoop dev team

This seems like a great idea, something I hadn't considered before since most 
patches were flowing into branch-2 anyway - makes a lot of sense.

We could just drop branch-2 while we are at it too. It's just a pain to 
maintain an extra branch. Also, we should formalize that major features should 
always come via feature branches - allows for some oversight on compatibility 
etc. as a whole (not piecemeal) when the feature branch is merged.

In particular, let's also make sure we ship the script changes in a compatible 
manner. Happy to help.

Given that Vinod has stepped up for 2.7, would you like to drive 2.8? 

Practically, this is reality already, but something to formalize: having RMs 
per dot release (Karthik for 2.5, Vinod for 2.7,  Steve for 2.8 etc.).

thanks,
Arun


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Vinod Kumar Vavilapalli

On Mar 6, 2015, at 5:20 PM, Chris Douglas cdoug...@apache.org wrote:

 On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 I'd encourage everyone to post their wish list on the Roadmap wiki that 
 *warrants* making incompatible changes forcing us to go 3.x.
 
 This is a useful exercise, but not a prerequisite to releasing 3.0.0
 as an alpha off of trunk, right? Andrew summarized the operating
 assumptions for anyone working on it: rolling upgrades still work,
 wire compat is preserved, breaking changes may get rolled back when
 branch-3 is in beta (so be very conservative, notify others loudly).
 This applies to branches merged to trunk, also.


Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, 
because after that we will be back to restricting incompatible changes on 3.x 
line and we have to say no to features that need API breakage after that. If 
others feel there are features that warrant incompatibility, we should hear 
about them for inclusion in such a 3.x release. Till now, the operating 
assumption was to not break anything as much as possible. If we are opening the 
window on incompatibilities in 3.x, might as well get everyone to think about 
stuff that they want.



 +1 to Jason's comments on general. We can keep rolling alphas that 
 downstream can pick up, but I'd also like us to clarify the exit criterion 
 for a GA release of 3.0 and its relation to the life of 2.x if we are going 
 this route. This brings us back to the roadmap discussion, and a collective 
 agreement about a logical step at a future point in time where we say we 
 have enough incompatible features in 3.x that we can stop putting more of 
 them and start stabilizing it.
 
 We'll have this discussion again. We don't need to reach consensus on
 the roadmap, just that each artifact reflects the output of the
 project.


Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just 
requesting others to put their wish list up.



 Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
 atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking 
 up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as 
 compatible as is possible for users to opt in and migrate easily.
 
 +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying Irrespective of that..

Thanks,
+Vinod


[jira] [Resolved] (HADOOP-8604) conf/* files overwritten at Hadoop compilation

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-8604.
--
Resolution: Won't Fix

Closing as won't fix.

 conf/* files overwritten at Hadoop compilation
 --

 Key: HADOOP-8604
 URL: https://issues.apache.org/jira/browse/HADOOP-8604
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 1.0.3
Reporter: Robert Grandl
Priority: Minor

 Whenever I compile hadoop from terminal as:
 ant compile jar run
 all the conf/* files are overwritten. I am not sure if some of them should 
 not be like that but at least hadoop-env.sh, mapred-site.ml, core-site.xml, 
 hdfs-site.xml, masters, slaves should remains. Otherwise I am forced to 
 backup and replace content again after compilation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-09 Thread sanjay Radia

 On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
+ 1 on this.

sanjay

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Steve Loughran


On 09/03/2015 15:56, Andrew Wang andrew.w...@cloudera.com wrote:

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and
release
these changes anyway? I don't see this ending well.

I'm staring at CHANGES.TXT  thinking 'how can we ship something off trunk
that has as many of these as we can get out ‹especially those shell script
bits‹ in a way that doesn't break everything. Because there's a lot of
improvements and bug fixes there which aren't going to be anyone's hands
for a long time otherwise, not just due to any proposed 3.x release
schedule, but because of the java 8 requirements as well as classloader
stuff.




One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of
the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew


I still want to see what's being proposed here. Having classpath isolation
will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
every app that imports hadoop-hdfs-client and say your code just broke,
not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
Which, given that Java 7 is still something cluster ops teams are coming
to terms with, is going to be a while






[jira] [Resolved] (HADOOP-11571) Über-jira: S3a stabilisation phase I

2015-03-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-11571.
-
Resolution: Fixed

 Über-jira: S3a stabilisation phase I
 

 Key: HADOOP-11571
 URL: https://issues.apache.org/jira/browse/HADOOP-11571
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Blocker
 Fix For: 2.7.0


 s3a shipped in 2.6; now its out various corner cases, scale and error 
 handling issues are surfacing. 
 fix them before 2.7 ships



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-7220) documentation lists options in wrong order

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-7220.
--
Resolution: Won't Fix

stale

 documentation lists options in wrong order
 --

 Key: HADOOP-7220
 URL: https://issues.apache.org/jira/browse/HADOOP-7220
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Dieter Plaetinck
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 On http://hadoop.apache.org/common/docs/r0.20.2/streaming.html various 
 example use -D flags.
 I noticed if you invoke hadoop this way, it won't work.
 
 dplaetin@n-0:/usr/local/hadoop/bin$ ./hadoop jar 
 /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -file 
 /proj/Search/wall/experiment/  -mapper './build-models.py --mapper'   
 -reducer './build-models.py --reducer'   -input sim-input -output sim-output 
 -D 
 mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
  -D mapred.text.key.comparator.options=-k1,2n 
 11/04/12 10:39:28 ERROR streaming.StreamJob: Unrecognized option: -D
 Usage: $HADOOP_HOME/bin/hadoop jar \
   $HADOOP_HOME/hadoop-streaming.jar [options]
 Options:
   -inputpath DFS input file(s) for the Map step
   -output   path DFS output directory for the Reduce step
   -mapper   cmd|JavaClassName  The streaming command to run
   -combiner JavaClassName Combiner has to be a Java class
   -reducer  cmd|JavaClassName  The streaming command to run
   -file file File/dir to be shipped in the Job jar file
   -inputformat 
 TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
   -outputformat TextOutputFormat(default)|JavaClassName  Optional.
   -partitioner JavaClassName  Optional.
   -numReduceTasks num  Optional.
   -inputreader spec  Optional.
   -cmdenv   n=vOptional. Pass env.var to streaming commands
   -mapdebug path  Optional. To run this script when a map task fails 
   -reducedebug path  Optional. To run this script when a reduce task fails 
   -verbose
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
 For more details about these options:
 Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info
 Streaming Job Failed!
 
 I could only make it work by moving the '-D flags to the front' (right after 
 the streaming.jar part).  maybe because it's a generic option, it needs to be 
 in front or something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11698) remove distcpv1 from hadoop-extras

2015-03-09 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11698:
-

 Summary: remove distcpv1 from hadoop-extras
 Key: HADOOP-11698
 URL: https://issues.apache.org/jira/browse/HADOOP-11698
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools/distcp
Affects Versions: 3.0.0
Reporter: Allen Wittenauer


distcpv1 is pretty much unsupported. we should just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop - Major releases

2015-03-09 Thread Andrew Wang
Hi Mayank,

Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which
will keep supporting JDK7 for a while yet. Someone on the original thread
also proposed keeping Hadoop 3 JDK7-source compatible to make backports to
2.x easier. I support this.

Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume
was your previous migration) is a far, far more impactful change than what
is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x
release except for the JDK8 bump and classpath isolation. The intent is to
otherwise maintain wire and API compatibility.

Overall your timeline sounds like it fits the schedule I proposed. If we
release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2
next year. Seems like a sound upgrade procedure for a large cluster.

Best,
Andrew

On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote:

 Hi Guys,

 From my prospective @ ebay we are not going to upgrade to JDK 8 any time
 soon we just upgraded to 7 and not want to move further at least this year
 so I will request you guys not to drop the support for JDK 7 as that would
 be very crucial for us to move forward.

 We also just completed our Hadoop 2 migration for all clusters this year
 which we started earlier last year, so I don't think we can do again major
 upgrades this year. Stabilizing the major releases takes lots of effort and
 time, I think Hadoop 3.x makes sense at least for us next year.

 Thanks,

 Mayank

 On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote:

  Over the last few days, we have had lots of discussions that have
  intertwined several major themes:
 
 
 
  # When/why do we make major Hadoop releases?
 
  # When/how do we move to major JDK versions?
 
  # To a lesser extent, we have debated another theme: what do we do about
  trunk?
 
 
 
  For now, let's park JDK  trunk to treat them in a separate thread(s).
 
 
 
  For a while now, I've had a couple of lampposts in my head which I used
  for guidance - apologize for not sharing this broadly prior to this
  discussion, maybe putting it out here will help - certainly hope so.
 
 
 
 
 
  Major Releases
 
 
 
  Hadoop continues to benefit tremendously by the investment in stability,
  validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter,
  eBay, LinkedIn etc.
 
 
 
  A historical perspective...
 
 
 
  In it's lifetime, Apache Hadoop went from monthly to quarterly releases
  because, as Hadoop became more and more of a production system (starting
  with hadoop-0.16 and more so with hadoop 0.18), users could not absorb
 the
  torrid pace of change.
 
 
 
  IMHO, we didn't go far enough in addressing the competing pressures of
  stability v/s rapid innovation.  We paid for it by losing one of our
 anchor
  users - Facebook - around the time of hadoop-0.19 - they just forked.
 
 
 
  Around the same time, Yahoo hit the same problem (I know, I lived through
  it painfully) and got stuck with hadoop-0.20 for a *very* long time and
  forked to add Security rather than deal with the next major release
  (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for
 the
  community, is stuck - probably forever - on their fork of hadoop-0.20.
 
 
 
  Overall, these were dark days for the community: every anchor user was on
  their own fork, and it took a toll on the project.
 
 
 
  Recently, thankfully for Hadoop, we have had a period of relative
  stability with hadoop-1.x and hadoop-2.x. Even so, there were close
 shaves:
  Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only
  just now finishing their migration to hadoop-2.x.
 
 
 
  I think the major lessons here are the obvious ones:
 
 
 
  # Compatibility matters
 
  # Maintaining ?multiple major releases, in parallel, is a big problem -
 it
  leads to an unproductive, and risky, split in community investment along
  different lines.
 
 
 
 
 
  Looking Ahead
 
 
 
  Given the above, here are some thoughts for looking ahead:
 
 
 
  # Be very conservative about major releases - a major benefit is required
  (features) for the cost. Let's not compel our anchor users like Yahoo,
  Twitter, eBay, and LinkedIn to invest in previous releases rather than
 the
  latest one. Let's hear more from them - and let's be very accommodating
 to
  them - for they play a key role in keeping Hadoop healthy  stable.
 
 
 
  # Be conservative about dropping support for JDKs. In particular, let's
  hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn
  has already moved to jdk-1.8, which is great for the validation , but
 let's
  wait for the rest of our anchor users to move before we drop jdk-1.7. We
  did the same thing with jdk-1.6 - waited for them to move before we drop
  support for jdk-1.7.
 
 
 
  Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor
  users on their plans for jdk-1.8 specifically, and on their overall
  

Re: Hadoop - Major releases

2015-03-09 Thread Mayank Bansal
Hi Andrew,

I wish things are as simple as you are pointing out. At least they are not
for us so far.

Couple of things

1. We would be moving to Hadoop -3 (Not this year though) however I don't
see we can do another JDK upgrade so soon. So the point I am trying to make
is we should be supporting jdk 7 as well for Hadoop-3.

2. For the sake of JDK 8 and classpath isolation we shouldn't be making
another release as those can be supported in Hadoop 2 as well, so what is
the motivation of making Hadoop 3 so soon?

Thanks,

Mayank

On Mon, Mar 9, 2015 at 3:34 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi Mayank,

 Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which
 will keep supporting JDK7 for a while yet. Someone on the original thread
 also proposed keeping Hadoop 3 JDK7-source compatible to make backports to
 2.x easier. I support this.

 Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume
 was your previous migration) is a far, far more impactful change than what
 is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x
 release except for the JDK8 bump and classpath isolation. The intent is to
 otherwise maintain wire and API compatibility.

 Overall your timeline sounds like it fits the schedule I proposed. If we
 release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2
 next year. Seems like a sound upgrade procedure for a large cluster.

 Best,
 Andrew

 On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote:

  Hi Guys,
 
  From my prospective @ ebay we are not going to upgrade to JDK 8 any time
  soon we just upgraded to 7 and not want to move further at least this
 year
  so I will request you guys not to drop the support for JDK 7 as that
 would
  be very crucial for us to move forward.
 
  We also just completed our Hadoop 2 migration for all clusters this year
  which we started earlier last year, so I don't think we can do again
 major
  upgrades this year. Stabilizing the major releases takes lots of effort
 and
  time, I think Hadoop 3.x makes sense at least for us next year.
 
  Thanks,
 
  Mayank
 
  On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com
 wrote:
 
   Over the last few days, we have had lots of discussions that have
   intertwined several major themes:
  
  
  
   # When/why do we make major Hadoop releases?
  
   # When/how do we move to major JDK versions?
  
   # To a lesser extent, we have debated another theme: what do we do
 about
   trunk?
  
  
  
   For now, let's park JDK  trunk to treat them in a separate thread(s).
  
  
  
   For a while now, I've had a couple of lampposts in my head which I used
   for guidance - apologize for not sharing this broadly prior to this
   discussion, maybe putting it out here will help - certainly hope so.
  
  
  
  
  
   Major Releases
  
  
  
   Hadoop continues to benefit tremendously by the investment in
 stability,
   validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter,
   eBay, LinkedIn etc.
  
  
  
   A historical perspective...
  
  
  
   In it's lifetime, Apache Hadoop went from monthly to quarterly releases
   because, as Hadoop became more and more of a production system
 (starting
   with hadoop-0.16 and more so with hadoop 0.18), users could not absorb
  the
   torrid pace of change.
  
  
  
   IMHO, we didn't go far enough in addressing the competing pressures of
   stability v/s rapid innovation.  We paid for it by losing one of our
  anchor
   users - Facebook - around the time of hadoop-0.19 - they just forked.
  
  
  
   Around the same time, Yahoo hit the same problem (I know, I lived
 through
   it painfully) and got stuck with hadoop-0.20 for a *very* long time and
   forked to add Security rather than deal with the next major release
   (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for
  the
   community, is stuck - probably forever - on their fork of hadoop-0.20.
  
  
  
   Overall, these were dark days for the community: every anchor user was
 on
   their own fork, and it took a toll on the project.
  
  
  
   Recently, thankfully for Hadoop, we have had a period of relative
   stability with hadoop-1.x and hadoop-2.x. Even so, there were close
  shaves:
   Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are
 only
   just now finishing their migration to hadoop-2.x.
  
  
  
   I think the major lessons here are the obvious ones:
  
  
  
   # Compatibility matters
  
   # Maintaining ?multiple major releases, in parallel, is a big problem -
  it
   leads to an unproductive, and risky, split in community investment
 along
   different lines.
  
  
  
  
  
   Looking Ahead
  
  
  
   Given the above, here are some thoughts for looking ahead:
  
  
  
   # Be very conservative about major releases - a major benefit is
 required
   (features) for the cost. Let's not compel our anchor users like Yahoo,
   Twitter, eBay, and LinkedIn to 

[jira] [Created] (HADOOP-11699) _HOST not consistently resolving to lowercase fully qualified hostname

2015-03-09 Thread Kevin Minder (JIRA)
Kevin Minder created HADOOP-11699:
-

 Summary: _HOST not consistently resolving to lowercase fully 
qualified hostname
 Key: HADOOP-11699
 URL: https://issues.apache.org/jira/browse/HADOOP-11699
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Kevin Minder


The _HOST marker used for Kerberos principals in various configuration files 
does not always return lowercase fully qualified hostnames.  For example this 
setting in hdfs-site.xml
{code}
property
  namedfs.namenode.kerberos.principal/name
  valuehdfs/_h...@yourrealm.com/value
/property
{code}

In particular, this is impeding our work to have Hadoop work with equivalent 
security on Windows as on Linux.

In the windows env in which I'm having the issue, I was able to get a fully 
qualified host name using this version of method getLocalHostName() in .  
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java

{code:java}
  public static String getLocalHostName() throws UnknownHostException {
String hostname = InetAddress.getLocalHost().getCanonicalHostName();
if ( !hostname.contains( . ) ) {
  final String os = System.getProperties().getProperty( os.name, ? 
).toLowerCase();
  if ( os.startsWith( windows ) ) {
String domain = System.getenv( USERDNSDOMAIN );
if ( domain != null ) {
  hostname += . + domain.trim();
}
  }
}
return hostname == null ? localhost : hostname.toLowerCase();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop - Major releases

2015-03-09 Thread Jagane Sundar
Thichphuoctien
On Mar 9, 2015 3:35 PM, Andrew Wang andrew.w...@cloudera.com wrote:

 Hi Mayank,

 Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which
 will keep supporting JDK7 for a while yet. Someone on the original thread
 also proposed keeping Hadoop 3 JDK7-source compatible to make backports to
 2.x easier. I support this.

 Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume
 was your previous migration) is a far, far more impactful change than what
 is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x
 release except for the JDK8 bump and classpath isolation. The intent is to
 otherwise maintain wire and API compatibility.

 Overall your timeline sounds like it fits the schedule I proposed. If we
 release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2
 next year. Seems like a sound upgrade procedure for a large cluster.

 Best,
 Andrew

 On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote:

  Hi Guys,
 
  From my prospective @ ebay we are not going to upgrade to JDK 8 any time
  soon we just upgraded to 7 and not want to move further at least this
 year
  so I will request you guys not to drop the support for JDK 7 as that
 would
  be very crucial for us to move forward.
 
  We also just completed our Hadoop 2 migration for all clusters this year
  which we started earlier last year, so I don't think we can do again
 major
  upgrades this year. Stabilizing the major releases takes lots of effort
 and
  time, I think Hadoop 3.x makes sense at least for us next year.
 
  Thanks,
 
  Mayank
 
  On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com
 wrote:
 
   Over the last few days, we have had lots of discussions that have
   intertwined several major themes:
  
  
  
   # When/why do we make major Hadoop releases?
  
   # When/how do we move to major JDK versions?
  
   # To a lesser extent, we have debated another theme: what do we do
 about
   trunk?
  
  
  
   For now, let's park JDK  trunk to treat them in a separate thread(s).
  
  
  
   For a while now, I've had a couple of lampposts in my head which I used
   for guidance - apologize for not sharing this broadly prior to this
   discussion, maybe putting it out here will help - certainly hope so.
  
  
  
  
  
   Major Releases
  
  
  
   Hadoop continues to benefit tremendously by the investment in
 stability,
   validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter,
   eBay, LinkedIn etc.
  
  
  
   A historical perspective...
  
  
  
   In it's lifetime, Apache Hadoop went from monthly to quarterly releases
   because, as Hadoop became more and more of a production system
 (starting
   with hadoop-0.16 and more so with hadoop 0.18), users could not absorb
  the
   torrid pace of change.
  
  
  
   IMHO, we didn't go far enough in addressing the competing pressures of
   stability v/s rapid innovation.  We paid for it by losing one of our
  anchor
   users - Facebook - around the time of hadoop-0.19 - they just forked.
  
  
  
   Around the same time, Yahoo hit the same problem (I know, I lived
 through
   it painfully) and got stuck with hadoop-0.20 for a *very* long time and
   forked to add Security rather than deal with the next major release
   (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for
  the
   community, is stuck - probably forever - on their fork of hadoop-0.20.
  
  
  
   Overall, these were dark days for the community: every anchor user was
 on
   their own fork, and it took a toll on the project.
  
  
  
   Recently, thankfully for Hadoop, we have had a period of relative
   stability with hadoop-1.x and hadoop-2.x. Even so, there were close
  shaves:
   Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are
 only
   just now finishing their migration to hadoop-2.x.
  
  
  
   I think the major lessons here are the obvious ones:
  
  
  
   # Compatibility matters
  
   # Maintaining ?multiple major releases, in parallel, is a big problem -
  it
   leads to an unproductive, and risky, split in community investment
 along
   different lines.
  
  
  
  
  
   Looking Ahead
  
  
  
   Given the above, here are some thoughts for looking ahead:
  
  
  
   # Be very conservative about major releases - a major benefit is
 required
   (features) for the cost. Let's not compel our anchor users like Yahoo,
   Twitter, eBay, and LinkedIn to invest in previous releases rather than
  the
   latest one. Let's hear more from them - and let's be very accommodating
  to
   them - for they play a key role in keeping Hadoop healthy  stable.
  
  
  
   # Be conservative about dropping support for JDKs. In particular, let's
   hear from our anchor users on their plans for adoption jdk-1.8.
 LinkedIn
   has already moved to jdk-1.8, which is great for the validation , but
  let's
   wait for the rest of our anchor users to move before we drop jdk-1.7.
 We
   did the 

[jira] [Resolved] (HADOOP-9086) Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-9086.
--
Resolution: Won't Fix

I'm going to set this as won't fix.  Introducing more dependencies at this 
level sounds like a bad thing, esp given that every ops person has their own 
preferences as to what to user here.

 Enforce process singleton rules through an exclusive write lock on a file, 
 not a pid file +kill -0,
 ---

 Key: HADOOP-9086
 URL: https://issues.apache.org/jira/browse/HADOOP-9086
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts, util
Affects Versions: 1.1.1, 2.0.3-alpha
 Environment: Unix/Linux. 
Reporter: Steve Loughran

 the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the 
 existence of a daemon service by a {{kill -0}} of a process id picked up from 
 a pid file. 
 This is flawed
 # pid file locations may change with installations.
 # Linux and Unix recycle pids, leading to false positives -the scripts think 
 the process is running, when another process is.
 # doesn't work on windows.
 Having the processes acquire an exclusive write-lock on a known file would 
 delegate lock management and implicitly liveness to the OS itself. when the 
 process dies, the lock is relased (on Unixes)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11700) JDiff package is not found while building the project of hadoop-common from hadoop source code.

2015-03-09 Thread Radhanpura Aashish (JIRA)
Radhanpura Aashish created HADOOP-11700:
---

 Summary: JDiff package is not found while building the project of 
hadoop-common from hadoop source code.
 Key: HADOOP-11700
 URL: https://issues.apache.org/jira/browse/HADOOP-11700
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Radhanpura Aashish


hadoop-trunk/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsJDiffDoclet.java:[24,13]
 package jdiff does not exist
[ERROR] 
/home/thelionheart/Downloads/hadoop-trunk/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsJDiffDoclet.java:[42,12]
 cannot find symbol
  symbol:   variable JDiff
  location: class 
org.apache.hadoop.classification.tools.ExcludePrivateAnnotationsJDiffDoclet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11680) Deduplicate jars in convenience binary distribution

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-11680.
---
Resolution: Duplicate

I'm going to close this as a dupe of HADOOP-10115, especially since that was 
just committed.

 Deduplicate jars in convenience binary distribution
 ---

 Key: HADOOP-11680
 URL: https://issues.apache.org/jira/browse/HADOOP-11680
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Sean Busbey
Assignee: Sean Busbey

 Pulled from discussion on HADOOP-11656 Colin wrote:
 {quote}
 bq. Andrew wrote: One additional note related to this, we can spend a lot of 
 time right now distributing 100s of MBs of jar dependencies when launching a 
 YARN job. Maybe this is ameliorated by the new shared distributed cache, but 
 I've heard this come up quite a bit as a complaint. If we could meaningfully 
 slim down our client, it could lead to a nice win.
 I'm frustrated that nobody responded to my earlier suggestion that we 
 de-duplicate jars. This would drastically reduce the size of our install, and 
 without rearchitecting anything.
 In fact I was so frustrated that I decided to write a program to do it myself 
 and measure the delta. Here it is:
 Before:
 {code}
 du -h /h
 249M/h
 {code}
 After:
 {code}
 du -h /h
 140M/h
 {code}
 Seems like deduplicating jars would be a much better project than splitting 
 into a client jar, if we really cared about this.
 snip
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-7332) Deadlock in IPC

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-7332.
--
Resolution: Won't Fix

stale

 Deadlock in IPC
 ---

 Key: HADOOP-7332
 URL: https://issues.apache.org/jira/browse/HADOOP-7332
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 Saw this during a run of TestIPC on 0.22 branch:
 [junit] Java stack information for the threads listed above:
 [junit] ===
 [junit] IPC Client (47) connection to /0:0:0:0:0:0:0:0:48853 from an 
 unknown user:
 [junit] at 
 org.apache.hadoop.ipc.Client$ParallelResults.callComplete(Client.java:879)
 [junit] - waiting to lock 0xf599ef88 (a 
 org.apache.hadoop.ipc.Client$ParallelResults)
 [junit] at 
 org.apache.hadoop.ipc.Client$ParallelCall.callComplete(Client.java:862)
 [junit] at 
 org.apache.hadoop.ipc.Client$Call.setException(Client.java:185)
 [junit] - locked 0xf59e2818 (a 
 org.apache.hadoop.ipc.Client$ParallelCall)
 [junit] at 
 org.apache.hadoop.ipc.Client$Connection.cleanupCalls(Client.java:843)
 [junit] at 
 org.apache.hadoop.ipc.Client$Connection.close(Client.java:832)
 [junit] - locked 0xf59d8a90 (a 
 org.apache.hadoop.ipc.Client$Connection)
 [junit] at 
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:708)
 [junit] Thread-242:
 [junit] at 
 org.apache.hadoop.ipc.Client$Connection.markClosed(Client.java:788)
 [junit] - waiting to lock 0xf59d8a90 (a 
 org.apache.hadoop.ipc.Client$Connection)
 [junit] at 
 org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:742)
 [junit] at org.apache.hadoop.ipc.Client.call(Client.java:1109)
 [junit] - locked 0xf599ef88 (a 
 org.apache.hadoop.ipc.Client$ParallelResults)
 [junit] at 
 org.apache.hadoop.ipc.TestIPC$ParallelCaller.run(TestIPC.java:135)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop - Major releases

2015-03-09 Thread Andrew Wang
Hi Mayank,


 1. We would be moving to Hadoop -3 (Not this year though) however I don't
 see we can do another JDK upgrade so soon. So the point I am trying to make
 is we should be supporting jdk 7 as well for Hadoop-3.

 We'll still be releasing 2.x releases for a while, with similar
featuresets as 3.x. You can keep using 2.x until you feel ready to jump to
JDK8.


 2. For the sake of JDK 8 and classpath isolation we shouldn't be making
 another release as those can be supported in Hadoop 2 as well, so what is
 the motivation of making Hadoop 3 so soon?

 So already you can run 2.x with JDK8 and some degree of classpath
isolation, but I've discussed the motivation for a 3.0 on the previous
thread. We had issues in the JDK6 days with our dependencies not supporting
JDK6 and thus not releasing security or bug fixes, which in turn put us in
a bad spot. Classpath isolation we are still discussing, but right now it's
opt-in and somewhat incomplete, which makes it hard for downstream projects
to effectively make use of it. The goal for 3.0 is to clean this up and
have it on by default (or always).

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi Raymie,

 Konst proposed just releasing off of trunk rather than cutting a branch-2,
 and there was general agreement there. So, consider #3 abandoned. 12 can
 be achieved at the same time, we just need to avoid using JDK8 language
 features in trunk so things can be backported.

 Best,
 Andrew

 On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote:

 In this (and the related threads), I see the following three requirements:

 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

 2. We'll still be releasing 2.x releases for a while, with similar
 feature sets as 3.x.

 3. Avoid the risk of split-brain behavior by minimize backporting
 headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
 Adding a branch-3, branch-3.x would be obnoxious.

 These three cannot be achieved at the same time.  Which do we abandon?


 On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com
 wrote:
 
  On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
  + 1 on this.
 
  sanjay



[jira] [Resolved] (HADOOP-7648) Update date from 2009 to 2011

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-7648.
--
Resolution: Incomplete

closing as stale

 Update date from 2009 to 2011
 -

 Key: HADOOP-7648
 URL: https://issues.apache.org/jira/browse/HADOOP-7648
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, documentation
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis

 Build files contains year parameter that shows up in the UI.
 Some of the documentation has 2009 copyrights on them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Allen Wittenauer


Between this and the other thread, I’m seeing:

* companies that were forced to make internal forks because their 
patches were ignored are now considered the deciders for whether we move forward
* 5 years since the last branch off of trunk is considered ‘soon’
* More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the 
JDK7 release
* We are now OPENLY hostile to operations teams
* No one seems to really care that we’re about to create an absolute 
nightmare for anyone that uses maven repos, as they’ll need to keep track of 
which jars have been compiled with which JVM with zero hints from our build 
artifacts



On Mar 9, 2015, at 4:18 PM, Steve Loughran ste...@hortonworks.com wrote:

 
 
 On 09/03/2015 15:56, Andrew Wang andrew.w...@cloudera.com wrote:
 
 I find this proposal very surprising. We've intentionally deferred
 incompatible changes to trunk, because they are incompatible and do not
 belong in a minor release. Now we are supposed to blur our eyes and
 release
 these changes anyway? I don't see this ending well.
 
 I'm staring at CHANGES.TXT  thinking 'how can we ship something off trunk
 that has as many of these as we can get out ‹especially those shell script
 bits‹ in a way that doesn't break everything. Because there's a lot of
 improvements and bug fixes there which aren't going to be anyone's hands
 for a long time otherwise, not just due to any proposed 3.x release
 schedule, but because of the java 8 requirements as well as classloader
 stuff.
 
 
 
 
 One higher-level goal we should be working towards is tightening our
 compatibility guarantees, not loosening them. This is why I've been
 highlighting classpath isolation as a 3.0 feature, since this is one of
 the
 biggest issues faced by our users and downstreams. I think a 3.0 with an
 improved compatibility story will make operators and downstreams much
 happier than releasing trunk as 2.8.
 
 Best,
 Andrew
 
 
 I still want to see what's being proposed here. Having classpath isolation
 will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to
 every app that imports hadoop-hdfs-client and say your code just broke,
 not if they want their apps to continue to run on Hadoop 2 and/or Java 7.
 Which, given that Java 7 is still something cluster ops teams are coming
 to terms with, is going to be a while
 
 
 
 



Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
In this (and the related threads), I see the following three requirements:

1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

2. We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x.

3. Avoid the risk of split-brain behavior by minimize backporting
headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
Adding a branch-3, branch-3.x would be obnoxious.

These three cannot be achieved at the same time.  Which do we abandon?


On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote:

 On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:

 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 + 1 on this.

 sanjay


[jira] [Resolved] (HADOOP-7027) Temporary Fix to handle problem in org/apache/hadoop/security/UserGroupInformation.java when using Apache Harmony

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-7027.
--
Resolution: Won't Fix

 Temporary Fix to handle problem in 
 org/apache/hadoop/security/UserGroupInformation.java when using Apache Harmony
 -

 Key: HADOOP-7027
 URL: https://issues.apache.org/jira/browse/HADOOP-7027
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 0.21.0
 Environment: SLE v. 11, Apache Harmony 6
Reporter: Guillermo Cabrera
Priority: Trivial
 Attachments: HADOOP-7027.patch


 Building and running Hadoop common is not possible with the error outloine in 
 HADOOP-6941. To address the problem for someone using Apache Harmony, we have 
 created a temporary fix specific to Harmony (will fail on other JREs) that 
 will overcome this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Andrew Wang
Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 12 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote:

 In this (and the related threads), I see the following three requirements:

 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

 2. We'll still be releasing 2.x releases for a while, with similar
 feature sets as 3.x.

 3. Avoid the risk of split-brain behavior by minimize backporting
 headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
 Adding a branch-3, branch-3.x would be obnoxious.

 These three cannot be achieved at the same time.  Which do we abandon?


 On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com
 wrote:
 
  On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
  + 1 on this.
 
  sanjay



[jira] [Resolved] (HADOOP-7026) Adding new target to build.xml to run tests without compiling

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-7026.
--
Resolution: Fixed

fixed in more recent trees

 Adding new target to build.xml to run tests without compiling
 -

 Key: HADOOP-7026
 URL: https://issues.apache.org/jira/browse/HADOOP-7026
 Project: Hadoop Common
  Issue Type: New Feature
  Components: build, test
Affects Versions: 0.21.0
 Environment: SLE v. 11, Apache Harmony 6
Reporter: Guillermo Cabrera
Priority: Trivial
 Attachments: HADOOP-7026.patch


 While testing Apache Harmony Select (lightweight version of Harmony) with 
 Hadoop Common we had to first build with Harmony and then test using Harmony 
 Select using the test-core target. This was done in an effort to investigate 
 any issues with Harmony Select in running common. However, the test-core 
 target also compiles the classes which we are unable to do with Harmony 
 Select. A new target is proposed that only runs the tests without compiling 
 them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Hadoop - Major releases

2015-03-09 Thread Arun Murthy
Over the last few days, we have had lots of discussions that have intertwined 
several major themes:



# When/why do we make major Hadoop releases?

# When/how do we move to major JDK versions?

# To a lesser extent, we have debated another theme: what do we do about trunk?



For now, let's park JDK  trunk to treat them in a separate thread(s).



For a while now, I've had a couple of lampposts in my head which I used for 
guidance - apologize for not sharing this broadly prior to this discussion, 
maybe putting it out here will help - certainly hope so.





Major Releases



Hadoop continues to benefit tremendously by the investment in stability, 
validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, 
LinkedIn etc.



A historical perspective...



In it's lifetime, Apache Hadoop went from monthly to quarterly releases 
because, as Hadoop became more and more of a production system (starting with 
hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid 
pace of change.



IMHO, we didn't go far enough in addressing the competing pressures of 
stability v/s rapid innovation.  We paid for it by losing one of our anchor 
users - Facebook - around the time of hadoop-0.19 - they just forked.



Around the same time, Yahoo hit the same problem (I know, I lived through it 
painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to 
add Security rather than deal with the next major release (hadoop-0.21). Later 
on, Facebook did the same, and, unfortunately for the community, is stuck - 
probably forever - on their fork of hadoop-0.20.



Overall, these were dark days for the community: every anchor user was on their 
own fork, and it took a toll on the project.



Recently, thankfully for Hadoop, we have had a period of relative stability 
with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on 
hadoop-0.23 for a *very* long time - in fact, they are only just now finishing 
their migration to hadoop-2.x.



I think the major lessons here are the obvious ones:



# Compatibility matters

# Maintaining ?multiple major releases, in parallel, is a big problem - it 
leads to an unproductive, and risky, split in community investment along 
different lines.





Looking Ahead



Given the above, here are some thoughts for looking ahead:



# Be very conservative about major releases - a major benefit is required 
(features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, 
eBay, and LinkedIn to invest in previous releases rather than the latest one. 
Let's hear more from them - and let's be very accommodating to them - for they 
play a key role in keeping Hadoop healthy  stable.



# Be conservative about dropping support for JDKs. In particular, let's hear 
from our anchor users on their plans for adoption jdk-1.8. LinkedIn has already 
moved to jdk-1.8, which is great for the validation , but let's wait for the 
rest of our anchor users to move before we drop jdk-1.7. We did the same thing 
with jdk-1.6 - waited for them to move before we drop support for jdk-1.7.



Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor users 
on their plans for jdk-1.8 specifically, and on their overall appetite for 
hadoop-3.  Let's not finalize our plans for moving forward until this input has 
been considered.



Thoughts?


thanks,
Arun



Unfortunate that it's necessary disclaimers:

# Before people point out vendor affiliations to lend unnecessary color to my 
opinions, let me state that hadoop-2 v/s hadoop-3 is a non-issue for us. For 
major HDP versions the key is, just, compatibility?... e.g. we ship major, but 
compatible, community releases such as hive-0.13/hive-0.14 in HDP-2.x/HDP-2.x+1 
etc.

# Also, release management is a similar non-issue - we have already had several 
individuals step up in hadoop-2.x line. Expect more of the same from folks like 
Andrew, Karthik, Vinod, Steve etc.


[jira] [Resolved] (HADOOP-11646) Erasure Coder API for encoding and decoding of block group

2015-03-09 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B resolved HADOOP-11646.

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to HDFS-7285 branch.

 Erasure Coder API for encoding and decoding of block group
 --

 Key: HADOOP-11646
 URL: https://issues.apache.org/jira/browse/HADOOP-11646
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HADOOP-11646-v4.patch, HADOOP-11646-v5.patch, 
 HDFS-7662-v1.patch, HDFS-7662-v2.patch, HDFS-7662-v3.patch


 This is to define ErasureCoder API for encoding and decoding of BlockGroup. 
 Given a BlockGroup, ErasureCoder extracts data chunks from the blocks and 
 leverages RawErasureCoder defined in HADOOP-11514 to perform concrete 
 encoding or decoding. Note this mainly focuses on the basic fundamental 
 aspects, and solves encoding, data blocks recovering and etc. Regarding 
 parity blocks recovering, as it involves multiple steps, HADOOP-11550 will 
 handle it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)