Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Sanjay Radia


Hadoop 1.0's goal was compatibility on several fronts.
(See https://issues.apache.org/jira/browse/HADOOP-5071) for details.

Due to the amount of work involved, it has been necessary to split  
this work across several  releases prior to 1.0.


Turns out that release 0.21 has a number of Jiras targeted towards API  
and config stability.
Further, in 0.21,  we are tagging interfaces with a classification of  
their intended audience(scope) and their stability

(see HADOOP-5073 for the classification).
Post 1.0 stable interfaces will remain stable (both syntax and  
semantics) according the proposed 1.0 rules.
Hadoop's  pre-1.0 rules allow interfaces to be changed regardless of  
stability as long as one allows 2 releases of deprecation.
(See http://wiki.apache.org/hadoop/Roadmap for the current i.e.  
pre-1.0 rules).


So how do we arrange to maintain that stable interfaces remain stable  
(both syntax and semantics) between 0.21 and 1.0?
I propose that we honor the compatibility of stable interfaces  from  
release 0.21 onwards;

i.e. apply the same post 1.0 rules to pre-1.0 releases.

The actual discussion on what needs to be stable or not belongs inside  
Jira Hadoop-5073, not in this email thread;
 I would like to use this email thread to discuss the proposal of  
honoring  compatibility of stable interfaces prior to 1.0.


Feedback?

sanjay




Re: Who are the major contributors to Hive and/or Hbase?

2009-08-28 Thread Gaurav Sharma
Hope this helps:
  http://hadoop.apache.org/hive/credits.html
  http://hadoop.apache.org/hbase/credits.html


On Fri, Aug 28, 2009 at 1:26 PM, Gopal Gandhi gopal.gandhi2...@yahoo.comwrote:

 May be I should change the title?

 --- On Fri, 8/28/09, Gopal Gandhi gopal.gandhi2...@yahoo.com wrote:


 From: Gopal Gandhi gopal.gandhi2...@yahoo.com
 Subject: Who are the gurus in Hive and/or Hbase?
 To: common-u...@hadoop.apache.org, common-dev@hadoop.apache.org
 Cc: common-dev@hadoop.apache.org
 Date: Friday, August 28, 2009, 6:25 PM







 We are inviting gurus or major contributors of Hive and/or Hbase (or
 anything related to Hadoop) to give us presentations about the products.
 Would you name a few names? The gurus must be in bay area.
 Thanks.






Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Doug Cutting

Sanjay Radia wrote:

No. The 1.0 proposal was that it included both API and wire compatibility.


The proposal includes a lot of things, but it's so far just a proposal. 
 There's been no vote to formally define what 1.0 will mean.  In every 
discussion I've heard, from the very beginning of the project, it 
primarily meant API stability.  You've added wire compatibility, data 
stability, security, restart recovery, etc.  These are all very nice 
features to have, essential perhaps in some contexts, but they may nor 
may not be required for 1.0.  I worry that if we keep piling more things 
on, we'll never get to 1.0.


What would be wrong with calling it 1.0 when we have end-user API 
stability?  Why would that be a bad thing?


Doug


[jira] Created: (HADOOP-6222) Core doesn't have TestCommonCLI facility

2009-08-28 Thread Boris Shkolnik (JIRA)
Core doesn't have TestCommonCLI facility


 Key: HADOOP-6222
 URL: https://issues.apache.org/jira/browse/HADOOP-6222
 Project: Hadoop Common
  Issue Type: Test
Reporter: Boris Shkolnik


TestCLI is a base class, which cannot run FS type of commands.
We need a copy of TestHDFSCLI as TestCommonCLI to be able to test CLI stuff 
in common.

I suggest we create TestCommonCLI.java in hadoop-common

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Should we release Common 0.20.1-rc0?

2009-08-28 Thread Todd Lipcon
Hey Owen,

Looks like it might need a chmod:

 You don't have permission to access
/~omalley/hadoop-0.20.1-rc0/hadoop-0.20.1.tar.gz on this server.

-Todd

On Fri, Aug 28, 2009 at 5:06 PM, Owen O'Malley omal...@apache.org wrote:

 I've rolled a release candidate for 0.20.1. Please try the release and vote
 on whether we should release it.

 http://people.apache.org/~omalley/hadoop-0.20.1-rc0http://people.apache.org/%7Eomalley/hadoop-0.20.1-rc0

 -- Owen



Re: Weird 'java.io.IOException: Task process exit with nonzero status of 134' problem

2009-08-28 Thread indoos

Hi,
Todd, right on target!!

Tony, heap usage would be at least 30% extra in 64 bit as compared to 32
bit.
Increasing the SWAP size might help in bypassing the out of memory error,
but does impact processing speed.

Todd is referring using -XX:+UseCompressedOops option. You might have to
find if your JVM version suports it, else you might have to upgrade.
Here is some more information about it at
http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/ 
Another useful option may be looking at the garbage collection tuning along
with compressed opt option.

-Sanjay


Todd Lipcon-4 wrote:
 
 Hi Tony,
 Signal 134 usually means the JVM crashed hard. So, you're looking at
 either
 a JVM bug or simply an OutOfMemory situation. There are two possibilities
 that might explain why you see the issue on the 64-bit JVM and not the
 32-bit:
 
 1) There could be a bug present in the 64-bit but not the 32-bit. Are you
 running the same exact Java release or is your 32-bit possibly newer?
 
 2) The 64-bit JVM will take more heap size for the same program than the
 32-bit. This is due to the extra overhead of object references in a 64-bit
 heap. There's a Compressed Object Pointers option recently introduced
 that
 can reduce this overhead, but it's not enabled by default as of yet.
 
 You should be able to look at the stderr output of the tasks that fail to
 deduce what's going on.
 
 -Todd
 
 On Sun, Aug 23, 2009 at 2:20 PM, tony_l hlu...@aol.com wrote:
 

 we are running some hadoop map only job where each mapper takes 1-2 G of
 memory at configuration time. When running on EC2 large instance, we
 could
 run two mappers per node in parallel. The problem is that each mapper can
 only work well the first time it is configured/initialized. If a job
 attempts to run a mapper for a second time due to multiple input files or
 one large input file is split into multiple chucks, we run into this 134
 error from time to time. This problem only persists on 64bit JVM. When we
 move to 32bit platforms, this issue is gone. In both cases, we use most
 recent SUN JAVA 1.6 on centos.

 Anybody knows what's wrong?
 --
 View this message in context:
 http://www.nabble.com/Weird-%27java.io.IOException%3A-Task-process-exit-with-nonzero-status-of-134%27-problem-tp25107532p25107532.html
 Sent from the Hadoop core-dev mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Weird-%27java.io.IOException%3A-Task-process-exit-with-nonzero-status-of-134%27-problem-tp25107532p25199621.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.