Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 10/02/11 22:25, Michael Segel wrote: Shrinivas, Assuming you're in the US, I'd recommend the following: Go with 2TB 7200 SATA hard drives. (Not sure what type of hardware you have) What we've found is that in the data nodes, there's an optimal configuration that balances price versus

Hbase documentations

2011-02-14 Thread Matthew John
Hi guys, can someone send me a good documentation on Hbase (other than the hadoop wiki). I am also looking for a good Hbase tutorial. Regards, Matthew

Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 12/02/11 16:26, Michael Segel wrote: All, I'd like to clarify somethings... First the concept is to build out a cluster of commodity hardware. So when you do your shopping you want to get the most bang for your buck. That is the 'sweet spot' that I'm talking about. When you look at your

RE: recommendation on HDDs

2011-02-14 Thread Michael Segel
Steve is right, and to try and add more clarification... Interesting choice; the 7 core in a single CPU option is something else to consider. Remember also this is a moving target, what anyone says is valid now (Feb 2011) will be seen as quaint in two years time. Even a few months from

Re: Hbase documentations

2011-02-14 Thread Bibek Paudel
On Mon, Feb 14, 2011 at 11:55 AM, Matthew John tmatthewjohn1...@gmail.com wrote: Hi guys, can someone send me a good documentation on Hbase (other than the hadoop wiki). I am also looking for a good Hbase tutorial. Have you checked this: http://hbase.apache.org/book.html ? -b Regards,

RE: hadoop 0.20 append - some clarifications

2011-02-14 Thread Gokulakannan M
I think that in general, the behavior of any program reading data from an HDFS file before hsync or close is called is pretty much undefined. In Unix, users can parallelly read a file when another user is writing a file. And I suppose the sync feature design is based on that. So at any

Re: hadoop 0.20 append - some clarifications

2011-02-14 Thread Ted Dunning
HDFS definitely doesn't follow anything like POSIX file semantics. They may be a vague inspiration for what HDFS does, but generally the behavior of HDFS is not tightly specified. Even the unit tests have some real surprising behavior. On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M

Is this a fair summary of HDFS failover?

2011-02-14 Thread Mark Kerzner
Hi, is it accurate to say that - In 0.20 the Secondary NameNode acts as a cold spare; it can be used to recreate the HDFS if the Primary NameNode fails, but with the delay of minutes if not hours, and there is also some data loss; - in 0.21 there are streaming edits to a Backup Node

Re: Reduce Failed at Tasktrackers

2011-02-14 Thread Jairam Chandar
Did you get a response/solution/workaround to this problem? I am getting the same error. -Jairam

HBase crashes when one server goes down

2011-02-14 Thread Rodrigo Barreto
Hi, We are new with Hadoop, we have just configured a cluster with 3 servers and everything is working ok except when one server goes down, the Hadoop / HDFS continues working but the HBase stops, the queries does not return results until we restart the HBase. The HBase configuration is copied

Re: Is this a fair summary of HDFS failover?

2011-02-14 Thread M. C. Srivas
The summary is quite inaccurate. On Mon, Feb 14, 2011 at 8:48 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, is it accurate to say that - In 0.20 the Secondary NameNode acts as a cold spare; it can be used to recreate the HDFS if the Primary NameNode fails, but with the delay of

Re: HBase crashes when one server goes down

2011-02-14 Thread Jean-Daniel Cryans
Please use the hbase mailing list for HBase-related questions. Regarding your issue, we'll need more information to help you out. Haven you checked the logs? If you see exceptions in there, did you google them trying to figure out what's going on? Finally, does your setup meet all the

something doubts about map/reduce

2011-02-14 Thread Wang LiChuan [王立传]
HI My Friends: I’ has researched into hadoop and map/reduce, but before I can go on. I have one question, and I Can’t find it in FAQ. Please consider this situation: 1. I created 100 files, each file of course is bigger than the default 64MB(such as 1G),so definitely will be

Re: something doubts about map/reduce

2011-02-14 Thread Harsh J
Hello, 2011/2/14 Wang LiChuan[王立传] lcw...@arcsoft.com: So my question is this: If I return false in the function of isSplitable to tell the framework, it's a none-splittable file, when doing map/reduce, how many map task may I have?Does I have 100 map, and each one handle a file? Or do

Re: Problem with running the job, no default queue

2011-02-14 Thread Koji Noguchi
Hi Shivani, You probably don't want to ask m45 specific questions on hadoop.apache mailing list. Try % hadoop queue -showacls It should show which queues you're allowed to submit. If it doesn't give you any queues, you need to request one. Koji On 2/9/11 9:10 PM, Shivani Rao

Check/Compare mappers output

2011-02-14 Thread maha
Hi all, I want to know how can I check/compare mappers key and values. Example: My Mappers have the following to filter documents before being output: String doc1; if(!doc1.equals(d1)) output.collect(new Text('#'+doc1+'#'), new Text('#'+word1.substring(word1.indexOf(',')+1,

Re: Debugging and fixing Safemode

2011-02-14 Thread Matthew Foley
Hi Sandhya, the threshold for leaving safemode automatically is configurable; it defaults to 0.999, but you can change parameter dfs.namenode.safemode.threshold-pct to a different floating-point number in your config. It is set to almost 100% by default, on the theory that (a) if you didn't

Re: Is this a fair summary of HDFS failover?

2011-02-14 Thread Mark Kerzner
Thank you, M. C. Srivas, that was enormously useful. I understand it now, but just to be complete, I have re-formulated my points according to your comments: - In 0.20 the Secondary NameNode performs snapshotting. Its data can be used to recreate the HDFS if the Primary NameNode fails. The

Re: Is this a fair summary of HDFS failover?

2011-02-14 Thread Ted Dunning
Note that document purports to be from 2008 and, at best, was uploaded just about a year ago. That it is still pretty accurate is kind of a tribute to either the stability of hbase or the stagnation depending on how you read it. On Mon, Feb 14, 2011 at 12:31 PM, Mark Kerzner

Re: Is this a fair summary of HDFS failover?

2011-02-14 Thread M. C. Srivas
I understand you are writing a book Hadoop in Practice. If so, its important that what's recommended in the book should be verified in practice. (I mean, beyond simply posting in this newsgroup - for instance, the recommendations on NN fail-over should be tried out first before writing about how

Re: Is this a fair summary of HDFS failover?

2011-02-14 Thread Mark Kerzner
I completely agree, and I am using yours and the group's posting to define the direction and approaches, but I am also trying every solution - and I am beginning to do just that, the AvatarNode now. Thank you, Mark On Mon, Feb 14, 2011 at 4:43 PM, M. C. Srivas mcsri...@gmail.com wrote: I

RE: hadoop 0.20 append - some clarifications

2011-02-14 Thread Gokulakannan M
I agree that HDFS doesn't strongly follow POSIX semantics. But it would have been better if this issue is fixed. _ From: Ted Dunning [mailto:tdunn...@maprtech.com] Sent: Monday, February 14, 2011 10:18 PM To: gok...@huawei.com Cc: common-user@hadoop.apache.org;