RE: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread John Lilley
Pavan, How large are the rows in HBase? 22 million rows is not very much but you mentioned "huge strings". Can you tell which part of the processing is the limiting factor (read from HBase, mapper output, reducers)? John From: Pavan Sudheendra [mailto:pavan0...@gmail.com] Sent: Saturday, Sept

RE: How to get number of data nodes as a hadoop client

2013-09-21 Thread John Lilley
Never mind, its in the ApplicationClientProtocol class. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, September 19, 2013 6:12 PM To: user@hadoop.apache.org Subject: How to get number of data nodes as a hadoop client How does a Hadoop client query the number of datanode

Re: Can you help me to install HDFS Federation and test?

2013-09-21 Thread Visioner Sadak
sryy for late reply just checked my mail today are you using client side mount table just as mentioned in the doc which u reffered if u r using client side mount table configurations in u r core-site.xml u wont be able to create directory in that case first create folder without client side-mounta

connection overload strategies

2013-09-21 Thread John Lilley
If my YARN application tasks are all reading/writing HDFS simultaneously and some node is unable to honor a connection request because it is overloaded, what happens? I've seen HDFS attempt to retry connections. For that matter, how does MR under YARN deal with connection overload during the sh

RE: Task status query

2013-09-21 Thread John Lilley
Thanks Harsh! The data-transport format is pretty easy, but how is the RPC typically set up? Does the AM open a listen port to accept the RPC from the tasks, and then pass the port/URI to the tasks when they are spawned as command-line or environment? john -Original Message- From: Har

Re: Semantics of ApplicationResourceUsageReport

2013-09-21 Thread Sandy Ryza
Hi Albert, You're correct about used. Reserved is a little bit more arcane - it refers to a mechanism that schedulers use to prevent applications with larger container sizes from starving. Applications place container "reservations" on nodes, and no other containers can be placed on the node unt

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pavan Sudheendra
No, I don't have a combiner in place. Is it necessary? How do I make my map output compressed? Yes, the Tables in HBase are compressed. Although, there's no real bottleneck, the time it takes to process the entire table is huge. I have to constantly check if i can optimize it somehow.. Oh okay..

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pradeep Gollakota
One thing that comes to mind is that your keys are Strings which are highly inefficient. You might get a lot better performance if you write a custom writable for your Key object using the appropriate data types. For example, use a long (LongWritable) for timestamps. This should make (de)serializat

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pavan Sudheendra
Hi Pradeep, Yes.. Basically i'm only writing the key part as the map output.. The V of is not of much use to me.. But i'm hoping to change that if it leads to faster execution.. I'm kind of a newbie so looking to make the map/reduce job run a lot faster.. Also, yes. It gets sorted by the HouseHol