Pavan,
How large are the rows in HBase? 22 million rows is not very much but you
mentioned "huge strings". Can you tell which part of the processing is the
limiting factor (read from HBase, mapper output, reducers)?
John
From: Pavan Sudheendra [mailto:pavan0...@gmail.com]
Sent: Saturday, Sept
Never mind, its in the ApplicationClientProtocol class.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Thursday, September 19, 2013 6:12 PM
To: user@hadoop.apache.org
Subject: How to get number of data nodes as a hadoop client
How does a Hadoop client query the number of datanode
sryy for late reply just checked my mail today are you using client side
mount table just as mentioned in the doc which u reffered if u r using
client side mount table configurations in u r core-site.xml u wont be able
to create directory in that case first create folder without client
side-mounta
If my YARN application tasks are all reading/writing HDFS simultaneously and
some node is unable to honor a connection request because it is overloaded,
what happens? I've seen HDFS attempt to retry connections.
For that matter, how does MR under YARN deal with connection overload during
the sh
Thanks Harsh! The data-transport format is pretty easy, but how is the RPC
typically set up? Does the AM open a listen port to accept the RPC from the
tasks, and then pass the port/URI to the tasks when they are spawned as
command-line or environment?
john
-Original Message-
From: Har
Hi Albert,
You're correct about used.
Reserved is a little bit more arcane - it refers to a mechanism that
schedulers use to prevent applications with larger container sizes from
starving. Applications place container "reservations" on nodes, and no
other containers can be placed on the node unt
No, I don't have a combiner in place. Is it necessary? How do I make my map
output compressed? Yes, the Tables in HBase are compressed.
Although, there's no real bottleneck, the time it takes to process the
entire table is huge. I have to constantly check if i can optimize it
somehow..
Oh okay..
One thing that comes to mind is that your keys are Strings which are highly
inefficient. You might get a lot better performance if you write a custom
writable for your Key object using the appropriate data types. For example,
use a long (LongWritable) for timestamps. This should make
(de)serializat
Hi Pradeep,
Yes.. Basically i'm only writing the key part as the map output.. The V of
is not of much use to me.. But i'm hoping to change that if it leads
to faster execution.. I'm kind of a newbie so looking to make the
map/reduce job run a lot faster..
Also, yes. It gets sorted by the HouseHol