Rita,
Are you doing a push from the source cluster or a pull from the target
cluster?
Doing a pull with distcp using hftp (to accomodate for version differences)
has the advantage of slightly fewer transfers of blocks over the TORs. Each
block is read from exactly the datanode where it is
Of course it all depends...
But something like this could work:
Leave 1-2 GB for the kernel, pagecache, tools, overhead etc.
Plan 3-4 GB for Datanode and Tasktracker each
Plan 2.5-3 GB per slot. Depending on the kinds of jobs, you may need more
or less memory per slot.
Have 2-3 times as many
moved common-user@hadoop.apache.org to bcc and added u...@pig.apache.org
Best asked on the Pig users list.
Cheers,
Joep
On Wed, Oct 3, 2012 at 7:04 AM, Abhishek abhishek.dod...@gmail.com wrote:
Hi all,
Below hive query in pig latin how to do that.
select t2.col1, t3.col2
from table2 t2
Agreed that different locations is not a good idea.
However, the question was, can it be done? Yes, with some hacking I suppose.
Do I recommend hacking? No.
But, if you cannot help yourself, then having data nodes in a different
locations per slave: create a hdfs-site.xml per node (enjoy).
For
Pierre,
As discussed in recent other threads, it depends.
The most sensible thing for Hadoop nodes is to find a sweet spot for
price/performance.
In general that will mean keeping a balance between compute power, disks,
and network bandwidth, and factor in racks, space, operating costs etc.
How
The error is that you cannot open /tmp/MatrixMultiply/out/_logs
Does the directory exist?
Do you have proper access rights set?
Joep
On Wed, Nov 30, 2011 at 3:23 AM, ChWaqas waqas...@gmail.com wrote:
Hi I am trying to run the matrix multiplication example mentioned(with
source
code) on the