I use hadoop-2.0.5, there are MutableCounterLong and MutableCounterLong
class in metrics v2.
I am studing metrics v2 code.
What are difference MutableCounterLong and MutableCounterLong class ?
I find the MutableCounterLong is used to calculate throughput, is that
right? How does the metrics
John,
I assume you do not wish to be using the DistributedCache (or a HDFS
location for DistributedCache), which is the most ideal way to ship
jars.
You can place your jars onto the TT classpaths by placing them at an
arbitrary location such as /opt/jars, and editing the TT's
hadoop-env.sh to ext
This is appropriate. You are making use of both disk mounts and have
the directories for each service isolated as well.
On Fri, Aug 9, 2013 at 7:37 AM, ch huang wrote:
> hi,all:
>i plan to put DN together with NM, i want to use 2*1TB disk , one
> disk mount on /data/1 and another mount on
You are partially incorrect - NameNode is not an SPOF any longer in
2.x releases. Please look at the docs that cover HA in the release you
use, such as
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html.
RM HA is not yet present, but is incoming. S
hi,all:
i plan to put DN together with NM, i want to use 2*1TB disk , one
disk mount on /data/1 and another mount on /data/2
and i set dfs.datanode.data.dir as "/data/1/hadoopdataspace
,/data/2/hadoopdataspace",
set yarn.nodemanager.local-dirs as "/data/1/yarn/local ,/data/2/yarn/local"
,
s
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)
Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. AP
On 08/08/2013 09:23 PM, John Hancock wrote:
Where else might one put .jar files that a map/reduce job will need?
Why do you need an alternative location? Is there a constraint on being
able to place your library jars under $HADOOP_HOME/lib?
Given that your questions are very broad and at high level, I would suggest
that you should pick up a book or such to go through that. The Hadoop:
Definitive Guide by Tom White is a great book to start with it.
Meanwhile some links to start with:
http://hadoop.apache.org/docs/stable/mapred_tutoria
Where else might one put .jar files that a map/reduce job will need?
hi,all:
like name node resource manager is also a SPOF ,is there any solution for
RM HA?
so ,from performance aspect i need seperate RN with NN,because each of them
is memory hungry
On Fri, Aug 9, 2013 at 8:00 AM, Marcos Luis Ortiz Valmaseda <
marcosluis2...@gmail.com> wrote:
> Remember that in YARN, the two main responsibilities of the JobTracker is
> divided in two different compon
Hi,
I start learning about mapreduce with Hadoop by wordcount example. I am a
bit confused about the frontier between the map and reduce program. Is
there a standard format for the map output and the reduce input? Is there
a full explanation of java classes used somewhere? I also appreciate to
lea
Remember that in YARN, the two main responsibilities of the JobTracker is
divided in two different components:
- Resource Management by ResourceManager (this is a global component)
- Job scheduling and monitoring by the NodeManager (this is a per-node
component)
- Resource negotiation and task exec
Is there a reliable way of converting an HDFS Path object into a String?
Invoking path.toUri().toString() does not work with special characters
(e.g., if there are spaces in the original path string). So, for instance,
in the following example
String address = ...; // Path string without the hdfs
Are there any standard or recommended scripts for the mapred.healthChecker
options in the mapred-site.xml configuration file for a linux box?
-Jeff
Thanks for the list Marcos. I will go through the slides/links. I think that's
helpful
Regards,
Dhaval
From: Marcos Luis Ortiz Valmaseda
To: Dhaval Shah
Cc: user@hadoop.apache.org
Sent: Thursday, 8 August 2013 4:50 PM
Subject: Re: Hosting Hadoop
Well, a
Well, all depends, because many companies use Cloud Computing
platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop
hosting:
http://aws.amazon.com/elasticmapreduce
http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html
http://bitrefinery.com/services/hadoop-hosting
http://
We are exploring the possibility of hosting Hadoop outside of our data centers.
I am aware that Hadoop in general isn't exactly designed to run on virtual
hardware. So a few questions:
1. Are there any providers out there who would host Hadoop on dedicated
physical hardware?
2. Has anyone had s
Hi devdoer,
What version are you using?
-Sandy
On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird wrote:
> HI:
>
> I configure the FairScheduler with default settings and my job has 19
> reduce tasks. I found that all the reduce tasks are schedule to run in one
> node.
>
> While with default FIFO
sequence writer is also synchronized, I dont think this is bad.
if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, "Jay Vyas" wrote:
> Then is this a bug? Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id lik
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.
I think the first thing to determine is why the synchronization was put
into the LineRecordWriter in the first
Hello Ch Huang,
Do you know this book?
"Hadoop Operations" http://shop.oreilly.com/product/0636920025085.do
I think, it answers most of the questions in detail.
For a production cluster you should consider MRv1.
And I suggest you, to go with more hard drives per slave node to have a
higher
IO b
Hi Harsh,
Do you want me to raise a Jira on this.
regards,
sathwik
On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas wrote:
> Then is this a bug? Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
Thanks for the hints Shekhar.
My cluster is running well.
Felipe
On Thu, Aug 8, 2013 at 8:56 AM, Shekhar Sharma wrote:
> keep the configuration same in the datanodes as well for the time
> being..Only thing that data node or slave machine should know is Masters
> files ( that means who is the m
keep the configuration same in the datanodes as well for the time
being..Only thing that data node or slave machine should know is Masters
files ( that means who is the master)
and you need to tell the slave machine where is your namenode running,
which you need to specify in the property fs.defaul
Then is this a bug? Synchronization in absence of any race condition is
normally considered "bad".
In any case id like to know why this writer is synchronized whereas the other
one are not.. That is, I think, then point at issue: either other writers
should be synchronized or else this one sho
if you have removed this property from the slave machines then your DN
information will be created under /tmp folder and once you reboot your data
node machines, the information will be lost..
Sorry i have not seen the logs..but you dont have play around the
properties..
...see datanode will not c
Thanks,
at all files I changed to master (cloud6) and I take off this property
hadoop.tmp.dir.
Felipe
On Wed, Aug 7, 2013 at 3:20 PM, Shekhar Sharma wrote:
> Disable the firewall on data node and namenode machines..
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 1
HI:
I configure the FairScheduler with default settings and my job has 19
reduce tasks. I found that all the reduce tasks are schedule to run in one
node.
While with default FIFO schedule, the 19 reduce tasks are scheduled into
diffrent nodes.
How can I configure FairSchedule to load more balan
I may be nitpicking here but if "perhaps the answer is no" then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.
On Thu, Aug 8, 2013 at 12:50 PM, Harsh J wrote:
> While we don't fork by default, we do provide a MultithreadedMappe
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, "Azuryy Yu" wrote:
> its not hadoop forked threads, we may create a line record
*logs:*
*
*
*
2013-08-08 06:03:51,627 INFO
org.apache.oozie.command.wf.ActionStartXCommand: USER[root] GROUP[-]
TOKEN[] APP[clickstream-wf] JOB[044-130719141217337-oozie-oozi-W]
ACTION[044-130719141217337-oozie-oozi-W@:start:] Start action
[044-130719141217337-oozie-oozi-W@:start:] with
Hey Jitendra,
I ensured those two things you mentioned, still i'm facing the same issue.
Regards,
Kasa
On Wed, Aug 7, 2013 at 7:32 PM, Jitendra Yadav
wrote:
> Hi,
>
> I hope below points might help you.
>
> *Approach 1#*
>
> You need to change the sshd_config file in the remote server (probab
if you want HA, then do you want to deploy journal node on the DN?
On Aug 8, 2013 5:09 PM, "ch huang" wrote:
> hi,all:
> My company need build a 10 node hadoop cluster (2 namenode and
> 8 datanode & node manager ,for both data storage and data analysis ) ,we
> have hbase ,hive on the
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, "Sathwik B P" wrote:
> Hi,
> Thanks for your reply.
> May I know where does hadoop fork multiple threads to use a single
> RecordWriter.
>
> regards,
> sathwik
>
> On Thu
hi,all:
My company need build a 10 node hadoop cluster (2 namenode and
8 datanode & node manager ,for both data storage and data analysis ) ,we
have hbase ,hive on the hadoop cluster, 10G data increment per day.
we use CDH4.3 ( for dual - namenode HA),my plan is
Hi,
In my company we sometimes uses KVM to launch test or small demo clusters.
Every developer also has a KVM pseudo distributed cluster on its computer.
Nonetheless, I would not recommend using KVM for production clusters.
Check this link about all the theory of Hadoop with virtualizat
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.
regards,
sathwik
On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu wrote:
> because we may use multi-threads to write a single file.
> On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote:
>
>> Hi,
>>
>>
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote:
> Hi,
>
> LineRecordWriter.write(..) is synchronized. I did not find any other
> RecordWriter implementations define the write as synchronized.
> Any specific reason for this.
>
> regards,
> sath
39 matches
Mail list logo