Re: WELCOME to user@hadoop.apache.org

2015-06-08 Thread
It seems the parameter "mapreduce.map.memory.mb" is parsed from client.

2015-06-07 15:05 GMT+08:00 J. Rottinghuis :

> On each node you can configure how much memory is available for containers
> to run.
> On the other hand, for each application you can configure how large
> containers should be. For MR apps, you can separately set mappers,
> reducers, and the app master itself.
>
> Yarn will detemine through scheduling rules and depending on locality
> where tasks are run. One app has one container size (per respective
> category map, reduce, AM) that is not driven by nodes. Available node
> memory divided by task size will determine how many tasks run on each node.
> There are minimum and maximum container sizes, so you can avoid running
> crazy things such as 1K 1MB containers for example.
>
> Hope that helps,
>
> Joep
>
> On Thu, Jun 4, 2015 at 6:48 AM, paco  wrote:
>
>>
>> Hello,
>>
>> Recently I have increased my physical cluster. I have two kind of nodes:
>>
>> Type 1:
>> RAM: 24 GB
>> 12 cores
>>
>> Type 2:
>> RAM: 64 GB
>> 12 cores
>>
>> Theses nodes are in the same physical rack. I would like to configure it
>> to use 12 container per node, in nodes of type 1 each mapper has 1.8GB
>> (22GB / 12 cores = 1.8GB), in nodes of kind 2 each mapper will has 5.3GB
>> (60/12). Is it possible?
>>
>> I have configured so:
>>
>> nodes type 1(slaves):
>> 
>> yarn.nodemanager.resource.memory-mb
>> 22000
>> 
>>
>> 
>> mapreduce.map.memory.mb
>> 1800
>> 
>> 
>> mapred.map.child.java.opts
>> -Xmx1800m
>> 
>>
>>
>>
>> nodes type 2(slaves):
>> 
>> yarn.nodemanager.resource.memory-mb
>> 6
>> 
>>
>> 
>> mapreduce.map.memory.mb
>> 5260
>> 
>> 
>> mapred.map.child.java.opts
>> -Xmx5260m
>> 
>>
>>
>>
>> Hadoop is creating mapper with 1 GB of memory like:
>>
>> Nodes of kind 1:
>> 20GB/1GB = 20 container which it is executing with -Xmx1800
>>
>> Nodes of kind 2:
>> 60GB/1GB = 60 container which it is executing with -Xmx5260
>>
>>
>> Thanks!
>>
>>
>>
>


Re: How to implement a servlet to submit job to Hadoop 2.6.0 cluster

2015-05-27 Thread
>
> config.addResource(new 
> Path("/usr/local/hadoop-2.6.0/etc/hadoop/core-site.xml"));
>
> the path is org.apache.hadoop.fs.Path, so the resource should be in hdfs,
do you have the resource in hdfs?

can you try the API

> config.addResource(InputStream in)


2015-05-25 18:36 GMT+08:00 Carmen Manzulli :

>
> Hi,
>
> I'm trying to run a servlet (QueryServlet) - using Tomcat 8- which submits
> a job called "ArqOnHadoop2" to hadoop 2.6.0... this last one is configured
> using single node setting up in /usr/local folder. This job works if i
> start it from command line but, when i try to execute the follow code from
> netbeans,I receive "HTTP Status 500 - Cannot initialize Cluster. Please
> check your configuration for mapreduce.framework.name and the correspond
> server addresses."
>
> Configuration config = new Configuration();
> config.addResource(new 
> Path("/usr/local/hadoop-2.6.0/etc/hadoop/core-site.xml"));
> config.addResource(new 
> Path("/usr/local/hadoop-2.6.0/etc/hadoop/hdfs-site.xml"));
> config.addResource(new 
> Path("/usr/local/hadoop-2.6.0/etc/hadoop/yarn-site.xml"));
> config.addResource(new 
> Path("/usr/local/hadoop-2.6.0/etc/hadoop/mapred-site.xml"));
>
> config.set("fs.hdfs.impl", 
> org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
> config.set("yarn.resourcemanager.address","master:8032");
> config.set("mapreduce.framework.name","yarn");
> config.set("fs.defaultFS","hdfs://master:9000");
> //input query parameter as string
> config.set(ArqReducerConstants.MY_QUERY,args[0]);
> Job job = Job.getInstance(config);
> job.setJarByClass(QueryServlet.class);//input number of lines for mapper 
> parameter as int, used to reduce number of splits and run map() method 
> quicklyString N=args[4];int n=Integer.parseInt(N);
> job.getConfiguration().setInt(NLineInputFormat.LINES_PER_MAP, n);
>
> job.setMapperClass(MapperDoesNothing.class);
> job.setMapOutputKeyClass(NullWritable.class);
> job.setMapOutputValueClass(TripleWritable.class);
>
> job.setReducerClass(QueryReducer.class);
>
> job.setInputFormatClass( BlockedNTriplesInputFormat.class);
> job.setOutputFormatClass(TextOutputFormat.class);String 
> in="hdfs://master:9000"+args[1];String 
> out="hdfs://master:9000"+args[2];//input and output paths 
> parametersFileInputFormat.setInputPaths(job, new 
> Path(in));FileOutputFormat.setOutputPath(job, new Path(out));
>
> job.waitForCompletion(true);
>
> //where args is just a name of a String array with some input parameters..
>
> There are no problems with hadoop or in the .xml files; there are no
> problems with permissions, because i've disable its with:
> dfs.permission.enables=false in hdfs-site.xml; there are no problem with
> permissions also for my hadoop folder because i've used chmod -R 777;
>
> So...what does my project miss to obtain the goal? I need help...
>
> i think there is something missing when i set the Configuration Object, or
> problem could be due to something about jar in the classpath but in this
> case i don't know how to insert all hadoop jar using maven...
>
> Thanks in advance, Carmen.
>
>


Re: several jobs in one MapReduce runtime

2015-04-20 Thread
If the cluster have enough resource, then more than one job will run at the
same time

2015-04-18 2:27 GMT+08:00 xeonmailinglist-gmail :

> Hi,
>
> I have a MapReduce runtime where I put several jobs running in
> concurrency. How I manage the job scheduler so that it won't run a job at a
> time?
>
> Thanks,
>
> --
> --
>
>


Re: Unable to load file from local to HDFS cluster

2015-04-11 Thread
Oh, I see. Is that you have configured a conflicted port before?

2015-04-09 18:36 GMT+08:00 sandeep vura :

> Hi Yanghaogn,
>
> Sure, We couldn't able to load the file from local to HDFS. Its getting
> exception DFSOutputStream connection refused,which means packets are not
> receiving properly from namenode to datanodes .However,if we start clusters
> our datanodes are not starting properly and getting connection closed
> exception.
>
> Our Hadoop WebUI also opening very slow ,ssh connection also very
> slow.Then finally we have changed our network ports and checked the
> performance of the cluster it works good.
>
> Issue was fixed in Namenode server network port.
>
> Regards,
> Sandeep.v
>
>
> On Thu, Apr 9, 2015 at 12:30 PM, 杨浩  wrote:
>
>> Root cause: Network related issue?
>> can you tell us more detailedly? Thank you
>>
>> 2015-04-09 13:51 GMT+08:00 sandeep vura :
>>
>>> Our issue has been resolved.
>>>
>>> Root cause: Network related issue.
>>>
>>> Thanks for each and everyone spent sometime and replied to my questions.
>>>
>>> Regards,
>>> Sandeep.v
>>>
>>> On Thu, Apr 9, 2015 at 10:45 AM, sandeep vura 
>>> wrote:
>>>
>>>> Can anyone give solution for my issue?
>>>>
>>>> On Thu, Apr 9, 2015 at 12:48 AM, sandeep vura 
>>>> wrote:
>>>>
>>>>> Exactly but every time it picks randomly. Our datanodes are
>>>>> 192.168.2.81,192.168.2.82,192.168.2.83,192.168.2.84,192.168.2.85
>>>>>
>>>>> Namenode  : 192.168.2.80
>>>>>
>>>>> If i restarts the cluster next time it will show 192.168.2.81:50010
>>>>> connection closed
>>>>>
>>>>> On Thu, Apr 9, 2015 at 12:28 AM, Liaw, Huat (MTO) <
>>>>> huat.l...@ontario.ca> wrote:
>>>>>
>>>>>>  You can not start 192.168.2.84:50010…. closed by ((192.168.2.x
>>>>>> -datanode))
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* sandeep vura [mailto:sandeepv...@gmail.com]
>>>>>> *Sent:* April 8, 2015 2:39 PM
>>>>>>
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Unable to load file from local to HDFS cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are using this setup from a very long time.We are able to run all
>>>>>> the jobs successfully but suddenly went wrong with namenode.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 9, 2015 at 12:06 AM, sandeep vura 
>>>>>> wrote:
>>>>>>
>>>>>> I have also noticed another issue when starting hadoop cluster
>>>>>> start-all.sh command
>>>>>>
>>>>>>
>>>>>>
>>>>>> namenode and datanode daemons are starting.But sometimes one of the
>>>>>> datanode would drop the connection and it shows the message connection
>>>>>> closed by ((192.168.2.x -datanode)) everytime when it restart the hadoop
>>>>>> cluster datanode will keeps changing .
>>>>>>
>>>>>>
>>>>>>
>>>>>> for example 1st time when i starts hadoop cluster - 192.168.2.1 -
>>>>>> connection closed
>>>>>>
>>>>>> 2nd time when i starts hadoop cluster - 192.168.2.2-connection closed
>>>>>> .This point again 192.168.2.1 will starts successfuly without any errors.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I couldn't able to figure out the issue exactly.Is issue relates to
>>>>>> network or Hadoop configuration.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 8, 2015 at 11:54 PM, Liaw, Huat (MTO) <
>>>>>> huat.l...@ontario.ca> wrote:
>>>>>>
>>>>>> hadoop fs -put   Copy from remote location to
>>>>>> HDFS
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* sandeep vura [mailto:sandeepv...@gmail.com]
>>>>>> *Sent:* April 8, 2015 2:24 PM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Unable to load file from local to HDFS cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sorry Liaw,I tried same command but its didn't resolve.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sandeep.V
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 8, 2015 at 11:37 PM, Liaw, Huat (MTO) <
>>>>>> huat.l...@ontario.ca> wrote:
>>>>>>
>>>>>> Should be hadoop dfs -put
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* sandeep vura [mailto:sandeepv...@gmail.com]
>>>>>> *Sent:* April 8, 2015 1:53 PM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Unable to load file from local to HDFS cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> When loading a file from local to HDFS cluster using the below
>>>>>> command
>>>>>>
>>>>>>
>>>>>>
>>>>>> hadoop fs -put sales.txt /sales_dept.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Getting the following exception.Please let me know how to resolve
>>>>>> this issue asap.Please find the attached is the logs that is displaying 
>>>>>> on
>>>>>> namenode.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sandeep.v
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Unsubscribe

2015-04-11 Thread
You can mailto user-unsubscr...@hadoop.apache.org

2015-04-09 23:35 GMT+08:00 Ram :

>
>


Re: Unsubscribe

2015-04-11 Thread
you can mailto user-unsubscr...@hadoop.apache.org

2015-04-10 1:16 GMT+08:00 Liaw, Huat (MTO) :

>  What do you unsubscribe?
>
>
>
> *From:* Rajeev Yadav [mailto:rajeya...@gmail.com]
> *Sent:* April 9, 2015 1:02 PM
> *To:* user@hadoop.apache.org
> *Subject:* Unsubscribe
>
>
>
> Unsubscribe
>
>
>
> --
>
> *Warm Regards,*
>
> Rajeev Yadav
>
>
>


Re: Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

2015-04-11 Thread
I think you can have a try at http://hdt.incubator.apache.org/

2015-04-12 2:04 GMT+08:00 Answer Agrawal :

> Thanks Jonathan
>
> I have installed and configured my own hadoop cluster with one master node
> and 7 slave nodes. Now I just want to make sure that job running through
> Eclipse is internally same as running through jar file. Also job history
> server and web ui 8088 shows only list of those jobs which are submitted
> through jar using terminal.
>
>
>
>
> On Sat, Apr 11, 2015 at 12:25 PM, Jonathan Aquilina <
> jaquil...@eagleeyet.net> wrote:
>
>>  I could be wrong here, but the way I understand things I do not think
>> that is even possible to run the JAR file from your PC. There are two
>> things that you need to realize.
>>
>> 1) How is the JAR file going to connect to the cluster
>>
>> 2) How is the JAR file going to be distributed to the cluster.
>>
>> Again I could be wrong here in my response, so anyone else on the list
>> feel free to correct me. I am still a novice to Hadoop and have only worked
>> with it on amazon EMR.
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>>  On 2015-04-11 08:23, Answer Agrawal wrote:
>>
>>  A mapreduce job can be run as jar file from terminal or directly from
>> eclipse IDE. When a job run as jar file from terminal it uses multiple jvm
>> and all resources of cluster. Does the same thing happen when we run from
>> IDE. I have run a job on both and it takes less time on IDE than jar file
>> on terminal.
>>
>> Thanks
>>
>>
>>
>


Re: Unable to load file from local to HDFS cluster

2015-04-09 Thread
Root cause: Network related issue?
can you tell us more detailedly? Thank you

2015-04-09 13:51 GMT+08:00 sandeep vura :

> Our issue has been resolved.
>
> Root cause: Network related issue.
>
> Thanks for each and everyone spent sometime and replied to my questions.
>
> Regards,
> Sandeep.v
>
> On Thu, Apr 9, 2015 at 10:45 AM, sandeep vura 
> wrote:
>
>> Can anyone give solution for my issue?
>>
>> On Thu, Apr 9, 2015 at 12:48 AM, sandeep vura 
>> wrote:
>>
>>> Exactly but every time it picks randomly. Our datanodes are
>>> 192.168.2.81,192.168.2.82,192.168.2.83,192.168.2.84,192.168.2.85
>>>
>>> Namenode  : 192.168.2.80
>>>
>>> If i restarts the cluster next time it will show 192.168.2.81:50010
>>> connection closed
>>>
>>> On Thu, Apr 9, 2015 at 12:28 AM, Liaw, Huat (MTO) 
>>> wrote:
>>>
  You can not start 192.168.2.84:50010…. closed by ((192.168.2.x
 -datanode))



 *From:* sandeep vura [mailto:sandeepv...@gmail.com]
 *Sent:* April 8, 2015 2:39 PM

 *To:* user@hadoop.apache.org
 *Subject:* Re: Unable to load file from local to HDFS cluster



 We are using this setup from a very long time.We are able to run all
 the jobs successfully but suddenly went wrong with namenode.



 On Thu, Apr 9, 2015 at 12:06 AM, sandeep vura 
 wrote:

 I have also noticed another issue when starting hadoop cluster
 start-all.sh command



 namenode and datanode daemons are starting.But sometimes one of the
 datanode would drop the connection and it shows the message connection
 closed by ((192.168.2.x -datanode)) everytime when it restart the hadoop
 cluster datanode will keeps changing .



 for example 1st time when i starts hadoop cluster - 192.168.2.1 -
 connection closed

 2nd time when i starts hadoop cluster - 192.168.2.2-connection closed
 .This point again 192.168.2.1 will starts successfuly without any errors.



 I couldn't able to figure out the issue exactly.Is issue relates to
 network or Hadoop configuration.







 On Wed, Apr 8, 2015 at 11:54 PM, Liaw, Huat (MTO) 
 wrote:

 hadoop fs -put   Copy from remote location to HDFS



 *From:* sandeep vura [mailto:sandeepv...@gmail.com]
 *Sent:* April 8, 2015 2:24 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: Unable to load file from local to HDFS cluster



 Sorry Liaw,I tried same command but its didn't resolve.



 Regards,

 Sandeep.V



 On Wed, Apr 8, 2015 at 11:37 PM, Liaw, Huat (MTO) 
 wrote:

 Should be hadoop dfs -put



 *From:* sandeep vura [mailto:sandeepv...@gmail.com]
 *Sent:* April 8, 2015 1:53 PM
 *To:* user@hadoop.apache.org
 *Subject:* Unable to load file from local to HDFS cluster



 Hi,



 When loading a file from local to HDFS cluster using the below command



 hadoop fs -put sales.txt /sales_dept.



 Getting the following exception.Please let me know how to resolve this
 issue asap.Please find the attached is the logs that is displaying on
 namenode.



 Regards,

 Sandeep.v







>>>
>>>
>>
>


Re: Question about log files

2015-04-06 Thread
I think the log information has lost.

 the hadoop is not designed for that you deleted these files incorrectly

2015-04-02 11:45 GMT+08:00 煜 韦 :

> Hi there,
> If log files are deleted without restarting service, it seems that the
> logs is to be lost for later operation. For example, on namenode, datanode.
> Why not log files could be re-created when deleted by mistake or on
> purpose during cluster is running?
>
> Thanks,
> Jared
>


Re: How to know when datanode are marked dead by namenode

2015-03-30 Thread
Hi Ted
I have read the feature, and it says,  "The patch appears to be a
documentation patch that doesn't require tests."

Can you tell me what patch should add UT, and which would not

2015-03-29 9:44 GMT+08:00 Ted Yu :

> Himawan:
> You don't need to recompile the code.
> Please see this thread http://search-hadoop.com/m/LgpTk2FkzEk
>
> The last comment in that thread led to:
> https://issues.apache.org/jira/browse/HDFS-7685
>
> Cheers
>
> On Sat, Mar 28, 2015 at 6:36 PM, Himawan Mahardianto <
> mahardia...@ugm.ac.id> wrote:
>
>> So, if I want to change the default dead time, I have to compile from
>> source first?
>> Is there any other way that I use to change dead time from native
>> hadoop-2.6 (not from source)?
>> Or maybe the native doesn't have a feature to change the dead time?
>>
>> Thank's all for the responses :)
>>
>> On Sun, Mar 29, 2015 at 8:13 AM, Ted Yu  wrote:
>>
>>> Himawan:
>>> Please see the following constants
>>> in 
>>> hadoop-hdfs-project//hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
>>> :
>>>
>>>   public static final String
>>> DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY =
>>> "dfs.namenode.heartbeat.recheck-interval";
>>>
>>>   public static final int
>>> DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT = 5*60*1000;
>>>
>>> Cheers
>>>
>>> On Sat, Mar 28, 2015 at 5:48 PM, Himawan Mahardianto <
>>> mahardia...@ugm.ac.id> wrote:
>>>
 Thank you for your response and explanation, but I couldn't find "
 dfs.namenode.heartbeat.recheck-interval" parameter on
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
 that you told before, maybe any other formula with this version?

 On Sat, Mar 28, 2015 at 6:42 PM, Brahma Reddy Battula <
 brahmareddy.batt...@huawei.com> wrote:

>  HI
>
> The NameNode updates this detail after 10.30 minutes by default. You
> can see the dead and live datanodes at that time.
>
> It computes this heartbeatExpireInterval time by the following formula
>
>   heartbeatExpireInterval = 2 * heartbeatRecheckInterval +
>   10 * 1000* heartbeatInterval
>
> where heartbeatRecheckInterval is defined by the configuration
> dfs.namenode.heartbeat.recheck-interval which is 5 minutes by default and
> heartbeatInterval by dfs.heartbeat.interval which is 3 seconds by default.
>
> Hence
> heartbeatExpireInterval = 10.30 minutes
>
>
> SO If you want keep more time, you can configure
> dfs.namenode.heartbeat.recheck-interval based one your requirement..
>
>
>  Thanks & Regards
>
> Brahma Reddy Battula
>
>
>
>
>   --
> *From:* Himawan Mahardianto [mahardia...@ugm.ac.id]
> *Sent:* Saturday, March 28, 2015 4:42 PM
> *To:* user@hadoop.apache.org
> *Subject:* How to know when datanode are marked dead by namenode
>
>   Hi guys I'm newbie here, do you know how to time calculation when
> datanode are marked dead by namenode, what parameters on HDFS-SITE.xml
> should I look for to calculate it, and how can I reduce dead time from
> default 10 minutes to 5 minutes or increase it to 20 minutes?
> Thank's before
>
>  best regards
> Himawan Mahardianto
>


>>>
>>
>


Re: trying to understand HashPartitioner

2015-03-26 Thread
It's not the number of the the reduce task, but the ID of the reduce task.
For definite , it will only be dealed on one reduce task.

In MRv2, each reduce task has an ID, like 0、1、2、3、4. The result is the
reduce ID and the  will be processed on that reduce task

2015-03-19 7:27 GMT+08:00 Jianfeng (Jeff) Zhang :

>
>  You can take it similar as the HashMap of java. Use the hashCode of one
> object to distribute it into different bucket.
>
>
>
>  Best Regard,
> Jeff Zhang
>
>
>   From: xeonmailinglist-gmail 
> Reply-To: "user@hadoop.apache.org" 
> Date: Wednesday, March 18, 2015 at 7:08 PM
> To: "user@hadoop.apache.org" 
> Subject: Re: trying to understand HashPartitioner
>
>  What tells with partition will run on which reduce task?
>
> On 18-03-2015 09:30, xeonmailinglist-gmail wrote:
>
>  Hi,
>
> I am trying to understand how HashPartitioner.java works. Thus, I ran a
> mapreduce job with 5 reducers and 5 input files. I thought that the output
> of getPartition(K2 key, V2 value, int numReduceTasks) was the number of
> reduce task that K2 and V2 will execute. Is this correct?
>  ​
>
> --
> --
>
>
> --
> --
>
>


Re: The Activities of Apache Hadoop Community

2015-03-10 Thread
Hi , it would be help to explain "the assignees' mail addresses"


2015-03-05 10:21 GMT+08:00 Azuryy Yu :

> That's good to know,
>
> On Tue, Mar 3, 2015 at 8:12 PM, Akira AJISAKA 
> wrote:
>
>> Hi all,
>>
>> One year after the previous post, we collected and analyzed
>> JIRA tickets again to investigate the activities of Apache Hadoop
>> community in 2014.
>>
>> http://ajisakaa.blogspot.com/2015/02/the-activities-of-apache-hadoop.html
>>
>> As we expected in the previous post, the activities of
>> Apache Hadoop community was continued to expand also in 2014.
>> We hope it will be the same in 2015.
>>
>> Thanks,
>> Akira
>>
>> On 2/13/14 11:20, Akira AJISAKA wrote:
>>
>>> Hi all,
>>>
>>> We collected and analyzed JIRA tickets to investigate
>>> the activities of Apache Hadoop Community in 2013.
>>>
>>> http://ajisakaa.blogspot.com/2014/02/the-activities-of-
>>> apache-hadoop.html
>>>
>>> We counted the number of the organizations, the lines
>>> of code, and the number of the issues. As a result, we
>>> confirmed all of them are increasing and Hadoop community
>>> is getting more active.
>>> We appreciate continuous contributions of developers
>>> and we hope the activities will expand also in 2014.
>>>
>>> Thanks,
>>> Akira
>>>
>>>
>>
>


Re: How to find bottlenecks of the cluster ?

2015-03-03 Thread
I think benchmark will do some help, since it can help to find out the
executing speed of I/O rated job and CPU rated job

2015-03-02 19:01 GMT+08:00 Adrien Mogenet 
:

> This is a non-sense ; you have to tell us under which conditions you want
> to find a bottleneck.
>
> Regardless the workload, we mostly use OpenTSDB to check cpu times (iowait
> / user / sys / idle), disk usage (await, ios in progress...) and memory
> (numa allocations, buffers, cache, dirty pages...)
>
> On 2 March 2015 at 08:20, Krish Donald  wrote:
>
>> Basically we have 4 points to consider, CPU , Memory, IO and Network
>>
>> So how to see which one is causing the bottleneck ?
>> What parameters we should consider etc ?
>>
>> On Sun, Mar 1, 2015 at 10:57 PM, Nishanth S 
>> wrote:
>>
>>> This is a  vast  topic.Can you tell what components are there in your
>>> data pipe line and how data flows in to system and the way its
>>> processed.There are several  inbuilt tests like testDFSIO and terasort that
>>> you can run.
>>>
>>> -Nishan
>>>
>>> On Sun, Mar 1, 2015 at 9:45 PM, Krish Donald 
>>> wrote:
>>>
 Hi,

 I wanted to understand, how should we find out the bottleneck of the
 cluster?

 Thanks
 Krish

>>>
>>>
>>
>
>
> --
>
> *Adrien Mogenet*
> Head of Backend/Infrastructure
> adrien.moge...@contentsquare.com
> (+33)6.59.16.64.22
> http://www.contentsquare.com
> 4, avenue Franklin D. Roosevelt - 75008 Paris
>


Re: how to check hdfs

2015-03-03 Thread
I don't think it nessary to run the command with daemon in that client, and
hdfs is not a daemon for hadoop。

2015-03-03 20:57 GMT+08:00 Somnath Pandeya :

>  Is your hdfs daemon running on cluster. ? ?
>
>
>
> *From:* Vikas Parashar [mailto:para.vi...@gmail.com]
> *Sent:* Tuesday, March 03, 2015 10:33 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: how to check hdfs
>
>
>
> Hi,
>
>
>
> Kindly install hadoop-hdfs rpm in your machine..
>
>
>
> Rg:
>
> Vicky
>
>
>
> On Mon, Mar 2, 2015 at 11:19 PM, Shengdi Jin  wrote:
>
>  Hi all,
>
> I just start to learn hadoop, I have a naive question
>
> I used
>
> hdfs dfs -ls /home/cluster
>
> to check the content inside.
>
> But I get error
> ls: No FileSystem for scheme: hdfs
>
> My configuration file core-site.xml is like
> 
> 
>   fs.defaultFS
>   hdfs://master:9000
> 
> 
>
>
> hdfs-site.xml is like
> 
> 
>dfs.replication
>2
> 
> 
>dfs.name.dir
>file:/home/cluster/mydata/hdfs/namenode
> 
> 
>dfs.data.dir
>file:/home/cluster/mydata/hdfs/datanode
> 
> 
>
> is there any thing wrong ?
>
> Thanks a lot.
>
>
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are 
> not
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this 
> e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>
>


Re: Submit mapreduce job in remote YARN

2015-02-20 Thread
yes, you can do this in java, if these conditions are satisfied

   1. your client is in the same network with the hadoop cluster
   2. add the hadoop configuration to your java classpath, then the jvm
   will load the hadoop configuration

but the suggesttiong way is

> hadoop jar


2015-02-20 19:18 GMT+08:00 xeonmailinglist :

> Hi,
>
> I would like to submit a mapreduce job in a remote YARN cluster. Can I do
> this in java, or using  a REST API?
>
> Thanks,
>


Re: How to get Hadoop's Generic Options value

2015-02-19 Thread
why not trying
>
>   -D files=/home/MapReduce/testFile.json


2015-02-20 5:03 GMT+08:00 Haoming Zhang :

> Hi,
>
>
> As you know, Hadoop support the Generic Options
> .
> For example you can use "-files" to specify files to be copied to the map
> reduce cluster.
>
>
> But how to get the value of Generic Options? For example, I have a command
> that is :
>
> hadoop jar job.jar JobClass -files /home/MapReduce/testFile.json input
> output
>
> I'm using the generic option "-files", and the value of this option is
> "/home/MapReduce/testFile.json", so how can I get this value?
>
>
> I tried GenericOptionsParser
> 
> class like this:
>
> String option = new GenericOptionsParser(conf, 
> args).getCommandLine().getOptionValue("files");
>
> But the result was NULL. I also changed the parameter of .getOptionValue()
> to "-files", got NULL as well.
>
>
> And I also tried the code like this:
>
> String[] toolArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
>
> But the above code only get all the remaining command options other than
> the generic options, that is the contrary of my expectation.
>
>
> Any suggestion will be great appreciated!
>
>
> Thanks,
> Haoming
>
>


Re: Question about mapp Task and reducer Task

2015-02-15 Thread
hi ulul
 thank you for explanation. I have googled the feature, and hortonworks
said

This feature is a technical preview and considered under development. Do
not use this feature in your production systems.

 can we use it in production env?


2015-02-15 20:15 GMT+08:00 Ulul :

>  Hi
>
> Actually it depends : in MR1 each mapper or reducer will be exezcuted in
> its own JVM, in MR2 you can activate uberjobs that will let the framework
> serialize small jobs' mappers and reducers in the applicationmaster JVM.
>
> Look for mapreduce.job.ubertask.* properties
>
> Ulul
>
> Le 15/02/2015 11:11, bit1...@163.com a écrit :
>
> Hi, Hadoopers,
>
>  I am pretty newbie to Hadoop, I got a question:  when a job runs, Will
> each mapper or reducer task take up a JVM process or only a thread?
> I hear that the answer is the Process. That is, say, one job contains 5
> mappers and 2 reducers , then there will be 7 JVM processes?
> Thanks.
>
>  --
>  bit1...@163.com
>
>
>


Re: Question about mapp Task and reducer Task

2015-02-15 Thread
I think so

2015-02-15 18:11 GMT+08:00 bit1...@163.com :

> Hi, Hadoopers,
>
> I am pretty newbie to Hadoop, I got a question:  when a job runs, Will
> each mapper or reducer task take up a JVM process or only a thread?
> I hear that the answer is the Process. That is, say, one job contains 5
> mappers and 2 reducers , then there will be 7 JVM processes?
> Thanks.
>
> --
> bit1...@163.com
>


Re: execute job in a remote jobtracker in YARN?

2015-02-15 Thread
Do you mean you want to execute a job in the remote cluster which don't
contain you node?
If you copy the configure of RM to your own computer, and this computer
will be taken as the hadoop client. Then you can execute the job through
'hadoop jar', the job will be executed on remote cluster

2015-02-14 6:48 GMT+08:00 Ravi Prakash :

> Hi!
>
> There is no "JobTracker" in YARN. There is an ApplicationMaster. And there
> is a ResourceManager. Which do you mean?
>
> You can use the ResourceManager REST API to submit new applications
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application
>
> Another option (and a rather convoluted one at that) is to create an MR
> job which retrieves the job jar and conf and parameters from a common
> source (FTP server e.g.) and launches a new job. This is something similar
> to what Oozie does. However its unlikely that you need to do the same thing.
>
> HTH
> Ravi
>
>
>   On Friday, February 13, 2015 10:20 AM, xeonmailinglist <
> xeonmailingl...@gmail.com> wrote:
>
>
> Hi,
>
> I want to execute a job remotely. So, I was thinking in serialize the
> org.apache.hadoop.mapreduce.Job class and send it to a remote component
> that I create that launches the job there, or find a way to transform
> the Job class into a configuration file that my remote component will
> execute the job.
>
>
> Is it possible to execute a Job in a remote jobtracker in YARN? If so,
> what is the best way to do it?
>
> Thanks,
>
>
>