Re: Changing the maximum tasks per node on a per job basis

2013-05-23 Thread Harsh J
Your problem seems to surround available memory and over-subscription. If you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to use the CapacityScheduler to address this for you. I once detailed how-to, on a similar question here: http://search-hadoop.com/m/gnFs91yIg1e On

dncp_block_verification log

2013-05-23 Thread Brahma Reddy Battula
Hi All, On some systems, I noticed that when the scanner runs, the dncp_block_verification.log.curr file under the block pool gets quite large .. Please let me know.. i) why it is growing in only some machines..? ii) Wht's solution..? Following links also will describes the

pauses during startup (maybe network related?)

2013-05-23 Thread Ted
Hi I'm running hadoop on my local laptop for development and everything works but there's some annoying pauses during the startup which causes the entire hadoop startup process to take up to 4 minutes and I'm wondering what it is and if I can do anything about it. I'm running everything on 1

Hadoop Rack awareness on virtual system

2013-05-23 Thread Jitendra Yadav
Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~

Re: dncp_block_verification log

2013-05-23 Thread Harsh J
Hi, What is your HDFS version? I vaguely remember this to be a problem in the 2.0.0 version or so where there was also a block scanner excessive work bug, but I'm not sure what fixed it. I've not seen it appear in the later releases. On Thu, May 23, 2013 at 12:08 PM, Brahma Reddy Battula

RE: dncp_block_verification log

2013-05-23 Thread Brahma Reddy Battula
HI Harsh Thanks for reply... I am using hadoop-2.0.1 From: Harsh J [ha...@cloudera.com] Sent: Thursday, May 23, 2013 8:24 PM To: user@hadoop.apache.org Subject: Re: dncp_block_verification log Hi, What is your HDFS version? I vaguely remember this to

Hadoop Installation Mappers setting

2013-05-23 Thread Jitendra Yadav
Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~

Out of memory error by Node Manager, and shut down

2013-05-23 Thread Krishna Kishore Bonagiri
Hi, I have got the following error in node manager's log, and it got shut down, after about 1 application were run after it was started. Any clue why does it occur... or is this a bug? 2013-05-22 11:53:34,456 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process

Re: Hadoop Installation Mappers setting

2013-05-23 Thread bejoy . hadoop
Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop

Re: Hadoop Rack awareness on virtual system

2013-05-23 Thread Leonid Fedotov
You definitely can. Just set rack script on your VMs. Leonid On Thu, May 23, 2013 at 2:50 AM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~

Hadoop Classpath issue.

2013-05-23 Thread Dhanasekaran Anbalagan
Hi Guys, When i trying to execute hadoop fs -ls / command It's return extra two lines. 226:~# hadoop fs -ls / *common ./* *lib lib* Found 9 items drwxrwxrwx - hdfs supergroup 0 2013-03-07 04:46 /benchmarks drwxr-xr-x - hbase hbase 0 2013-05-23 08:59 /hbase

Re: Hadoop Rack awareness on virtual system

2013-05-23 Thread Jitendra Yadav
Hi Leonid, Thanks for you reply. please you please give me an example how to make topology.sh file? Lets say I have below slave servers(data nodes) 192.168.45.1 dnode1 192.168.45.2 dnode2 192.168.45.3 dnode3 192.168.45.4 dnode4 192.168.45.5 dnode5 Thanks On Thu, May 23, 2013 at 8:02

Re: Hadoop Rack awareness on virtual system

2013-05-23 Thread Harsh J
An example topology file and script is available on the Wiki at http://wiki.apache.org/hadoop/topology_rack_awareness_scripts On Thu, May 23, 2013 at 8:38 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi Leonid, Thanks for you reply. please you please give me an example how to make

Re: Out of memory error by Node Manager, and shut down

2013-05-23 Thread Pramod N
Looks like the problem is with jvm heap size. Its trying to create a new thread and threads require native memory for internal JVM related things. One of the possible solution is to reduce java heap size(to increase free native memory). Is there any other information about the memory status

Re: R for Hadoop

2013-05-23 Thread Amal G Jose
Try Rhipe, it is good. http://amalgjose.wordpress.com/2013/05/05/rhipe-installation/ http://www.datadr.org/ http://amalgjose.wordpress.com/2013/05/05/r-installation-in-linux-platforms/ On Mon, May 20, 2013 at 2:23 PM, sudhakara st sudhakara...@gmail.comwrote: Hi You find good start up

RE: Shuffle phase replication factor

2013-05-23 Thread John Lilley
Ling, Thanks for the response! I could use more clarification on item 1. Specifically * mapred.reduce.parallel.copies limits the number of outbound connections for a reducer, but not the inbound connections for a mapper. Does tasktracker.http.threads limit the number of

Re: Shuffle phase replication factor

2013-05-23 Thread Sandy Ryza
In MR1, the tasktracker serves the mapper files (so that tasks don't have to stick around taking up resources). In MR2, the shuffle service, which lives inside the nodemanager, serves them. -Sandy On Thu, May 23, 2013 at 10:22 AM, John Lilley john.lil...@redpoint.netwrote: Ling,

Re: Is there a way to limit # of hadoop tasks per user at runtime?

2013-05-23 Thread Amal G Jose
You can use capacity scheduler also. In that you can create some queues, each of specific capacity. Then you can submit jobs to that specific queue at runtime or you can configure it as direct submission. On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Mehmet,

Re: Hadoop Installation Mappers setting

2013-05-23 Thread Amal G Jose
I am explaining it more. If your machine have 8 GB of memory. After reserving to Operating system and all other processes except tasktracker, you have 4 GB remaining(assume). The remaining process running is tasktracker. If the child jvm size is 200 MB, Then you can define a maximum slots of

HDFS data and non-aligned splits

2013-05-23 Thread John Lilley
What happens when MR produces data splits, and those splits don't align on block boundaries? I've read that MR will attempt to make data splits near block boundaries to improve data locality, but isn't there always some slop where records straddle the block boundaries, resulting in an extra

SequenceFile sync marker uniqueness

2013-05-23 Thread John Lilley
How does SequenceFile guarantee that the sync marker does not appear in the data? John

Re: pauses during startup (maybe network related?)

2013-05-23 Thread Chris Nauroth
Hi Ted, 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting There are a couple of relevant activities that happen during

Re: Is there a way to limit # of hadoop tasks per user at runtime?

2013-05-23 Thread Harsh J
The only pain point I'd find with CS in a multi-user environment is its limitation of using queue configs. Its non-trivial to configure a queue per user as CS doesn't provide any user level settings (it wasn't designed for that initially), while in FS you get user level limiting settings for free,

Re: Hadoop Installation Mappers setting

2013-05-23 Thread Jitendra Yadav
Hi, Thanks for your clarification. I have one more question. How does cores factor influence slots calculation? Thanks~ On 5/23/13, Amal G Jose amalg...@gmail.com wrote: I am explaining it more. If your machine have 8 GB of memory. After reserving to Operating system and all other

Re: HDFS data and non-aligned splits

2013-05-23 Thread Harsh J
What happens when MR produces data splits, and those splits don’t align on block boundaries? Answer depends on the file format used here. With any of the formats we ship, nothing happens. but isn’t there always some slop where records straddle the block boundaries, resulting in an extra HDFS

Re: Hadoop Installation Mappers setting

2013-05-23 Thread bejoy . hadoop
When you take a mapreduce tasks, you need CPU cycles to do the processing, not just memory. So ideally based on the processor type(hyperthreaded or not) compute the available cores. Then may be compute as, one core for each task slot. Regards Bejoy KS Sent from remote device, Please excuse

Re: SequenceFile sync marker uniqueness

2013-05-23 Thread Harsh J
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init time, so pretty random). For the rest of my answer, I'll prefer not to repeat what Martin's already said very well here: http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for the Avro DataFile format

HTTP file server, map output, and other files

2013-05-23 Thread John Lilley
Thanks to previous kind answers and more reading in the elephant book, I now understand that mapper tasks place partitioned results into local files that are served up to reducers via HTTP: The output file's partitions are made available to the reducers over HTTP. The maximum number of worker

Re: Hive tmp logs

2013-05-23 Thread Sanjay Subramanian
Clarification This property defines a file on HDFS property namehive.exec.scratchdir/name value /data01/workspace/hive scratch/dir/on/local/linux/disk/value /property From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Date: Wednesday,

hive.log

2013-05-23 Thread Sanjay Subramanian
How do I set the property in hive-site.xml that defines the local linux directory for hive.log ? Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged

MiniDFS Cluster log dir

2013-05-23 Thread siddhi mehta
Hey guys, For testing purpose I am starting up a minicluster using the http://hadoop.apache.org/docs/r1.2.0/cli_minicluster.html I was wondering what is a good way to configure log directory for the same. I tried setting hadoop.log.dir or yarn.log.dir but that seems to have no effect. I am

Re: hive.log

2013-05-23 Thread Sanjay Subramanian
Ok figured it out - vi /etc/hive/conf/hive-log4j.properties - Modify this line #hive.log.dir=/tmp/${user.name} hive.log.dir=/data01/workspace/hive/log/${user.name} From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Reply-To:

Child Error

2013-05-23 Thread Jim Twensky
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and an 8-core processor. I sometimes get the following error on a random basis: --- Exception in thread main

Re: pauses during startup (maybe network related?)

2013-05-23 Thread Ted
thanks, I'm almost 100% sure it's network related now. What I tested was unpluggin my network :), the entire system starts in just a few seconds. I decided to search on reverse dns in google and I see other people have complained about very slow reverse dns lookups (some related to hadoop /

Where to begin from??

2013-05-23 Thread Lokesh Basu
Hi all, I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find

Re: splittable vs seekable compressed formats

2013-05-23 Thread Rahul Bhattacharjee
I think seeking is a property of the fs , so any file stored in hdfs is seekable. Inputstream is seekable and outputstream isn't. FileSystem supports seekable. Thanks, Rahul On Thu, May 23, 2013 at 11:01 PM, John Lilley john.lil...@redpoint.netwrote: I’ve read about splittable compressed

Re: Where to begin from??

2013-05-23 Thread Chris Embree
I'll be chastised and have mean things said about me for this. Get some experience in IT before you start looking at Hadoop. My reasoning is this: If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop. Asking what things you need

Re: Where to begin from??

2013-05-23 Thread Sanjay Subramanian
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real

Task attempt failed after TaskAttemptListenerImpl ping

2013-05-23 Thread YouPeng Yang
Hi hadoop users I find that One application filed when the container log it shows that it always ping [2]. How does it come out? I'm using the YARN and MRv2(CDH-4.1.2) [1]resourcemanager.log 2013-05-24 09:45:07,192 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:

Re: Where to begin from??

2013-05-23 Thread Raj Hadoop
Hi, With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world. I agree to the

Re: Hadoop Classpath issue.

2013-05-23 Thread YouPeng Yang
Hi You should check your /usr/bin/hadoop script. 2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com Hi Guys, When i trying to execute hadoop fs -ls / command It's return extra two lines. 226:~# hadoop fs -ls / *common ./* *lib lib* Found 9 items drwxrwxrwx - hdfs supergroup

Re: Where to begin from??

2013-05-23 Thread Lokesh Basu
First of all thank you all. I accept that I don't know much about the real world problem and have to begin from scratch to get some insight of what is actually driving these technologies. to Chris : I will start working on finding and implementing some real world problem and see how these

Re: Hadoop Classpath issue.

2013-05-23 Thread shashwat shriparv
Check your HDFS at namenode:50070 if these files are there... *Thanks Regards* ∞ Shashwat Shriparv On Fri, May 24, 2013 at 9:45 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi You should check your /usr/bin/hadoop script. 2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com

Re: Task attempt failed after TaskAttemptListenerImpl ping

2013-05-23 Thread Harsh J
Assuming you mean failed there instead of filed. In MR, a ping message is sent over the TaskUmbilicalProtocol from the Task container to the MR AM. A ping is only sent as an alternative, to check self, if there's no progress to report from the task. No progress to report for a long time generally

Re: pauses during startup (maybe network related?)

2013-05-23 Thread Harsh J
You are spot on about the DNS lookup slowing things down. I've faced the same issue (before I had a local network DNS set up for the WiFi network I use). but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost. There's a few configs you need to

Hadoop 2.0.4: Unable to load native-hadoop library for your platform

2013-05-23 Thread Ben Kim
Hi I downloaded hadoop 2.0.4 and keep getting these errors from hadoop cli and MapReduce task logs 13/05/24 14:34:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable i tried adding $HADOOP_HOME/lib/native/* to

Hint on EOFException's on datanodes

2013-05-23 Thread Stephen Boesch
On a smallish (10 node) cluster with only 2 mappers per node after a few minutes EOFExceptions are cropping up on the datanodes: an example is shown below. Any hint on what to tweak/change in hadoop / cluster settings to make this more happy? 2013-05-24 05:03:57,460 INFO