Your problem seems to surround available memory and over-subscription. If
you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to
use the CapacityScheduler to address this for you.
I once detailed how-to, on a similar question here:
http://search-hadoop.com/m/gnFs91yIg1e
On
Hi All,
On some systems, I noticed that when the scanner runs, the
dncp_block_verification.log.curr file under the block pool gets quite large ..
Please let me know..
i) why it is growing in only some machines..?
ii) Wht's solution..?
Following links also will describes the
Hi I'm running hadoop on my local laptop for development and
everything works but there's some annoying pauses during the startup
which causes the entire hadoop startup process to take up to 4 minutes
and I'm wondering what it is and if I can do anything about it.
I'm running everything on 1
Hi,
Can we create and test hadoop rack awareness functionality in virtual box
system(like on laptop .etc)?.
Thanks~
Hi,
What is your HDFS version? I vaguely remember this to be a problem in the
2.0.0 version or so where there was also a block scanner excessive work
bug, but I'm not sure what fixed it. I've not seen it appear in the later
releases.
On Thu, May 23, 2013 at 12:08 PM, Brahma Reddy Battula
HI Harsh
Thanks for reply...
I am using hadoop-2.0.1
From: Harsh J [ha...@cloudera.com]
Sent: Thursday, May 23, 2013 8:24 PM
To: user@hadoop.apache.org
Subject: Re: dncp_block_verification log
Hi,
What is your HDFS version? I vaguely remember this to
Hi,
While installing hadoop cluster, how we can calculate the exact number of
mappers value.
Thanks~
Hi,
I have got the following error in node manager's log, and it got shut
down, after about 1 application were run after it was started. Any clue
why does it occur... or is this a bug?
2013-05-22 11:53:34,456 FATAL
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
Hi
I assume the question is on how many slots.
It dependents on
- the child/task jvm size and the available memory.
- available number of cores
Your available memory for tasks is total memory - memory used for OS and other
services running on your box.
Other services include non hadoop
You definitely can.
Just set rack script on your VMs.
Leonid
On Thu, May 23, 2013 at 2:50 AM, Jitendra Yadav
jeetuyadav200...@gmail.comwrote:
Hi,
Can we create and test hadoop rack awareness functionality in virtual box
system(like on laptop .etc)?.
Thanks~
Hi Guys,
When i trying to execute hadoop fs -ls / command
It's return extra two lines.
226:~# hadoop fs -ls /
*common ./*
*lib lib*
Found 9 items
drwxrwxrwx - hdfs supergroup 0 2013-03-07 04:46 /benchmarks
drwxr-xr-x - hbase hbase 0 2013-05-23 08:59 /hbase
Hi Leonid,
Thanks for you reply.
please you please give me an example how to make topology.sh file?
Lets say I have below slave servers(data nodes)
192.168.45.1 dnode1
192.168.45.2 dnode2
192.168.45.3 dnode3
192.168.45.4 dnode4
192.168.45.5 dnode5
Thanks
On Thu, May 23, 2013 at 8:02
An example topology file and script is available on the Wiki at
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
On Thu, May 23, 2013 at 8:38 PM, Jitendra Yadav
jeetuyadav200...@gmail.comwrote:
Hi Leonid,
Thanks for you reply.
please you please give me an example how to make
Looks like the problem is with jvm heap size. Its trying to create a new
thread and threads require native memory for internal JVM related things.
One of the possible solution is to reduce java heap size(to increase free
native memory). Is there any other information about the memory status
Try Rhipe, it is good.
http://amalgjose.wordpress.com/2013/05/05/rhipe-installation/
http://www.datadr.org/
http://amalgjose.wordpress.com/2013/05/05/r-installation-in-linux-platforms/
On Mon, May 20, 2013 at 2:23 PM, sudhakara st sudhakara...@gmail.comwrote:
Hi
You find good start up
Ling,
Thanks for the response! I could use more clarification on item 1.
Specifically
* mapred.reduce.parallel.copies limits the number of outbound
connections for a reducer, but not the inbound connections for a mapper. Does
tasktracker.http.threads limit the number of
In MR1, the tasktracker serves the mapper files (so that tasks don't have
to stick around taking up resources). In MR2, the shuffle service, which
lives inside the nodemanager, serves them.
-Sandy
On Thu, May 23, 2013 at 10:22 AM, John Lilley john.lil...@redpoint.netwrote:
Ling,
You can use capacity scheduler also. In that you can create some queues,
each of specific capacity. Then you can submit jobs to that specific queue
at runtime or you can configure it as direct submission.
On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Mehmet,
I am explaining it more.
If your machine have 8 GB of memory.
After reserving to Operating system and all other processes except
tasktracker, you have 4 GB remaining(assume).
The remaining process running is tasktracker.
If the child jvm size is 200 MB,
Then you can define a maximum slots of
What happens when MR produces data splits, and those splits don't align on
block boundaries? I've read that MR will attempt to make data splits near
block boundaries to improve data locality, but isn't there always some slop
where records straddle the block boundaries, resulting in an extra
How does SequenceFile guarantee that the sync marker does not appear in the
data?
John
Hi Ted,
2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting
There are a couple of relevant activities that happen during
The only pain point I'd find with CS in a multi-user environment is its
limitation of using queue configs. Its non-trivial to configure a queue per
user as CS doesn't provide any user level settings (it wasn't designed for
that initially), while in FS you get user level limiting settings for
free,
Hi,
Thanks for your clarification.
I have one more question.
How does cores factor influence slots calculation?
Thanks~
On 5/23/13, Amal G Jose amalg...@gmail.com wrote:
I am explaining it more.
If your machine have 8 GB of memory.
After reserving to Operating system and all other
What happens when MR produces data splits, and those splits don’t align
on block boundaries?
Answer depends on the file format used here. With any of the formats we
ship, nothing happens.
but isn’t there always some slop where records straddle the block
boundaries, resulting in an extra HDFS
When you take a mapreduce tasks, you need CPU cycles to do the processing, not
just memory.
So ideally based on the processor type(hyperthreaded or not) compute the
available cores. Then may be compute as, one core for each task slot.
Regards
Bejoy KS
Sent from remote device, Please excuse
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init
time, so pretty random). For the rest of my answer, I'll prefer not to
repeat what Martin's already said very well here:
http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for
the Avro DataFile format
Thanks to previous kind answers and more reading in the elephant book, I now
understand that mapper tasks place partitioned results into local files that
are served up to reducers via HTTP:
The output file's partitions are made available to the reducers over HTTP. The
maximum number of worker
Clarification
This property defines a file on HDFS
property
namehive.exec.scratchdir/name
value /data01/workspace/hive scratch/dir/on/local/linux/disk/value
/property
From: Sanjay Subramanian
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
Date: Wednesday,
How do I set the property in hive-site.xml that defines the local linux
directory for hive.log ?
Thanks
sanjay
CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged
Hey guys,
For testing purpose I am starting up a minicluster using the
http://hadoop.apache.org/docs/r1.2.0/cli_minicluster.html
I was wondering what is a good way to configure log directory for the same.
I tried setting hadoop.log.dir or yarn.log.dir but that seems to have no
effect.
I am
Ok figured it out
- vi /etc/hive/conf/hive-log4j.properties
- Modify this line
#hive.log.dir=/tmp/${user.name}
hive.log.dir=/data01/workspace/hive/log/${user.name}
From: Sanjay Subramanian
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
Reply-To:
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and
an 8-core processor. I sometimes get the following error on a random basis:
---
Exception in thread main
thanks, I'm almost 100% sure it's network related now.
What I tested was unpluggin my network :), the entire system starts in
just a few seconds.
I decided to search on reverse dns in google and I see other people
have complained about very slow reverse dns lookups (some related to
hadoop /
Hi all,
I'm a computer science undergraduate and has recently started to explore
about Hadoop. I find it very interesting and want to get involved both as
contributor and developer for this open source project. I have been going
through many text book related to Hadoop and HDFS but still I find
I think seeking is a property of the fs , so any file stored in hdfs is
seekable. Inputstream is seekable and outputstream isn't. FileSystem
supports seekable.
Thanks,
Rahul
On Thu, May 23, 2013 at 11:01 PM, John Lilley john.lil...@redpoint.netwrote:
I’ve read about splittable compressed
I'll be chastised and have mean things said about me for this.
Get some experience in IT before you start looking at Hadoop. My reasoning
is this: If you don't know how to develop real applications in a
Non-Hadoop world, you'll struggle a lot to develop with Hadoop.
Asking what things you need
I agree with Chris…don't worry about what the technology is called Hadoop , Big
table, Lucene, Hive….Model the problem and see what the solution could
be….that’s very important
And Lokesh please don't mind…we are writing to u perhaps stuff that u don't
want to hear but its an important real
Hi hadoop users
I find that One application filed when the container log it shows that it
always ping [2].
How does it come out?
I'm using the YARN and MRv2(CDH-4.1.2)
[1]resourcemanager.log
2013-05-24 09:45:07,192 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Hi,
With all due to respect to the senior members of this site, I wanted to first
congratulate Lokesh for his interest in Hadoop. I want to know how many fresh
graduates are interested in this technology. I guess not many. So we have to
welcome Lokesh to Hadoop world.
I agree to the
Hi
You should check your /usr/bin/hadoop script.
2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com
Hi Guys,
When i trying to execute hadoop fs -ls / command
It's return extra two lines.
226:~# hadoop fs -ls /
*common ./*
*lib lib*
Found 9 items
drwxrwxrwx - hdfs supergroup
First of all thank you all.
I accept that I don't know much about the real world problem and have to
begin from scratch to get some insight of what is actually driving these
technologies.
to Chris :
I will start working on finding and implementing some real world problem
and see how these
Check your HDFS at namenode:50070 if these files are there...
*Thanks Regards*
∞
Shashwat Shriparv
On Fri, May 24, 2013 at 9:45 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:
Hi
You should check your /usr/bin/hadoop script.
2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com
Assuming you mean failed there instead of filed.
In MR, a ping message is sent over the TaskUmbilicalProtocol from the
Task container to the MR AM. A ping is only sent as an alternative, to
check self, if there's no progress to report from the task. No
progress to report for a long time generally
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).
but I'm still more just miffed at how it's knowing I'm a 192 address when I
told it to use localhost.
There's a few configs you need to
Hi I downloaded hadoop 2.0.4 and keep getting these errors from hadoop cli
and MapReduce task logs
13/05/24 14:34:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
i tried adding $HADOOP_HOME/lib/native/* to
On a smallish (10 node) cluster with only 2 mappers per node after a few
minutes EOFExceptions are cropping up on the datanodes: an example is shown
below.
Any hint on what to tweak/change in hadoop / cluster settings to make this
more happy?
2013-05-24 05:03:57,460 INFO
47 matches
Mail list logo