There is no reason to do the block scans. All of the modern kernels will
provide you notification when an file or directory is altered.
This could be readily handled with a native application that writes
structured data to a receiver in the Datanode, or via JNA/JNI for pure
java or mixed
startup the recipient of the notifications would keep up the block
information and the du information.
Raghu Angadi wrote:
Jason Venner wrote:
There is no reason to do the block scans. All of the modern kernels
will provide you notification when an file or directory is altered.
This could
Here is some simple code I wrote using JNA to handline linux INOTIFY.
This code was my first and only attempt to use JNA.
The JNA jars are available from https://jna.dev.java.net/
Raghu Angadi wrote:
Jason Venner wrote:
There is no reason to do the block scans. All of the modern kernels
If you put your dfs directory as a set of comma separated tokens you
will do fine.
property
namedfs.data.dir/name
value${hadoop.tmp.dir}/dfs/data/value
descriptionDetermines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of
to jason.had...@gmail.com in the next little bit.
Jason Venner wrote:
The problem we are having is that datanodes periodically stall for
10-15 minutes and drop off the active list and then come back.
What is going on is that a long operation set is holding the lock on
on FSDataset.volumes, and all
I have always assumed (which is clearly my error) that edit log writes
were flushed to storage to ensure that the edit log was consistent
during machine crash recovery.
I have been working through FSEditLog.java and I don't see any calls of
force(true) on the file channel or sync on the file
somehow you have alternate versions of the file earlier in the class path.
Perhaps someone's empty copies are bundled into one of your application
jar files.
Or perhaps the configurationfiles are not distributed to the datanodes
in the expected locations.
Saptarshi Guha wrote:
For some
We provided a patch for 16 that could be retrofitted into 19
Our internal use of this has shown that jstack can hang in some
situations, and that just sending the sigquit is safer.
https://issues.apache.org/jira/browse/HADOOP-3994
Ryan LeCompte wrote:
For what it's worth, I started seeing
The path separator is a major issue with a number of items in the
configuration data set that are multiple items packed together via the
path separator.
the class path
the distributed cache
the input path set
all suffer from the path.separator issue for 2 reasons:
1 being the difference across
Yes this will work. You will need to configure the class path to include
that directory.
The Tasktracker's really only have the classpath as setup by
conf/hadoop-env.sh, and the Tasktracker$Child's have that classpath +
the unpacked distributed cache directory.
Saptarshi Guha wrote:
Hello,
The copy rate for the reduces is throttled by the availability of the
data from the maps.
If the map data is not available yet, the effective copy rate goes toward 0.
patek tek wrote:
Hello,
I have been running experiments with Hadoop and noticed that
the copy-rate of reducers descreases
in 19 there is a chaining facility, I haven't looked at it yet, but it
may provide an alternative to the rather standard pattern of looping.
You may also what to check what mahout is doing as it is a common
problem in that space.
Delip Rao wrote:
Thanks Chris! I ended up doing something
We recently setup a fuse mount using the 18.2 fuse code, against our
18.1 hdfs, which has been running stably for some time.
We have about 20 datanodes and 50TB or so in our hdfs. The namenode is
running an i686 kernel and has been running with -Xmx1500m.
We have 1,492,093 files in our hdfs
We have just realized one reason for the '/no live node contains block/'
error from /DFSClient/ is an indication that the /DFSClient/ was unable
to open a connection due to insufficient available file descriptors.
FsShell is particularly bad about consuming descriptors and leaving the
Is it possible there is a firewall blocking port 9000 on one or more of
the machines.
We had that happen to us with some machines that were kickstarted by our
IT, the firewall was configured to only allow ssh.
[EMAIL PROTECTED] wrote:
Hi,
I am trying to use hadoop 0.18.1. After I start
we just went from 8k to 64k after some problems,
Karl Anderson wrote:
On 4-Nov-08, at 3:45 PM, Yuri Pradkin wrote:
Hi,
I'm running current snapshot (-r709609), doing a simple word count
using python over
streaming. I'm have a relatively moderate setup of 17 nodes.
I'm getting this
We are seeing some strange lockups on a couple of our machines (in
multiple clusters)
Basically the hadoop processes will hang on the machine (datanode,
tasktracker and tasktracker$child).
And if you happen to tail the log files the tail will hang, if you do a
find in the dfs data directory
.
Thanks all.
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
We have trouble with that also, particularly when we have JMX enabled in
our jobs.
We have modified the /main/ that launches the children of the task
tracker to explicity exit, in it's finally block. That helps substantially.
We also have some jobs that do not seem to be killable by the
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
please
help me in this?
Thanks
Pallavi
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
just a reduce task without a map
stage, but you could do it by having a map stage just using the
IdentityMapper class (which passes the data through to the reducers
unchanged), so effectively just doing a reduce.
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor
-dev: 576 people
general: 80 people
General is for cross sub-project questions and announcements and
really should have more people watching it.
The traffic for last month was:
core-user: 692 messages
core-dev: 2679 messages
general: 18 messages
-- Owen
--
Jason Venner
Attributor - Program
for the help. I appreciate your time.
-SM
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
will be launched simultaneously.
What about running two different jobtrackers on the same machines,
looking at the same DFS files? Never tried it myself, but it might be
an approach.
--
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com
--
Jason Venner
Attributor
What are people doing?
For jobs that have a long enough SLA, just shutting down the cluster and
bringing up the secondary as the master works for us.
We have some jobs where that doesn't work well, because the recovery
time is not acceptable.
There has been internal discussion of using drdb
Check out
https://issues.apache.org/jira/browse/HADOOP-3422
Joe Williams wrote:
I have been attempting to get Hadoop metrics in Ganliga and have been
unsuccessful thus far. I have see this thread
(http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200712.mbox/raw/[EMAIL PROTECTED]/)
I applied the patch in the jira to my distro
Joe Williams wrote:
Thanks Jason, until this is implemented are how are you pulling stats
from Hadoop?
-joe
Jason Venner wrote:
Check out
https://issues.apache.org/jira/browse/HADOOP-3422
Joe Williams wrote:
I have been attempting to get
Once the patch is applied you should start seeing the ganglia metrics
We do.
Joe Williams wrote:
Once I have the patch applied and have it running should I see the
metrics? Or do I need to additional work?
Thanks.
-Joe
Jason Venner wrote:
I applied the patch in the jira to my distro
Joe
If you write a SequenceFile with the results from the RDBM you can use
the join primitives to handle this rapidly.
The key is that you have to write the data in the native key sort order.
Since you have a primary key, you should be able to dump the table in
primary key order, and you can define
)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
When you compile from svn, the svn state number becomes part of the
required version for hdfs - the last time I looked at it was 0.15.3 but
it may still be happening.
Raghu Angadi wrote:
Check the log from NameNode and DataNode. Most common reason is that
you might be running older version
?
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
It would be very convenient to have this available for building unit
tests for map reduce jobs.
In the interests of avoiding NiH I am hoping this has been done
Happy Elephant riding!
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop
Nothing like missing a jar file hadoop-...test.jar in the distribution :-[
Jason Venner wrote:
It would be very convenient to have this available for building unit
tests for map reduce jobs.
In the interests of avoiding NiH I am hoping this has been done
Happy Elephant riding
, at 9:55 PM, Jason Venner wrote:
For the data joins, I let the framework do it - which means one
partition per split - so I have to chose my partition count carefully
to fill the machines.
I had an error in my initial outer join mapper, the join map code now
runs about 40x faster than the old
For the data joins, I let the framework do it - which means one
partition per split - so I have to chose my partition count carefully to
fill the machines.
I had an error in my initial outer join mapper, the join map code now
runs about 40x faster than the old brute force read it all shuffle
,
Shirley
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
rrd file and graph.
Jason Venner wrote:
I first verified that when I was using the file context, I saw the
counters in the report.
Then I switched context's to ganglia.
I also instrumented the low level code. I am hoping someone
understands this off the top of their head as I don't want
, the datanodes
refuse to start. How can I have a clean start without messing my old
data?
thanks in advance for help.
- Prasad Pingali,
LTRC,
IIIT, Hyderabad.
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact
: reduces_launched, Type: int32, Value: 0 to
localhost/127.0.0.1:8649
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested
to remain in
this state?
(which also apparently is in-memory vs
serialized to disk...). In general, what does
COMMIT_PENDING mean? (job
done, but output not committed to dfs?)
Thanks!
--
Jason Venner
Attributor - Program the Web http
after not reporting for 60X seconds,
it is clear that that incrementing a counter is insufficient disable the
kill timeout.
How do you disable the kill timeout?
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact
Well, on deeper reading of the code and the documentation,
reporter.progress(), is the required call.
Jason Venner wrote:
I have a mapper that for each task does extensive computation. In the
computation, I increment a counter once per major operation (about
once every 5 seconds). I can see
This only happens if you add a class from the jar to the JobConf
creation line.
JobConf conf = new JobConf(MyClass.class);
JobConf
public JobConf(Class exampleClass)
Construct a map/reduce job configuration.
Parameters:
exampleClass - a class whose containing jar is used
For the first day or so, when the jobs are viewable via the main page of
the job tracker web interface, the jobs specific counters are also
visible. Once the job is only visible in the history page, the counters
are not visible.
Is it possible to view the counters of the older jobs?
--
Jason
We really appreciated the presenters material, and the lunch and snacks
were also top notch!
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact if interested
(NameNode.java:130)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
--
Jason Venner
Attributor
I see endless spew in the log files of the form
2008-03-11 20:40:45,379 INFO org.apache.hadoop.dfs.DataNode: Datanode 0
forwarding connect ack to upstream firstbadlink is
2008-03-11 20:40:45,172 INFO org.apache.hadoop.dfs.DataNode: Received
block blk_3082015406379486032 of size 7913915 from
of Sciences, Beijing.
--
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
Thanks a lot guys! It worked fine and it was exactly what i was looking for.
Best wishes,
John.
--
Jason Venner
Attributor - Publish with Confidence http
that via HOD?
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact if interested
We have started our first attempt at this, and do not see the metrics
being reported.
Our first cut simply is trying to report the counters at the end of the job.
A theory is that the job is exiting before the metrics are flushed.
This code is in the driver for our map/reduce task, and is
All of the test/sample jobs fail with either out of memory or The reduce
copier failed.
Anyone have any tips on this?
Our non ec2 installations seem to work just fine.
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact
At the present time I just manually substituted the values in the hodrc,
and that works
Mahadev Konar wrote:
Could you try with --resource_manager.queue=batch?
Regards
mahadev
-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED]
Sent: Monday, February 25, 2008 7:41 PM
could not be allocated.
[2008-02-21 19:46:11,025] DEBUG/10 torque:131 - /usr/bin/qdel 207.server.com
[2008-02-21 19:46:13,079] CRITICAL/50 hod:253 - Cannot allocate cluster
/mnt/scratch/grid/test
[2008-02-21 19:46:13,940] DEBUG/10 hod:391 - return code: 6
--
Jason Venner
Attributor
context for ganglia
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=localhost:8649
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact if interested
and udp.
Still nothing visible via the ganglia ui and no rrd file for anything
hadoop related.
Jason Venner wrote:
We have modified my metrics file, distributed it and restarted our
cluster. We have gmond running on the nodes, and a machine on the vlan
with gmetad running.
We have statistics
Instead of localhost, in the servers block, we now put the machine that
has gmetad running.
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=GMETAD_HOST:8649
Jason Venner wrote:
Well, with the metrics file changed to perform file based logging,
metrics do
will do. Has anyone tried out this configuration
with Intel or AMD CPUs? Is the memory throughput sufficient?
Jason Venner wrote:
We are starting to build larger clusters, and want to better
understand how to configure the network topology.
Up to now we have just been setting up a private vlan
in recent
releases.
On 2/12/08 11:51 AM, Jason Venner [EMAIL PROTECTED] wrote:
We are starting to build larger clusters, and want to better understand
how to configure the network topology.
Up to now we have just been setting up a private vlan for the small
clusters.
We have been thinking about
currently using version 0.14.4
- Shimi
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact if interested
This should be one of the features coming in 0.16 via HOD
Steve Schlosser wrote:
Hello all
Is it possible for a Hadoop program to override
mapred.tasktracker.tasks.maximum at runtime? I've found that my job
overloads our nodes when running our default 8 tasks per node, but if
I decrease
Is there a smart way to find the the disjoin set between to MapSequence
files, that takes advantage of the fact the data is already sorted?
--
Jason Venner
Attributor - Publish with Confidence http://www.attributor.com/
Attributor is hiring Hadoop Wranglers, contact if interested
)
at
org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:275)
at
org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:192)
at java.lang.Thread.run(Thread.java:619)
--
Jason Venner
Attributor - Publish with Confidence http
We are running under linux with dfs on GiGE lans, kernel
2.6.15-1.2054_FC5smp, with a variety of xeon steppings for our processors.
Our replacation factor was set to 3
Florian Leibert wrote:
Maybe it helps to know that we're running Hadoop inside amazon's EC2...
Thanks,
Florian
--
Jason
That was the error that we were seeing in our hung reduce tasks. It went
away for us, and we never figured out why. A number of things happened
in our environment around the time it went a way.
We shifted to 0.15.2, our cluster moved to a separate switched vlan from
our main network, we started
68 matches
Mail list logo