Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
There is no reason to do the block scans. All of the modern kernels will provide you notification when an file or directory is altered. This could be readily handled with a native application that writes structured data to a receiver in the Datanode, or via JNA/JNI for pure java or mixed

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
startup the recipient of the notifications would keep up the block information and the du information. Raghu Angadi wrote: Jason Venner wrote: There is no reason to do the block scans. All of the modern kernels will provide you notification when an file or directory is altered. This could

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
Here is some simple code I wrote using JNA to handline linux INOTIFY. This code was my first and only attempt to use JNA. The JNA jars are available from https://jna.dev.java.net/ Raghu Angadi wrote: Jason Venner wrote: There is no reason to do the block scans. All of the modern kernels

Re: RAID vs. JBOD

2009-01-11 Thread Jason Venner
If you put your dfs directory as a set of comma separated tokens you will do fine. property namedfs.data.dir/name value${hadoop.tmp.dir}/dfs/data/value descriptionDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Jason Venner
to jason.had...@gmail.com in the next little bit. Jason Venner wrote: The problem we are having is that datanodes periodically stall for 10-15 minutes and drop off the active list and then come back. What is going on is that a long operation set is holding the lock on on FSDataset.volumes, and all

Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Jason Venner
I have always assumed (which is clearly my error) that edit log writes were flushed to storage to ensure that the edit log was consistent during machine crash recovery. I have been working through FSEditLog.java and I don't see any calls of force(true) on the file channel or sync on the file

Re: Problem loading hadoop-site.xml - dumping parameters

2009-01-05 Thread Jason Venner
somehow you have alternate versions of the file earlier in the class path. Perhaps someone's empty copies are bundled into one of your application jar files. Or perhaps the configurationfiles are not distributed to the datanodes in the expected locations. Saptarshi Guha wrote: For some

Re: stack trace from hung task

2008-12-30 Thread Jason Venner
We provided a patch for 16 that could be retrofitted into 19 Our internal use of this has shown that jstack can hang in some situations, and that just sending the sigquit is safer. https://issues.apache.org/jira/browse/HADOOP-3994 Ryan LeCompte wrote: For what it's worth, I started seeing

Re: -libjars with multiple jars broken when client and cluster reside on different OSs?

2008-12-30 Thread Jason Venner
The path separator is a major issue with a number of items in the configuration data set that are multiple items packed together via the path separator. the class path the distributed cache the input path set all suffer from the path.separator issue for 2 reasons: 1 being the difference across

Re: Classes Not Found even when classpath is mentioned (Starting mapreduce from another app)

2008-12-23 Thread Jason Venner
Yes this will work. You will need to configure the class path to include that directory. The Tasktracker's really only have the classpath as setup by conf/hadoop-env.sh, and the Tasktracker$Child's have that classpath + the unpacked distributed cache directory. Saptarshi Guha wrote: Hello,

Re: Copy-rate of reducers decreases over time

2008-12-23 Thread Jason Venner
The copy rate for the reduces is throttled by the availability of the data from the maps. If the map data is not available yet, the effective copy rate goes toward 0. patek tek wrote: Hello, I have been running experiments with Hadoop and noticed that the copy-rate of reducers descreases

Re: Run Map-Reduce multiple times

2008-12-23 Thread Jason Venner
in 19 there is a chaining facility, I haven't looked at it yet, but it may provide an alternative to the rather standard pattern of looping. You may also what to check what mahout is doing as it is a common problem in that space. Delip Rao wrote: Thanks Chris! I ended up doing something

hdfs fuse mount and namenode out of memory conditions.

2008-11-17 Thread Jason Venner
We recently setup a fuse mount using the 18.2 fuse code, against our 18.1 hdfs, which has been running stably for some time. We have about 20 datanodes and 50TB or so in our hdfs. The namenode is running an i686 kernel and has been running with -Xmx1500m. We have 1,492,093 files in our hdfs

Re: File Descriptors not cleaned up

2008-11-10 Thread Jason Venner
We have just realized one reason for the '/no live node contains block/' error from /DFSClient/ is an indication that the /DFSClient/ was unable to open a connection due to insufficient available file descriptors. FsShell is particularly bad about consuming descriptors and leaving the

Re: Problem while starting Hadoop

2008-11-05 Thread Jason Venner
Is it possible there is a firewall blocking port 9000 on one or more of the machines. We had that happen to us with some machines that were kickstarted by our IT, the firewall was configured to only allow ssh. [EMAIL PROTECTED] wrote: Hi, I am trying to use hadoop 0.18.1. After I start

Re: too many open files? Isn't 4K enough???

2008-11-05 Thread Jason Venner
we just went from 8k to 64k after some problems, Karl Anderson wrote: On 4-Nov-08, at 3:45 PM, Yuri Pradkin wrote: Hi, I'm running current snapshot (-r709609), doing a simple word count using python over streaming. I'm have a relatively moderate setup of 17 nodes. I'm getting this

Question about: file system lockups with xfs, hadoop 0.16.3 and linux 2.6.18-92.1...PAE i686

2008-10-28 Thread Jason Venner
We are seeing some strange lockups on a couple of our machines (in multiple clusters) Basically the hadoop processes will hang on the machine (datanode, tasktracker and tasktracker$child). And if you happen to tail the log files the tail will hang, if you do a find in the dfs data directory

Question about dfs.datanode.du.reserved and dfs.datanode.du.pct in 0.18.1

2008-10-02 Thread Jason Venner
. Thanks all. -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: Timeouts at reduce stage

2008-09-01 Thread Jason Venner
We have trouble with that also, particularly when we have JMX enabled in our jobs. We have modified the /main/ that launches the children of the task tracker to explicity exit, in it's finally block. That helps substantially. We also have some jobs that do not seem to be killable by the

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-08-28 Thread Jason Venner
-- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: Map Task is failing with out of memory issue

2008-08-28 Thread Jason Venner
please help me in this? Thanks Pallavi -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: how use only a reducer without a mapper

2008-08-27 Thread Jason Venner
just a reduce task without a map stage, but you could do it by having a map stage just using the IdentityMapper class (which passes the data through to the reducers unchanged), so effectively just doing a reduce. -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor

Re: Mailing list sizes

2008-08-25 Thread Jason Venner
-dev: 576 people general: 80 people General is for cross sub-project questions and announcements and really should have more people watching it. The traffic for last month was: core-user: 692 messages core-dev: 2679 messages general: 18 messages -- Owen -- Jason Venner Attributor - Program

Re: pseudo-global variable constuction

2008-08-19 Thread Jason Venner
for the help. I appreciate your time. -SM -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: How can I control Number of Mappers of a job?

2008-08-01 Thread Jason Venner
will be launched simultaneously. What about running two different jobtrackers on the same machines, looking at the same DFS files? Never tried it myself, but it might be an approach. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com -- Jason Venner Attributor

Question about fault tolerance and fail over for name nodes

2008-07-29 Thread Jason Venner
What are people doing? For jobs that have a long enough SLA, just shutting down the cluster and bringing up the secondary as the master works for us. We have some jobs where that doesn't work well, because the recovery time is not acceptable. There has been internal discussion of using drdb

Re: Hadoop and Ganglia Meterics

2008-07-24 Thread Jason Venner
Check out https://issues.apache.org/jira/browse/HADOOP-3422 Joe Williams wrote: I have been attempting to get Hadoop metrics in Ganliga and have been unsuccessful thus far. I have see this thread (http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200712.mbox/raw/[EMAIL PROTECTED]/)

Re: Hadoop and Ganglia Meterics

2008-07-24 Thread Jason Venner
I applied the patch in the jira to my distro Joe Williams wrote: Thanks Jason, until this is implemented are how are you pulling stats from Hadoop? -joe Jason Venner wrote: Check out https://issues.apache.org/jira/browse/HADOOP-3422 Joe Williams wrote: I have been attempting to get

Re: Hadoop and Ganglia Meterics

2008-07-24 Thread Jason Venner
Once the patch is applied you should start seeing the ganglia metrics We do. Joe Williams wrote: Once I have the patch applied and have it running should I see the metrics? Or do I need to additional work? Thanks. -Joe Jason Venner wrote: I applied the patch in the jira to my distro Joe

Re: Using MapReduce to do table comparing.

2008-07-23 Thread Jason Venner
If you write a SequenceFile with the results from the RDBM you can use the join primitives to handle this rapidly. The key is that you have to write the data in the native key sort order. Since you have a primary key, you should be able to dump the table in primary key order, and you can define

help request: 0.16.0 java.io.IOException: Filesystem closed org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find task_....

2008-07-18 Thread Jason Venner
) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071) -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-15 Thread Jason Venner
When you compile from svn, the svn state number becomes part of the required version for hdfs - the last time I looked at it was 0.15.3 but it may still be happening. Raghu Angadi wrote: Check the log from NameNode and DataNode. Most common reason is that you might be running older version

Re: Why is the task run in a child JVM?

2008-07-14 Thread Jason Venner
? -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Has anyone packed up the src/test/..../ClusterMapReduceTestCase into a separate jar for use by external code

2008-07-09 Thread Jason Venner
It would be very convenient to have this available for building unit tests for map reduce jobs. In the interests of avoiding NiH I am hoping this has been done Happy Elephant riding! -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop

Re: Has anyone packed up the src/test/..../ClusterMapReduceTestCase into a separate jar for use by external code

2008-07-09 Thread Jason Venner
Nothing like missing a jar file hadoop-...test.jar in the distribution :-[ Jason Venner wrote: It would be very convenient to have this available for building unit tests for map reduce jobs. In the interests of avoiding NiH I am hoping this has been done Happy Elephant riding

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Jason Venner
, at 9:55 PM, Jason Venner wrote: For the data joins, I let the framework do it - which means one partition per split - so I have to chose my partition count carefully to fill the machines. I had an error in my initial outer join mapper, the join map code now runs about 40x faster than the old

Re: MapSide Join and left outer or right outer joins?

2008-07-02 Thread Jason Venner
For the data joins, I let the framework do it - which means one partition per split - so I have to chose my partition count carefully to fill the machines. I had an error in my initial outer join mapper, the join map code now runs about 40x faster than the old brute force read it all shuffle

Re: joins in map reduce

2008-06-30 Thread Jason Venner
, Shirley -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: hadoop 0.16.3 Gangia Metric Counter reporting does not appear to work

2008-05-20 Thread Jason Venner
rrd file and graph. Jason Venner wrote: I first verified that when I was using the file context, I saw the counters in the report. Then I switched context's to ganglia. I also instrumented the low level code. I am hoping someone understands this off the top of their head as I don't want

Re: IP based cluster - address space issues

2008-05-20 Thread Jason Venner
, the datanodes refuse to start. How can I have a clean start without messing my old data? thanks in advance for help. - Prasad Pingali, LTRC, IIIT, Hyderabad. -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact

hadoop 0.16.3 Gangia Metric Counter reporting does not appear to work

2008-05-19 Thread Jason Venner
: reduces_launched, Type: int32, Value: 0 to localhost/127.0.0.1:8649 -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: Query against different data types within HDFS using Map/Reduce

2008-05-05 Thread Jason Venner
with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: JobConf: How to pass List/Map

2008-05-01 Thread Jason Venner
-- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Re: OOM error with large # of map tasks

2008-05-01 Thread Jason Venner
to remain in this state? (which also apparently is in-memory vs serialized to disk...). In general, what does COMMIT_PENDING mean? (job done, but output not committed to dfs?) Thanks! -- Jason Venner Attributor - Program the Web http

Question about reporting progress in mapper tasks. 0.15.3

2008-04-15 Thread Jason Venner
after not reporting for 60X seconds, it is clear that that incrementing a counter is insufficient disable the kill timeout. How do you disable the kill timeout? -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact

Re: Question about reporting progress in mapper tasks. 0.15.3 - solved

2008-04-15 Thread Jason Venner
Well, on deeper reading of the code and the documentation, reporter.progress(), is the required call. Jason Venner wrote: I have a mapper that for each task does extensive computation. In the computation, I increment a counter once per major operation (about once every 5 seconds). I can see

Re: Quick jar deployment question...

2008-04-03 Thread Jason Venner
This only happens if you add a class from the jar to the JobConf creation line. JobConf conf = new JobConf(MyClass.class); JobConf public JobConf(Class exampleClass) Construct a map/reduce job configuration. Parameters: exampleClass - a class whose containing jar is used

Question on how to view the counters of jobs in the job tracker history

2008-04-03 Thread Jason Venner
For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not visible. Is it possible to view the counters of the older jobs? -- Jason

Thank you Yahoo, for the wonderful Hadoop summit

2008-03-25 Thread Jason Venner
We really appreciated the presenters material, and the lunch and snacks were also top notch! -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact if interested

Question about recovering from a corrupted namenode 0.16.0

2008-03-13 Thread Jason Venner
(NameNode.java:130) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852) -- Jason Venner Attributor

0.16.0 job hang problem dfs related?

2008-03-11 Thread Jason Venner
I see endless spew in the log files of the form 2008-03-11 20:40:45,379 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-03-11 20:40:45,172 INFO org.apache.hadoop.dfs.DataNode: Received block blk_3082015406379486032 of size 7913915 from

Re: dynamically adding slaves to hadoop cluster

2008-03-10 Thread Jason Venner
of Sciences, Beijing. -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing. Thanks a lot guys! It worked fine and it was exactly what i was looking for. Best wishes, John. -- Jason Venner Attributor - Publish with Confidence http

HOD question wrt virtual mapred master node

2008-03-09 Thread Jason Venner
that via HOD? -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact if interested

Question about using the metrics framework.

2008-03-06 Thread Jason Venner
We have started our first attempt at this, and do not see the metrics being reported. Our first cut simply is trying to report the counters at the end of the job. A theory is that the job is exiting before the metrics are flushed. This code is in the driver for our map/reduce task, and is

0.15.3 on ec2 - using manual hadoop install no to ec2 magic: The reduce copier failed

2008-02-25 Thread Jason Venner
All of the test/sample jobs fail with either out of memory or The reduce copier failed. Anyone have any tips on this? Our non ec2 installations seem to work just fine. -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact

Re: 0.16.0 HOD python config errors wrt missing '}' on variable names

2008-02-25 Thread Jason Venner
At the present time I just manually substituted the values in the hodrc, and that works Mahadev Konar wrote: Could you try with --resource_manager.queue=batch? Regards mahadev -Original Message- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Monday, February 25, 2008 7:41 PM

Re: Problems running a HOD test cluster

2008-02-22 Thread Jason Venner
could not be allocated. [2008-02-21 19:46:11,025] DEBUG/10 torque:131 - /usr/bin/qdel 207.server.com [2008-02-21 19:46:13,079] CRITICAL/50 hod:253 - Cannot allocate cluster /mnt/scratch/grid/test [2008-02-21 19:46:13,940] DEBUG/10 hod:391 - return code: 6 -- Jason Venner Attributor

Question on metrics via ganglia

2008-02-21 Thread Jason Venner
context for ganglia jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext jvm.period=10 jvm.servers=localhost:8649 -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact if interested

Re: Question on metrics via ganglia

2008-02-21 Thread Jason Venner
and udp. Still nothing visible via the ganglia ui and no rrd file for anything hadoop related. Jason Venner wrote: We have modified my metrics file, distributed it and restarted our cluster. We have gmond running on the nodes, and a machine on the vlan with gmetad running. We have statistics

Re: Question on metrics via ganglia solved

2008-02-21 Thread Jason Venner
Instead of localhost, in the servers block, we now put the machine that has gmetad running. dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext dfs.period=10 dfs.servers=GMETAD_HOST:8649 Jason Venner wrote: Well, with the metrics file changed to perform file based logging, metrics do

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

2008-02-12 Thread Jason Venner
will do. Has anyone tried out this configuration with Intel or AMD CPUs? Is the memory throughput sufficient? Jason Venner wrote: We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

2008-02-12 Thread Jason Venner
in recent releases. On 2/12/08 11:51 AM, Jason Venner [EMAIL PROTECTED] wrote: We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan for the small clusters. We have been thinking about

Re: Caching frequently map input files

2008-02-11 Thread Jason Venner
currently using version 0.14.4 - Shimi -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact if interested

Re: Override mapred.tasktracker.tasks.maximum?

2008-02-07 Thread Jason Venner
This should be one of the features coming in 0.16 via HOD Steve Schlosser wrote: Hello all Is it possible for a Hadoop program to override mapred.tasktracker.tasks.maximum at runtime? I've found that my job overloads our nodes when running our default 8 tasks per node, but if I decrease

Question about joining two map sequence files - 0.15.3

2008-02-07 Thread Jason Venner
Is there a smart way to find the the disjoin set between to MapSequence files, that takes advantage of the fact the data is already sorted? -- Jason Venner Attributor - Publish with Confidence http://www.attributor.com/ Attributor is hiring Hadoop Wranglers, contact if interested

error log question - org.apache.hadoop.dfs.NameNode.Secondary: java.net.UnknownHostException: hdfs

2008-02-06 Thread Jason Venner
) at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:275) at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:192) at java.lang.Thread.run(Thread.java:619) -- Jason Venner Attributor - Publish with Confidence http

Re: hadoop 0.15.3 r612257 freezes on reduce task

2008-01-29 Thread Jason Venner
We are running under linux with dfs on GiGE lans, kernel 2.6.15-1.2054_FC5smp, with a variety of xeon steppings for our processors. Our replacation factor was set to 3 Florian Leibert wrote: Maybe it helps to know that we're running Hadoop inside amazon's EC2... Thanks, Florian -- Jason

Re: hadoop 0.15.3 r612257 freezes on reduce task

2008-01-29 Thread Jason Venner
That was the error that we were seeing in our hung reduce tasks. It went away for us, and we never figured out why. A number of things happened in our environment around the time it went a way. We shifted to 0.15.2, our cluster moved to a separate switched vlan from our main network, we started