Re: Task tracker archive contains too many files

2009-02-04 Thread Amareshwari Sriramadasu
Andrew wrote: I've noticed that task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker. We are using a lot of external libraries, that are deployed via -libjars option. The total number of files after unpacking is about 20 thousands. After running a number of

Task tracker archive contains too many files

2009-02-04 Thread Andrew
I've noticed that task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker. We are using a lot of external libraries, that are deployed via -libjars option. The total number of files after unpacking is about 20 thousands. After running a number of jobs, tasks start

Re: Value-Only Reduce Output

2009-02-04 Thread Rasit OZDAS
I tried it myself, it doesn't work. I've also tried stream.map.output.field.separator and map.output.key.field.separator parameters for this purpose, they don't work either. When hadoop sees empty string, it takes default tab character instead. Rasit 2009/2/4 jason hadoop

Re: Hadoop FS Shell - command overwrite capability

2009-02-04 Thread Rasit OZDAS
John, I also couldn't find a way from console, Maybe you already know and don't prefer to use, but API solves this problem. FileSystem.copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst) If you have to use console, long solution, but you can create a jar for this, and call it

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Enis Soztutar
Hadoop-2536 connects to the db via JDBC, so in theory it should work with proper jdbc drivers. It has been tested against MySQL, Hsqldb, and PostreSQL, but not Oracle. To answer your earlier question, the actual SQL statements might not be recognized by Oracle, so I suggest the best way to

Re: Value-Only Reduce Output

2009-02-04 Thread jason hadoop
For your reduce, the parameter is stream.reduce.input.field.separator, if you are supplying a reduce class and I believe the output format is TextOutputFormat... It looks like you have tried the map parameter for the separator, not the reduce parameter. From 0.19.0 PipeReducer: configure:

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

2009-02-04 Thread Bill Au
I have been looking into this some more by looking a the output of dfsadmin -report during the decommissioning process. After a node has been decommissioned, dfsadmin -report shows that the node is in the Decommissioned state. The web interface dfshealth.jsp shows it as a dead node. After I

Re: How to use DBInputFormat?

2009-02-04 Thread Rasit OZDAS
Amandeep, SQL command not properly ended I get this error whenever I forget the semicolon at the end. I know, it doesn't make sense, but I recommend giving it a try Rasit 2009/2/4 Amandeep Khurana ama...@gmail.com: The same query is working if I write a simple JDBC client and query the

Re: HDD benchmark/checking tool

2009-02-04 Thread Mikhail Yakshin
On Tue, Feb 3, 2009 at 8:53 PM, Dmitry Pushkarev wrote: Recently I have had a number of drive failures that slowed down processes a lot until they were discovered. It is there any easy way or tool, to check HDD performance and see if there any IO errors? Currently I wrote a simple script

Regarding Hadoop multi cluster set-up

2009-02-04 Thread shefali pawar
Hi, I am trying to set-up a two node cluster using Hadoop0.19.0, with 1 master(which should also work as a slave) and 1 slave node. But while running bin/start-dfs.sh the datanode is not starting on the slave. I had read the previous mails on the list, but nothing seems to be working in this

Re: Regarding Hadoop multi cluster set-up

2009-02-04 Thread S D
Shefali, Is your firewall blocking port 54310 on the master? John On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar shefal...@rediffmail.comwrote: Hi, I am trying to set-up a two node cluster using Hadoop0.19.0, with 1 master(which should also work as a slave) and 1 slave node. But while

Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread TCK
Hey guys, We have been using Hadoop to do batch processing of logs. The logs get written and stored on a NAS. Our Hadoop cluster periodically copies a batch of new logs from the NAS, via NFS into Hadoop's HDFS, processes them, and copies the output back to the NAS. The HDFS is cleaned up at

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread Brian Bockelman
Hey TCK, We use HDFS+FUSE solely as a storage solution for a application which doesn't understand MapReduce. We've scaled this solution to around 80Gbps. For 300 processes reading from the same file, we get about 20Gbps. Do consider your data retention policies -- I would say that

Re: Re: Regarding Hadoop multi cluster set-up

2009-02-04 Thread shefali pawar
  Hi, I will have to check. I can do that tomorrow in college. But if that is the case what should i do? Should i change the port number and try again? Shefali On Wed, 04 Feb 2009 S D wrote : Shefali, Is your firewall blocking port 54310 on the master? John On Wed, Feb 4, 2009 at 12:34

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Amandeep Khurana
Ok. I'm not sure if I got it correct. Are you saying, I should test the statement that hadoop creates directly with the database? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 4, 2009 at 7:13 AM, Enis Soztutar enis@gmail.com

Re: HDFS Namenode Heap Size woes

2009-02-04 Thread Sean Knapp
Brian, Jason, Thanks again for your help. Just to close thread, while following your suggestions I found I had an incredibly large number of files on my data nodes that were being marked for invalidation at startup. I believe they were left behind from an old mass-delete that was followed by a

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread TCK
Thanks, Brian. This sounds encouraging for us. What are the advantages/disadvantages of keeping a persistent storage (HD/K)FS cluster separate from a processing Hadoop+(HD/K)FS cluster ? The advantage I can think of is that a permanent storage cluster has different requirements from a

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread Brian Bockelman
Sounds overly complicated. Complicated usually leads to mistakes :) What about just having a single cluster and only running the tasktrackers on the fast CPUs? No messy cross-cluster transferring. Brian On Feb 4, 2009, at 12:46 PM, TCK wrote: Thanks, Brian. This sounds encouraging for

Re: Chukwa documentation

2009-02-04 Thread Ariel Rabkin
Howdy. You do not need torque. It's not even helpful, as far as I know. You don't need a database, but if you don't have one, you'd probably need to do a bit more work to analyze the collected data in HDFS. If you were going to be using MapReduce for analysis anyway, that's probably a non-issue

Re: Regarding Hadoop multi cluster set-up

2009-02-04 Thread Ian Soboroff
I would love to see someplace a complete list of the ports that the various Hadoop daemons expect to have open. Does anyone have that? Ian On Feb 4, 2009, at 1:16 PM, shefali pawar wrote: Hi, I will have to check. I can do that tomorrow in college. But if that is the case what should i

RE: Control over max map/reduce tasks per job

2009-02-04 Thread Jonathan Gray
I have filed an issue for this: https://issues.apache.org/jira/browse/HADOOP-5170 JG -Original Message- From: Bryan Duxbury [mailto:br...@rapleaf.com] Sent: Tuesday, February 03, 2009 10:59 PM To: core-user@hadoop.apache.org Subject: Re: Control over max map/reduce tasks per job

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Amandeep Khurana
Ok. I created the same database in a MySQL database and ran the same hadoop job against it. It worked. So, that means there is some Oracle specific issue. It cant be an issue with the JDBC drivers since I am using the same drivers in a simple JDBC client. What could it be? Amandeep Amandeep

Bad connection to FS.

2009-02-04 Thread Mithila Nagendra
Hey all When I try to copy a folder from the local file system in to the HDFS using the command hadoop dfs -copyFromLocal, the copy fails and it gives an error which says Bad connection to FS. How do I get past this? The following is the output at the time of execution:

Re: Bad connection to FS.

2009-02-04 Thread TCK
Mithila, how come there is no NameNode java process listed by your jps command? I would check the hadoop namenode logs to see if there was some startup problem (the location of those logs would be specified in hadoop-env.sh, at least in the version I'm using). -TCK --- On Wed, 2/4/09,

Re: Bad connection to FS.

2009-02-04 Thread Amandeep Khurana
I faced the same issue a few days back. Formatting the namenode made it work for me. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Feb 4, 2009 at 3:06 PM, Mithila Nagendra mnage...@asu.edu wrote: Hey all When I try to copy a

Re: problem with completion notification from block movement

2009-02-04 Thread Raghu Angadi
Karl Kleinpaste wrote: On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote: The Datanode's use multiple threads with locking and one of the assumptions is that the block report (1ce per hour by default) takes little time. The datanode will pause while the block report is running and if it

Re: Bad connection to FS.

2009-02-04 Thread TCK
I believe the debug logs location is still specified in hadoop-env.sh (I just read the 0.19.0 doc). I think you have to shut down all nodes first (stop-all), then format the namenode, and then restart (start-all) and make sure that NameNode comes up too. We are using a very old version,

Re: Bad connection to FS.

2009-02-04 Thread lohit
As noted by others NameNode is not running. Before formatting anything (which is like deleting your data), try to see why NameNode isnt running. search for value of HADOOP_LOG_DIR in ./conf/hadoop-env.sh if you have not set it explicitly it would default to your hadoop

copying binary files to a SequenceFile

2009-02-04 Thread Mark Kerzner
Hi all, I am copying regular binary files to a SequenceFile, and I am using BytesWritable, to which I am giving all the byte[] content of the file. However, once it hits a file larger than my computer memory, it may have problems. Is there a better way? Thank you, Mark

Not able to copy a file to HDFS after installing

2009-02-04 Thread Rajshekar
Hello, I am new to Hadoop and I jus installed on Ubuntu 8.0.4 LTS as per guidance of a web site. I tested it and found working fine. I tried to copy a file but it is giving some error pls help me out had...@excel-desktop:/usr/local/hadoop/hadoop-0.17.2.1$ bin/hadoop jar

Hadoop IO performance, prefetch etc

2009-02-04 Thread Songting Chen
Hi, Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone) I think the reason is that while each map has a sequential IO request, since there

Re: Not able to copy a file to HDFS after installing

2009-02-04 Thread Sagar Naik
where is the namenode running ? localhost or some other host -Sagar Rajshekar wrote: Hello, I am new to Hadoop and I jus installed on Ubuntu 8.0.4 LTS as per guidance of a web site. I tested it and found working fine. I tried to copy a file but it is giving some error pls help me out