Re: Measuring IO time in map/reduce jobs?

2009-02-13 Thread jdd dhok
Hi, Linux kernel provides delay accounting information through a netlink socket to user space. You can read more about it here: http://www.mjmwired.net/kernel/Documentation/accounting/taskstats.txt. I think there's a python tool called iotop that uses this feature. Hope this helps. Regards,

Namenode not listening for remote connections to port 9000

2009-02-13 Thread Michael Lynch
Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only listening for connections from localhost. In my case, the namenode is called centos1, and the datanode

Pluggable JDBC schemas [Was: How to use DBInputFormat?]

2009-02-13 Thread Fredrik Hedberg
Hi, Please let us know how this works out. Also, it would be nice if people with experience with other RDMBS than MySQL and Oracle could comment on the syntax and performance of their respective RDBMS with regard to Hadoop. Even if the syntax of the current SQL queries are valid for

Re: Namenode not listening for remote connections to port 9000

2009-02-13 Thread Steve Loughran
Michael Lynch wrote: Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only listening for connections from localhost. In my case, the namenode is called

Re: Namenode not listening for remote connections to port 9000

2009-02-13 Thread Norbert Burger
On Fri, Feb 13, 2009 at 8:37 AM, Steve Loughran ste...@apache.org wrote: Michael Lynch wrote: Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only

Limit number of records or total size in combiner input using jobconf?

2009-02-13 Thread Saptarshi Guha
Hello, Running a MR job on 7 machines failed when it came to processing 53GB. Browsing the errors, org.saptarshiguha.rhipe.GRMapreduce$GRCombiner.reduce(GRMapreduce.java:149) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:1106) at

Re: Namenode not listening for remote connections to port 9000

2009-02-13 Thread Mark Kerzner
I had a problem that it listened only on 8020, even though I told it to use 9000 On Fri, Feb 13, 2009 at 7:50 AM, Norbert Burger norbert.bur...@gmail.comwrote: On Fri, Feb 13, 2009 at 8:37 AM, Steve Loughran ste...@apache.org wrote: Michael Lynch wrote: Hi, As far as I can tell I've

Re: stable version

2009-02-13 Thread Steve Loughran
Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread main

Re: stable version

2009-02-13 Thread Anum Ali
This only occurs in linux , in windows its fine. On Fri, Feb 13, 2009 at 7:11 AM, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev.

Re: stable version

2009-02-13 Thread Steve Loughran
Anum Ali wrote: This only occurs in linux , in windows its fine. do a java -version for me, and an ant -diagnostics, stick both on the bugrep https://issues.apache.org/jira/browse/HADOOP-5254 It may be that XInclude only went live in java1.6u5; I'm running a JRockit JVM which predates

Re: Too many open files in 0.18.3

2009-02-13 Thread Raghu Angadi
Sean, A few things in your messages is not clear to me. Currently this is what I make out of it : 1) with 1k limit, you do see the problem. 2) with 16 limit - (?) not clear if you see the problem 3) with 8k you don't see the problem 3a) with or without the patch, I don't know. But if

Hadoop Write Performance

2009-02-13 Thread Xavier Stevens
Does anyone have an expected or experienced write speed to HDFS outside of Map/Reduce? Any recommendations on properties to tweak in hadoop-site.xml? Currently I have a multi-threaded writer where each thread is writing to a different file. But after a while I get this: java.io.IOException:

Re: Pluggable JDBC schemas [Was: How to use DBInputFormat?]

2009-02-13 Thread Edward Capriolo
One thing to mention is 'limit' is not SQL standard. Microsoft SQL Server uses the SELECT TOP 100 FROM table. Some RDBMS may not support any such syntax. To be more SQL compliant you should use some data like an auto ID or DATE column for an offset. It is tricky to write anything truly database

capacity scheduler for 0.18.x?

2009-02-13 Thread Bill Au
I see that there is a patch for the fair scheduler for 0.18.1 in HADOOP-3746. Does anyone know if there is a similar patch for the capacity scheduler? I did a search on JIRA but didn't find anything. Bill

Re: Too many open files in 0.18.3

2009-02-13 Thread Sean Knapp
Raghu, Apologies for the confusion. I was seeing the problem with any setting for dfs.datanode.max.xcievers... 1k, 2k and 8k. Likewise, I was also seeing the problem with different open file settings, all the way up to 32k. Since I installed the patch, HDFS has been performing much better. The

Re: Too many open files in 0.18.3

2009-02-13 Thread Raghu Angadi
Sean Knapp wrote: Raghu, Apologies for the confusion. I was seeing the problem with any setting for dfs.datanode.max.xcievers... 1k, 2k and 8k. Likewise, I was also seeing the problem with different open file settings, all the way up to 32k. Since I installed the patch, HDFS has been performing

Re: Too many open files in 0.18.3

2009-02-13 Thread Sean Knapp
Raghu, Great, thanks for the help. Regards, Sean 2009/2/13 Raghu Angadi rang...@yahoo-inc.com Sean Knapp wrote: Raghu, Apologies for the confusion. I was seeing the problem with any setting for dfs.datanode.max.xcievers... 1k, 2k and 8k. Likewise, I was also seeing the problem with

Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
Is there a way to tell Hadoop to not run Map and Reduce concurrently? I'm running into a problem where I set the jvm to Xmx768 and it seems like 2 mappers and 2 reducers are running on each machine that only has 1.7GB of ram, so it complains of not being able to allocate memory...(which makes

datanode not being started

2009-02-13 Thread Sandy
Hello, I would really appreciate any help I can get on this! I've suddenly ran into a very strange error. when I do: bin/start-all I get: hadoop$ bin/start-all.sh starting namenode, logging to /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-namenode-loteria.cs.tamu.edu.out starting

Re: datanode not being started

2009-02-13 Thread james warren
Sandy - I suggest you take a look into your NameNode and DataNode logs. From the information posted, these likely would be at /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-namenode-loteria.cs.tamu.edu.log

Re: datanode not being started

2009-02-13 Thread Mithila Nagendra
Hey Sandy I had a similar problem with Hadoop. All I did was I stopped all the daemons using stop-all.sh. Then formatted the namenode again using hadoop namenode -format. After this I went on to restarting everything by using start-all.sh I hope you dont have much data on the datanode,

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Rasit OZDAS
Kris, This is the case when you have only 1 reducer. If it doesn't have any side effects for you.. Rasit 2009/2/14 Kris Jirapinyo kjirapi...@biz360.com: Is there a way to tell Hadoop to not run Map and Reduce concurrently?  I'm running into a problem where I set the jvm to Xmx768 and it seems

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
What do you mean when I have only 1 reducer? On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS rasitoz...@gmail.com wrote: Kris, This is the case when you have only 1 reducer. If it doesn't have any side effects for you.. Rasit 2009/2/14 Kris Jirapinyo kjirapi...@biz360.com: Is there a way

Re: Hadoop setup questions

2009-02-13 Thread Rasit OZDAS
I agree with Amar and James, if you require permissions for your project, then 1. create a group in linux for your user. 2. give group write access to all files in HDFS. (hadoop dfs -chmod -R g+w / - or sth, I'm not totally sure.) 3. change group ownership of all files in HDFS. (hadoop dfs

Re: Hadoop setup questions

2009-02-13 Thread Rasit OZDAS
With this configuration, any user having that group name will be able to write to any location.. (I've tried this in local network, though) 2009/2/14 Rasit OZDAS rasitoz...@gmail.com: I agree with Amar and James, if you require permissions for your project, then 1. create a group in linux

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
Have only one instance of the reduce task. This will run once your map tasks are completed. You can set this in your job conf by using conf.setNumReducers(1) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz 2009/2/13 Kris Jirapinyo

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
I can't afford to have only one reducer as my dataset is huge...right now it is 50GB and so the output.collect() in the reducer will surely run out of java heap space. 2009/2/13 Amandeep Khurana ama...@gmail.com Have only one instance of the reduce task. This will run once your map tasks are

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
What you can probably do is have the combine function do some reducing before the single reducer starts off. That might help. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz 2009/2/13 Kris Jirapinyo kris.jirapi...@biz360.com I can't afford to have only

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Kris Jirapinyo
Thanks for the recommendation, haven't really looked into how the combiner might be able to help. Now, are there any downsides to having one 50GB file as an output? If I understand correctly, the number of reducers you set for your job is the number of files you will get as output. 2009/2/13

Re: Running Map and Reduce Sequentially

2009-02-13 Thread Amandeep Khurana
Yes, number of output files = number of reducers. There is no downside of having a 50GB file. That really isnt too much of data. Ofcourse, multiple reducers would be much faster. But since you want a sequential run, having a single reducer is the only option I am aware of. You could consider

JvmMetrics

2009-02-13 Thread David Alves
Hi I ran into a use case where I need to keep two contexts for metrics. One being ganglia and the other being a file context (to do offline metrics analysis). I altered JvmMetrics to allow for the user to supply a context instead of if getting one by name, and altered file context for it

Re: stable version

2009-02-13 Thread Anum Ali
The parser problem is related to jar files , can be resolved not a bug. Forwarding link for its solution http://www.jroller.com/navanee/entry/unsupportedoperationexception_this_parser_does_not On 2/13/09, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: This only occurs in linux ,