Running hadoop jobs from a client and tuning (was Re: How does hadoop deal with hadoop-site.xml?)

2009-08-20 Thread stephen mulcahy
Should this be changed normally? If so, how large should it normally be? 50% of total system memory? Thanks for any input, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp

Re: Running hadoop jobs from a client and tuning (was Re: How does hadoop deal with hadoop-site.xml?)

2009-08-21 Thread stephen mulcahy
, typically set according to cores memory each map/reduce task will be using. Right, so as an admin, these are probably the more interesting ones to worry. Also, typically client and datanodes will be the same. Given my comments above, is this correct? Thanks, -stephen -- Stephen Mulcahy

Hadoop 0.20.1 (a big secret?)

2009-09-15 Thread stephen mulcahy
, apologies for publicising it ;)? Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread stephen mulcahy
on running things like HBase on the cluster also (which we do). -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com

Re: Eclipse plugin in Hadoop 0.20.1

2009-10-05 Thread stephen mulcahy
as distributed doesn't currently build against Eclipse 3.5 (and possibly 3.4) - but works fine if you apply the patch from http://issues.apache.org/jira/browse/HADOOP-3744 it build and works fine. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA

Re: Recommended file-system for DataNode

2009-10-09 Thread stephen mulcahy
errors when you try to write to that disk (despite df showing you loads of free space), so again, I'm not sure I'd recommend this one. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp

Slowdown with Hadoop Sort benchmark when using Jumbo frames?

2010-01-22 Thread stephen mulcahy
testing of Hadoop with Jumbo frames? If so, have you seen similar results or is this a characteristic of my systems/network? Is there an obvious reason why a larger MTU would result in a slowdown in Hadoop? Thanks for your thoughts, -stephen [1] http://wiki.apache.org/hadoop/Sort -- Stephen

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

2010-01-26 Thread stephen mulcahy
increase in the overall bandwidth when moving from an MTU of 1500 to an MTU of 9000. Has anyone else tested Hadoop performance with Jumbo frames? Are you seeing something different to what we're seeing? Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

2010-01-28 Thread stephen mulcahy
buffer allocation costs and garbage collection. I guess there must be something going on anyways - as there is clearly a performance drop-off, which surprised me, as lots of apps benefit significantly from jumbo frames. Thanks for the feedback, -stephen -- Stephen Mulcahy, DI2, Digital

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

2010-01-29 Thread stephen mulcahy
-- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com

Re: [RFH][Announce] hadoop on its way into Debian

2010-02-02 Thread stephen mulcahy
on sun java since that's the only java recommended by the Hadoop team? -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com

Re: [RFH][Announce] hadoop on its way into Debian

2010-02-03 Thread stephen mulcahy
. If there would be a hard dependency on sun-java, then hadoop could not enter the main repository of Debian, since sun-java is not free. This makes sense - thanks for your efforts on this. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park

Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-08 Thread stephen mulcahy
welcome (including comments on what distro/kernel combinations others are using). Thanks, -stephen [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572201 -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway

Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-09 Thread stephen mulcahy
Allen Wittenauer wrote: On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote: When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2 datanodes of the cluster enter a state whereby they are no longer responsive to network traffic. How much free memory do you have

Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-13 Thread stephen mulcahy
in our cluster was a good opportunity to give back to the community and do some testing on their behalf. With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? Thanks, -stephen -- Stephen Mulcahy, DI2

Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-15 Thread stephen mulcahy
to get an idea of what cluster configs work well and for people who want to sanity check a new cluster. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp

Hadoop performance - xfs and ext4

2010-04-22 Thread stephen mulcahy
back to EXT4, but thought the information might be useful/interesting to others. -stephen XFS config chosen from notes at http://everything2.com/index.pl?node_id=1479435 -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway

Re: Hadoop performance - xfs and ext4

2010-04-23 Thread stephen mulcahy
with noatime) gives me a cluster which runs TeraSort in about 22.5 minutes So ext4 looks like the winner, from a performance perspective, at least for running the TeraSort on my cluster with it's specific configuration. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI

Re: Hadoop performance - xfs and ext4

2010-04-23 Thread stephen mulcahy
though - whats the best way of addressing that? Do my apache credentials work for the wiki or do I need to explicitly have a new account for the hadoop wiki? -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http

Re: Hadoop performance - xfs and ext4

2010-05-11 Thread stephen mulcahy
it runs SLOWER with those options by between 5-8%. The TeraGen itself seemed to run about 5% faster but it was only a single run so I'm not sure how reliable that is. hth, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway

hadoop 0.20.205.0 multi-user cluster

2011-11-09 Thread stephen mulcahy
: org.apache.hadoop.security.AccessControlException: Permission denied: user=smulcahy, access=EXECUTE, inode=system:hadoop:supergroup:rwx-- . Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-14 Thread stephen mulcahy
. The users are remote users. Do I need to create accounts on the hadoop cluster for those users to add them to the hadoop group or how should this work? Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-14 Thread stephen mulcahy
On 14/11/11 09:38, stephen mulcahy wrote: Hi Raj, Thanks for your reply, comments below. On 09/11/11 18:45, Raj V wrote: Can you try the following? - Change the permisson to 775 for /hadoop/mapred/system As per the previous problem, the permissions still get reset on cluster restart. Am

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-14 Thread stephen mulcahy
reboot). -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com

native libraries problem in 0.20.205

2011-11-16 Thread stephen mulcahy
the native libs if I comment this out but just wondering why it would prefer ${HADOOP_PREFIX}/lib/libhadoop.a? Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.ie

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-16 Thread stephen mulcahy
permissions on hdfs://test/hadoop/mapred/system. Setting it to rwx-- Is there a reason for this policy? And how does that fit with multi-user hadoop? -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-16 Thread stephen mulcahy
On 16/11/11 14:07, stephen mulcahy wrote: On 14/11/11 20:46, Raj V wrote: Hi Stephen THis is probably happening during jobtracker start. Can you provide any relevant logs from the task tracker log fiile? You are correct, there is even a helpful message 2011-11-16 15:05:58,076 WARN

Re: native libraries problem in 0.20.205

2011-11-16 Thread stephen mulcahy
with both 32-bit and 64-bit libs in the native dirs these days) - I've tested that on my local cluster and it seems to work with some example jobs (unless they were put into lib/ for some other specific scenario). -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI

Re: hadoop 0.20.205.0 multi-user cluster

2011-11-16 Thread stephen mulcahy
On 16/11/11 14:52, stephen mulcahy wrote: So, digging further - hadoop seems to want to create a file mapred.system.dir/job id/jobToken for each job I submit. I assume this file is related to the new security stuff. Can I disable this activity until I require the security functionality