Should this be changed normally? If so, how large should it normally be?
50% of total system memory?
Thanks for any input,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp
, typically set according to cores memory each map/reduce task
will be using.
Right, so as an admin, these are probably the more interesting ones to
worry.
Also, typically client and datanodes will be the same.
Given my comments above, is this correct?
Thanks,
-stephen
--
Stephen Mulcahy
, apologies for publicising it ;)?
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
on
running things like HBase on the cluster also (which we do).
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
as distributed doesn't currently build against
Eclipse 3.5 (and possibly 3.4) - but works fine if you apply the patch
from http://issues.apache.org/jira/browse/HADOOP-3744 it build and works
fine.
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA
errors when you try to write to that disk
(despite df showing you loads of free space), so again, I'm not sure I'd
recommend this one.
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp
testing of Hadoop with Jumbo frames? If so,
have you seen similar results or is this a characteristic of my
systems/network? Is there an obvious reason why a larger MTU would
result in a slowdown in Hadoop?
Thanks for your thoughts,
-stephen
[1] http://wiki.apache.org/hadoop/Sort
--
Stephen
increase in the overall bandwidth when moving from an MTU of 1500 to an
MTU of 9000.
Has anyone else tested Hadoop performance with Jumbo frames? Are you
seeing something different to what we're seeing?
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway
buffer allocation costs and garbage collection.
I guess there must be something going on anyways - as there is clearly a
performance drop-off, which surprised me, as lots of apps benefit
significantly from jumbo frames.
Thanks for the feedback,
-stephen
--
Stephen Mulcahy, DI2, Digital
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
on sun java since that's the only java recommended by
the Hadoop team?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
.
If there would be a hard dependency on sun-java, then hadoop could not enter
the main repository of Debian, since sun-java is not free.
This makes sense - thanks for your efforts on this.
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park
welcome (including comments on what distro/kernel
combinations others are using).
Thanks,
-stephen
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572201
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway
Allen Wittenauer wrote:
On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:
When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or
2 datanodes of the cluster enter a state whereby they are no longer responsive
to network traffic.
How much free memory do you have
in our cluster was a good opportunity to give
back to the community and do some testing on their behalf.
With regard to our TeraSort benchmark time of ~23 minutes - is that in
the right ballpark for a cluster of 45 data nodes and a nn and 2nn?
Thanks,
-stephen
--
Stephen Mulcahy, DI2
to get an idea of what cluster configs work well and
for people who want to sanity check a new cluster.
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp
back to EXT4, but thought the information
might be useful/interesting to others.
-stephen
XFS config chosen from notes at
http://everything2.com/index.pl?node_id=1479435
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway
with noatime)
gives me a cluster which runs TeraSort in about 22.5 minutes
So ext4 looks like the winner, from a performance perspective, at least
for running the TeraSort on my cluster with it's specific configuration.
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI
though - whats the best way
of addressing that?
Do my apache credentials work for the wiki or do I need to explicitly
have a new account for the hadoop wiki?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http
it
runs SLOWER with those options by between 5-8%. The TeraGen itself
seemed to run about 5% faster but it was only a single run so I'm not
sure how reliable that is.
hth,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway
:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=smulcahy, access=EXECUTE, inode=system:hadoop:supergroup:rwx--
.
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
.
The users are remote users. Do I need to create accounts on the hadoop
cluster for those users to add them to the hadoop group or how should
this work?
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
On 14/11/11 09:38, stephen mulcahy wrote:
Hi Raj,
Thanks for your reply, comments below.
On 09/11/11 18:45, Raj V wrote:
Can you try the following?
- Change the permisson to 775 for /hadoop/mapred/system
As per the previous problem, the permissions still get reset on cluster
restart.
Am
reboot).
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
the native libs if
I comment this out but just wondering why it would prefer
${HADOOP_PREFIX}/lib/libhadoop.a?
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.ie
permissions on hdfs://test/hadoop/mapred/system. Setting
it to rwx--
Is there a reason for this policy? And how does that fit with multi-user
hadoop?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http
On 16/11/11 14:07, stephen mulcahy wrote:
On 14/11/11 20:46, Raj V wrote:
Hi Stephen
THis is probably happening during jobtracker start. Can you provide
any relevant logs from the task tracker log fiile?
You are correct, there is even a helpful message
2011-11-16 15:05:58,076 WARN
with both
32-bit and 64-bit libs in the native dirs these days) - I've tested that
on my local cluster and it seems to work with some example jobs (unless
they were put into lib/ for some other specific scenario).
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI
On 16/11/11 14:52, stephen mulcahy wrote:
So, digging further - hadoop seems to want to create a file
mapred.system.dir/job id/jobToken
for each job I submit.
I assume this file is related to the new security stuff. Can I disable
this activity until I require the security functionality
29 matches
Mail list logo