Hi all,
Can anybody help me to configure SWIM -- Statistical Workload Injector for
MapReduce on my hadoop cluster
On Wed, Feb 29, 2012 at 11:34 PM, W.P. McNeill bill...@gmail.com wrote:
I can do perform HDFS operations from the command line like hadoop fs -ls
/. Doesn't that meant that the datanode is up?
No. It is just meta data lookup which comes from Namenode. Try to cat
some file like hadoop fs
Hi all,
I am looking into reusing some existing code for distributed indexing
to test a Mahout tool I am working on
https://issues.apache.org/jira/browse/MAHOUT-944
What I want is to index the Apache Public Mail Archives dataset (200G)
via MapReduce on Hadoop.
I have been going through the
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:
Thank you for your time and suggestions, I've already tried starfish, but
not jmap. I'll check it out.
Thanks again,
Mark
On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl charles.ce...@gmail.comwrote:
I do agree that a git hub project is the way to go unless you could convince
Cloudera, HortonWorks or MapR to pick it up and support it. They have enough
committers
Is this potentially worthwhile? Maybe, it depends on how the cluster is
integrated in to the overall environment. Companies
From the fairscheduler docs I assume the following should work:
property
namemapred.fairscheduler.poolnameproperty/name
valuepool.name/value
/property
property
namepool.name/name
value${mapreduce.job.group.name}/value
/property
which means that the default pool will be the group of
I've just started playing with the Fair Scheduler. To specify the pool at job
submission time you set the mapred.fairscheduler.pool property on the Job
Conf to the name of the pool you want the job to use.
Dave
-Original Message-
From: Merto Mertek [mailto:masmer...@gmail.com]
Sent:
Thanks,
I will be trying the suggestions and will get back to you soon.
On Thu, Mar 1, 2012 at 8:09 PM, Dave Shine
dave.sh...@channelintelligence.com wrote:
I've just started playing with the Fair Scheduler. To specify the pool at
job submission time you set the mapred.fairscheduler.pool
Hi,
I tried what you had said. I added the following to mapred-site.xml:
property
namemapred.fairscheduler.poolnameproperty/name
valuepool.name/value
/property
property
namepool.name/name
value${mapreduce.job.group.name}/value
/property
Funny enough it created a pool with the name
When I try kill -QUIT for a job it doesn't send the stacktrace to the log
files. Does anyone know why or if I am doing something wrong?
I find the job using ps -ef|grep attempt. I then go to
logs/userLogs/jobid/attemptid/
Is there a high quality version of the hadoop logo anywhere? Even the graphic
presented on the Apache page itself suffers from dreadful jpeg artifacting. A
google image search didn't inspire much hope on this issue (they all have the
same low-quality jpeg appearance). I'm looking for good
Sorry, false alarm. I was looking at the popup thumbnails in google image
search. If I click all the way through, there are some high quality versions
available. Why is the version on the Apache site (and the Wikipedia page) so
poor?
On Mar 1, 2012, at 14:09 , Keith Wiley wrote:
Is there
On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley kwi...@keithwiley.com wrote:
Sorry, false alarm. I was looking at the popup thumbnails in google image
search. If I click all the way through, there are some high quality
versions available. Why is the version on the Apache site (and the Wikipedia
Starfish worked great for wordcount .. I didn't run it on my application
because I have only map tasks.
Mark
On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl charles.ce...@gmail.comwrote:
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:
Thank you for
Excellent!
Thank you.
Sent from my phone, please excuse my brevity.
Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com
Owen O'Malley omal...@apache.org wrote:
On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley kwi...@keithwiley.com wrote:
Sorry, false
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:
http://wiki.apache.org/hadoop/FAQ
1. Update conf/slave
2. on the slave nodes start datanode and tasktracker
3. hadoop balancer
Do I also need to run dfsadmin -refreshnodes?
You only have to refresh nodes if you're making use of an allows file.
Sent from my iPhone
On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:
http://wiki.apache.org/hadoop/FAQ
1. Update
On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com wrote:
You only have to refresh nodes if you're making use of an allows file.
Thanks does it mean that when tasktracker/datanode starts up it
communicates with namenode using master file?
Sent from my iPhone
On Mar 1, 2012,
Not quite. Datanodes get the namenode host from fs.defalt.name in
core-site.xml. Task trackers find the job tracker from the mapred.job.tracker
setting in mapred-site.xml.
Sent from my iPhone
On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:
On Thu, Mar 1, 2012 at 4:46
The master and slave files, if I remember correctly are used to start the
correct daemons on the correct nodes from the master node.
Raj
From: Joey Echeverria j...@cloudera.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Cc:
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com wrote:
Not quite. Datanodes get the namenode host from fs.defalt.name in
core-site.xml. Task trackers find the job tracker from the
mapred.job.tracker setting in mapred-site.xml.
I actually meant to ask how does
Whatever Joey said is correct for Cloudera's distribution. For same, I am
not confident about other distribution as i haven't tried them.
Thanks,
Anil
On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan rajv...@yahoo.com wrote:
The master and slave files, if I remember correctly are used to start
WHat Joey said is correct for both apache and cloudera distros. The DN/TT
daemons will connect to the NN/JT using the config files. The master and slave
files are used for starting the correct daemons.
From: anil gupta anilg...@buffalo.edu
To:
It is initiated by the slave.If you have defined files to state which slaves can talk to the namenode (using configdfs.hosts) and which hosts cannot (using propertydfs.hosts.exclude) then you would need to edit these files and issue the refresh command.On Mar 1, 2012, at 5:35 PM, Mohit Anchlia
Thanks all for the answers!!
On Thu, Mar 1, 2012 at 5:52 PM, Arpit Gupta ar...@hortonworks.com wrote:
It is initiated by the slave.
If you have defined files to state which slaves can talk to the namenode
(using config dfs.hosts) and which hosts cannot (using
property dfs.hosts.exclude)
Mohit,
New datanodes will connect to the namenode so thats how the namenode
knows. Just make sure the datanodes have the correct {fs.default.dir}
in their hdfs-site.xml and then start them. The namenode can, however,
choose to reject the datanode if you are using the {dfs.hosts} and
Tried but still getting the error 0.4.15. Really lost with this.
My hadoop release is 0.20.2 from more than a year ago. Could this be related
to the problem?
--
View this message in context:
http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792484.html
Sent
Marc,
Was the lzo libs on your server upgraded to a higher version recently?
Also, when you deployed a built copy of 0.4.15, did you ensure you
replaced the older native libs for hadoop-lzo as well?
On Fri, Mar 2, 2012 at 9:05 AM, Marc Sturlese marc.sturl...@gmail.com wrote:
Tried but still
Yes, The steps I followed where:
1-Intall lzo 2.06 in a machine with the same kernel as my nodes.
2-Compile there lzo 0.4.15 (in /lib replaced cdh3u3 per my hadoop 0.20.2
release)
3-Replace hadoop-lzo-0.4.9.jar for the now compiled hadoop-lzo-0.4.15.jar in
the hadoop lib directory of all my nodes
I use to have 2.05 but now as I said I installed 2.06
--
View this message in context:
http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792511.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
I know this doesn't fix lzo, but have you considered Snappy for the
intermediate output compression? It gets similar compression ratios
and compress/decompress speed, but arguably has better Hadoop
integration.
-Joey
On Thu, Mar 1, 2012 at 10:01 PM, Marc Sturlese marc.sturl...@gmail.com wrote:
Absolutely. In case I don't find the root of the problem soon I'll definitely
try it.
--
View this message in context:
http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792531.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Absolutely. In case I don't find the root of the problem soon I'll definitely
try it.
--
View this message in context:
http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792530.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hello Folks,
Are there any pointers to such comparisons between Apache Pig and Hadoop
Streaming Map Reduce jobs?
Also there was a claim in our company that Pig performs better than Map
Reduce jobs? Is this true? Are there any such benchmarks available
Thanks, Subir
Hi,
Please look inside $HADOOP_HOME/contrib/datajoin folder of 0.20.2 version.
You will find the jar.
On Sat, Feb 11, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
Hi, all,
I am starting to learn advanced Map/Reduce. However, I cannot find the
class DataJoinMapperBase in my downloaded
Hi,
Only HDFS should be enough.
On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do than...@cs.wisc.edu wrote:
hi all,
in order to run DFSIO in my cluster,
do i need to run JobTracker, and TaskTracker,
or just running HDFS is enough?
Many thanks,
Thanh
--
Join me at
Madhu,
That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR
setup to use it.
On Fri, Mar 2, 2012 at 11:07 AM, madhu phatak phatak@gmail.com wrote:
Hi,
Only HDFS should be enough.
On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do than...@cs.wisc.edu wrote:
hi all,
in order to
Considering Pig essentially translates scripts into Map Reduce jobs, one
can always write as good Map Reduce jobs as Pig does. You can refer to Pig
experience paper to see the overhead Pig introduces, but it's been
improved all the time.
Btw if you really care about the performance, how you
On Fri, Mar 2, 2012 at 10:18 AM, Subir S subir.sasiku...@gmail.com wrote:
Hello Folks,
Are there any pointers to such comparisons between Apache Pig and Hadoop
Streaming Map Reduce jobs?
I do not see why you seek to compare these two. Pig offers a language
that lets you write data-flow
Hi Harsha,
Sorry i read DFSIO as DFS Input/Output which i thought reading and writing
using HDFS API:)
On Fri, Mar 2, 2012 at 12:32 PM, Harsh J ha...@cloudera.com wrote:
Madhu,
That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR
setup to use it.
On Fri, Mar 2, 2012 at
40 matches
Mail list logo