Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Mapred Learn
Thanks harsh ! That means basically both APIs as well as hadoop client commands allow only serial writes. I was wondering what could be other ways to write data in parallel to HDFS other than using multiple parallel threads. Thanks, JJ Sent from my iPhone On May 17, 2011, at 10:59 PM, Harsh J

Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Harsh J
Hello, Adding to Joey's response, copyFromLocal's current implementation is serial given a list of files. On Wed, May 18, 2011 at 9:57 AM, Mapred Learn wrote: > Thanks Joey ! > I will try to find out abt copyFromLocal. Looks like Hadoop Apis write serially as you pointed out. > > Thanks, > -JJ >

Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Mapred Learn
Thanks Joey ! I will try to find out abt copyFromLocal. Looks like Hadoop Apis write serially as you pointed out. Thanks, -JJ On May 17, 2011, at 8:32 PM, Joey Echeverria wrote: > The sequence file writer definitely does it serially as you can only > ever write to the end of a file in Hadoop.

Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Joey Echeverria
The sequence file writer definitely does it serially as you can only ever write to the end of a file in Hadoop. Doing copyFromLocal could write multiple files in parallel (I'm not sure if it does or not), but a single file would be written serially. -Joey On Tue, May 17, 2011 at 5:44 PM, Mapred

Re: matrix-vector multiply in hadoop

2011-05-17 Thread Ted Dunning
Try using the Apache Mahout code that solves exactly this problem. Mahout has a distributed row-wise matrix that is read one row at a time. Dot products with the vector are computed and the results are collected. This capability is used extensively in the large scale SVD's in Mahout. On Tue, Ma

Are hadoop fs commands serial or parallel

2011-05-17 Thread Mapred Learn
Hi, My question is when I run a command from hdfs client, for eg. hadoop fs -copyFromLocal or create a sequence file writer in java code and append key/values to it through Hadoop APIs, does it internally transfer/write data to HDFS serially or in parallel ? Thanks in advance, -JJ

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
or conf.setBoolean("mapred.task.profile", true); Mark On Tue, May 17, 2011 at 4:49 PM, Mark question wrote: > I usually do this setting inside my java program (in run function) as > follows: > > JobConf conf = new JobConf(this.getConf(),My.class); > conf.set("*mapred*.ta

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
I usually do this setting inside my java program (in run function) as follows: JobConf conf = new JobConf(this.getConf(),My.class); conf.set("*mapred*.task.*profile*", "true"); then I'll see some output files in that same working directory. Hope that helps, Mark On Tue,

How do you run HPROF locally?

2011-05-17 Thread W.P. McNeill
I am running a Hadoop Java program in local single-JVM mode via an IDE (IntelliJ). I want to do performance profiling of it. Following the instructions in chapter 5 of *Hadoop: the Definitive Guide*, I added the following properties to my job configuration file. mapred.task.profile t

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Thanks for the inputs, but I'm running on a university cluster, not my own and hence are the assumptions such as each task(mapper/reduer) will take 1 GB valid ? So I guess to tune performance I should try running the job multiple times and rely on execution time as an indicator of success. Thank

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Konstantin Boudnik
Also, it seems like Ganglia would be very well complemented by Nagios to allow you to monitor an overall health of your cluster. --   Take care, Konstantin (Cos) Boudnik 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622 Disclaimer: Opinions expressed in this email are those of the author, and do

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Allen Wittenauer
On May 17, 2011, at 3:11 PM, Mark question wrote: > So what other memory consumption tools do you suggest? I don't want to do it > manually and dump statistics into file because IO will affect performance > too. We watch memory with Ganglia. We also tune our systems such that a task wi

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
So what other memory consumption tools do you suggest? I don't want to do it manually and dump statistics into file because IO will affect performance too. Thanks, Mark On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer wrote: > > On May 17, 2011, at 1:01 PM, Mark question wrote: > > > Hi > > >

Re: log4j.properties

2011-05-17 Thread Jamie Cockrill
Hi Shah, You've not mentioned which version of log4j you're using so I'm going to guess 1.2. I'm also not an expert, but I'll give it a go. I don't think you can set a max number of files to keep with the DailyRollingFileAppender. You can with RollingFileAppender. This seems to be a relatively c

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Allen Wittenauer
On May 17, 2011, at 1:01 PM, Mark question wrote: > Hi > > I need to use hadoop-tool-kit for monitoring. So I followed > http://code.google.com/p/hadoop-toolkit/source/checkout > > and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 Looking at the code, be awa

Again ... Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Sorry for the spam, but I didn't see my previous email yet. I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 and set a property *“mapred.performance.

matrix-vector multiply in hadoop

2011-05-17 Thread Alexandra Anghelescu
Hi all, I was wondering how to go about doing a matrix-vector multiplication using hadoop. I have my matrix in one file and my vector in another. All the map tasks will need the vector file... basically they need to share it. Basically I want my map function to output key-value pairs (i,m[i,j]*v(

Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Hi I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 and set a property *“mapred.performance.diagnose”* to true in * mapred-site.xml*. but I don't se

Re: Error in starting tasktracker

2011-05-17 Thread Subhramanian, Deepak
I reinstalled everything and am able to start everything other than the jobtracker. Jobtracker still gives the port in use even though I verified that the port is not running using netstat. ipedited:/usr/lib/hadoop-0.20/logs/history # /usr/java/jdk1.6.0_25/bin/jps 7435 SecondaryNameNode 7517 TaskT

Re: Error in starting tasktracker

2011-05-17 Thread Subhramanian, Deepak
Hi Harsh, I tried changing the port and tried again without luck. I changed the port to 8023. And it says port 8023 in use. But when I did netstat 8023 is not listed. I am also using oozie configured in the system . While trying to work with oozie the permissions of some of the directories got ch

Re: Error in starting tasktracker

2011-05-17 Thread Harsh J
Deepak, >From the logs it appears as if some service on your machine already uses the specified 8021 port. Try shutting down whatever might be using that if possible, or switch your JT's port to something else. On Tue, May 17, 2011 at 9:19 PM, Subhramanian, Deepak wrote: > Hi , > > I am using cd

Error in starting tasktracker

2011-05-17 Thread Subhramanian, Deepak
Hi , I am using cdh3 in pseudo distributed mode and getting the following error while starting the task tracker and job tracker. Any suggestions.? Error for Task Tracker 2011-05-17 13:28:10,234 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG: /

Debug hadoop error

2011-05-17 Thread jonathan.hwang
I need some help on figuring out why my job failed. I built a single node cluster just to try it out. I follow the example link http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Everything seems to be working correctly. I formated the namenode. Able to con

Re: log4j.properties

2011-05-17 Thread shahsaifi
Please help me out with this. -- View this message in context: http://lucene.472066.n3.nabble.com/log4j-properties-tp2842411p2951985.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException

2011-05-17 Thread Harsh J
Hey Lạc Trung, I do not see a configuration instance used in your code; but you're using the Configured class. Do you instantiate CopyFiles using Hadoop's ReflectionUtils utility class? Unless that's done, the getConf() would be returning a null causing the issue probably. On Sat, May 14, 2011 at

Re: Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException

2011-05-17 Thread Steve Loughran
On 16/05/11 21:12, Lạc Trung wrote: I'm using Hadoop-0.21. --- hut.edu.vn At the top, it's your code, so you get to fix it. The good thing about open source is you can go all the way in. This is what I would do in the same situation -Grab the 0.21 source JAR -add it your IDE -have a look at

Re: Hadoop's authority access

2011-05-17 Thread Harsh J
Hello, Am not sure of the latter part of your need, but you can add a filter atop the UIs. One good example extension is provided at: https://issues.apache.org/jira/browse/HADOOP-7119 There's also Hue, which provides this functionality of managing users along with a lot of other goodies. Read mor