Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-10-03 Thread Hemanth Yamijala
You could also try creating a lib directory with the dependant jar and package that along with the job's jar file. Please refer to this blog post for information: http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ On Wed, Sep 26, 2012 at 4:57 PM, sudh

Re: Passing Command-line Parameters to the Job Submit Command

2012-09-25 Thread Hemanth Yamijala
; > Is my above assumption correct? > > Thanks, > Varad > > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala wrote: > >> Varad, >> >> Looking at the code for the PiEstimator class which implements the >> 'pi' example, the two arguments are mandator

Re: Passing Command-line Parameters to the Job Submit Command

2012-09-23 Thread Hemanth Yamijala
Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code

Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)

2012-09-10 Thread Hemanth Yamijala
Hi, I am not sure if there's any way to restrict the tasks to specific machines. However, I think there are some ways of restricting to number of 'slots' that can be used by the job. Also, not sure which version of Hadoop you are on. The capacityscheduler (http://hadoop.apache.org/common/docs/r2.

Re: no output written to HDFS

2012-08-30 Thread Hemanth Yamijala
Hi, Do both input files contain data that needs to be processed by the mapper in the same fashion ? In which case, you could just put the input files under a directory in HDFS and provide that as input. The -input option does accept a directory as argument. Otherwise, can you please explain a lit

Re: HDFS disk consumption.

2010-12-28 Thread Hemanth Yamijala
Hi, On Wed, Dec 29, 2010 at 5:51 AM, Jane Chen wrote: > Is setting dfs.replication to 1 sufficient to stop replication?  How do I > verify that?  I have a pseudo cluster running 0.21.0.  It seems that the hdfs > disk consumption triples the amount of data stored. Setting to 1 is sufficient to

Re: Topology : Script Based Mapping

2010-12-28 Thread Hemanth Yamijala
Hi, On Tue, Dec 28, 2010 at 6:03 PM, Rajgopal Vaithiyanathan wrote: > I wrote a script to map the IP's to a rack. The script is as follows. : > > for i in $* ; do >        topo=`echo $i | cut -d"." -f1,2,3 | sed 's/\./-/g'` >        topo=/rack-$topo" " >        final=$final$topo > done > echo $fi

Re: Question from a Desperate Java Newbie

2010-12-09 Thread Hemanth Yamijala
Not exactly what you may want - but could you try using a HTTP client in Java ? Some of them have the ability to automatically follow redirects, manage cookies etc. Thanks hemanth On Thu, Dec 9, 2010 at 4:35 PM, edward choi wrote: > Excuse me for asking a general Java question here. > I tried to

Re: Hadoop command line arguments

2010-12-03 Thread Hemanth Yamijala
Hi, On Sat, Dec 4, 2010 at 4:50 AM, yogeshv wrote: > > Dear all, > > Which file in the hadoop svn processes/receives the hadoop command line > arguments.? > > While execution for ex: hadoop jar > . > 'hadoop' in the above line is a shell script that's present in the hadoop-common/bin location

Re: delay the execution of reducers

2010-12-02 Thread Hemanth Yamijala
Hi, > Changing the parameter for a specific job works better for me. > > But I was asking in general in which configuration file(s) should I change > the value of the parameters. > For parameters in hdfs-site.xml, I should changes the configuration file in > each machine. But for parameters in mapr

Re: Memory config for Hadoop cluster

2010-11-07 Thread Hemanth Yamijala
Amandeep, On Fri, Nov 5, 2010 at 11:54 PM, Amandeep Khurana wrote: > On Fri, Nov 5, 2010 at 2:00 AM, Hemanth Yamijala wrote: > >> Hi, >> >> On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana wrote: >> > Right. I meant I'm not using fair or capacity scheduler

Re: Memory config for Hadoop cluster

2010-11-05 Thread Hemanth Yamijala
the settings as 'final' on the job tracker and the task trackers. Then any submission by the job would not override the settings. Thanks Hemanth > > -Amandeep > > On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala wrote: > > Hi, > > > I'm not using any schedule

Re: Memory config for Hadoop cluster

2010-11-05 Thread Hemanth Yamijala
0.21, and the names of the parameters are different, though you can see the correspondence with similar variables in Hadoop 0.20. Thanks Hemanth > > -Amandeep > > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala wrote: > >> Amadeep, >> >> Which scheduler are you

Re: Memory config for Hadoop cluster

2010-11-05 Thread Hemanth Yamijala
Amadeep, Which scheduler are you using ? Thanks hemanth On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana wrote: > How are the following configs supposed to be used? > > mapred.cluster.map.memory.mb > mapred.cluster.reduce.memory.mb > mapred.cluster.max.map.memory.mb > mapred.cluster.max.reduce.

Re: Granting Permissions to HDFS

2010-10-30 Thread Hemanth Yamijala
Hi, On Thu, Oct 28, 2010 at 5:11 PM, Adarsh Sharma wrote: > Dear all, > I am listing all the HDFS delails through -fs shell. I know the superuser > owns the privileges to list files. But know I want to grant all read and > write privileges to two new users (for e. g  Tom and White ) . Only these

Re: help with rewriting hadoop java code for new API: getPos() and getCounter()

2010-10-30 Thread Hemanth Yamijala
Hi, On Wed, Oct 27, 2010 at 2:19 AM, Bibek Paudel wrote: > [Apologies for cross-posting] > > HI all, > I am rewriting a hadoop java code for the new (0.20.2) API- the code > was originally written for versions <= 0.19. > > 1. What is the equivalent of the getCounter() method ? For example, > the

Re: GC overhead limit exceeded while running Terrior on Hadoop

2010-10-30 Thread Hemanth Yamijala
Hi, On Tue, Oct 26, 2010 at 8:14 PM, siddharth raghuvanshi wrote: > Hi, > > While running Terrior on Hadoop, I am getting the following error again & > again, can someone please point out where the problem is? > > attempt_201010252225_0001_m_09_2: WARN - Error running child > attempt_20101025

Re: Need help on accessing datanodes local filesystem using hadoop map reduce framework

2010-10-23 Thread Hemanth Yamijala
Hi, On Sat, Oct 23, 2010 at 1:44 AM, Burhan Uddin wrote: > Hello, > I am a beginner with hadoop framework. I am trying create a distributed > crawling application. I have googled a lot. but the resources are too low. > Can anyone please help me on the following topics. > I suppose you know alrea

Re: nodes with different memory sizes

2010-10-13 Thread Hemanth Yamijala
Hi, You mentioned you'd like to configure different memory settings for the process depending on which nodes the tasks run on. Which process are you referring to here - the Hadoop daemons, or your map/reduce program ? An alternative approach could be to see if you can get only those nodes in Torq

Re: Sorting Numbers using mapreduce

2010-09-05 Thread Hemanth Yamijala
Hi, On Mon, Sep 6, 2010 at 1:47 AM, Neil Ghosh wrote: > Hi, > > I am trying to sort a list of numbers (one per line) using  hadoop > mapreduce. > Kindly suggest any reference and code. > > How do I implement custom input format and recordreader so that both key and > value are the number? > > I a

Re: Obtaining the number of map slots through the API (Hadoop 0.20.2)

2010-09-05 Thread Hemanth Yamijala
Hi, > > The optimization of one Hadoop job I'm running would benefit from knowing > the > maximum number of map slots in the Hadoop cluster. > > This number can be obtained (if my understanding is correct) by: > > * parsing the mapred-site.xml file to get >  the mapred.tasktracker.map.tasks.maximu

Re: main" java.lang.UnsupportedClassVersionError: Bad version number in .class

2010-08-30 Thread Hemanth Yamijala
Hi, Can you please confirm if you've set JAVA_HOME in /hadoop-env.sh on all the nodes ? Thanks Hemanth On Tue, Aug 31, 2010 at 6:21 AM, Mohit Anchlia wrote: > Hi, > > I am running some basic setup and test to know about hadoop. When I > try to start nodes I get this error. I am already using ja

Re: cluster startup problem

2010-08-30 Thread Hemanth Yamijala
Hi, On Mon, Aug 30, 2010 at 8:19 AM, Gang Luo wrote: > Hi all, > I am trying to configure and start a hadoop cluster on EC2. I got some > problems > here. > > > 1. Can I share hadoop code and its configuration across nodes? Say I have a > distributed file system running in the cluster and all th

Re: cluster write permission

2010-08-29 Thread Hemanth Yamijala
Hi, On Sun, Aug 29, 2010 at 10:14 PM, Gang Luo wrote: > HI all, > I am setting a hadoop cluster where I have to specify the local directory for > temp files/logs, etc. Should I allow everybody have the write permission to > these directories? Who actually does the write operation? The temp and l

Re: Hadoop startup problem - directory name required

2010-08-25 Thread Hemanth Yamijala
Hmm. Without the / in the property tag, isn't the file malformed XML ? I am pretty sure Hadoop complains in such cases ? On Wed, Aug 25, 2010 at 4:44 AM, cliff palmer wrote: > Thanks Allen - that has resolved the problem.  Good catch! > Cliff > > On Tue, Aug 24, 2010 at 3:05 PM, Allen Wittenauer

Re: Managing configurations

2010-08-18 Thread Hemanth Yamijala
Mark, On Wed, Aug 18, 2010 at 10:59 PM, Mark wrote: >  What is the preferred way of managing multiple configurations.. ie > development, production etc. > > Is there someway I can tell hadoop to use a separate conf directory other > than ${hadoop_home}/conf? I think I've read somewhere that one

Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of

2010-08-17 Thread Hemanth Yamijala
Hi, > Hi, Hemanth. Thinks for your reply! > > I tried your recommendation, absolute path, it worked, I was able to run the > jobs successfully. Thank you! > I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative > path didn't work. I am not entirely sure, but when the daemon

Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of

2010-08-13 Thread Hemanth Yamijala
Hi, > 1. I login through SSH without password from master and slaves, it's all > right :-) > > 2. >   >    hadoop.tmp.dir >    tmp >   > > In fact, 'tmp' is what I want  :-) > > $HADOOP_HOME >                         + tmp >                                  + dfs >                                

Re: Scheduler recommendation

2010-08-11 Thread Hemanth Yamijala
Hi, On Thu, Aug 12, 2010 at 10:31 AM, Hemanth Yamijala wrote: > Hi, > > On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett > wrote: >> From what I've read/seen, it appears that, if not the "default" >> scheduler, most installations are using Hadoop's

Re: Scheduler recommendation

2010-08-11 Thread Hemanth Yamijala
Hi, On Thu, Aug 12, 2010 at 3:35 AM, Bobby Dennett wrote: > From what I've read/seen, it appears that, if not the "default" > scheduler, most installations are using Hadoop's Fair Scheduler. Based > on features and our requirements, we're leaning towards using the > Capacity Scheduler; however, t

Re: add priority to task

2010-08-02 Thread Hemanth Yamijala
Hi, On Tue, Aug 3, 2010 at 9:42 AM, saurabhsuman8989 wrote: > > By By 'tasks' i mean different tasks under one job. When a Job is distributed > in different tasks , can i add prioroty to those tasks. It would interesting to know why you want to do this. Can you please explain your use case ? Th

Re: Set variables in mapper

2010-08-02 Thread Hemanth Yamijala
Hi, It would also be worthwhile to look at the Tool interface (http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Tool), which is used by example programs in the MapReduce examples as well. This would allow any arguments to be passed using the -Dvar.name=var.value convention on comm

Re: reuse cached files

2010-08-02 Thread Hemanth Yamijala
. This is *not* to be used by client code, and is not guaranteed to work. In the latter versions of Hadoop (0.21 and trunk), these methods have been deprecated in the public API and will be removed altogether. Thanks hemanth > > Thanks, > -Gang > > > > - 原始邮件 > 发件人

Re: jobtracker.jsp reports "GC overhead limit exceeded"

2010-08-01 Thread Hemanth Yamijala
Hi, > Actually I enabled all level logs. But I didn't realize to check logs in .out > files and only looked at .log file and didn't see any error msgs. now I > opened the .out file and saw the following logged exception: > > Exception in thread "IPC Server handler 5 on 50002" > java.lang.OutOfM

Re: reuse cached files

2010-08-01 Thread Hemanth Yamijala
Hi, > Thanks Hemanth. Is there any way to invalidate the reuse and ask Hadoop to > resent exactly the same files to cache for every job? I may be able to answer this better if I understand the use case. If you need the same files for every job, why would you need to send them afresh each time ? I

Re: Parameters that can be set per job

2010-07-29 Thread Hemanth Yamijala
Hi, > Is there a list of configuration parameters that can be set per job. I'm almost certain there's no list that documents per-job settable parameters that well. From 0.21 onwards, I think a convention adopted is to name all job-related or task-related parameters to include 'job' or 'map' or 'r

Re: reuse cached files

2010-07-29 Thread Hemanth Yamijala
Hi, > if I use distributed cache to send some files to all the nodes in one MR job, > can I reuse these cached files locally in my next job, or will hadoop re-sent > these files again? Cache files are reused across Jobs. From trunk onwards, they will be restricted to be reused across jobs of the

Re: Setting jar for embedded Job (Hadoop 0.20.2)

2010-07-26 Thread Hemanth Yamijala
Hi, > I'd like to run a Hadoop (0.20.2) job > from within another application, using ToolRunner. > > One class of this other application implements the Tool interface. > The implemented run() method: > * constructs a Job() > * sets the input/output/mapper/reducer > * sets the jar file by calling j

Re: Hadoop's datajoin

2010-07-12 Thread Hemanth Yamijala
Hi, > I am trying to use the hadoop's datajoin for joining two relation. According > to > the Readme file of datajoin, it gives the following syntax: > > $HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input   > datajoin/output

Re: reading distributed cache returns null pointer

2010-07-09 Thread Hemanth Yamijala
Hi, > Thanks for the information. I got your point. What I specifically want to ask > is > that if I use the following method to read my file now in each mapper: > >            FileSystem        hdfs=FileSystem.get(conf); >              URI[] uris=DistributedCache.getCacheFiles(conf); >          

Re: Is "heap size allocation" of namenode dynamic or static?

2010-07-09 Thread Hemanth Yamijala
Edward, Overall, I think the consideration should be about how much load do you expect to support on your cluster. For HDFS, there's a good amount of information about how much RAM is required to support a certain amount of data stored in DFS; something similar can be found for Map/Reduce as well.

Re: Pig share schema between projetcs

2010-07-09 Thread Hemanth Yamijala
John, Can you please redirect this to pig-u...@hadoop.apache.org ? You're more likely to get good responses there. Thanks hemanth On Thu, Jul 8, 2010 at 7:01 AM, John Seer wrote: > > Hello, Is there any way to share shema file in pig for the same table between > projects? > > > -- > View this m

Re: Intermediate files generated.

2010-07-01 Thread Hemanth Yamijala
Alex, > I don't think this is what I am looking for. Essential, I wish to run both > mapper as well as reducer. But at the same time, i wish to make sure that > the temp files that are used between mappers and reducers are of my choice. > Here, the choice means that I can specify the files in HDFS

Re: how to figure out the range of a split that failed?

2010-06-29 Thread Hemanth Yamijala
Hi, > I am running a mapreduce job on my hadoop cluster. > > I am running a 10 gigabytes data and one tiny failed task crashes the whole > operation. > I am up to 98% complete and throwing away all the finished data seems just > like an awful waste. > I'd like to save the finished data and run aga

Re: how often are hadoop configuration files reloaded?

2010-06-29 Thread Hemanth Yamijala
Michael, Configuration is not reloaded for daemons. There is currently no way to refresh configuration once the cluster is started. Some specific aspects - like queue configuration, blacklist nodes can be reloaded based on commands like hadoop admin refreshQueues or some such. Thanks Hemanth On

Re: hybrid map/reducer scheduler?

2010-06-28 Thread Hemanth Yamijala
Michael, > In addition to default FIFO scheduler, there are fair scheduler and capacity > scheduler. In some sense, fair scheduler can be considered a user-based > scheduling while capacity scheduler does a queue-based scheduling. Is there > or will there be a hybrid scheduler that combines the

Re: memory management of capacity scheduling

2010-06-26 Thread Hemanth Yamijala
Shashank, > Hi, > > Setup Info: > I have 2 node hadoop (20.2) cluster on Linux boxes. > HW info: 16 CPU (Hyperthreaded) > RAM: 32 GB > > I am trying to configure capacity scheduling. I want to use memory > management provided by capacity scheduler. But I am facing few issues. > I have added hadoop

Re: Stuck MR job

2010-06-23 Thread Hemanth Yamijala
he values set for some specific configuration variables. Unfortunately, the names of those variables have changed from 20 to 21 and trunk. Hence, I need to know the version to specify which ones to look up for. Thanks Hemanth > Vidhya > > On 6/23/10 3:16 AM, "Hemanth Yamijala" wrot

Re: Feed hdfs with external data.

2010-06-23 Thread Hemanth Yamijala
> You can use --config to your bin/hadoop commands. I > think it would also work if you set the HADOOP_CONF_DIR environment > variable to point to this path. > >> >> >> On Wed, Jun 23, 2010 at 10:52 AM, Hemanth Yamijala wrote: >> >>> Pierre, >&g

Re: Stuck MR job

2010-06-23 Thread Hemanth Yamijala
Vidhya, > Hi >  This looks like a trivial problem but would be glad if someone can help.. > >  I have been trying to run a m-r job on my cluster. I had modified my configs > (primarily reduced the heap sizes for the task tracker and the data nodes) > and restarted my hadoop cluster and the job w

Re: Feed hdfs with external data.

2010-06-23 Thread Hemanth Yamijala
lso work if you set the HADOOP_CONF_DIR environment variable to point to this path. > > > On Wed, Jun 23, 2010 at 10:52 AM, Hemanth Yamijala wrote: > >> Pierre, >> >> > I have a program that generates the data that's supposed to be treated by >> >

Re: Feed hdfs with external data.

2010-06-23 Thread Hemanth Yamijala
Pierre, > I have a program that generates the data that's supposed to be treated by > hadoop. > It's a java program that should write right on hdfs. > So as a test, I do this: > > > >            Configuration config = new Configuration(); >            FileSystem dfs = FileSystem.get(config); >    

Re: Hadoop JobTracker Hanging

2010-06-22 Thread Hemanth Yamijala
There was also https://issues.apache.org/jira/browse/MAPREDUCE-1316 whose cause hit clusters at Yahoo! very badly last year. The situation was particularly noticeable in the face of lots of jobs with failed tasks and a specific fix that enabled OutOfBand heartbeats. The latter (i.e. the OOB heartbe

Re: How to set the number of map tasks? (ver 0.20.2)

2010-06-21 Thread Hemanth Yamijala
Felix, > I'm using the new Job class: > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html > > There is a way to set the number of reduce tasks: > > setNumReduceTasks(int tasks) > > However, I don't see how to set the number of MAP tasks? > > I tried to set it

Re: AccessControlException when calling copyFromLocalFile()

2010-06-02 Thread Hemanth Yamijala
Ted, > When the user calling FileSystem.copyFromLocalFile() doesn't have permission > to write to certain hdfs path: > Thread [main] (Suspended (exception AccessControlException)) >    DFSClient.mkdirs(String, FsPermission) line: 905 >    DistributedFileSystem.mkdirs(Path, FsPermission) line: 262

Re: JNI native library loading problem in standalone mode

2010-05-31 Thread Hemanth Yamijala
Edward, If it's an option to copy the libraries to a fixed location on all the cluster nodes, you could do that and configure them in the library path via mapred.child.java.opts. Please look at http://bit.ly/ab93Z8 (MapReduce tutorial on Hadoop site) to see how to use this config option for settin

Re: Tasktracker appearing from "nowhere"

2010-05-28 Thread Hemanth Yamijala
Peter, > I'm getting the following errors: > > WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record > of 'previous' heartbeat for > 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; > reinitializing the tasktracker > > INFO org.apache.hadoop.mapred.JobTracker: Adding

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

2010-05-25 Thread Hemanth Yamijala
Erik, > > I've been unable to resolve this problem on my own so I've decided to ask > for help. I've pasted the logs I have for the DataNode on of the slave > nodes. The logs for TaskTracker are essentially the same (i.e. same > exception causing a shutdown). > > Any suggestions or hints as to wha

Re: Ordinary file pointer?

2010-05-22 Thread Hemanth Yamijala
Keith, On Sat, May 22, 2010 at 5:01 AM, Keith Wiley wrote: > On May 21, 2010, at 16:07 , Mikhail Yakshin wrote: > >> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote: >>> My Java mapper hands its processing off to C++ through JNI.  On the C++ >>> side I need to access a file.  I have already

Re: Setting up a second cluster and getting a weird issue

2010-05-14 Thread Hemanth Yamijala
Andrew, > Just to be clear, I'm only sharing the Hadoop binaries and config files via > NFS.  I don't see how this would cause a conflict - do you have any > additional information? FWIW, we had an experience where we were storing config files on NFS on a large cluster. Randomly, (and we guess

Re: Eclipse plugin

2010-05-06 Thread Hemanth Yamijala
Jim, > I have two machines, one is Windows XP and another one is Widows Vista. I > did the same thing on two machines. Hadoop Eclipse Plugin works fine in > Windows XP. But I got an error when I run it in Windows Vista. > > I copied hadoop-0.20.2-eclipse-plugin into Eclipse/plugins folder and > re

Re: separate JVM flags for map and reduce tasks

2010-04-22 Thread Hemanth Yamijala
Vasilis, > I 'd like to pass different JVM options for map tasks and different > ones for reduce tasks. I think it should be straightforward to add > mapred.mapchild.java.opts, mapred.reducechild.java.opts to my > conf/mapred-site.xml and process the new options accordingly in > src/mapred/org/apa

Re: How to make HOD apply more than one core on each machine?

2010-04-21 Thread Hemanth Yamijala
Song, >   I guess you are very close to my point. I mean whether we can find a way > to set the qsub parameter "ppn"? >From what I could see in the HOD code, it appears you cannot override the ppn value with HOD. You could look at src/contrib/hod/hodlib/NodePools/torque.py, and specifically the m

Re: How to make HOD apply more than one core on each machine?

2010-04-16 Thread Hemanth Yamijala
Song, >   I know it is the way to set the capacity of each node, however, I want to > know, how can we make Torque manager that we will run more than 1 mapred > tasks on each machine. Because if we dont do this, torque will assign other > cores on this machine to other tasks, which may cause a com

Re: How to make HOD apply more than one core on each machine?

2010-04-15 Thread Hemanth Yamijala
Song, >     HOD is good, and can manage a large virtual cluster on a huge physical > cluster. but the problem is, it doesnt apply more than one core for each > machine, and I have already recieved complaint from our admin! > I assume what you want is the Map/Reduce cluster that is started by HOD