Re: Which version of Hadoop

2013-04-20 Thread Hemanth Yamijala
2.x.x provides NN high availability. http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html However, it is in alpha stage right now. Thanks hemanth On Sat, Apr 20, 2013 at 5:30 PM, Ascot Moss wrote: > Hi, > > I am new to Hadoop, from Hadoop do

Re: Errors about MRunit

2013-04-20 Thread Hemanth Yamijala
+ user@ Please do continue the conversation on the mailing list, in case others like you can benefit from / contribute to the discussion Thanks Hemanth On Sat, Apr 20, 2013 at 5:32 PM, Hemanth Yamijala wrote: > Hi, > > My code is working with having mrunit-0.9.0-incubating-hadoop1

Re: Create and write files on mounted HDFS via java api

2013-04-20 Thread Hemanth Yamijala
Sorry - no. I just wanted to know if you were using FUSE, because I knew of no other way of mounting HDFS.. Basically was wondering if some libraries needed to be system path for the Java programs to work. >From your response looks like you aren't using FUSE. So what are you using to mount ? Hema

Re: Errors about MRunit

2013-04-20 Thread Hemanth Yamijala
Hi, If your goal is to use the new API, I am able to get it to work with the following maven configuration: org.apache.mrunit mrunit 0.9.0-incubating hadoop1 If I switch with classifier hadoop2, I get the same errors as what you facing. Thanks Hemanth On Sat,

Re: Mapreduce

2013-04-20 Thread Hemanth Yamijala
As this is a HBase specific question, it will be better to ask this question on the HBase user mailing list. Thanks Hemanth On Fri, Apr 19, 2013 at 10:46 PM, Adrian Acosta Mitjans < amitj...@estudiantes.uci.cu> wrote: > Hello: > > I'm working in a proyect, and i'm using hbase for storage the da

Re: Create and write files on mounted HDFS via java api

2013-04-20 Thread Hemanth Yamijala
Are you using Fuse for mounting HDFS ? On Fri, Apr 19, 2013 at 4:30 PM, lijinlong wrote: > I mounted HDFS to a local directory for storage,that is /mnt/hdfs.I can do > the basic file operation such as create ,remove,copy etc just using linux > command and GUI.But when I tried to do the same thi

Re: jobtracker is stopping because of permissions

2013-04-20 Thread Hemanth Yamijala
/mnt/san1 - owned by aye, hadmin and user mapred is trying to write to this directory. Can you look at your core-, hdfs- and mapred-site.xml to see where /mnt/san1 is configured as a value - that might make it more clear what needs to be changed. I suspect this could be one of the system directori

Re: Run multiple HDFS instances

2013-04-18 Thread Hemanth Yamijala
Are you trying to implement something like namespace federation, that's a part of Hadoop 2.0 - http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-project-dist/hadoop-hdfs/Federation.html On Thu, Apr 18, 2013 at 10:02 PM, Lixiang Ao wrote: > Actually I'm trying to do something like combining mult

Re: How to configure mapreduce archive size?

2013-04-18 Thread Hemanth Yamijala
* ** > > Thanks, > > ** ** > > Jane > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Wednesday, April 17, 2013 9:11 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**

Re: Hadoop fs -getmerge

2013-04-17 Thread Hemanth Yamijala
I don't think that is possible. When we use -getmerge, the destination filesystem happens to be a LocalFileSystem which extends from ChecksumFileSystem. I believe that's why the CRC files are getting in. Would it not be possible for you to ignore them, since they have a fixed extension ? Thanks H

Re: How to configure mapreduce archive size?

2013-04-17 Thread Hemanth Yamijala
; I will contact them again. > > ** ** > > Thanks,**** > > ** ** > > Jane > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Tuesday, April 16, 2013 9:35 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configur

Re: How to configure mapreduce archive size?

2013-04-16 Thread Hemanth Yamijala
ou help? > > ** ** > > Thanks. > > ** ** > > Xia > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Thursday, April 11, 2013 9:09 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure map

Re: How to configure mapreduce archive size?

2013-04-11 Thread Hemanth Yamijala
onfiguration().set(TableOutputFormat.*OUTPUT_TABLE*, > tableName); > >job.setNumReduceTasks(0); > > > >*boolean* b = job.waitForCompletion(*true*); > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Thursday

Re: Copy Vs DistCP

2013-04-11 Thread Hemanth Yamijala
AFAIK, the cp command works fully from the DFS client. It reads bytes from the InputStream created when the file is opened and writes the same to the OutputStream of the file. It does not work at the level of data blocks. A configuration io.file.buffer.size is used as the size of the buffer used in

Re: How to configure mapreduce archive size?

2013-04-11 Thread Hemanth Yamijala
oot/mapred/local/archive already goes more than 1G now. Looks > like it does not do the work. Could you advise if what I did is correct?** > ** > > > > local.cache.size > > 50 > > > > Thanks, > > > > Xia > >

Re: How to configure mapreduce archive size?

2013-04-08 Thread Hemanth Yamijala
Hi, This directory is used as part of the 'DistributedCache' feature. ( http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However,

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
t? > > Alberto > > On 28 March 2013 13:12, Hemanth Yamijala > wrote: > > Hmm. That feels like a join. Can't you read the input file on the map > side > > and output those keys along with the original map output keys.. That way > the > > reducer would aut

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
eys a > particular reducers will receive. > So, my intention is to know the keys in the setup method to store only > the needed lines. > > Thanks, > Alberto > > > On 28 March 2013 11:01, Hemanth Yamijala > wrote: > > Hi, > > > > Not sure if

Re: Find reducer for a key

2013-03-28 Thread Hemanth Yamijala
Hi, Not sure if I am answering your question, but this is the background. Every MapReduce job has a partitioner associated to it. The default partitioner is a HashPartitioner. You can as a user write your own partitioner as well and plug it into the job. The partitioner is responsible for splittin

Re: Auto clean DistCache?

2013-03-27 Thread Hemanth Yamijala
I don't think it is documented in mapred-default.xml, where it should ideally be. I could see it only in code. You can take a look at it here, if you are interested: http://goo.gl/k5xsI Thanks Hemanth On Wed, Mar 27, 2013 at 7:07 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Oh! g

Re: Child JVM memory allocation / Usage

2013-03-27 Thread Hemanth Yamijala
or="./dump.sh" > attempt_201302211510_81218_m_00_0: # Executing /bin/sh -c > "./dump.sh"... > attempt_201302211510_81218_m_00_0: put: File myheapdump.hprof does not > exist. > attempt_201302211510_81218_m_00_0: log4j:WARN No appenders could b

Re: Child JVM memory allocation / Usage

2013-03-27 Thread Hemanth Yamijala
yList.ensureCapacity(ArrayList.java:167) > at java.util.ArrayList.add(ArrayList.java:351) > at > com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59) > ... 22 more > > > > > > On Wed, Mar 27, 2013 at 10:16 AM, Hemanth

Re: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Hemanth Yamijala
The stack trace indicates the job client is trying to submit a job to the MR cluster and it is failing. Are you certain that at the time of submitting the job, the JobTracker is running ? (On localhost:54312) ? Regarding using a different file system - it depends a lot on what file system you are

Re: Child JVM memory allocation / Usage

2013-03-26 Thread Hemanth Yamijala
pdump.hprof -XX:OnOutOfMemoryError=./dump.sh' > > This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi. > > Koji > > > On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote: > > > Hi, > > > > I tried to use the -XX:+HeapDumpOnOutOfMemoryE

Re: Child JVM memory allocation / Usage

2013-03-26 Thread Hemanth Yamijala
matching a pattern. However, these are NOT retaining the current working directory. Hence, there is no option to get this from a cluster AFAIK. You are effectively left with the jmap option on pseudo distributed cluster I think. Thanks Hemanth On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala

Re: Child JVM memory allocation / Usage

2013-03-25 Thread Hemanth Yamijala
aintained by third party. > I only have have a edge node through which I can submit the jobs. > > Is there any other way of getting the dump instead of physically going to > that machine and checking out. > > > > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <

Re: Child JVM memory allocation / Usage

2013-03-25 Thread Hemanth Yamijala
. So I am trying to read > the whole file and load it into list in the mapper. > > For each and every record Iook in this file which I got from distributed > cache. > > — > Sent from iPhone > > > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala < > yhema...@t

Re: Child JVM memory allocation / Usage

2013-03-25 Thread Hemanth Yamijala
tried out your suggestion loading 420 MB file into memory. It threw java > heap space error. > > I am not sure where this 1.6 GB of configured heap went to ? > > > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi,

Re: Child JVM memory allocation / Usage

2013-03-24 Thread Hemanth Yamijala
Hi, The free memory might be low, just because GC hasn't reclaimed what it can. Can you just try reading in the data you want to read and see if that works ? Thanks Hemanth On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi < nagarjuna.kanamarlap...@gmail.com> wrote: > io.sort.mb = 256

Re: About running a simple wordcount mapreduce

2013-03-24 Thread Hemanth Yamijala
Which version of Hadoop are you using. A quick search shows me a bug https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show similar symptoms. However, that was fixed a long while ago. On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui < reduno1...@googlemail.com> wrote: >

Re: MapReduce Failed and Killed

2013-03-24 Thread Hemanth Yamijala
Any MapReduce task needs to communicate with the tasktracker that launched it periodically in order to let the tasktracker know it is still alive and active. The time for which silence is tolerated is controlled by a configuration property mapred.task.timeout. It looks like in your case, this has

Re: Too many open files error with YARN

2013-03-21 Thread Hemanth Yamijala
hich says it is for backporting 3357 to branch 0.23 > > So, I don't understand whether the fix is really in 2.0.0-alpha, request > you to please clarify me. > > Thanks, > Kishore > > > > > > On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala < &g

Re: Too many open files error with YARN

2013-03-20 Thread Hemanth Yamijala
There was an issue related to hung connections (HDFS-3357). But the JIRA indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth checking on Sandy's suggestion On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza wrote: > Hi Kishore, > > 50010 is the datanode port. Does your lsof ind

Re: map reduce and sync

2013-02-24 Thread Hemanth Yamijala
7;d > rather keep it like this if I can make it work. > > Any idea besides hadoop version? > > Thanks! > > Lucas > > > > On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi Lucas, >> >> I tried somet

Re: map reduce and sync

2013-02-23 Thread Hemanth Yamijala
> > hadoop -fs -tail works just fine, and reading the file using > org.apache.hadoop.fs.FSDataInputStream also works ok. > > Last thing, the web interface doesn't see the contents, and command hadoop > -fs -ls says the file is empty. > > What am I doing wrong? > >

Re: Reg job tracker page

2013-02-23 Thread Hemanth Yamijala
Yes. It corresponds to the JT start time. Thanks hemanth On Sat, Feb 23, 2013 at 5:37 PM, Manoj Babu wrote: > Bharath, > I can understand that its time stamp. > what does identifier means? whether is holds the job tracker instance > started time? > > Cheers! > Manoj. > > > On Sat, Feb 23, 2013

Re: Trouble in running MapReduce application

2013-02-23 Thread Hemanth Yamijala
Can you try this ? Pick a class like WordCount from your package and execute this command: javap -classpath -verbose org.myorg.Wordcount | grep version. For e.g. here's what I get for my class: $ javap -verbose WCMapper | grep version minor version: 0 major version: 50 Please paste the out

Re: map reduce and sync

2013-02-22 Thread Hemanth Yamijala
Could you please clarify, are you opening the file in your mapper code and reading from there ? Thanks Hemanth On Friday, February 22, 2013, Lucas Bernardi wrote: > Hello there, I'm trying to use hadoop map reduce to process an open file. The > writing process, writes a line to the file and sync

Re: Hadoop efficient resource isolation

2013-02-21 Thread Hemanth Yamijala
Supporting a multiuser scenario like this is always hard under Hadoop. There are a few configuration knobs that offer some administrative control and protection. Specifically for the problem you describe, you could probably set Mapreduce.{map|reduce}.child.ulimit on the tasktrackers, so that any j

Re: How to add another file system in Hadoop

2013-02-21 Thread Hemanth Yamijala
I may be guessing here a bit. Basically a filesystem is identified by the protocol part of the URI of a file - so a file on the S3 filesystem will have a URI like s3://... If you look at the core-default.xml file in Hadoop source, you will see configuration keys like fs..impl and the value is a cla

Re: OutOfMemoryError during reduce shuffle

2013-02-21 Thread Hemanth Yamijala
from occurring, by disallowing the in-memory shuffle from using > up all the JVM heap. > > Is it possible that the continued existence of this OutOfMemoryError > represents a bug in ShuffleRamManager, or in some other code that is > intended to prevent this situation from occurring? &

Re: OutOfMemoryError during reduce shuffle

2013-02-20 Thread Hemanth Yamijala
There are a few tweaks In configuration that may help. Can you please look at http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Shuffle%2FReduce+Parameters Also, since you have mentioned reducers are unbalanced, could you use a custom partitioner to balance out the outputs. Or just increas

Re: JUint test failing in HDFS when building Hadoop from source.

2013-02-19 Thread Hemanth Yamijala
Hi, In the past, some tests have been flaky. It would be good if you can search jira and see whether this is a known issue. Else, please file it, and if possible, provide a patch. :) Regarding whether this will be a reliable build, it depends a little bit on what you are going to use it for. For

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
, 2013, Fatih Haltas wrote: > Yes i reorganized the packages but still i am getting same error my hadoop > version is 1.0.4 > > 19 Şubat 2013 Salı tarihinde Hemanth Yamijala adlı kullanıcı şöyle yazdı: > > I am not sure if that will actually work, because the class is defined to >

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
a:266) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > > > > On Tue, Feb 19, 2013 at 8:10 PM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > > Sorry. I did not read the mail correctly. I think the error is in how the > jar has been

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
Sorry. I did not read the mail correctly. I think the error is in how the jar has been created. The classes start with root as wordcount_classes, instead of org. Thanks Hemanth On Tuesday, February 19, 2013, Hemanth Yamijala wrote: > Have you used the Api setJarByClass in your main prog

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
Have you used the Api setJarByClass in your main program? http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/Job.html#setJarByClass(java.lang.Class) On Tuesday, February 19, 2013, Fatih Haltas wrote: > Hi everyone, > > I know this is the common mistake to not specify the class

Re: Database insertion by HAdoop

2013-02-19 Thread Hemanth Yamijala
Hemanth sir. BTW, what exactly is > the kind of processing which you are planning to do on your data. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala < > yhema...@thoughtworks.co

Re: Database insertion by HAdoop

2013-02-18 Thread Hemanth Yamijala
08, > and we dont need to develop a professional app, we just need to develop it > fast and make our experiment result soon. > Thanks > > > On 02/18/2013 11:58 PM, Hemanth Yamijala wrote: > > What database is this ? Was hbase mentioned ? > > On Monday, February 18, 2013,

Re: Database insertion by HAdoop

2013-02-18 Thread Hemanth Yamijala
What database is this ? Was hbase mentioned ? On Monday, February 18, 2013, Mohammad Tariq wrote: > Hello Masoud, > > You can use the Bulk Load feature. You might find it more > efficient than normal client APIs or using the TableOutputFormat. > > The bulk load feature uses a MapReduce

Re: How to install Oozie 3.3.1 on Hadoop 1.1.1

2013-02-15 Thread Hemanth Yamijala
Hi, It may be useful to post this question on the oozie user mailing list. There are likely to be more expert users there. u...@oozie.apache.org Thanks Hemanth On Friday, February 15, 2013, anand verma wrote: > Hi, > > I am struggling for many days to install Oozie 3.3.1 on Hadoop 1.1.1. > Oozi

Re: How to understand DataNode usages ?

2013-02-14 Thread Hemanth Yamijala
This seems to be related to the % used capacity at a datanode. The values are computed for all the live datanodes, and the range / central limits / deviations are computed based on a sorted list of the values. Thanks hemanth On Thu, Feb 14, 2013 at 2:42 PM, Dhanasekaran Anbalagan wrote: > Hi Gu

Re: Java submit job to remote server

2013-02-12 Thread Hemanth Yamijala
Can you please include the complete stack trace and not just the root. Also, have you set fs.default.name to a hdfs location like hdfs://localhost:9000 ? Thanks Hemanth On Wednesday, February 13, 2013, Alex Thieme wrote: > Thanks for the prompt reply and I'm sorry I forgot to include the > excep

Re: Confused about splitting

2013-02-10 Thread Hemanth Yamijala
Adding on to the response, looking at the existing source code of LineRecordReader, which has a similar function to read across HDFS blocks to align with line boundaries may also help you to write similar code. Harsh had responded with more specific details as to where to look on the list before. F

Re: Cannot use env variables in "hodrc"

2013-02-08 Thread Hemanth Yamijala
Hi, Hadoop On Demand is no longer supported with recent releases of Hadoop. There is no separate user list for HOD related questions. Which version of Hadoop are you using right now ? Thanks hemanth On Wed, Feb 6, 2013 at 8:59 PM, Mehmet Belgin wrote: > Hello again, > > Considering that I hav

Re: Issue with running hadoop program using eclipse

2013-01-31 Thread Hemanth Yamijala
Previously, I have resolved this error by building a jar and then using the API job.setJarByClass(.class). Can you please try that once ? On Thu, Jan 31, 2013 at 6:40 PM, Vikas Jadhav wrote: > Hi I know it class not found error > but I have Map and reduce Class as part of Driver class > So what

Re: Filesystem closed exception

2013-01-30 Thread Hemanth Yamijala
er code not to close the FS. > It will go away when the task ends anyway. > > Thx > > > On Thu, Jan 24, 2013 at 5:26 PM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi, >> >> We are noticing a problem where we get a filesystem closed exc

Re: How to find Blacklisted Nodes via cli.

2013-01-30 Thread Hemanth Yamijala
Hi, Part answer: you can get the blacklisted tasktrackers using the command line: mapred job -list-blacklisted-trackers. Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if it works fine after some time. Though I am not very sure about this. Thanks hemanth On Wed, Jan 30,

Re: TT nodes distributed cache failure

2013-01-25 Thread Hemanth Yamijala
Could you post the stack trace from the job logs. Also looking at the task tracker logs on the failed nodes may help. Thanks Hemanth On Friday, January 25, 2013, Terry Healy wrote: > Running hadoop-0.20.2 on a 20 node cluster. > > When running a Map/Reduce job that uses several .jars loaded into

Re: mappers-node relationship

2013-01-25 Thread Hemanth Yamijala
This may beof some use, about how maps are decided: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Thanks Hemanth On Friday, January 25, 2013, jamal sasha wrote: > Hi. > A very very lame question. > Does numbers of mapper depends on the number of nodes I have? > How I imagine map-reduce

Re: Filesystem closed exception

2013-01-25 Thread Hemanth Yamijala
gt; > On Fri, Jan 25, 2013 at 6:56 AM, Hemanth Yamijala > wrote: > > Hi, > > > > We are noticing a problem where we get a filesystem closed exception > when a > > map task is done and is finishing execution. By map task, I literally > mean > > the MapTask clas

Filesystem closed exception

2013-01-24 Thread Hemanth Yamijala
Hi, We are noticing a problem where we get a filesystem closed exception when a map task is done and is finishing execution. By map task, I literally mean the MapTask class of the map reduce code. Debugging this we found that the mapper is getting a handle to the filesystem object and itself calli

Re: Where do/should .jar files live?

2013-01-22 Thread Hemanth Yamijala
On top of what Bejoy said, just wanted to add that when you submit a job to Hadoop using the hadoop jar command, the jars which you reference in the command on the edge/client node will be picked up by Hadoop and made available to the cluster nodes where the mappers and reducers run. Thanks Hemant

Re: passing arguments to hadoop job

2013-01-21 Thread Hemanth Yamijala
t, Reporter reporter) throws > IOException { > int sum =* baseSum*; > while (values.hasNext()) { > sum += values.next().get(); > } > output.collect(key, new IntWritable(sum)); > } > } > > On Mon, Jan 21, 2013 at 8:29 PM, Hemanth Yamijala

Re: passing arguments to hadoop job

2013-01-21 Thread Hemanth Yamijala
Hi, Please note that you are referring to a very old version of Hadoop. the current stable release is Hadoop 1.x. The API has changed in 1.x. Take a look at the wordcount example here: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Example%3A+WordCount+v2.0 But, in principle your meth

Re: How to unit test mappers reading data from DistributedCache?

2013-01-17 Thread Hemanth Yamijala
Hi, Not sure how to do it using MRUnit, but should be possible to do this using a mocking framework like Mockito or EasyMock. In a mapper (or reducer), you'd use the Context classes to get the DistributedCache files. By mocking these to return what you want, you could potentially run a true unit t

Re: tcp error

2013-01-16 Thread Hemanth Yamijala
failed when I tried to open it. Restarting the daemons helped. I don't think this problem will come in a normal up-and-running production cluster. Thanks hemanth On Thu, Jan 17, 2013 at 9:48 AM, Hemanth Yamijala wrote: > At the place where you get the error, can you cross check what th

Re: tcp error

2013-01-16 Thread Hemanth Yamijala
At the place where you get the error, can you cross check what the URL is that is being accessed ? Also, can you compare it with the URL with pages before this that work ? Thanks hemanth On Thu, Jan 17, 2013 at 1:08 AM, jamal sasha wrote: > I am inside a network where I need proxy settings to

Re: config file loactions in Hadoop 2.0.2

2013-01-15 Thread Hemanth Yamijala
Hi, One place where I could find the capacity-scheduler.xml was from source - hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/resources. AFAIK, the masters file is only used for starting the secondary namenode - which has in 2.x been replaced by a pr

Re: FileSystem.workingDir vs mapred.local.dir

2013-01-15 Thread Hemanth Yamijala
Hi, AFAIK, the mapred.local.dir property refers to a set of directories under which different types of data related to mapreduce jobs are stored - for e.g. intermediate data, localized files for a job etc. The working directory for a mapreduce job is configured under a sub directory within one of

Re: Compile error using contrib.utils.join package with new mapreduce API

2013-01-15 Thread Hemanth Yamijala
in 2.x and trunk. Could you check if this provides functionality you require - so we at least know there is new API support in later versions ? Thanks Hemanth On Mon, Jan 14, 2013 at 7:45 PM, Hemanth Yamijala wrote: > Hi, > > No. I didn't find any reference to a working sample.

Re: Compile error using contrib.utils.join package with new mapreduce API

2013-01-14 Thread Hemanth Yamijala
.co.uk> wrote: > Thanks Hemanth > > ** ** > > I appreciate your response > > Did you find any working example of it in use? It looks to me like I’d > still be tied to the old API > > Thanks**** > > Mike > > ** ** > > *From:* Hemanth

Re: log server for hadoop MR jobs??

2013-01-13 Thread Hemanth Yamijala
To add to that, log aggregation is a feature available with Hadoop 2.0 (where mapreduce is re-written to YARN). The functionality is available via the History Server: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html Thanks hemanth On Sat, Jan 12, 2013 a

Re: queues in haddop

2013-01-11 Thread Hemanth Yamijala
Queues in the capacity scheduler are logical data structures into which MapReduce jobs are placed to be picked up by the JobTracker / Scheduler framework, according to some capacity constraints that can be defined for a queue. So, given your use case, I don't think Capacity Scheduler is going to d

Re: JobCache directory cleanup

2013-01-11 Thread Hemanth Yamijala
11, 2013 at 3:28 PM, Ivan Tretyakov wrote: > Thanks for replies! > > keep.failed.task.files set to false. > Config of one of the jobs attached. > > > On Fri, Jan 11, 2013 at 5:44 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Good point. F

Re: JobCache directory cleanup

2013-01-10 Thread Hemanth Yamijala
Good point. Forgot that one :-) On Thu, Jan 10, 2013 at 10:53 PM, Vinod Kumar Vavilapalli < vino...@hortonworks.com> wrote: > > > Can you check the job configuration for these ~100 jobs? Do they have > keep.failed.task.files set to true? If so, these files won't be deleted. If > it doesn't, it c

Re: Not committing output in map reduce

2013-01-10 Thread Hemanth Yamijala
Is this the same as: http://stackoverflow.com/questions/6137139/how-to-save-only-non-empty-reducers-output-in-hdfs? i.e. LazyOutputFormat, etc. ? On Thu, Jan 10, 2013 at 4:51 PM, Pratyush Chandra < chandra.praty...@gmail.com> wrote: > Hi, > > I am using s3n as file system. I do not wish to crea

Re: JobCache directory cleanup

2013-01-10 Thread Hemanth Yamijala
I just verified it with my Hadoop 1.0.2 version Thanks Hemanth > > > On Thu, Jan 10, 2013 at 8:18 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi, >> >> The directory name you have provided is >> /data?/mapred/local/taskTracker/perso

Re: JobCache directory cleanup

2013-01-09 Thread Hemanth Yamijala
Hi, The directory name you have provided is /data?/mapred/local/taskTracker/persona/jobcache/. This directory is used by the TaskTracker (slave) daemons to localize job files when the tasks are run on the slaves. Hence, I don't think this is related to the parameter "mapreduce.jobtracker.retiredj

Re: Why the official Hadoop Documents are so messy?

2013-01-08 Thread Hemanth Yamijala
Hi, I am not sure if your complaint is as much about the changing interfaces as it is about documentation. Please note that versions prior to 1.0 did not have stable interfaces as a major requirement. Not by choice, but because the focus was on seemingly more important functionality, stability, p

Re: Differences between 'mapped' and 'mapreduce' packages

2013-01-07 Thread Hemanth Yamijala
>From a user perspective, at a high level, the mapreduce package can be thought of as having user facing client code that can be invoked, extended etc as applicable from client programs. The mapred package is to be treated as internal to the mapreduce system, and shouldn't directly be used unless

Re: Reg: Fetching TaskAttempt Details from a RunningJob

2013-01-07 Thread Hemanth Yamijala
Hi, In Hadoop 1.0, I don't think this information is exposed. The TaskInProgress is an internal class and hence cannot / should not be used from client applications. The only way out seems to be to screen scrape the information from the Jobtracker web UI. If you can live with completed events, th

Re: Skipping entire task

2013-01-06 Thread Hemanth Yamijala
Hi, Are tasks being executed multiple times due to failures? Sorry, it was not very clear from your question. Thanks hemanth On Sat, Jan 5, 2013 at 7:44 PM, David Parks wrote: > Thinking here... if you submitted the task programmatically you should be > able to capture the failure of the task

Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Hemanth Yamijala
If it is a small number, A seems the best way to me. On Friday, December 28, 2012, Kshiva Kps wrote: > > Which one is current .. > > > What is the preferred way to pass a small number of configuration > parameters to a mapper or reducer? > > > > > > *A. *As key-value pairs in the jobconf object.

Re: Selecting a task for the tasktracker

2012-12-27 Thread Hemanth Yamijala
Hi, Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own schedu

Re: What does mapred.map.tasksperslot do?

2012-12-27 Thread Hemanth Yamijala
David, Could you please tell what version of Hadoop you are using ? I don't see this parameter in the stable (1.x) or current branch. I only see references to it with respect to EMR and with Hadoop 0.18 or so. On Thu, Dec 27, 2012 at 1:51 PM, David Parks wrote: > I didn’t come up with much in

Re: Sane max storage size for DN

2012-12-13 Thread Hemanth Yamijala
This is a dated blog post, so it would help if someone with current HDFS knowledge can validate it: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/ . There is a bit about the RAM required for the Namenode and how to compute it: You can look at the 'Namespace

Re: "attempt*" directories in user logs

2012-12-10 Thread Hemanth Yamijala
However, in the case Oleg is talking about the attempts are: attempt_201212051224_0021_m_00_0 attempt_201212051224_0021_m_02_0 attempt_201212051224_0021_m_03_0 These aren't multiple attempts of a single task, are they ? They are actually different tasks. If they were multiple attempts,

Re: Map tasks processing some files multiple times

2012-12-06 Thread Hemanth Yamijala
gh to work out what I had done. > > ** ** > > Dave**** > > ** ** > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Thursday, December 06, 2012 3:25 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: Map tasks processing some files

Re: Map tasks processing some files multiple times

2012-12-06 Thread Hemanth Yamijala
David, You are using FileNameTextInputFormat. This is not in Hadoop source, as far as I can see. Can you please confirm where this is being used from ? It seems like the isSplittable method of this input format may need checking. Another thing, given you are adding the same input format for all f

Re: Issue with third party library

2012-12-05 Thread Hemanth Yamijala
Sampath, You mentioned that the file is present in the tasktracker local dir, could you please tell us the full path ? I am wondering if setting the full path will have any impact, rather than specifying the relative path. Another option may be to try to use the addCacheArchive and createSymLink

Re: Changing hadoop configuration without restarting service

2012-12-04 Thread Hemanth Yamijala
Generally true for the framework config files, but some of the supplementary features can be refreshed without restart. For e.g. scheduler configuration, host files (for included / excluded nodes) ... On Tue, Dec 4, 2012 at 5:33 AM, Cristian Cira wrote: > No. You will have to restart hadoop. Hot

Re: Using Hadoop infrastructure with input streams instead of key/value input

2012-12-04 Thread Hemanth Yamijala
Hi, I have not tried this myself before, but would libhdfs help ? http://hadoop.apache.org/docs/stable/libhdfs.html Thanks Hemanth On Mon, Dec 3, 2012 at 9:52 PM, Wheeler, Bill NPO < bill.npo.whee...@intel.com> wrote: > I am trying to use Hadoop’s partitioning/scheduling/storage > infrastruc

Re: hadoop current properties

2012-11-29 Thread Hemanth Yamijala
It is coming from the default properties file - mapred-default.xml. The order of loading configuration in Hadoop is default.xml > site.xml > job.xml. mapred.task.tracker.report.address 127.0.0.1:0 The interface and port that task tracker server listens on. Since it is only connected to by

Re: Failed to call hadoop API

2012-11-29 Thread Hemanth Yamijala
Hi, Little confused about where JNI comes in here (you mentioned this in your original email). Also, where do you want to get the information for the hadoop job ? Is it in a program that is submitting a job, or some sort of monitoring application that is monitoring jobs submitted to a cluster by o

Re: problem using s3 instead of hdfs

2012-10-16 Thread Hemanth Yamijala
odAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > > > On Tue, Oct 16, 2012 at 3:11 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi, >> >> I've n

Re: problem using s3 instead of hdfs

2012-10-16 Thread Hemanth Yamijala
Hi, I've not tried this on S3. However, the directory mentioned in the exception is based on the value of this particular configuration key: mapreduce.jobtracker.staging.root.dir. This defaults to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3 location and try ? Thanks Hemanth

Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file

2012-10-04 Thread Hemanth Yamijala
Hi, Roughly, this information will be available under the 'Hadoop map task list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is what you are using). You can reach this page by selecting the running tasks link from the job information page. The page has a table that lists all t

Re: Submitting a job to a remote cluster

2012-10-04 Thread Hemanth Yamijala
Hi, Could you please share your setup details - i.e. how many slaves, how many datanodes and tasktrackers. Also, the configuration - in particular hdfs-site.xml ? To answer your question: the datanode address is picked up from hdfs-site.xml, or hdfs-default.xml from the property dfs.datanode.addr

Re: hadoop issue on distributed cluster

2012-10-04 Thread Hemanth Yamijala
Hi, Didn't check everything. But found this in the mapred-site.xml: mapred.job.tracker hdfs://10.99.42.9:8021/ true The value shouldn't be a HDFS URL. Can you please fix this and try ? On Thu, Oct 4, 2012 at 12:32 PM, Ajit Kumar Shreevastava < ajit.shreevast...@hcl.com> wrote: > Hi All,**

  1   2   >