Pre-tasks to redirecting to JobHistoryServer

2016-01-24 Thread Prashant Kommireddi
Hello folks, Have a question - what classes can I look at to understand the the way in which application counters/logs are copied to JHS before redirecting clients to it? Thanks, Prashant

Re: Pig 0.14.0 on Hadoop 2.6.0 deprecation errors

2015-05-12 Thread Prashant Kommireddi
Something that needs correction, just that no one has gotten around to doing it. Please feel free to open a JIRA, even better if you would like to contribute a fix. On Tuesday, May 12, 2015, Anand Murali anand_vi...@yahoo.com wrote: Oliver: Many thanks for reply. If it is not an error why is

Re: max number of application master in YARN

2015-04-30 Thread Prashant Kommireddi
Take a look at yarn.scheduler.capacity.maximum-am-resource-percent On Thu, Apr 30, 2015 at 11:38 AM, Shushant Arora shushantaror...@gmail.com wrote: Is there any configuration in MR2 and YARN to limit concurrent max applications by setting max limit on ApplicationMasters in the cluster?

ProcfsBasedProcessTree

2014-05-28 Thread Prashant Kommireddi
What's does ProcfsBasedProcessTree do? Trying to understand a bunch of these messages in the logs of a job that is stuck forever May 25, 2014 4:01:51 AM org.apache.hadoop.yarn.util.ProcfsBasedProcessTree constructProcessInfo INFO: The process 22793 may have finished in the interim. May 25, 2014

Re: Rewriting Ab-Initio scripts using Hadoop MapReduce

2013-12-27 Thread Prashant Kommireddi
What specific info are you looking for? On Monday, December 23, 2013, Manoj Babu wrote: Hi All, Can anybody share their experience on Rewriting Ab-Initio scripts using Hadoop MapReduce? Cheers! Manoj.

Re: ResourceManager webapp code runs OOM

2013-10-22 Thread Prashant Kommireddi
also be increased. HTH Ravi On Monday, October 21, 2013 5:54 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hello, We are noticing the RM running out of memory in the webapp code. It happens in org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.renderBlock(Block html

ResourceManager webapp code runs OOM

2013-10-21 Thread Prashant Kommireddi
Hello, We are noticing the RM running out of memory in the webapp code. It happens in org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.renderBlock(Block html). The StringBuilder object appsTableData grows too large in this case while appending AppInfo. Ignoring the heap size (this

Re: Yarn log directory perms

2013-09-14 Thread Prashant Kommireddi
, September 13, 2013, Harsh J wrote: This is true for MRv1 too, and is done so for security reasons. On Sat, Sep 14, 2013 at 2:37 AM, Prashant Kommireddi prash1...@gmail.com javascript:; wrote: Hey guys, It looks like the default perms for app/container dirs is set to 710

Yarn log directory perms

2013-09-13 Thread Prashant Kommireddi
Hey guys, It looks like the default perms for app/container dirs is set to 710 and is not configurable. From DefaultContainerExecutor /** Permissions for user log dir. * $logdir/$user/$appId */ private static final short LOGDIR_PERM = (short)0710; Any reasons for not having this be a

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Prashant Kommireddi
the ResourceManager is in the appropriate position to judge failure of AM v/s failure-of-job. hth, Arun On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com wrote: Thanks Ravi. Well, in this case its a no-effort :) A failure of AM init should

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-22 Thread Prashant Kommireddi
Following-up on this. Please let me know if this is expected/bug and if you would like me to file a JIRA On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi prash1...@gmail.comwrote: Hello, I came across an issue that occurs with the job notification callbacks in MR2. It works fine

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-22 Thread Prashant Kommireddi
the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch. Thanks Ravi -- *From:* Prashant Kommireddi prash1...@gmail.com *To:* user@hadoop.apache.org user@hadoop.apache.org *Sent

DFS Permissions on Hadoop 2.x

2013-06-18 Thread Prashant Kommireddi
Hello, We just upgraded our cluster from 0.20.2 to 2.x (with HA) and had a question around disabling dfs permissions on the latter version. For some reason, setting the following config does not seem to work property namedfs.permissions.enabled/name valuefalse/value /property

Re: DFS Permissions on Hadoop 2.x

2013-06-18 Thread Prashant Kommireddi
for submitting a new bug report to HDFS. Thanks! Chris Nauroth Hortonworks http://hortonworks.com/ On Tue, Jun 18, 2013 at 12:14 PM, Leo Leung lle...@ddn.com wrote: I believe, the properties name should be “dfs.permissions” ** ** ** ** *From:* Prashant Kommireddi [mailto:prash1

Re: DFS Permissions on Hadoop 2.x

2013-06-18 Thread Prashant Kommireddi
+ ]); throw new YarnException(e); } In any case, this does not appear to be the right behavior as it does not respect dfs.permissions.enabled (set to false) at any point. Sounds like a bug? Thanks, Prashant On Tue, Jun 18, 2013 at 3:24 PM, Prashant Kommireddi prash1

MiniYARNCluster logs

2013-05-24 Thread Prashant Kommireddi
Hey guys, We are using the MiniYARNCluster and trying to see where the NN, RM, job logs can be found. We see the job logs are present on HDFS but not on any local dirs. Also, none of the master node logs (NN, RM) are available. Digging in a bit further (just looked at this 1 file), I see there

Re: Can anyone point me to a good Map Reduce in memory Join implementation?

2013-02-15 Thread Prashant Kommireddi
Specifically, replicated join - http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins On Fri, Feb 15, 2013 at 6:22 PM, David Boyd db...@lorenzresearch.comwrote: Use PIG it has specific directives for in memory joins of small data sets. The whole thing might require a half a dozen

Re: [Hadoop-Help]About Map-Reduce implementation

2013-02-14 Thread Prashant Kommireddi
Hi mayur, Flume is used for data collection. Pig is used for data processing. For eg, if you have a bunch of servers that you want to collect the logs from and push to HDFS - you would use flume. Now if you need to run some analysis on that data, you could use pig to do that. Sent from my iPhone

Re: Job history logging

2012-09-17 Thread Prashant Kommireddi
? On Sat, Sep 15, 2012 at 3:07 AM, Prashant Kommireddi prash1...@gmail.com wrote: Hi All, I have a question about job history logging. Seems like history logging is disabled if file creation fails, is there a reason this is done? The following snippet is from JobHistory.JobInfo.logSubmitted

Re: Log file parsing

2012-08-16 Thread Prashant Kommireddi
Take a look at Pig's HadoopJobHistoryLoader http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.html On Thu, Aug 16, 2012 at 9:34 PM, peter zhangju...@gmail.com wrote: Now, no utilities for job tracker log . -- peter Sent with Sparrow

Re: IOException: too many length or distance symbols

2012-07-29 Thread Prashant Kommireddi
context on how these files were written, etc.? Perhaps open a JIRA with a sample file and test-case to reproduce this? Other env stuff with info on version of hadoop, etc. would help too. On Sat, Jul 21, 2012 at 2:05 AM, Prashant Kommireddi prash1...@gmail.com wrote: I am seeing

Re: EOFException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)......

2012-05-27 Thread Prashant Kommireddi
I have seen this issue with large file writes using SequenceFile writer. Not found the same issue when testing with writing fairly small files ( 1GB). On Fri, May 25, 2012 at 10:33 PM, Kasi Subrahmanyam kasisubbu...@gmail.comwrote: Hi, If you are using a custom writable object while passing

Re: Namenode EOF Exception

2012-05-15 Thread Prashant Kommireddi
that was a few years ago) carrying these fixes. You ought to upgrade that cluster to the current stable release for the many fixes you can benefit from :) On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi prash1...@gmail.com wrote: Thanks Harsh. I am using 0.20.2, I see on the Jira

Re: Namenode EOF Exception

2012-05-14 Thread Prashant Kommireddi
are you running? Cause AFAIK most of the recent stable versions/distros include NN resource monitoring threads which should have placed your NN into safemode the moment all its disks ran near to out of space. On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi

Re: java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread Prashant Kommireddi
You might be running out of disk space. Check for that on your cluster nodes. -Prashant On Fri, May 11, 2012 at 12:21 AM, JunYong Li lij...@gmail.com wrote: is there errors in the task outpu file? on the jobtracker.jsp click the Jobid link - tasks link - Taskid link - Task logs link

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Prashant Kommireddi
Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I

Re: Compressing map only output

2012-04-30 Thread Prashant Kommireddi
Yes. These are hadoop properties - using set is just a way for Pig to set those properties in your job conf. On Mon, Apr 30, 2012 at 5:25 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Is there a way to compress map only jobs to compress map output that gets stored on hdfs as part-m-* files?

Re: Distributing MapReduce on a computer cluster

2012-04-23 Thread Prashant Kommireddi
Shailesh, there's a lot that goes into distributing work across tasks/nodes. It's not just distributing work but also fault-tolerance, data locality etc that come into play. It might be good to refer Hadoop apache docs or Tom White's definitive guide. Sent from my iPhone On Apr 23, 2012, at

Re: Jobtracker history logs missing

2012-04-09 Thread Prashant Kommireddi
Anyone faced similar issue or knows what the issue might be? Thanks in advance. On Thu, Apr 5, 2012 at 10:52 AM, Prashant Kommireddi prash1...@gmail.comwrote: Thanks Nitin. I believe the config key you mentioned controls the task attempts logs that go under - ${hadoop.log.dir}/userlogs

Re: Data Node is not Started

2012-04-06 Thread Prashant Kommireddi
Can you check the datanode logs? May its an incompatible namespace issue. On Apr 6, 2012, at 11:13 AM, Sujit Dhamale sujitdhamal...@gmail.com wrote: Hi all, my DataNode is not started . even after deleting hadoop*.pid file from /tmp , But still Data node is not started , Hadoop Version:

Re: Jobtracker history logs missing

2012-04-05 Thread Prashant Kommireddi
On Thu, Apr 5, 2012 at 3:22 AM, Nitin Khandelwal nitin.khandel...@germinait.com wrote: Hi Prashant, The userlogs for job are deleted after time specified by * mapred.userlog.retain.hours* property defined in mapred-site.xml (default is 24 Hrs). Thanks, Nitin On 5 April 2012 14:26, Prashant

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Prashant Kommireddi
Answers inline. On Wed, Apr 4, 2012 at 4:56 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am going through the chapter How mapreduce works and have some confusion: 1) Below description of Mapper says that reducers get the output file using HTTP call. But the description under The Reduce

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Prashant Kommireddi
Hi Mohit, What would be the advantage? Reducers in most cases read data from all the mappers. In the case where mappers were to write to HDFS, a reducer would still require to read data from other datanodes across the cluster. Prashant On Apr 4, 2012, at 9:55 PM, Mohit Anchlia

Re: Using a combiner

2012-03-14 Thread Prashant Kommireddi
It is a function of the number of spills on map side and I believe the default is 3. So for every 3 times data is spilled the combiner is run. This number is configurable. Sent from my iPhone On Mar 14, 2012, at 3:26 PM, Gayatri Rao rgayat...@gmail.com wrote: Hi all, I have a quick query on

Re: 100x slower mapreduce compared to pig

2012-02-28 Thread Prashant Kommireddi
It would be great if we can take a look at what you are doing in the UDF vs the Mapper. 100x slow does not make sense for the same job/logic, its either the Mapper code or may be the cluster was busy at the time you scheduled MapReduce job? Thanks, Prashant On Tue, Feb 28, 2012 at 4:11 PM,

Re: Adding mahout math jar to hadoop mapreduce execution

2012-01-30 Thread Prashant Kommireddi
How are you building the mapreduce jar? Try not to include the Mahout dist while building MR jar, and include it only on -libjars option. On Mon, Jan 30, 2012 at 10:33 PM, Daniel Quach danqu...@cs.ucla.edu wrote: I have been compiling my mapreduce with the jars in the classpath, and I believe

Re: Killing hadoop jobs automatically

2012-01-29 Thread Prashant Kommireddi
You might want to take a look at the kill command : hadoop job -kill jobid. Prashant On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar praveen...@gmail.comwrote: Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If

Re: Parallel CSV loader

2012-01-24 Thread Prashant Kommireddi
I am assuming you want to move data between Hadoop and database. Please take a look at Sqoop. Thanks, Prashant Sent from my iPhone On Jan 24, 2012, at 9:19 AM, Edmon Begoli ebeg...@gmail.com wrote: I am looking to use Hadoop for parallel loading of CSV file into a non-Hadoop, parallel

Re: hadoop filesystem cache

2012-01-14 Thread Prashant Kommireddi
You mean something different from the DistributedCache? Sent from my iPhone On Jan 14, 2012, at 5:30 PM, Rita rmorgan...@gmail.com wrote: After reading this article, http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was wondering if there was a filesystem cache for hdfs.

Re: increase number of map tasks

2012-01-12 Thread Prashant Kommireddi
of spills - how do we avoid them? Depends on what is causing the spills. You can have spills on Map and Reduce side, and adjusting config properties such io.sort.mb, io.sort.factor, and a few others on the Reduce side. Tom White's book has a good explanation on these. Thanks, Prashant Kommireddi

Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum

2012-01-10 Thread Prashant Kommireddi
Hi Hao, Ideally you would want to leave out a core each for Tasktracker and Datanode process' on each node. The rest could be used for maps and reducers. Thanks, Prashant 2012/1/10 hao.wang hao.w...@ipinyou.com Hi, Thanks for your help, your suggestion is very usefully. I have another

Re: Hadoop MySQL database access

2011-12-28 Thread Prashant Kommireddi
By design reduce would start only after all the maps finish. There is no way for the reduce to begin grouping/merging by key unless all the maps have finished. Sent from my iPhone On Dec 28, 2011, at 8:53 AM, JAGANADH G jagana...@gmail.com wrote: Hi All, I wrote a map reduce program to fetch

Re: Another newbie - problem with grep example

2011-12-23 Thread Prashant Kommireddi
Seems like you do not have /user/MyId/input/conf on HDFS. Try this. cd $HADOOP_HOME_DIR (this should be your hadoop root dir) hadoop fs -put conf input/conf And then run the MR job again. -Prashant Kommireddi On Fri, Dec 23, 2011 at 3:40 PM, Pat Flaherty p...@well.com wrote: Hi, Installed

Re: Configure hadoop scheduler

2011-12-20 Thread Prashant Kommireddi
I am guessing you are trying to use the FairScheduler but you have specified CapacityScheduler in your configuration. You need to change mapreduce.jobtracker.scheduler to FairScheduler. Sent from my iPhone On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote: Hi, I am having

Re: Regarding pointers for LZO compression in Hive and Hadoop

2011-12-14 Thread Prashant Kommireddi
http://code.google.com/p/hadoop-gpl-packing/ Thanks, Prashant On Wed, Dec 14, 2011 at 11:32 AM, Abhishek Pratap Singh manu.i...@gmail.com wrote: Hi, I m looking for some useful docs on enabling LZO on hadoop cluster. I tried few of the blogs, but somehow its not working. Here is my

Re: More cores Vs More Nodes ?

2011-12-13 Thread Prashant Kommireddi
Hi Brad, how many taskstrackers did you have on each node in both cases? Thanks, Prashant Sent from my iPhone On Dec 13, 2011, at 9:42 AM, Brad Sarsfield b...@bing.com wrote: Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult

Re: Create a single output per each mapper

2011-12-12 Thread Prashant Kommireddi
Take a look at cleanup() method on Mapper. Thanks, Prashant Sent from my iPhone On Dec 12, 2011, at 8:46 PM, Shi Yu sh...@uchicago.edu wrote: Hi, Suppose I have two mappers, each mapper is assigned 10 lines of data. I want to set a counter for each mapper, counting and accumulating, then

Re: OOM Error Map output copy.

2011-12-09 Thread Prashant Kommireddi
Arun, I faced the same issue and increasing the # of reducers fixed the problem. I was initially under the impression MR framework spills to disk if data is too huge to keep in memory, however on extraordinarily large reduce inputs this was not the case and the job failed on trying to assign the

Re: Hadoop Comic

2011-12-07 Thread Prashant Kommireddi
Here you go https://docs.google.com/viewer?a=vpid=explorerchrome=truesrcid=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1hl=en_USpli=1 Thanks, Prashant On Wed, Dec 7, 2011 at 1:47 AM, shreya@cognizant.com wrote: Hi, Can someone please send me the Hadoop comic. Saw

Re: how to integrate snappy into hadoop

2011-12-07 Thread Prashant Kommireddi
sudo make install 1862 sudo make installcheck 1904 sudo apt-get install libtool 1907 sudo apt-get install automake Thanks, Prashant Kommireddi On Wed, Dec 7, 2011 at 5:39 PM, Jinyan Xu jinyan...@exar.com wrote: Hi , Anyone else have the experience integrating snappy into hadoop ? help

Re: how to integrate snappy into hadoop

2011-12-07 Thread Prashant Kommireddi
I have not tried it with HBase, and yes 0.20.2 is not compatible with it. What is the error you receive when you try compiling Snappy? I don't think compiling Snappy would be dependent on HBase. 2011/12/7 Jinyan Xu jinyan...@exar.com Hi Prashant Kommireddi, Last week, I read build-hadoop-from

Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
Thanks Maneesh. Quick question, does a client really need to know Block size and replication factor - A lot of times client has no control over these (set at cluster level) -Prashant Kommireddi On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.comwrote: Hi Maneesh, Thanks

Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
to create a file is void create(, short replication, long blocksize); I presume it means that the client already has knowledge of these values and passes them to the NameNode when creating a new file. Hope that helps. thanks -Maneesh On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi

Re: Passing data files via the distributed cache

2011-11-25 Thread Prashant Kommireddi
I believe you want to ship data to each node in your cluster before MR begins so the mappers can access files local to their machine. Hadoop tutorial on YDN has some good info on this. http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata -Prashant Kommireddi On Fri, Nov 25, 2011 at 1