Hello folks,
Have a question - what classes can I look at to understand the the way in
which application counters/logs are copied to JHS before redirecting
clients to it?
Thanks,
Prashant
Something that needs correction, just that no one has gotten around to
doing it. Please feel free to open a JIRA, even better if you would like to
contribute a fix.
On Tuesday, May 12, 2015, Anand Murali anand_vi...@yahoo.com wrote:
Oliver:
Many thanks for reply. If it is not an error why is
Take a look at
yarn.scheduler.capacity.maximum-am-resource-percent
On Thu, Apr 30, 2015 at 11:38 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Is there any configuration in MR2 and YARN to limit concurrent max
applications by setting max limit on ApplicationMasters in the cluster?
What's does ProcfsBasedProcessTree do? Trying to understand a bunch of
these messages in the logs of a job that is stuck forever
May 25, 2014 4:01:51 AM org.apache.hadoop.yarn.util.ProcfsBasedProcessTree
constructProcessInfo
INFO: The process 22793 may have finished in the interim.
May 25, 2014
What specific info are you looking for?
On Monday, December 23, 2013, Manoj Babu wrote:
Hi All,
Can anybody share their experience on Rewriting Ab-Initio scripts using
Hadoop MapReduce?
Cheers!
Manoj.
also be increased.
HTH
Ravi
On Monday, October 21, 2013 5:54 PM, Prashant Kommireddi
prash1...@gmail.com wrote:
Hello,
We are noticing the RM running out of memory in the webapp code. It
happens in
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.renderBlock(Block
html
Hello,
We are noticing the RM running out of memory in the webapp code. It happens
in
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppsBlock.renderBlock(Block
html).
The StringBuilder object appsTableData grows too large in this case while
appending AppInfo. Ignoring the heap size (this
, September 13, 2013, Harsh J wrote:
This is true for MRv1 too, and is done so for security reasons.
On Sat, Sep 14, 2013 at 2:37 AM, Prashant Kommireddi
prash1...@gmail.com javascript:; wrote:
Hey guys,
It looks like the default perms for app/container dirs is set to 710
Hey guys,
It looks like the default perms for app/container dirs is set to 710 and is
not configurable. From DefaultContainerExecutor
/** Permissions for user log dir.
* $logdir/$user/$appId */
private static final short LOGDIR_PERM = (short)0710;
Any reasons for not having this be a
the ResourceManager is in the appropriate position to judge failure
of AM v/s failure-of-job.
hth,
Arun
On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com
wrote:
Thanks Ravi.
Well, in this case its a no-effort :) A failure of AM init should
Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA
On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi prash1...@gmail.comwrote:
Hello,
I came across an issue that occurs with the job notification callbacks in
MR2. It works fine
the AM OOMs), I agree with you that we can do more. If you
feel strongly about this, please create a JIRA and possibly upload a patch.
Thanks
Ravi
--
*From:* Prashant Kommireddi prash1...@gmail.com
*To:* user@hadoop.apache.org user@hadoop.apache.org
*Sent
Hello,
We just upgraded our cluster from 0.20.2 to 2.x (with HA) and had a
question around disabling dfs permissions on the latter version. For some
reason, setting the following config does not seem to work
property
namedfs.permissions.enabled/name
valuefalse/value
/property
for submitting a new bug report to HDFS.
Thanks!
Chris Nauroth
Hortonworks
http://hortonworks.com/
On Tue, Jun 18, 2013 at 12:14 PM, Leo Leung lle...@ddn.com wrote:
I believe, the properties name should be “dfs.permissions”
** **
** **
*From:* Prashant Kommireddi [mailto:prash1
+ ]);
throw new YarnException(e);
}
In any case, this does not appear to be the right behavior as it does
not respect dfs.permissions.enabled (set to false) at any point.
Sounds like a bug?
Thanks, Prashant
On Tue, Jun 18, 2013 at 3:24 PM, Prashant Kommireddi prash1
Hey guys,
We are using the MiniYARNCluster and trying to see where the NN, RM, job
logs can be found. We see the job logs are present on HDFS but not on any
local dirs. Also, none of the master node logs (NN, RM) are available.
Digging in a bit further (just looked at this 1 file), I see there
Specifically, replicated join -
http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins
On Fri, Feb 15, 2013 at 6:22 PM, David Boyd db...@lorenzresearch.comwrote:
Use PIG it has specific directives for in memory joins of small
data sets. The whole thing might require a half a dozen
Hi mayur,
Flume is used for data collection. Pig is used for data processing.
For eg, if you have a bunch of servers that you want to collect the
logs from and push to HDFS - you would use flume. Now if you need to
run some analysis on that data, you could use pig to do that.
Sent from my iPhone
?
On Sat, Sep 15, 2012 at 3:07 AM, Prashant Kommireddi
prash1...@gmail.com wrote:
Hi All,
I have a question about job history logging. Seems like history logging is
disabled if file creation fails, is there a reason this is done?
The following snippet is from JobHistory.JobInfo.logSubmitted
Take a look at Pig's HadoopJobHistoryLoader
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.html
On Thu, Aug 16, 2012 at 9:34 PM, peter zhangju...@gmail.com wrote:
Now, no utilities for job tracker log .
--
peter
Sent with Sparrow
context on how these files were written, etc.?
Perhaps open a JIRA with a sample file and test-case to reproduce
this? Other env stuff with info on version of hadoop, etc. would help
too.
On Sat, Jul 21, 2012 at 2:05 AM, Prashant Kommireddi
prash1...@gmail.com wrote:
I am seeing
I have seen this issue with large file writes using SequenceFile writer.
Not found the same issue when testing with writing fairly small files (
1GB).
On Fri, May 25, 2012 at 10:33 PM, Kasi Subrahmanyam
kasisubbu...@gmail.comwrote:
Hi,
If you are using a custom writable object while passing
that was a few years
ago) carrying these fixes. You ought to upgrade that cluster to the
current stable release for the many fixes you can benefit from :)
On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi
prash1...@gmail.com wrote:
Thanks Harsh. I am using 0.20.2, I see on the Jira
are you running? Cause AFAIK most of the recent
stable versions/distros include NN resource monitoring threads which
should have placed your NN into safemode the moment all its disks ran
near to out of space.
On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
prash1...@gmail.com wrote:
Hi
You might be running out of disk space. Check for that on your cluster
nodes.
-Prashant
On Fri, May 11, 2012 at 12:21 AM, JunYong Li lij...@gmail.com wrote:
is there errors in the task outpu file?
on the jobtracker.jsp click the Jobid link - tasks link - Taskid link -
Task logs link
Seems like a matter of upgrade. I am not a Cloudera user so would not know
much, but you might find some help moving this to Cloudera mailing list.
On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote:
There is only one cluster. I am not copying between clusters.
Say I
Yes. These are hadoop properties - using set is just a way for Pig to set
those properties in your job conf.
On Mon, Apr 30, 2012 at 5:25 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Is there a way to compress map only jobs to compress map output that gets
stored on hdfs as part-m-* files?
Shailesh, there's a lot that goes into distributing work across
tasks/nodes. It's not just distributing work but also fault-tolerance,
data locality etc that come into play. It might be good to refer
Hadoop apache docs or Tom White's definitive guide.
Sent from my iPhone
On Apr 23, 2012, at
Anyone faced similar issue or knows what the issue might be?
Thanks in advance.
On Thu, Apr 5, 2012 at 10:52 AM, Prashant Kommireddi prash1...@gmail.comwrote:
Thanks Nitin.
I believe the config key you mentioned controls the task attempts logs
that go under - ${hadoop.log.dir}/userlogs
Can you check the datanode logs? May its an incompatible namespace issue.
On Apr 6, 2012, at 11:13 AM, Sujit Dhamale sujitdhamal...@gmail.com wrote:
Hi all,
my DataNode is not started .
even after deleting hadoop*.pid file from /tmp , But still Data node is not
started ,
Hadoop Version:
On Thu, Apr 5, 2012 at 3:22 AM, Nitin Khandelwal
nitin.khandel...@germinait.com wrote:
Hi Prashant,
The userlogs for job are deleted after time specified by *
mapred.userlog.retain.hours* property defined in mapred-site.xml (default
is 24 Hrs).
Thanks,
Nitin
On 5 April 2012 14:26, Prashant
Answers inline.
On Wed, Apr 4, 2012 at 4:56 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get the output file using
HTTP call. But the description under The Reduce
Hi Mohit,
What would be the advantage? Reducers in most cases read data from all
the mappers. In the case where mappers were to write to HDFS, a
reducer would still require to read data from other datanodes across
the cluster.
Prashant
On Apr 4, 2012, at 9:55 PM, Mohit Anchlia
It is a function of the number of spills on map side and I believe
the default is 3. So for every 3 times data is spilled the combiner is
run. This number is configurable.
Sent from my iPhone
On Mar 14, 2012, at 3:26 PM, Gayatri Rao rgayat...@gmail.com wrote:
Hi all,
I have a quick query on
It would be great if we can take a look at what you are doing in the UDF vs
the Mapper.
100x slow does not make sense for the same job/logic, its either the Mapper
code or may be the cluster was busy at the time you scheduled MapReduce job?
Thanks,
Prashant
On Tue, Feb 28, 2012 at 4:11 PM,
How are you building the mapreduce jar? Try not to include the Mahout dist
while building MR jar, and include it only on -libjars option.
On Mon, Jan 30, 2012 at 10:33 PM, Daniel Quach danqu...@cs.ucla.edu wrote:
I have been compiling my mapreduce with the jars in the classpath, and I
believe
You might want to take a look at the kill command : hadoop job -kill
jobid.
Prashant
On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar praveen...@gmail.comwrote:
Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?
What I want to achieve is - If
I am assuming you want to move data between Hadoop and database.
Please take a look at Sqoop.
Thanks,
Prashant
Sent from my iPhone
On Jan 24, 2012, at 9:19 AM, Edmon Begoli ebeg...@gmail.com wrote:
I am looking to use Hadoop for parallel loading of CSV file into a
non-Hadoop, parallel
You mean something different from the DistributedCache?
Sent from my iPhone
On Jan 14, 2012, at 5:30 PM, Rita rmorgan...@gmail.com wrote:
After reading this article,
http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was
wondering if there was a filesystem cache for hdfs.
of spills - how do we avoid them?
Depends on what is causing the spills. You can have spills on Map and
Reduce side, and adjusting config properties such io.sort.mb,
io.sort.factor, and a few others on the Reduce side. Tom White's book has
a good explanation on these.
Thanks,
Prashant Kommireddi
Hi Hao,
Ideally you would want to leave out a core each for Tasktracker and
Datanode process' on each node. The rest could be used for maps and
reducers.
Thanks,
Prashant
2012/1/10 hao.wang hao.w...@ipinyou.com
Hi,
Thanks for your help, your suggestion is very usefully.
I have another
By design reduce would start only after all the maps finish. There is
no way for the reduce to begin grouping/merging by key unless all the
maps have finished.
Sent from my iPhone
On Dec 28, 2011, at 8:53 AM, JAGANADH G jagana...@gmail.com wrote:
Hi All,
I wrote a map reduce program to fetch
Seems like you do not have /user/MyId/input/conf on HDFS.
Try this.
cd $HADOOP_HOME_DIR (this should be your hadoop root dir)
hadoop fs -put conf input/conf
And then run the MR job again.
-Prashant Kommireddi
On Fri, Dec 23, 2011 at 3:40 PM, Pat Flaherty p...@well.com wrote:
Hi,
Installed
I am guessing you are trying to use the FairScheduler but you have
specified CapacityScheduler in your configuration. You need to change
mapreduce.jobtracker.scheduler to FairScheduler.
Sent from my iPhone
On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote:
Hi,
I am having
http://code.google.com/p/hadoop-gpl-packing/
Thanks,
Prashant
On Wed, Dec 14, 2011 at 11:32 AM, Abhishek Pratap Singh manu.i...@gmail.com
wrote:
Hi,
I m looking for some useful docs on enabling LZO on hadoop cluster. I tried
few of the blogs, but somehow its not working.
Here is my
Hi Brad, how many taskstrackers did you have on each node in both cases?
Thanks,
Prashant
Sent from my iPhone
On Dec 13, 2011, at 9:42 AM, Brad Sarsfield b...@bing.com wrote:
Praveenesh,
Your question is not naïve; in fact, optimal hardware design can ultimately
be a very difficult
Take a look at cleanup() method on Mapper.
Thanks,
Prashant
Sent from my iPhone
On Dec 12, 2011, at 8:46 PM, Shi Yu sh...@uchicago.edu wrote:
Hi,
Suppose I have two mappers, each mapper is assigned 10 lines of
data. I want to set a counter for each mapper, counting and
accumulating, then
Arun, I faced the same issue and increasing the # of reducers fixed the
problem.
I was initially under the impression MR framework spills to disk if data is
too huge to keep in memory, however on extraordinarily large reduce inputs
this was not the case and the job failed on trying to assign the
Here you go
https://docs.google.com/viewer?a=vpid=explorerchrome=truesrcid=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1hl=en_USpli=1
Thanks,
Prashant
On Wed, Dec 7, 2011 at 1:47 AM, shreya@cognizant.com wrote:
Hi,
Can someone please send me the Hadoop comic.
Saw
sudo make install
1862 sudo make installcheck
1904 sudo apt-get install libtool
1907 sudo apt-get install automake
Thanks,
Prashant Kommireddi
On Wed, Dec 7, 2011 at 5:39 PM, Jinyan Xu jinyan...@exar.com wrote:
Hi ,
Anyone else have the experience integrating snappy into hadoop ? help
I have not tried it with HBase, and yes 0.20.2 is not compatible with it.
What is the error you receive when you try compiling Snappy? I don't think
compiling Snappy would be dependent on HBase.
2011/12/7 Jinyan Xu jinyan...@exar.com
Hi Prashant Kommireddi,
Last week, I read build-hadoop-from
Thanks Maneesh.
Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these (set
at cluster level)
-Prashant Kommireddi
On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.comwrote:
Hi Maneesh,
Thanks
to
create a file is
void create(, short replication, long blocksize);
I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.
Hope that helps.
thanks
-Maneesh
On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi
I believe you want to ship data to each node in your cluster before MR
begins so the mappers can access files local to their machine. Hadoop
tutorial on YDN has some good info on this.
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
-Prashant Kommireddi
On Fri, Nov 25, 2011 at 1
54 matches
Mail list logo