Unsubscribe

2022-03-26 Thread Kasi Subrahmanyam
On Sat, Mar 26, 2022, 10:13 Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:

>


Unsubscribe

2019-07-11 Thread Kasi Subrahmanyam
unsubscribe


Using NlineInputFormat in HIve

2015-10-27 Thread Kasi Subrahmanyam
Hi,
I have a requirement where i have to send one line to one mapper in a file
but doing it using hive.
How can we implement the functionality of NLineInptFormat in Hive.
I couldnt find this and i tried the following configuration in hive

set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

set mapreduce.input.fileinputformat.split.maxsize=100;

I have a file with onlt 1 rows but i need each row to go different mapper

But if i used the above configuration in hive some of the rows are missing

based on the max split size i provide


Thanks,


Definite APi for Vidoe Processing

2013-07-24 Thread Kasi Subrahmanyam
Hi,

I am able to find that we have definite API for processing iages in hadoop
using HIPI.
Why dont we have the same for videos?

Thanks,
Subbu


Definite APi for Vidoe Processing

2013-07-24 Thread Kasi Subrahmanyam
Hi,

I am able to find that we have definite API for processing iages in hadoop
using HIPI.
Why dont we have the same for videos?

Thanks,
Subbu


Sending the entire file content as value to the mapper

2013-07-11 Thread Kasi Subrahmanyam
Hi Team,

I have a file which has semi structured text  data with no definite start
and end points.
How can i send the entire content of the file at once as key or value to
the mapper instead of line by line.

Thanks,
Subbu


Sending the entire file content as value to the mapper

2013-07-11 Thread Kasi Subrahmanyam
Hi Team,

I have a file which has semi structured text  data with no definite start
and end points.
How can i send the entire content of the file at once as key or value to
the mapper instead of line by line.

Thanks,
Subbu


Output Directory not getting created

2013-07-03 Thread Kasi Subrahmanyam
Hi,

There is a production cluster which has the MapR installed with hadoop in
User A.

I am trying to run a hadoop job through another User B.

My Job is unable to create output in the filesystem under user B. with the
following error


13/07/03 09:34:00 INFO mapred.FileInputFormat: Total input paths to process
: 2
13/07/03 09:34:00 INFO mapred.JobClient: Creating job's output directory at
maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F
13/07/03 09:34:00 INFO mapred.JobClient: Creating job's user history
location directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F/_logs
2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0527 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPTIME_F, rpc
error, Permission denied(13)
2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0590 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_LOG_D, rpc
error, Permission denied(13)
2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0749 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPDATIME_AGG,
rpc error, Permission denied(13)


Thanks,
Subbu


Output Directory not getting created

2013-07-03 Thread Kasi Subrahmanyam
Hi,

There is a production cluster which has the MapR installed with hadoop in
User A.

I am trying to run a hadoop job through another User B.

My Job is unable to create output in the filesystem under user B. with the
following error


13/07/03 09:34:00 INFO mapred.FileInputFormat: Total input paths to process
: 2
13/07/03 09:34:00 INFO mapred.JobClient: Creating job's output directory at
maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F
13/07/03 09:34:00 INFO mapred.JobClient: Creating job's user history
location directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F/_logs
2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0527 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPTIME_F, rpc
error, Permission denied(13)
2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0590 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_LOG_D, rpc
error, Permission denied(13)
2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:852
Thread: 140084677048064 Rmdir failed for dir _logs, error Permission
denied(13)
2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:925
Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13
2013-07-03 09:34:16,0749 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:1293
Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPDATIME_AGG,
rpc error, Permission denied(13)


Thanks,
Subbu


Counters across all jobs

2012-08-28 Thread Kasi Subrahmanyam
Hi,

I have around 4 jobs running in a controller.
How can i have a single unique counter present in all the jobs and
incremented where ever used in a job?

For example:Consider a counter ACount.
If job1 is incrementing the counter by2 and job3 by 5 and job 4 by 6.
Can i have the  counter displayed output in the jobtracker as
job1:2
job2:2
job3:7
job4:13

Thanks,
Subbu


Re: Operations after killing a job

2012-07-01 Thread Kasi Subrahmanyam
I prefer not to use Oozie if possible.
What I want to know is where does the control reach in the Hadoop classes
if I keep a job explicitly.
To say can I know which class gets the control if possible

Thanks,
Subbu

On Sunday, July 1, 2012, Harsh J wrote:

 A framework like Oozie can help you do what you need here. Take a look
 at its workflow running/management features:
 http://incubator.apache.org/oozie/

 Is this what you're looking for?

 On Sat, Jun 30, 2012 at 3:30 PM, Kasi Subrahmanyam
 kasisubbu...@gmail.com javascript:; wrote:
  Hi,
 
  If i have few jobs added to a controller, and i explicitly killed a job
 in
  between (assuming all the other jobs failed due to dependency).
  Can i have the control back to perform some operations after that  or is
  there any API to do that.
  Please correct me if i question is wrong to begin with?
 
 
  Thanks in advance
  Subbu



 --
 Harsh J



Operations after killing a job

2012-07-01 Thread Kasi Subrahmanyam
I prefer not to use Oozie if possible.
What I want to know is where does the control reach in the Hadoop classes
if I keep a job explicitly.


On Sunday, July 1, 2012, Harsh J wrote:

 A framework like Oozie can help you do what you need here. Take a look
 at its workflow running/management features:
 http://incubator.apache.org/oozie/

 Is this what you're looking for?

 On Sat, Jun 30, 2012 at 3:30 PM, Kasi Subrahmanyam
 kasisubbu...@gmail.com wrote:
  Hi,
 
  If i have few jobs added to a controller, and i explicitly killed a job
 in
  between (assuming all the other jobs failed due to dependency).
  Can i have the control back to perform some operations after that  or is
  there any API to do that.
  Please correct me if i question is wrong to begin with?
 
 
  Thanks in advance
  Subbu



 --
 Harsh J



Re: Passing key-value pairs between chained jobs?

2012-06-14 Thread Kasi Subrahmanyam
HI MIcheal,
The problem for the second question can be solved if you use the
SequenceFileOutputFormat for the first job output and the
SequenceFileInputFormat for the second job input.

On Thu, Jun 14, 2012 at 11:11 PM, Michael Parker michael.g.par...@gmail.com
 wrote:

 Hi all,

 One more question. I have two jobs to run serially using a JobControl.
 The key-value types for the outputs of the reducer of the first job
 are ActiveDayKey, Text, where ActiveDayKey is a class that
 implements WritableComparable. And so the key-value types for the
 inputs to the mapper of the second job are ActiveDayKey, Text. I'm
 noticing two things:

 First, in the output of the reducer from the first job, each
 ActiveDayKey object is being written as a string using its toString
 method. Since it's a subclass of WritableComparable that already knows
 how to serialize itself using write(DataOuptut), is there any way to
 exploit that to write it in binary format? Otherwise, do I need to
 write a subclass of FileOutputFormat?

 Second, the second job fails with java.lang.ClassCastException:
 org.apache.hadoop.io.LongWritable cannot be cast to
 co.adhoclabs.LogProcessor$ActiveDayKey. I'm assuming this is because
 by default the key type is Long for the line number, and here I want
 to ignore the line number and use the ActiveDayKey written on the line
 itself as the key. Again, since ActiveDayKey knows how to deserialize
 itself using readFields(DataInput), is there any way to exploit that
 to read it from the line in binary format? Do I need to write a
 subclass of FileInputFormat?

 Assuming I need to write subclasses of FileOutputFormat and
 FileInputFormat classes, what's a good example of this? The terasort
 example?

 Thanks,
 Mike



Re: Consistent Checksum error using SequenceFileInputFormat against segment/content segment/parse_text folders output by Nutch.

2012-05-09 Thread Kasi Subrahmanyam
HI Ali,
I also faced this error when i ran the jobs either in local or in a cluster.
I am able to solve this problem by removing the .crc file created in the
input folder for this job.
Please check that there is no .crc file in the input.
I hope this solves the problem.

Thanks,
Subbu

On Wed, May 9, 2012 at 1:31 PM, Ali Safdar Kureishy 
safdar.kurei...@gmail.com wrote:

 Hi,

 I've included both the Nutch and Hadoop mailing lists, since I don't know
 which one of the two is the root cause for this issue, and it might be
 possible to pursue a resolution from both sides.

 What I'm trying to do is to dump the contents of all the fetched pages
 from my nutch crawl -- about 600K of them. I've tried extracting this
 information initially from the *segment/parse_text* folder, but I kept
 receiving the error below, so I switched over to the *segment/content 
 *folder,
 but BOTH of these *consistently *give me the following Checksum Error
 exception which fails the map-reduce job. At the very least I'm hoping to
 get some tip(s) on how to ignore this error and let my job complete.

 *org.apache.hadoop.fs.ChecksumException: Checksum Error
 at
 org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)
 at
 org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
 at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)
 at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)
 at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)
 at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404)
 at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
 at
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
 at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
 at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
 at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:499)
 at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
 at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1522)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 *

 I'm using the *SequenceFileInputFormat* to read the data in each case.

 I have also attached the Hadoop output (checksum-error.txt). I have no
 idea how to ignore this error or to debug it. I've tried setting the
 boolean *io.skip.checksum.errors* property to *true* on the MapReduce
 Conf object, but it makes no difference. The error still happens
 consistently, so it seems like I'm either not setting the right property,
 or that it is being ignored by Hadoop? Since the error is thrown down in
 the internals of Hadoop, there doesn't seem to be any other way to ignore
 the error either, without changing Hadoop code (that I'm not able to do at
 this point). Is this a problem with the data that was output by Nutch? Or
 is this a bug with Hadoop? *Btw, I ran Nutch in local mode (without
 hadoop), and I'm running the Hadoop job (below) purely as an application
 from Eclipse (not via the bin/hadoop script).*

 Any help or pointers on how to dig further with this would be greatly
 appreciated. If there is any other way for me to ignore these checksum
 errors and let the job complete, do please share that with me as well.

 Here is the code for the reader job using MapReduce:

 package org.q.alt.sc.nutch.readerjobs;

 import java.io.IOException;

 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.SequenceFileInputFormat;
 import org.apache.hadoop.mapred.TextOutputFormat;
 import org.apache.hadoop.mapred.lib.IdentityReducer;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;
 import org.apache.nutch.protocol.Content;

 public class SegmentContentReader extends Configured implements Tool {

 /**
  * @param args
  */
 public static void main(String[] args) throws Exception {
 int exitCode = ToolRunner.run(new SegmentContentReader(), args);
 System.exit(exitCode);
 }

 @Override
 public int run(String[] args) throws Exception {
 if (args.length != 2) {
 System.out.printf(
 Usage: %s [generic 

Re: Getting filename in case of MultipleInputs

2012-05-06 Thread Kasi Subrahmanyam
Yeah Jim,
I have gone through the comments in that JIRA ticket and am able to solve
my problem

On Sat, May 5, 2012 at 11:25 PM, Jim Donofrio donofrio...@gmail.com wrote:

 There is already a JIRA for this:

 MAPREDUCE-1743


 On 05/03/2012 09:06 AM, Harsh J wrote:

 Subbu,

 The only way I can think of, is to use an overridden
 InputFormat/RecordReader pair that sets the map.input.file config
 value during its initialization, using the received FileSplit object.

 This should be considered as a bug, however, and even 2.x is affected.
 Can you please file a JIRA on
 https://issues.apache.org/**jira/browse/MAPREDUCEhttps://issues.apache.org/jira/browse/MAPREDUCEand
  post back the ID
 on this thread for posterity?

 On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam
 kasisubbu...@gmail.com  wrote:

 Hi,

 Could anyone suggest how to get the filename in the mapper.
 I have gone through the JIRA ticket that map.input.file doesnt work in
 case
 of multiple inputs,TaggedInputSplit also doesnt work in case of 0.20.2
 version as it is not a public class.
 I tried to find any other approach than this but i could find none in the
 search
 Could anyone suggest a solution other tan these



 Thanks in advance;
 Subbu.


 Thanks,




Re: Cleanup after a Job

2012-05-01 Thread kasi subrahmanyam
Hi arun,

I can see that the output commiter is present in the reducer.
How to make sure thtat this commiter happens at the end of the job or does
it run by default at the end of the job.
I can have more than one reducer tasks.




On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com wrote:

 Use OutputCommitter.(abortJob, commitJob):

 http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

 Arun

 On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

 Hi

 I have few jobs added to a Job controller .
 I need a afterJob() to be executed after the completion of s Job.
 For example

 Here i am actually overriding the Job of JobControl.
 I have Job2 depending on the output of Job1.This input for Job2is obtained
 after doing some File System operations on the output of the Job1.This
 operation should happen in a afterJob( ) method while is available for each
 Job.How do i make sure that afterJob () method is called for each Job added
 to the controller before running the jobs that are depending on it.


 Thanks


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Cleanup after a Job

2012-05-01 Thread kasi subrahmanyam
Hi Robert,
Could you provied me the exact method of the JobControl Job or jobConf
which calls the commitJob method
Thanks

On Tue, May 1, 2012 at 7:36 PM, Robert Evans ev...@yahoo-inc.com wrote:

  Either abortJob or commitJob will be called for all jobs.  AbortJob will
 be called if the job has failed.  CommitJob will be called if it succeeded.
  The purpose of these are to commit the output of the map/reduce job and
 cleanup any temporary files/data that might be lying around.

 CommitTask/abortTask is similar, and is called for each individual task.

 --Bobby Evans



 On 5/1/12 8:32 AM, kasi subrahmanyam kasisubbu...@gmail.com wrote:

 Hi arun,

 I can see that the output commiter is present in the reducer.
 How to make sure thtat this commiter happens at the end of the job or does
 it run by default at the end of the job.
 I can have more than one reducer tasks.




 On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Use OutputCommitter.(abortJob, commitJob):

 http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

 Arun

 On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

 Hi

 I have few jobs added to a Job controller .
 I need a afterJob() to be executed after the completion of s Job.
 For example

 Here i am actually overriding the Job of JobControl.
 I have Job2 depending on the output of Job1.This input for Job2is obtained
 after doing some File System operations on the output of the Job1.This
 operation should happen in a afterJob( ) method while is available for each
 Job.How do i make sure that afterJob () method is called for each Job added
 to the controller before running the jobs that are depending on it.


 Thanks


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/







Re: cygwin single node setup

2012-04-28 Thread kasi subrahmanyam
Hi Onder,
You could try to format the namenode and restart the daemons,
That solved my problem most number of times.
May be the running daemons where not able to pickup the all the datanodes
configurations

On Sat, Apr 28, 2012 at 4:23 PM, Onder SEZGIN ondersez...@gmail.com wrote:

 Hi,

 I am pretty a newbie and i am following the quick start guide for single
 node set up on windows using cygwin.

 In this step,

 $ bin/hadoop fs -put conf input

 I am getting the following errors.

 I have got no files
 under /user/EXT0125622/input/conf/capacity-scheduler.xml. That might be a
 reason for the errors i get but why does hadoop look for such directory as
 i have not configured anything like that. so supposably, hadoop is making
 up and looking for such file and directory?

 Any idea and help is welcome.

 Cheers
 Onder

 12/04/27 13:44:37 WARN hdfs.DFSClient: DataStreamer Exception:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
 to 0 nodes, instead of 1
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)

  12/04/27 13:44:37 WARN hdfs.DFSClient: Error Recovery for block null bad
 datanode[0] nodes == null
 12/04/27 13:44:37 WARN hdfs.DFSClient: Could not get block locations.
 Source file /user/EXT0125622/input/conf/capacity-scheduler.xml -
 Aborting...
 put: java.io.IOException: File
 /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
 to 0 nodes, instead of 1
 12/04/27 13:44:37 ERROR hdfs.DFSClient: Exception closing file
 /user/EXT0125622/input/conf/capacity-scheduler.xml :
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
 to 0 nodes, instead of 1
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
 to 0 nodes, instead of 1
at

 

Re: Unable to set the heap size on Amazon elastic mapreduce

2012-04-05 Thread kasi subrahmanyam
Hi Shirsh,

You need to increase the heap size of the JVM for the changes you made for
mapred.child to take effect
These changes need to be made in the files of the hadoop configuration
folder.

In hadoop-env.sh increase the heap size to may be 2000.(I think this
property by default will be commented.)
Then in the mapred-default.xml change( in the src folder) overwrite the
other properties like mapred.map.child.java.opts.

I hope this helps.


On Thu, Apr 5, 2012 at 1:26 PM, shrish bajpai
random_unk...@hotmail.co.ukwrote:

   Hi,
 I have tried the following combinations of bootstrap actions to increase
 the heap size of my job but none of them seem to work:

 --mapred-key-value mapred.child.java.opts=-Xmx1024m
 --mapred-key-value mapred.child.ulimit=unlimited

 --mapred-key-value mapred.map.child.java.opts=-Xmx1024m
 --mapred-key-value mapred.map.child.ulimit=unlimited

 -m mapred.map.child.java.opts=-Xmx1024m
 -m mapred.map.child.ulimit=unlimited

 -m mapred.child.java.opts=-Xmx1024m
 -m mapred.child.ulimit=unlimited

 What is the right syntax?

 Thanks

 Shrish




Re: getting NullPointerException while running Word cont example

2012-04-05 Thread kasi subrahmanyam
Hi Sujit,

I think it is a problem with the host names configuration.
Could you please check whether you added the host names of the master and
the slaves in the etc/hosts file of all the nodes.


On Mon, Apr 2, 2012 at 8:00 PM, Sujit Dhamale sujitdhamal...@gmail.comwrote:

 Can some one please look in to below issue ??
 Thanks in Advance

 On Wed, Mar 7, 2012 at 9:09 AM, Sujit Dhamale sujitdhamal...@gmail.com
 wrote:

  Hadoop version : hadoop-0.20.203.0rc1.tar
  Operaring Syatem : ubuntu 11.10
 
 
 
  On Wed, Mar 7, 2012 at 12:19 AM, Harsh J ha...@cloudera.com wrote:
 
  Hi Sujit,
 
  Please also tell us which version/distribution of Hadoop is this?
 
  On Tue, Mar 6, 2012 at 11:27 PM, Sujit Dhamale 
 sujitdhamal...@gmail.com
  wrote:
   Hi,
  
   I am new to Hadoop., i install Hadoop as per
  
 
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  
 
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluste
  
  
  
   while running Word cont example i am getting *NullPointerException
  
   *can some one please look in to this issue ?*
  
   *Thanks in Advance*  !!!
  
   *
  
  
   duser@sujit:~/Desktop/hadoop$ bin/hadoop dfs -ls /user/hduser/data
   Found 3 items
   -rw-r--r--   1 hduser supergroup 674566 2012-03-06 23:04
   /user/hduser/data/pg20417.txt
   -rw-r--r--   1 hduser supergroup1573150 2012-03-06 23:04
   /user/hduser/data/pg4300.txt
   -rw-r--r--   1 hduser supergroup1423801 2012-03-06 23:04
   /user/hduser/data/pg5000.txt
  
   hduser@sujit:~/Desktop/hadoop$ bin/hadoop jar hadoop*examples*.jar
   wordcount /user/hduser/data /user/hduser/gutenberg-outputd
  
   12/03/06 23:14:33 INFO input.FileInputFormat: Total input paths to
  process
   : 3
   12/03/06 23:14:33 INFO mapred.JobClient: Running job:
  job_201203062221_0002
   12/03/06 23:14:34 INFO mapred.JobClient:  map 0% reduce 0%
   12/03/06 23:14:49 INFO mapred.JobClient:  map 66% reduce 0%
   12/03/06 23:14:55 INFO mapred.JobClient:  map 100% reduce 0%
   12/03/06 23:14:58 INFO mapred.JobClient: Task Id :
   attempt_201203062221_0002_r_00_0, Status : FAILED
   Error: java.lang.NullPointerException
  at
   java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
  
   12/03/06 23:15:07 INFO mapred.JobClient: Task Id :
   attempt_201203062221_0002_r_00_1, Status : FAILED
   Error: java.lang.NullPointerException
  at
   java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
  
   12/03/06 23:15:16 INFO mapred.JobClient: Task Id :
   attempt_201203062221_0002_r_00_2, Status : FAILED
   Error: java.lang.NullPointerException
  at
   java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
  at
  
 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
  
   12/03/06 23:15:31 INFO mapred.JobClient: Job complete:
  job_201203062221_0002
   12/03/06 23:15:31 INFO mapred.JobClient: Counters: 20
   12/03/06 23:15:31 INFO mapred.JobClient:   Job Counters
   12/03/06 23:15:31 INFO mapred.JobClient: Launched reduce tasks=4
   12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22084
   12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
   reduces waiting after reserving slots (ms)=0
   12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
  maps
   waiting after reserving slots (ms)=0
   12/03/06 23:15:31 INFO mapred.JobClient: Launched map tasks=3
   12/03/06 23:15:31 INFO mapred.JobClient: Data-local map tasks=3
   12/03/06 23:15:31 INFO mapred.JobClient: Failed reduce tasks=1
   12/03/06 23:15:31 INFO mapred.JobClient:
 SLOTS_MILLIS_REDUCES=16799
   12/03/06 23:15:31 INFO mapred.JobClient:   FileSystemCounters
   12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_READ=740520
   12/03/06 23:15:31 INFO mapred.JobClient: HDFS_BYTES_READ=3671863
   12/03/06 23:15:31 INFO mapred.JobClient:
 FILE_BYTES_WRITTEN=2278287
   12/03/06 23:15:31 INFO mapred.JobClient:   File Input Format Counters
   12/03/06 23:15:31 INFO mapred.JobClient: Bytes Read=3671517
   12/03/06 23:15:31 INFO mapred.JobClient:   Map-Reduce Framework
   12/03/06 23:15:31 INFO mapred.JobClient: Map output materialized
   bytes=1474341
   12/03/06 23:15:31 INFO mapred.JobClient: Combine output
  records=102322
   12/03/06 

Re: Read key and values from HDFS

2012-03-30 Thread kasi subrahmanyam
Hi Pedro i am not sure we have a single method for reading the data in
output files for different otutput formats.
But for sequence files we can use SequenceFile.Reader class in the API to
read the sequence files.

On Fri, Mar 30, 2012 at 10:49 PM, Pedro Costa psdc1...@gmail.com wrote:


 The ReduceTask can save the file using several output format:
 InternalFileOutputFormant, SequenceFileOutputFormat, TeraOuputFormat, etc...

 How can I read the keys and the values from the output file? Can anyone
 give me an example? Is there a way to create just one method that can read
 all different outputformat?


 Thanks,

 --
 Best regards,




Re: start hadoop slave over WAN

2012-03-30 Thread kasi subrahmanyam
Try checking the logs in the logs folder for the datanode.It might give
some lead.
Maybe there is a mismatch between the namespace iDs in the system and user
itself while starting the datanode.

On Fri, Mar 30, 2012 at 10:32 PM, Ben Cuthbert bencuthb...@ymail.comwrote:

 All

 We have a master in one region and we are trying to start a slave datanode
 in another region. When executing the scripts it looks to login to the
 remote host, but
 never starts the datanode. When executing hbase tho it does work. Is there
 a timeout or something with hadoop?


Re: Job tracker service start issue.

2012-03-23 Thread kasi subrahmanyam
Hi Oliver,

I am not sure my suggestion might solve your problem or it might be already
solved on your side.
It seems the task tracker is having a problem accessing the tmp directory.
Try going to the core and mapred site xml and change the tmp directory to a
new one.
If this is not yet working then manually change the permissions of theat
directory  using :
chmod -R 777 tmp

On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallou olivier.sal...@irisa.frwrote:



 Le 3/23/12 8:50 AM, Manish Bhoge a écrit :
  I have Hadoop running on Standalone box. When I am starting deamon for
  namenode, secondarynamenode, job tracker, task tracker and data node, it
 is starting gracefully. But soon after it start job tracker it doesn't
  show up job tracker service. when i run 'jps' it is showing me all the
  services including task tracker except Job Tracker.
 
  Is there any time limit that need to set up or is it going into the safe
  mode. Because when i saw job tracker log this what it is showing, looks
  like it is starting the namenode but soon after it shutdown:
 
  2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker:
 STARTUP_MSG:
  /
  STARTUP_MSG: Starting JobTracker
  STARTUP_MSG:   host = manish/10.131.18.119
  STARTUP_MSG:   args = []
  STARTUP_MSG:   version = 0.20.2-cdh3u3
  STARTUP_MSG:   build =
 file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick
 -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb
 16 10:22:53 PST 2012
  /
  2012-03-22 23:26:04,140 INFO
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
  2012-03-22 23:26:04,141 INFO
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Starting expired delegation token remover thread,
 tokenRemoverScanInterval=60 min(s)
  2012-03-22 23:26:04,141 INFO
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
  2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker:
 Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
  2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader:
 Refreshing hosts (include/exclude) list
  2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker:
 Starting jobtracker with owner as mapred
  2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting
 Socket Reader #1 for port 54311
  2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=JobTracker, port=54311
  2012-03-22 23:26:04,206 INFO
 org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics
 with hostName=JobTracker, port=54311
  2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
 org.mortbay.log.Slf4jLog
  2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added
 global filtersafety
 (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
  2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port
 returned by webServer.getConnectors()[0].getLocalPort() before open() is
 -1. Opening the listener on 50030
  2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer:
 listener.getLocalPort() returned 50030
 webServer.getConnectors()[0].getLocalPort() returned 50030
  2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 50030
  2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1
  2012-03-22 23:26:09,517 INFO org.mortbay.log: Started
 SelectChannelConnector@0.0.0.0:50030
  2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=JobTracker, sessionId=
  2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:
 JobTracker up at: 54311
  2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:
 JobTracker webserver: 50030
  2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed
 to operate on mapred.system.dir
 (hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of
 permissions.
  2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This
 directory should be owned by the user 'mapred (auth:SIMPLE)'
  2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker:
 Bailing out ...
  org.apache.hadoop.security.AccessControlException: The systemdir
 hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at
 org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at