RE: Restricting quota for users in HDFS

2009-06-17 Thread Palleti, Pallavi
Yeah. I meant the same. I want to restrict a directory which is owned by
a particular user.

Thanks
Pallavi

-Original Message-
From: Allen Wittenauer [mailto:a...@yahoo-inc.com] 
Sent: Tuesday, June 16, 2009 11:18 PM
To: core-user@hadoop.apache.org
Subject: Re: Restricting quota for users in HDFS




On 6/15/09 11:16 PM, Palleti, Pallavi pallavi.pall...@corp.aol.com
wrote:
 We have chown command in hadoop dfs to make a particular directory own
 by a person. Do we have something similar to create user with some
space
 limit/restrict the disk usage by a particular user?

Quotas are implemented on a per-directory basis, not per-user.
There
is no support for this user can have X space, regardless of where
he/she
writes only this directory has a limit of X space, regardless of who
writes there.



hadoop-streaming for network simulation

2009-06-17 Thread Simon Lorenz
Hi,

maybe somebody could help me with this. 

What I want to do is using Hadoop Streaming for just executing the same
program with different parameters.

I'm using the network simulatioin software Omnet++ and I want to run
this simulation in parallel.

Omnet Programs can be executed from a linux shell, the just need an
omnetpp.ini file for configuration.
So first step for me is, to get omnet running on hadoop with a simple
example and the same parameters.

So which hadoop-paramters I have to use when I start Hadoop Streaming?
(I think hadoop streaming is the right way to do this, or not?)

Acutally I try something like this, but the streaming job fails.

bin/hadoop jar hadoop-0.18.3-streaming.jar 
-input /input/fifo  
-output /output/fifo 
-mapper /home/simon/omnetpp-4.0/samples/fifo/fifo -u Cmdenv -c Fifo1
-file /home/simon/omnetpp-4.0/samples/fifo/fifo 
-reducer NONE

So the omnet program is just the mapper.

- in /input/fifo is the omnetpp.ini file located, that omnet fifo
example needs to run the job.

fifo -u Cmdenv -c Fifo1 are the parameters to start omnet without the
graphic interface. But also when I put these parameters into a shell
srcipt and run the script with hadoop I get the same error.

The omnet program is installed on all machines.

Maybe I have to give the omnet.ini file in some other way to the
omnet-fifo program? I don't know.

I'm no programmer and I even don't know if this approuch goes into the
right direction.
For any hints or suggestions I would be very glad.

P.S. Sorry for my bad english

Simon Lorenz
Karlsruhe
Germany



Re: Debugging Map-Reduce programs

2009-06-17 Thread Rakhi Khatwani
Hi,
  You could also use apache commons logging to write logs in your
map/reduce functions which will be seen in the jobtracker UI.
that's how we did debugging :)

Hope it helps
Regards,
Raakhi


On Tue, Jun 16, 2009 at 7:29 PM, jason hadoop jason.had...@gmail.comwrote:

 When you are running in local mode you have 2 basic choices if you want to
 interact with a debugger.
 You can launch from within eclipse or other IDE, or you can setup a java
 debugger transport as part of the mapred.child.java.opts variable, and
 attach to the running jvm.
 By far the simplest is loading via eclipse.

 Your other alternative is to inform the framework to retain the job files
 via keep.failed.task.files (be careful here you will fill your disk with
 old
 dead data) and use the debug the IsolationRunner

 Examples in my book :)


 On Mon, Jun 15, 2009 at 6:49 PM, bharath vissapragada 
 bharathvissapragada1...@gmail.com wrote:

  I am running in a local mode . Can you tell me how to set those
 breakpoints
  or how to access those files so that i can debug the program.
 
  The program is generating  = java.lang.NumberFormatException: For input
  string: 
 
  But that particular string is the one which is the input to the mapclass
 .
  So I think that it is not reading my input correctly .. But when i try to
  print the same .. it isn't printing to the STDOUT ..
  Iam using the FileInputFormat class
 
   FileInputFormat.addInputPath(conf, new
  Path(/home/rip/Desktop/hadoop-0.18.3/input));
  FileOutputFormat.setOutputPath(conf, new
  Path(/home/rip/Desktop/hadoop-0.18.3/output));
 
  input and output are folders for inp and outpt.
 
  It is generating these warnings also
 
  09/06/16 12:38:32 WARN fs.FileSystem: local is a deprecated filesystem
  name. Use file:/// instead.
 
  Thanks in advance
 
 
  On Tue, Jun 16, 2009 at 3:50 AM, Aaron Kimball aa...@cloudera.com
 wrote:
 
   On Mon, Jun 15, 2009 at 10:01 AM, bharath vissapragada 
   bhara...@students.iiit.ac.in wrote:
  
Hi all ,
   
When running hadoop in local mode .. can we use print statements to
   print
something to the terminal ...
  
  
   Yes. In distributed mode, each task will write its stdout/stderr to
 files
   which you can access through the web-based interface.
  
  
   
Also iam not sure whether the program is reading my input files ...
 If
  i
keep print statements it isn't displaying any .. can anyone tell me
 how
   to
solve this problem.
  
  
   Is it generating exceptions? Are the files present? If you're running
 in
   local mode, you can use a debugger; set a breakpoint in your map()
 method
   and see if it gets there. How are you configuring the input files for
  your
   job?
  
  
   
   
Thanks in adance,
   
  
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



Error Recovery for block -Aborting to Write

2009-06-17 Thread Pallavi Palleti
Hi all,

We are facing issues while porting some logs to HDFS. The way we are doing it 
is using a simple java code which tries to read the file and writes to HDFS 
using OutputStream. It was working perfectly fine and recently, we are getting 
below error messages once in a while and when we try to read that data for that 
data we are getting error message as Could not obtain block and the jobs are 
failing.

Can some one tell me what would be the issue?

The error message while writing to HDFS is:

WARN dfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_-1005367228931083977_10012402java.io.IOException:
Bad response 1 for block blk_-1005367228931083977_10012402 from datanode
xxx.xxx.xxx.88:50010
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli
ent.java:2076)

09/06/08 14:26:41 WARN dfs.DFSClient: Error Recovery for block
blk_-1005367228931083977_10012402 bad datanode[1] xxx.xxx.xxx.88:50010
09/06/08 14:26:41 WARN dfs.DFSClient: Error Recovery for block
blk_-1005367228931083977_10012402 in pipeline xxx.xxx.xxx.79:50010,
xxx.xxx.xxx.88:50010, xxx.xxx.xxx.68:50010: bad datanode
xxx.xxx.xxx.88:50010
09/06/08 14:26:42 WARN dfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_-1005367228931083977_10012456java.io.IOException:
Bad response 1 for block blk_-1005367228931083977_10012456 from datanode
xxx.xxx.xxx.68:50010
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli
ent.java:2076)

09/06/08 14:26:42 WARN dfs.DFSClient: Error Recovery for block
blk_-1005367228931083977_10012456 bad datanode[1] xxx.xxx.xxx.68:50010
09/06/08 14:26:42 WARN dfs.DFSClient: Error Recovery for block
blk_-1005367228931083977_10012456 in pipeline xxx.xxx.xxx.79:50010,
xxx.xxx.xxx.68:50010: bad datanode xxx.xxx.xxx.68:50010
09/06/08 14:26:43 WARN dfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_-1005367228931083977_10012457java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli
ent.java:2052)

09/06/08 14:26:43 WARN dfs.DFSClient: Error Recovery for block
blk_-1005367228931083977_10012457 bad datanode[0] xxx.xxx.xxx.79:50010
IOException - java.io.IOException: All datanodes xxx.xxx.xxx.79:50010 are
bad. Aborting... while writing -

Thanks
Pallavi


not a SequenceFile?

2009-06-17 Thread Shravan Mahankali
Hi Group,

 

I have trouble running couple of examples provided by Hadoop. Below are the
error messages I have from the console, could you please advise what could
be the problem and probable solution?

 

09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process
: 1

09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process
: 1

09/06/17 16:30:29 INFO mapred.JobClient: Running job: job_200906171601_0009

09/06/17 16:30:30 INFO mapred.JobClient:  map 0% reduce 0%

09/06/17 16:30:38 INFO mapred.JobClient: Task Id :
attempt_200906171601_0009_m_00_0, Status : FAILED

java.io.IOException: hdfs://localhost:9000/user/root/words not a
SequenceFile

at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458)

at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1431)

at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1420)

at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1415)

at
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordR
eader.java:43)

at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFil
eInputFormat.java:54)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)

at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

 

09/06/17 16:30:39 INFO mapred.JobClient: Task Id :
attempt_200906171601_0009_r_00_0, Status : FAILED

java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass

at
org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
createInstance(UserDefinedValueAggregatorDescriptor.java:57)

at
org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
createAggregator(UserDefinedValueAggregatorDescriptor.java:64)

at
org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
init(UserDefinedValueAggregatorDescriptor.java:76)

at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getValueAggreg
atorDescriptor(ValueAggregatorJobBase.java:54)

at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getAggregatorD
escriptors(ValueAggregatorJobBase.java:65)

at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.initializeMySp
ec(ValueAggregatorJobBase.java:74)

at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.configure(Valu
eAggregatorJobBase.java:42)

at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)

at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)

at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass

at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)

at java.lang.ClassLoader.loadClass(ClassLoader.java:251)

at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:242)

at
org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
createInstance(UserDefinedValueAggregatorDescriptor.java:52)

... 10 more

 

Thank You,

Shravan Kumar. M 

Catalytic Software Ltd. [SEI-CMMI Level 5 Company]

-

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator -  mailto:netopshelpd...@catalytic.com
netopshelpd...@catalytic.com

 

 



Re: not a SequenceFile?

2009-06-17 Thread Nick Cen
I guest you have set SequenceFileFormat as your inputformat in the
configuration object, but the file you provide is not a sequence file.

2009/6/17 Shravan Mahankali shravan.mahank...@catalytic.com

 Hi Group,



 I have trouble running couple of examples provided by Hadoop. Below are the
 error messages I have from the console, could you please advise what could
 be the problem and probable solution?



 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process
 : 1

 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process
 : 1

 09/06/17 16:30:29 INFO mapred.JobClient: Running job: job_200906171601_0009

 09/06/17 16:30:30 INFO mapred.JobClient:  map 0% reduce 0%

 09/06/17 16:30:38 INFO mapred.JobClient: Task Id :
 attempt_200906171601_0009_m_00_0, Status : FAILED

 java.io.IOException: hdfs://localhost:9000/user/root/words not a
 SequenceFile

at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458)

at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1431)

at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1420)

at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1415)

at

 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordR
 eader.java:43)

at

 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFil
 eInputFormat.java:54)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)

at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)



 09/06/17 16:30:39 INFO mapred.JobClient: Task Id :
 attempt_200906171601_0009_r_00_0, Status : FAILED

 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass

at

 org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
 createInstance(UserDefinedValueAggregatorDescriptor.java:57)

at

 org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
 createAggregator(UserDefinedValueAggregatorDescriptor.java:64)

at

 org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
 init(UserDefinedValueAggregatorDescriptor.java:76)

at

 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getValueAggreg
 atorDescriptor(ValueAggregatorJobBase.java:54)

at

 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getAggregatorD
 escriptors(ValueAggregatorJobBase.java:65)

at

 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.initializeMySp
 ec(ValueAggregatorJobBase.java:74)

at

 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.configure(Valu
 eAggregatorJobBase.java:42)

at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)

at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)

at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass

at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)

at java.lang.ClassLoader.loadClass(ClassLoader.java:251)

at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:242)

at

 org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor.
 createInstance(UserDefinedValueAggregatorDescriptor.java:52)

... 10 more



 Thank You,

 Shravan Kumar. M

 Catalytic Software Ltd. [SEI-CMMI Level 5 Company]

 -

 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are addressed.
 If you have received this email in error please notify the system
 administrator -  mailto:netopshelpd...@catalytic.com
 netopshelpd...@catalytic.com








-- 
http://daily.appspot.com/food/


RE: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU

2009-06-17 Thread zjffdu
HI Jason,

Where can I download your books' Alpha Chapters, I am very interested in
your book about hadoop.

And I cannot visit the link www.prohadoopbook.com



-Original Message-
From: jason hadoop [mailto:jason.had...@gmail.com] 
Sent: 2009年6月9日 20:47
To: core-user@hadoop.apache.org
Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09
summit here is a 50% off coupon corrected code is LUCKYOU

http://eBookshop.apress.com CODE LUCKYOU

-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals



Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded

2009-06-17 Thread akhil1988

Thanks Jason.

I went inside the code of the statement and found out that it eventually
makes some binaryRead function call to read a binary file and there it
strucks.

Do you know whether there is any problem in giving a binary file for
addition to the distributed cache. 
In the statement DistributedCache.addCacheFile(new
URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory
which contains some text as well as some binary files. In the statement 
Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can
see(in the output messages) that it is able to read the text files but it
gets struck at the binary files.

So, I think here the problem is: it is not able to read the binary files
which either have not been transferred to the cache or a binary file cannot
be read.

Do you know the solution to this?

Thanks,
Akhil


jason hadoop wrote:
 
 Something is happening inside of your (Parameters.
 readConfigAndLoadExternalData(Config/allLayer1.config);)
 code, and the framework is killing the job for not heartbeating for 600
 seconds
 
 On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote:
 

 One more thing, finally it terminates there (after some time) by giving
 the
 final Exception:

 java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
 at LbjTagger.NerTagger.main(NerTagger.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


 akhil1988 wrote:
 
  Thank you Jason for your reply.
 
  My Map class is an inner class and it is a static class. Here is the
  structure of my code.
 
  public class NerTagger {
 
  public static class Map extends MapReduceBase implements
  MapperLongWritable, Text, Text, Text{
  private Text word = new Text();
  private static NETaggerLevel1 tagger1 = new
  NETaggerLevel1();
  private static NETaggerLevel2 tagger2 = new
  NETaggerLevel2();
 
  Map(){
  System.out.println(HI2\n);
 
  Parameters.readConfigAndLoadExternalData(Config/allLayer1.config);
  System.out.println(HI3\n);
 
  Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true);
 
  System.out.println(loading the tagger);
 
 
 tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1);
  System.out.println(HI5\n);
 
 
 tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2);
  System.out.println(Done- loading the tagger);
  }
 
  public void map(LongWritable key, Text value,
  OutputCollectorText, Text output, Reporter reporter ) throws
 IOException
  {
  String inputline = value.toString();
 
  /* Processing of the input pair is done here */
  }
 
 
  public static void main(String [] args) throws Exception {
  JobConf conf = new JobConf(NerTagger.class);
  conf.setJobName(NerTagger);
 
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);
 
  conf.setMapperClass(Map.class);
  conf.setNumReduceTasks(0);
 
  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);
 
  conf.set(mapred.job.tracker, local);
  conf.set(fs.default.name, file:///);
 
  DistributedCache.addCacheFile(new
  URI(/home/akhil1988/Ner/OriginalNer/Data/), conf);
  DistributedCache.addCacheFile(new
  URI(/home/akhil1988/Ner/OriginalNer/Config/), conf);
  DistributedCache.createSymlink(conf);
 
 
  conf.set(mapred.child.java.opts,-Xmx4096m);
 
  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new
 Path(args[1]));
 
  System.out.println(HI1\n);
 
  JobClient.runJob(conf);
  }
 
  Jason, when the program executes HI1 and HI2 are printed but it does
 not
  reaches HI3. In the statement
  Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); it
 is
  able to access Config/allLayer1.config file 

Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded

2009-06-17 Thread jason hadoop
I have only ever used the distributed cache to add files, including binary
files such as shared libraries.
It looks like you are adding a directory.

The DistributedCache is not generally used for passing data, but for passing
file names.
The files must be stored in a shared file system (hdfs for simplicity)
already.

The distributed cache makes the names available to the tasks, and the the
files are extracted from hdfs and stored in the task local work area on each
task tracker node.
It looks like you may be storing the contents of your files in the
distributed cache.

On Wed, Jun 17, 2009 at 6:56 AM, akhil1988 akhilan...@gmail.com wrote:


 Thanks Jason.

 I went inside the code of the statement and found out that it eventually
 makes some binaryRead function call to read a binary file and there it
 strucks.

 Do you know whether there is any problem in giving a binary file for
 addition to the distributed cache.
 In the statement DistributedCache.addCacheFile(new
 URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory
 which contains some text as well as some binary files. In the statement
 Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can
 see(in the output messages) that it is able to read the text files but it
 gets struck at the binary files.

 So, I think here the problem is: it is not able to read the binary files
 which either have not been transferred to the cache or a binary file cannot
 be read.

 Do you know the solution to this?

 Thanks,
 Akhil


 jason hadoop wrote:
 
  Something is happening inside of your (Parameters.
  readConfigAndLoadExternalData(Config/allLayer1.config);)
  code, and the framework is killing the job for not heartbeating for 600
  seconds
 
  On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote:
 
 
  One more thing, finally it terminates there (after some time) by giving
  the
  final Exception:
 
  java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
  at LbjTagger.NerTagger.main(NerTagger.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
 
 
  akhil1988 wrote:
  
   Thank you Jason for your reply.
  
   My Map class is an inner class and it is a static class. Here is the
   structure of my code.
  
   public class NerTagger {
  
   public static class Map extends MapReduceBase implements
   MapperLongWritable, Text, Text, Text{
   private Text word = new Text();
   private static NETaggerLevel1 tagger1 = new
   NETaggerLevel1();
   private static NETaggerLevel2 tagger2 = new
   NETaggerLevel2();
  
   Map(){
   System.out.println(HI2\n);
  
   Parameters.readConfigAndLoadExternalData(Config/allLayer1.config);
   System.out.println(HI3\n);
  
   Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true);
  
   System.out.println(loading the tagger);
  
  
 
 tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1);
   System.out.println(HI5\n);
  
  
 
 tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2);
   System.out.println(Done- loading the
 tagger);
   }
  
   public void map(LongWritable key, Text value,
   OutputCollectorText, Text output, Reporter reporter ) throws
  IOException
   {
   String inputline = value.toString();
  
   /* Processing of the input pair is done here
 */
   }
  
  
   public static void main(String [] args) throws Exception {
   JobConf conf = new JobConf(NerTagger.class);
   conf.setJobName(NerTagger);
  
   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(IntWritable.class);
  
   conf.setMapperClass(Map.class);
   conf.setNumReduceTasks(0);
  
   conf.setInputFormat(TextInputFormat.class);
   conf.setOutputFormat(TextOutputFormat.class);
  
   conf.set(mapred.job.tracker, local);
   conf.set(fs.default.name, file:///);
  
   DistributedCache.addCacheFile(new
   

Re: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU

2009-06-17 Thread jason hadoop
You can purchase the ebook from www.apress.com. The final copy is now
available.
There is a 50% off coupon good for a few more days, LUCKYOU.

you can try prohadoop.ning.com as an alternative for www.prohadoopbook.com,
or www.prohadoop.com.

What error do you receive when you try to visit www.prohadoopbook.com ?

2009/6/17 zjffdu zjf...@gmail.com

 HI Jason,

 Where can I download your books' Alpha Chapters, I am very interested in
 your book about hadoop.

 And I cannot visit the link www.prohadoopbook.com



 -Original Message-
 From: jason hadoop [mailto:jason.had...@gmail.com]
 Sent: 2009年6月9日 20:47
 To: core-user@hadoop.apache.org
 Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09
 summit here is a 50% off coupon corrected code is LUCKYOU

 http://eBookshop.apress.com CODE LUCKYOU

 --
 Alpha Chapters of my book on Hadoop are available
 http://www.apress.com/book/view/9781430219422
 www.prohadoopbook.com a community for Hadoop Professionals




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU

2009-06-17 Thread zhang jianfeng
Hi Jason,

I still can not visit the links you provide, I am in china maybe some
network problem.

Could you send me the alpha chapters of your book? That would be
appreciated.


Thank you

Jeff Zhang



2009/6/17 jason hadoop jason.had...@gmail.com

 You can purchase the ebook from www.apress.com. The final copy is now
 available.
 There is a 50% off coupon good for a few more days, LUCKYOU.

 you can try prohadoop.ning.com as an alternative for www.prohadoopbook.com
 ,
 or www.prohadoop.com.

 What error do you receive when you try to visit www.prohadoopbook.com ?

 2009/6/17 zjffdu zjf...@gmail.com

  HI Jason,
 
  Where can I download your books' Alpha Chapters, I am very interested in
  your book about hadoop.
 
  And I cannot visit the link www.prohadoopbook.com
 
 
 
  -Original Message-
  From: jason hadoop [mailto:jason.had...@gmail.com]
  Sent: 2009年6月9日 20:47
  To: core-user@hadoop.apache.org
  Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the
 09
  summit here is a 50% off coupon corrected code is LUCKYOU
 
  http://eBookshop.apress.com CODE LUCKYOU
 
  --
  Alpha Chapters of my book on Hadoop are available
  http://www.apress.com/book/view/9781430219422
  www.prohadoopbook.com a community for Hadoop Professionals
 
 


 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



hadoop interaction with AFS

2009-06-17 Thread Brock Palen
Ran into an issue with running hadoop on a cluster that also has AFS  
installed.  When a user ssh's in they get an 'extra'  group id, I  
think it is called a 'pag',


Problem is when you try to start hadoop from a shell that has a pag,  
one of the checks in newer versions stops because there is no group  
name to go with the gid,


as an example from the 'groups'  command:

[bro...@nyx-login1 ~]$ groups
cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti  
hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id:  
cannot find name for group ID 1093742985

1093742985


If you ssh to the machine its self using ssh keys, the pag is not  
created, which our current work around, but is kinda kludgy,


If you need more information let me know.


Brock Palen
bro...@mlds-networks.com
www.mlds-networks.com
MLDS Owner Senior Tech.




Re: hadoop interaction with AFS

2009-06-17 Thread Brian Bockelman

Hey Brock,

I've seen a similar problem at another site.  They were able to solve  
this by upgrading their version of OpenAFS.  Is that an option for you?


Brian

On Jun 17, 2009, at 8:35 AM, Brock Palen wrote:

Ran into an issue with running hadoop on a cluster that also has AFS  
installed.  When a user ssh's in they get an 'extra'  group id, I  
think it is called a 'pag',


Problem is when you try to start hadoop from a shell that has a pag,  
one of the checks in newer versions stops because there is no group  
name to go with the gid,


as an example from the 'groups'  command:

[bro...@nyx-login1 ~]$ groups
cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti  
hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id:  
cannot find name for group ID 1093742985

1093742985


If you ssh to the machine its self using ssh keys, the pag is not  
created, which our current work around, but is kinda kludgy,


If you need more information let me know.


Brock Palen
bro...@mlds-networks.com
www.mlds-networks.com
MLDS Owner Senior Tech.





Re: hadoop interaction with AFS

2009-06-17 Thread Brock Palen




Hey Brock,

I've seen a similar problem at another site.  They were able to  
solve this by upgrading their version of OpenAFS.  Is that an  
option for you?


It might be, I see we are running 1.4.8 and they have 1.5 out,  not  
sure what our cell is running and compatibility,  good to know.




Brian

On Jun 17, 2009, at 8:35 AM, Brock Palen wrote:

Ran into an issue with running hadoop on a cluster that also has  
AFS installed.  When a user ssh's in they get an 'extra'  group  
id, I think it is called a 'pag',


Problem is when you try to start hadoop from a shell that has a  
pag, one of the checks in newer versions stops because there is no  
group name to go with the gid,


as an example from the 'groups'  command:

[bro...@nyx-login1 ~]$ groups
cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti  
hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id:  
cannot find name for group ID 1093742985

1093742985


If you ssh to the machine its self using ssh keys, the pag is not  
created, which our current work around, but is kinda kludgy,


If you need more information let me know.


Brock Palen
bro...@mlds-networks.com
www.mlds-networks.com
MLDS Owner Senior Tech.







Re: hadoop interaction with AFS

2009-06-17 Thread Faisal Khan
If re-building of Hadoop is an option then I guess you can replace 'groups'
command in Shell.java (src/core/org/apache/hadoop/util/Shell.java) with 'id
-rgn'  to get the correct group name.

--
Faisal



On Wed, Jun 17, 2009 at 10:35 AM, Brock Palen bro...@mlds-networks.comwrote:

 Ran into an issue with running hadoop on a cluster that also has AFS
 installed.  When a user ssh's in they get an 'extra'  group id, I think it
 is called a 'pag',

 Problem is when you try to start hadoop from a shell that has a pag, one of
 the checks in newer versions stops because there is no group name to go with
 the gid,

 as an example from the 'groups'  command:

 [bro...@nyx-login1 ~]$ groups
 cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti hyades
 adina chemkinpro coe molpro aces2 helios mcnp5 matlab id: cannot find name
 for group ID 1093742985
 1093742985


 If you ssh to the machine its self using ssh keys, the pag is not created,
 which our current work around, but is kinda kludgy,

 If you need more information let me know.


 Brock Palen
 bro...@mlds-networks.com
 www.mlds-networks.com
 MLDS Owner Senior Tech.





Trying to setup Cluster

2009-06-17 Thread Divij Durve
Im trying to setup a cluster with 3 different machines running Fedora. I
cant get them to log into the localhost without the password but thats the
least of my worries at the moment.

I am posting my config files and the master and slave files let me know if
anyone can spot a problem with the configs...


Hadoop-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namedfs.data.dir/name
value$HADOOP_HOME/dfs-data/value
finaltrue/final
  /property

 property
 namedfs.name.dir/name
 value$HADOOP_HOME/dfs-name/value
 finaltrue/final
   /property


property
  namehadoop.tmp.dir/name
value$HADOOP_HOME/hadoop-tmp/value
  descriptionA base for other temporary directories./description
  /property


property
  namefs.default.name/name
valuehdfs://gobi.something.something:54310/value
  descriptionThe name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl)
naming
the FileSystem implementation class.  The uri's authority is
used to
  determine the host, port, etc. for a FileSystem./description
  /property

property
  namemapred.job.tracker/name
valuekalahari.something.something:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
at.  If local, then jobs are run in-process as a single map
  and reduce task.
/description
/property

 property
 namemapred.system.dir/name
 value$HADOOP_HOME/mapred-system/value
 finaltrue/final
   /property

property
  namedfs.replication/name
value1/value
  descriptionDefault block replication.
The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create
time.
/description
/property


property
  namemapred.local.dir/name
value$HADOOP_HOME/mapred-local/value
  namedfs.replication/name
value1/value
/property


/configuration


Slave:
kongur.something.something

master:
kalahari.something.something

i execute the dfs-start.sh command from gobi.something.something.

is there any other info that i should provide in order to help? Also Kongur
is where im running the data node the master file on kongur should have
localhost in it rite? thanks for the help

Divij


Re: Anyway to sort keys before Reduce function in Hadoop ?

2009-06-17 Thread Kunsheng Chen

Thanks, Alex! It is really helpful, at least I know it is sorted in someway.

Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my keys 
are Integers, and I want them to be in Descend order, is it easy to do that ?


Thanks again,

-Kun

--- On Mon, 6/15/09, Alex Loddengaard a...@cloudera.com wrote:

 From: Alex Loddengaard a...@cloudera.com
 Subject: Re: Anyway to sort keys before Reduce function in Hadoop ?
 To: core-user@hadoop.apache.org
 Date: Monday, June 15, 2009, 11:53 PM
 Hey Kun,
 
 Keys given to a given reducer instance are given in sorted
 order.  Meaning,
 for a given reducer JVM instance, the reduce function will
 be called several
 times, once for each key.  The order in which the keys
 are given to the
 reduce function are sorted.  The sorting happens in
 the shuffle phase, which
 is basically partitioning and sorting.  That said, if
 you have one reducer
 (which isn't possible in large jobs), keys will be given to
 you in sorted
 order.
 
 You may be interested in the combiner phase, which is
 essentially a mini
 reduce that happens before data is transferred between
 mapper and reducer:
 
 http://wiki.apache.org/hadoop/HadoopMapReduce (grep
 for combine)
 
 You may also find these videos useful:
 http://www.cloudera.com/hadoop-training-mapreduce-hdfs
 http://www.cloudera.com/hadoop-training-programming-with-hadoop
 
 Hope this helps.  Let me know if I misunderstood your
 question.
 
 Alex
 
 On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen ke...@yahoo.com
 wrote:
 
 
  Hi everyone,
 
  Is there anyway to sort the keys before Reduce but
 after Map ?
 
 
  I also think of sorting keys myself in Reduce
 function, but it might take
  too many memory once the number of results getting
 large.
 
  I am thinking of using some numeric value as keys in
 Reduce (which was
  calculate by Map). If it is possible, I could output
 my results by some
  orders easily.
 
 
  Thanks in advance,
 
  -Kun
 
 
 
 
 


  


Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded

2009-06-17 Thread akhil1988

Hi Jason!

Thanks for going with me to solve my problem.

To restate things and make it more easier to understand: I am working in
local mode in the directory which contains the job jar and also the Config
and Data directories.

I just removed the following three statements from my code:
 DistributedCache.addCacheFile(new
 URI(/home/akhil1988/Ner/OriginalNer/Data/), conf);
 DistributedCache.addCacheFile(new
 URI(/home/akhil1988/Ner/OriginalNer/Config/), conf);
 DistributedCache.createSymlink(conf);

The program executes till the same point as before now also and terminates.
That means the above three statements are of no use while working in local
mode. In local mode, the working directory for the mapreduce tasks becomes
the current woking direcotry in which you started the hadoop command to
execute the job.

Since I have removed the DistributedCache.add. statements there should
be no issue whether I am giving a file name or a directory name as argument
to it. Now it seems to me that there is some problem in reading the binary
file using binaryRead.

Please let me know if I am going wrong anywhere.

Thanks,
Akhil
 




jason hadoop wrote:
 
 I have only ever used the distributed cache to add files, including binary
 files such as shared libraries.
 It looks like you are adding a directory.
 
 The DistributedCache is not generally used for passing data, but for
 passing
 file names.
 The files must be stored in a shared file system (hdfs for simplicity)
 already.
 
 The distributed cache makes the names available to the tasks, and the the
 files are extracted from hdfs and stored in the task local work area on
 each
 task tracker node.
 It looks like you may be storing the contents of your files in the
 distributed cache.
 
 On Wed, Jun 17, 2009 at 6:56 AM, akhil1988 akhilan...@gmail.com wrote:
 

 Thanks Jason.

 I went inside the code of the statement and found out that it eventually
 makes some binaryRead function call to read a binary file and there it
 strucks.

 Do you know whether there is any problem in giving a binary file for
 addition to the distributed cache.
 In the statement DistributedCache.addCacheFile(new
 URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory
 which contains some text as well as some binary files. In the statement
 Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I
 can
 see(in the output messages) that it is able to read the text files but it
 gets struck at the binary files.

 So, I think here the problem is: it is not able to read the binary files
 which either have not been transferred to the cache or a binary file
 cannot
 be read.

 Do you know the solution to this?

 Thanks,
 Akhil


 jason hadoop wrote:
 
  Something is happening inside of your (Parameters.
  readConfigAndLoadExternalData(Config/allLayer1.config);)
  code, and the framework is killing the job for not heartbeating for 600
  seconds
 
  On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com
 wrote:
 
 
  One more thing, finally it terminates there (after some time) by
 giving
  the
  final Exception:
 
  java.io.IOException: Job failed!
 at
 org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
  at LbjTagger.NerTagger.main(NerTagger.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
 
 
  akhil1988 wrote:
  
   Thank you Jason for your reply.
  
   My Map class is an inner class and it is a static class. Here is the
   structure of my code.
  
   public class NerTagger {
  
   public static class Map extends MapReduceBase implements
   MapperLongWritable, Text, Text, Text{
   private Text word = new Text();
   private static NETaggerLevel1 tagger1 = new
   NETaggerLevel1();
   private static NETaggerLevel2 tagger2 = new
   NETaggerLevel2();
  
   Map(){
   System.out.println(HI2\n);
  
   Parameters.readConfigAndLoadExternalData(Config/allLayer1.config);
   System.out.println(HI3\n);
  
  
 Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true);
  
   System.out.println(loading the tagger);
  
  
 
 tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1);
   System.out.println(HI5\n);
  
  
 
 

Re: MapContext.getInputSplit() returns nothing

2009-06-17 Thread Roshan James
Thanks, it looks like I can write a line reader in C++ that roughly does
what the Java version does. This also means that I can deserialise my own
custom formats as well. Thanks!

Roshan

On Tue, Jun 16, 2009 at 12:22 PM, Owen O'Malley omal...@apache.org wrote:

 Sorry, I forget how much isn't clear to people who are just starting.

 FileInputFormat creates FileSplits. The serialization is very stable and
 can't be changed without breaking things. The reason that pipes can't
 stringify it is that the string form of input splits are ambiguous (and
 since it is user code, we really can't make assumptions about it). The
 format of FileSplit is:

 16 bit filename byte length
 filename in bytes
 64 bit offset
 64 bit length

 Technically the filename uses a funky utf-8 encoding, but in practice as
 long as the filename has ascii characters they are ascii. Look at
 org.apache.hadoop.io.UTF.writeString for the precise definition.

 -- Owen



NullPointerException running jps

2009-06-17 Thread Richa Khandelwal
Hi,

I am getting a NullPointerException  trying to run the jps command, which is
kind of weird. Anyone has any idea on this?

Thanks,

-- 
Richa Khandelwal
University of California,
Santa Cruz
CA


Re: Anyway to sort keys before Reduce function in Hadoop ?

2009-06-17 Thread Chuck Lam
an alternative is to create a new WritableComparator and then set it
in the JobConf object with the method setOutputKeyComparatorClass().
You can use IntWritable.Comparator as a start.


On Wed, Jun 17, 2009 at 9:37 AM, tim robertson
timrobertson...@gmail.com wrote:

 I think you can do this by creating your own key type extending IntWritable
 and override the compareTo method to implement this.
 Cheers

 Tim




 On Wed, Jun 17, 2009 at 6:34 PM, Kunsheng Chen ke...@yahoo.com wrote:

 
  Thanks, Alex! It is really helpful, at least I know it is sorted in
  someway.
 
  Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my
  keys are Integers, and I want them to be in Descend order, is it easy to do
  that ?
 
 
  Thanks again,
 
  -Kun
 
  --- On Mon, 6/15/09, Alex Loddengaard a...@cloudera.com wrote:
 
   From: Alex Loddengaard a...@cloudera.com
   Subject: Re: Anyway to sort keys before Reduce function in Hadoop ?
   To: core-user@hadoop.apache.org
   Date: Monday, June 15, 2009, 11:53 PM
   Hey Kun,
  
   Keys given to a given reducer instance are given in sorted
   order.  Meaning,
   for a given reducer JVM instance, the reduce function will
   be called several
   times, once for each key.  The order in which the keys
   are given to the
   reduce function are sorted.  The sorting happens in
   the shuffle phase, which
   is basically partitioning and sorting.  That said, if
   you have one reducer
   (which isn't possible in large jobs), keys will be given to
   you in sorted
   order.
  
   You may be interested in the combiner phase, which is
   essentially a mini
   reduce that happens before data is transferred between
   mapper and reducer:
  
   http://wiki.apache.org/hadoop/HadoopMapReduce (grep
   for combine)
  
   You may also find these videos useful:
   http://www.cloudera.com/hadoop-training-mapreduce-hdfs
   http://www.cloudera.com/hadoop-training-programming-with-hadoop
  
   Hope this helps.  Let me know if I misunderstood your
   question.
  
   Alex
  
   On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen ke...@yahoo.com
   wrote:
  
   
Hi everyone,
   
Is there anyway to sort the keys before Reduce but
   after Map ?
   
   
I also think of sorting keys myself in Reduce
   function, but it might take
too many memory once the number of results getting
   large.
   
I am thinking of using some numeric value as keys in
   Reduce (which was
calculate by Map). If it is possible, I could output
   my results by some
orders easily.
   
   
Thanks in advance,
   
-Kun
   
   
   
   
  
 
 
 
 


JobControl for Pipes?

2009-06-17 Thread Roshan James
Hello, Is there any way to express dependencies between map-reduce jobs
(such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs?  The
provided header Pipes.hh does not seem to reflect any such capabilities.

best,
Roshan


Queries throwing errors.

2009-06-17 Thread Divij Durve
select count(1) from test;

Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Starting Job = job_200906171546_0001, Tracking URL =
http://gobi.mssm.edu:50030/jobdetails.jsp?jobid=job_200906171546_0001
Kill Command =
/home/divij/hive/build/hadoopcore/hadoop-0.19.0/bin/../bin/hadoop job
-Dmapred.job.tracker=gobi.mssm.edu:54211 -kill job_200906171546_0001
2009-06-17 03:47:22,408 map = 0%,  reduce =0%
2009-06-17 03:47:23,414 map = 0%,  reduce =0%
2009-06-17 03:47:24,420 map = 0%,  reduce =0%
2009-06-17 03:47:25,429 map = 0%,  reduce =0%
2009-06-17 03:47:26,432 map = 0%,  reduce =0%
2009-06-17 03:47:27,436 map = 0%,  reduce =0%
2009-06-17 03:47:28,443 map = 0%,  reduce =0%
2009-06-17 03:47:29,448 map = 0%,  reduce =0%
2009-06-17 03:47:30,453 map = 0%,  reduce =0%
2009-06-17 03:47:31,457 map = 0%,  reduce =0%
2009-06-17 03:47:32,461 map = 0%,  reduce =0%
2009-06-17 03:47:33,465 map = 0%,  reduce =0%
2009-06-17 03:47:34,468 map = 0%,  reduce =0%
2009-06-17 03:47:35,472 map = 0%,  reduce =0%
2009-06-17 03:47:36,476 map = 0%,  reduce =0%
2009-06-17 03:47:37,480 map = 0%,  reduce =0%
2009-06-17 03:47:38,484 map = 0%,  reduce =0%
2009-06-17 03:47:39,488 map = 0%,  reduce =0%
2009-06-17 03:47:40,491 map = 0%,  reduce =0%
2009-06-17 03:47:41,495 map = 0%,  reduce =0%
2009-06-17 03:47:42,498 map = 0%,  reduce =0%
2009-06-17 03:47:43,502 map = 0%,  reduce =0%
2009-06-17 03:47:44,509 map = 100%,  reduce =100%
Ended Job = job_200906171546_0001 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver


I have no idea whats wrong. I thought this b


Re: Anyway to sort keys before Reduce function in Hadoop ?

2009-06-17 Thread Owen O'Malley
On Wed, Jun 17, 2009 at 12:26 PM, Chuck Lam chuck@gmail.com wrote:

 an alternative is to create a new WritableComparator and then set it
 in the JobConf object with the method setOutputKeyComparatorClass().
 You can use IntWritable.Comparator as a start.


The important part of that is to define a RawComparator for your key class
and call JobConf.setOutputKeyComparatorClass with it.So if you wanted to
invert the default sort order of IntWritable keys, you could:

public class InvertedIntWritableComparator extends IntWritable.Comparator {

public int compare(byte[] b1, int s1, int l1,
   byte[] b2, int s2, int l2) {
  return -1 * super.compare(b1,s1,l1,b2,s2,l2);
}
}

then

job.setOutputKeyComparatorClass(InvertedIntWritableComparator.class);

-- Owen


Restrict output of mappers to reducers running on same node?

2009-06-17 Thread Tarandeep Singh
Hi,

Can I restrict the output of mappers running on a node to go to reducer(s)
running on the same node?

Let me explain why I want to do this-

I am converting huge number of XML files into SequenceFiles. So
theoretically I don't even need reducers, mappers would read xml files and
output Sequencefiles. But the problem with this approach is I will end up
getting huge number of small output files.

To avoid generating large number of smaller files, I can Identity reducers.
But by running reducers, I am unnecessarily transfering data over network. I
ran some test case using a small subset of my data (~90GB). With map only
jobs, my cluster finished conversion in only 6 minutes. But with map and
Identity reducers job, it takes around 38 minutes.

I have to process close to a terabyte of data. So I was thinking of a faster
alternatives-

* Writing a custom OutputFormat
* Somehow restrict output of mappers running on a node to go to reducers
running on the same node. May be I can write my own partitioner (simple) but
not sure how Hadoop's framework assigns partitions to reduce tasks.

Any pointers ?

Or this is not possible at all ?

Thanks,
Tarandeep


Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2009-06-17 Thread Viral K

Does anybody have any updates on this?

How can we have our own RecordReader in Hadoop pipes?  When I try to print
the context.getInputSplit, I get the filenames along with some junk
characters.  As a result the file open fails.

Anybody got it working?

Viral.



11 Nov. wrote:
 
 I traced into the c++ recordreader code:
   WordCountReader(HadoopPipes::MapContext context) {
 std::string filename;
 HadoopUtils::StringInStream stream(context.getInputSplit());
 HadoopUtils::deserializeString(filename, stream);
 struct stat statResult;
 stat(filename.c_str(), statResult);
 bytesTotal = statResult.st_size;
 bytesRead = 0;
 cout  filenameendl;
 file = fopen(filename.c_str(), rt);
 HADOOP_ASSERT(file != NULL, failed to open  + filename);
   }
 
 I got nothing for the filename virable, which showed the InputSplit is
 empty.
 
 2008/3/4, 11 Nov. nov.eleve...@gmail.com:

 hi colleagues,
I have set up the single node cluster to test pipes examples.
wordcount-simple and wordcount-part work just fine. but
 wordcount-nopipe can't run. Here is my commnad line:

  bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input
 input/ -output out-dir-nopipe1

 and here is the error message printed on my console:

 08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or JobConf#setJar(String).
 08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 08/03/03 23:23:07 INFO mapred.JobClient: Running job:
 job_200803032218_0004
 08/03/03 23:23:08 INFO mapred.JobClient:  map 0% reduce 0%
 08/03/03 23:23:11 INFO mapred.JobClient: Task Id :
 task_200803032218_0004_m_00_0, Status : FAILED
 java.io.IOException: pipe child exception
 at org.apache.hadoop.mapred.pipes.Application.abort(
 Application.java:138)
 at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(
 PipesMapRunner.java:83)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(
 TaskTracker.java:1787)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readByte(DataInputStream.java:250)
 at
 org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java
 :313)
 at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java
 :335)
 at
 org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(
 BinaryProtocol.java:112)

 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0: Hadoop Pipes Exception: failed to open
 at /home/hadoop/hadoop-0.15.2-single-cluster
 /src/examples/pipes/impl/wordcount-nopipe.cc:67 in
 WordCountReader::WordCountReader(HadoopPipes::MapContext)


 Could anybody tell me how to fix this? That will be appreciated.
 Thanks a lot!

 
 

-- 
View this message in context: 
http://www.nabble.com/Pipes-example-wordcount-nopipe.cc-failed-when-reading-from-input-splits-tp15807856p24084734.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Announcing CloudBase-1.3.1 release

2009-06-17 Thread zsongbo
How about the index of CloudBase?

On Wed, Jun 17, 2009 at 4:16 AM, Ru, Yanbo y...@business.com wrote:


 Hi,

 We have released 1.3.1 version of CloudBase on sourceforge-
 https://sourceforge.net/projects/cloudbase

 CloudBase is a data warehouse system for Terabyte  Petabyte scale
 analytics. It is built on top of Map-Reduce architecture. It allows you to
 query flat log files using ANSI SQL.

 Please give it a try and send us your feedback.

 Thanks,

 Yanbo

 Release notes -

 New Features:
 * CREATE CSV tables - One can create tables on top of data in CSV (Comma
 Separated Values) format and query them using SQL. Current implementation
 doesn't accept CSV records which span multiple lines. Data may not be
 processed correctly if a field contains embedded line-breaks. Please visit
 http://cloudbase.sourceforge.net/index.html#userDoc for detailed
 specification of the CSV format.

 Bug fixes:
 * Aggregate function 'AVG' returns the same value as 'SUM' function
 * If a query has multiple aliases, only the last alias works



Re: NullPointerException running jps

2009-06-17 Thread Praveen Yarlagadda
Can you post the stack trace?

On Wed, Jun 17, 2009 at 12:21 PM, Richa Khandelwal richa@gmail.comwrote:

 Hi,

 I am getting a NullPointerException  trying to run the jps command, which
 is
 kind of weird. Anyone has any idea on this?

 Thanks,

 --
 Richa Khandelwal
 University of California,
 Santa Cruz
 CA




-- 
Regards,
Praveen


Re: Restrict output of mappers to reducers running on same node?

2009-06-17 Thread Jothi Padmanabhan
You could look at CombineFileInputFormat to generate a single split out of
several files.

Your partitioner would be able to assign keys to specific reducers, but you
would not have control on which node a given reduce task will run.

Jothi


On 6/18/09 5:10 AM, Tarandeep Singh tarand...@gmail.com wrote:

 Hi,
 
 Can I restrict the output of mappers running on a node to go to reducer(s)
 running on the same node?
 
 Let me explain why I want to do this-
 
 I am converting huge number of XML files into SequenceFiles. So
 theoretically I don't even need reducers, mappers would read xml files and
 output Sequencefiles. But the problem with this approach is I will end up
 getting huge number of small output files.
 
 To avoid generating large number of smaller files, I can Identity reducers.
 But by running reducers, I am unnecessarily transfering data over network. I
 ran some test case using a small subset of my data (~90GB). With map only
 jobs, my cluster finished conversion in only 6 minutes. But with map and
 Identity reducers job, it takes around 38 minutes.
 
 I have to process close to a terabyte of data. So I was thinking of a faster
 alternatives-
 
 * Writing a custom OutputFormat
 * Somehow restrict output of mappers running on a node to go to reducers
 running on the same node. May be I can write my own partitioner (simple) but
 not sure how Hadoop's framework assigns partitions to reduce tasks.
 
 Any pointers ?
 
 Or this is not possible at all ?
 
 Thanks,
 Tarandeep



Re: Restrict output of mappers to reducers running on same node?

2009-06-17 Thread jason hadoop
You can open your sequence file in the mapper configure method, write to it
in your map, and close it in the mapper close method.
Then you end up with 1 sequence file per map. I am making an assumption that
each key,value to your map some how represents a single xml file/item.

On Wed, Jun 17, 2009 at 7:29 PM, Jothi Padmanabhan joth...@yahoo-inc.comwrote:

 You could look at CombineFileInputFormat to generate a single split out of
 several files.

 Your partitioner would be able to assign keys to specific reducers, but you
 would not have control on which node a given reduce task will run.

 Jothi


 On 6/18/09 5:10 AM, Tarandeep Singh tarand...@gmail.com wrote:

  Hi,
 
  Can I restrict the output of mappers running on a node to go to
 reducer(s)
  running on the same node?
 
  Let me explain why I want to do this-
 
  I am converting huge number of XML files into SequenceFiles. So
  theoretically I don't even need reducers, mappers would read xml files
 and
  output Sequencefiles. But the problem with this approach is I will end up
  getting huge number of small output files.
 
  To avoid generating large number of smaller files, I can Identity
 reducers.
  But by running reducers, I am unnecessarily transfering data over
 network. I
  ran some test case using a small subset of my data (~90GB). With map only
  jobs, my cluster finished conversion in only 6 minutes. But with map and
  Identity reducers job, it takes around 38 minutes.
 
  I have to process close to a terabyte of data. So I was thinking of a
 faster
  alternatives-
 
  * Writing a custom OutputFormat
  * Somehow restrict output of mappers running on a node to go to reducers
  running on the same node. May be I can write my own partitioner (simple)
 but
  not sure how Hadoop's framework assigns partitions to reduce tasks.
 
  Any pointers ?
 
  Or this is not possible at all ?
 
  Thanks,
  Tarandeep




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: JobControl for Pipes?

2009-06-17 Thread jason hadoop
Job control is coming with the Hadoop WorkFlow manager, in the mean time
there is cascade by chris wensel. I do not have any personal experience with
either. I do not know how pipes interacts with either.

On Wed, Jun 17, 2009 at 12:43 PM, Roshan James 
roshan.james.subscript...@gmail.com wrote:

 Hello, Is there any way to express dependencies between map-reduce jobs
 (such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs?  The
 provided header Pipes.hh does not seem to reflect any such capabilities.

 best,
 Roshan




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Hadoop Eclipse Plugin

2009-06-17 Thread Praveen Yarlagadda
Hi,

I have a problem configuring Hadoop Map/Reduce plugin with Eclipse.

Setup Details:

I have a namenode, a jobtracker and two data nodes, all running on ubuntu.
My set up works fine with example programs. I want to connect to this setup
from eclipse.

namenode - 10.20.104.62 - 54310(port)
jobtracker - 10.20.104.53 - 54311(port)

I run eclipse on a different windows m/c. I want to configure map/reduce
plugin
with eclipse, so that I can access HDFS from windows.

Map/Reduce master
Host - With jobtracker IP, it did not work
Port - With jobtracker port, it did not work

DFS master
Host - With namenode IP, It did not work
Port - With namenode port, it did not work

I tried other combination too by giving namenode details for Map/Reduce
master
and jobtracker details for DFS master. It did not work either.

If anyone has configured plugin with eclipse, please let me know. Even the
pointers
to how to configure it will be highly appreciated.

Thanks,
Praveen


Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2009-06-17 Thread Jianmin Woo

I tried this example and it seems that the input/output should only be in 
file:///... format to get correct results.

- Jianmin





From: Viral K khaju...@yahoo-inc.com
To: core-user@hadoop.apache.org
Sent: Thursday, June 18, 2009 8:57:47 AM
Subject: Re: Pipes example wordcount-nopipe.cc failed when reading from input 
splits


Does anybody have any updates on this?

How can we have our own RecordReader in Hadoop pipes?  When I try to print
the context.getInputSplit, I get the filenames along with some junk
characters.  As a result the file open fails.

Anybody got it working?

Viral.



11 Nov. wrote:
 
 I traced into the c++ recordreader code:
   WordCountReader(HadoopPipes::MapContext context) {
 std::string filename;
 HadoopUtils::StringInStream stream(context.getInputSplit());
 HadoopUtils::deserializeString(filename, stream);
 struct stat statResult;
 stat(filename.c_str(), statResult);
 bytesTotal = statResult.st_size;
 bytesRead = 0;
 cout  filenameendl;
 file = fopen(filename.c_str(), rt);
 HADOOP_ASSERT(file != NULL, failed to open  + filename);
   }
 
 I got nothing for the filename virable, which showed the InputSplit is
 empty.
 
 2008/3/4, 11 Nov. nov.eleve...@gmail.com:

 hi colleagues,
I have set up the single node cluster to test pipes examples.
wordcount-simple and wordcount-part work just fine. but
 wordcount-nopipe can't run. Here is my commnad line:

  bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input
 input/ -output out-dir-nopipe1

 and here is the error message printed on my console:

 08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or JobConf#setJar(String).
 08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 08/03/03 23:23:07 INFO mapred.JobClient: Running job:
 job_200803032218_0004
 08/03/03 23:23:08 INFO mapred.JobClient:  map 0% reduce 0%
 08/03/03 23:23:11 INFO mapred.JobClient: Task Id :
 task_200803032218_0004_m_00_0, Status : FAILED
 java.io.IOException: pipe child exception
 at org.apache.hadoop.mapred.pipes.Application.abort(
 Application.java:138)
 at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(
 PipesMapRunner.java:83)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(
 TaskTracker.java:1787)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readByte(DataInputStream.java:250)
 at
 org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java
 :313)
 at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java
 :335)
 at
 org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(
 BinaryProtocol.java:112)

 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0:
 task_200803032218_0004_m_00_0: Hadoop Pipes Exception: failed to open
 at /home/hadoop/hadoop-0.15.2-single-cluster
 /src/examples/pipes/impl/wordcount-nopipe.cc:67 in
 WordCountReader::WordCountReader(HadoopPipes::MapContext)


 Could anybody tell me how to fix this? That will be appreciated.
 Thanks a lot!

 
 

-- 
View this message in context: 
http://www.nabble.com/Pipes-example-wordcount-nopipe.cc-failed-when-reading-from-input-splits-tp15807856p24084734.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.