Re: capacity - resource based scheduling

2012-04-18 Thread Harsh J
Corbin, Have you got JVM reuse on? Those extra JVMs may just be from use of that (there've been some issues lately with JVM reuse that may be the cause). CS does not launch up JVMs to fill up/reserve slots. On Wed, Apr 18, 2012 at 9:43 AM, Corbin Hoenes cor...@tynt.com wrote: I have a

Re: compression

2012-04-18 Thread Harsh J
Hey, Data in HBase is compressed upon compaction/flushes (i.e. upon creation of the storefiles). Hence the compression is also done over blocks of data (akin to SequenceFiles) and is efficient. The memstore isn't kept compressed nor is the WAL. RPCs in Apache HBase aren't compressed yet, but

Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Hello, I'd like to ask you if there is a possibility of setting a timeout for processing one input line of text input in mapper function. The idea is, that if processing of one line is too long, Hadoop will cut this process and continue processing next input line. Thank you for your answer.

Re: Setting a timeout for one Map() input processing

2012-04-18 Thread Michel Segel
Multiple threads within the mapper where you have the main thread starting a timeout thread and a process thread. Take the result of the thread that finishes first, ignoring the other and killing it all within the Mapper.map() method? Sure it seems possible. (you output from the time out thread

Re: Setting a timeout for one Map() input processing

2012-04-18 Thread Harsh J
Since you're looking for per-line (and not per-task/file) monitoring, this is best done by your own application code (a monitoring thread, etc.). On Wed, Apr 18, 2012 at 6:09 PM, Ondřej Klimpera klimp...@fit.cvut.cz wrote: Hello, I'd like to ask you if there is a possibility of setting a timeout

Re: Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Thanks, I'll try to implement it and get you know if it worked. On 04/18/2012 04:07 PM, Harsh J wrote: Since you're looking for per-line (and not per-task/file) monitoring, this is best done by your own application code (a monitoring thread, etc.). On Wed, Apr 18, 2012 at 6:09 PM, Ondřej

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-18 Thread praveenesh kumar
Hi, Sweet.. Can you please elaborate how can I tweak my configs to make CDH4/hadoop-0.23 run in 1.5GB RAM VM. Regards, Praveenesh On Wed, Apr 18, 2012 at 8:42 AM, Harsh J ha...@cloudera.com wrote: Praveenesh, Speaking minimally (and thereby requiring less tweaks on your end), 1.5 GB would

Re: Profiling Hadoop Job

2012-04-18 Thread Leonardo Urbina
Sorry it took so long to respond, however that did solve it. Thanks! On Thu, Mar 8, 2012 at 7:37 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: The JobClient is trying to download the profile output to the local directory. It seems like you don't have write permissions in the

Has anyone installed HCE and built it successfully?

2012-04-18 Thread Mark question
Hey guys, I've been stuck with HCE installation for two days now and can't figure out the problem. Errors I get from running (sh build.sh) is can not execute binary file . I tried setting my JAVA_HOME and ANT_HOME manually and using the script build.sh, no luck. So, please if you've used HCE

Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-18 Thread Sky USC
Please help me architect the design of my first significant MR task beyond word count. My program works well. but I am trying to optimize performance to maximize use of available computing resources. I have 3 questions at the bottom. Project description in an abstract sense (written in

Re: capacity - resource based scheduling

2012-04-18 Thread Corbin Hoenes
Yes JVM reuse is on. I'll turn off and see if it goes away :) thanks! On Apr 18, 2012, at 12:03 AM, Harsh J wrote: Corbin, Have you got JVM reuse on? Those extra JVMs may just be from use of that (there've been some issues lately with JVM reuse that may be the cause). CS does not

How to rebuild NameNode from DataNode.

2012-04-18 Thread Saburo Fujioka
Hello, I do a tentative plan of operative trouble countermeasures of a system currently now. If when NameNode has been lost, but are investigating the means to rebuild the remaining NameNode from DataNode, I don't know at the moment. Were consistent with those of the DataNode is the namespaceID

Re: How to rebuild NameNode from DataNode.

2012-04-18 Thread Harsh J
This isn't possible to do. DN holds no metadata about what file the block belongs to. Redundant copies (2-3, at least one off-machine) of dfs.name.dir, aside from has not proven bad in recovery situations in my experience yet. You should be just fine with adequate redundancy and suitably periodic