Class loading in Hadoop and HBase

2014-03-19 Thread Amit Sela
Hi all, I'm running with Hadoop 1.0.4 and HBase 0.94.12 bundled (OSGi) versions I built. Most issues I encountered are related to class loaders. One of the patterns I noticed in both projects is: ClassLoader cl = Thread.currentThread().getContextClassLoader(); if(cl == null) { cl

Fair scheduler not assigning tasks

2014-01-20 Thread Amit Sela
I've been using Fair Scheduler with Hadoop 1.0.4 for f few months now with no issues what so ever. All of a sudden I have a problem where jobs are in status UNASSIGNED. Jobs submitted are pending for map/reduce slots although the cluster resources are free. In some of the pools only map slots are a

Custom counters in combiner

2014-01-13 Thread Amit Sela
Hi all, I'm running a mapreduce job that has custom counters incremented in the combiner's reduce function. Looking at the mapreduce web UI I see that, like all counters, its has three columns: Map, Reduce and Total. >From what I know, the combiner is executed on the map output, hence runs in Mapp

Re: manipulating key in combine phase

2014-01-13 Thread Amit Sela
ng of the combiner is that it is like a “mapper-side >> pre-reducer” and operates on blocks of data that have already been sorted >> by key, so mucking with the keys doesn’t **seem** like a good idea. >> >> john >> >> >> >> *From:* Amit Sela [mailto:a

manipulating key in combine phase

2014-01-12 Thread Amit Sela
Hi all, I was wondering if it is possible to manipulate the key during combine: Say I have a mapreduce job where the key has many qualifiers. I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers. In the combiner class I would do something like: int coun

Re: Setting up Snappy compression in Hadoop

2014-01-02 Thread Amit Sela
2 PM, Ted Yu wrote: > >> Please take a look at >> http://hbase.apache.org/book.html#snappy.compression >> >> Cheers >> >> >> On Wed, Jan 1, 2014 at 8:05 AM, Amit Sela wrote: >> >>> Hi all, >>> >>>

Setting up Snappy compression in Hadoop

2014-01-01 Thread Amit Sela
Hi all, I'm running on Hadoop 1.0.4 and I'd like to use Snappy for map output compression. I'm adding the configurations: configuration.setBoolean("mapred.compress.map.output", true); configuration.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec"); And I've

Add machine with bigger storage to cluster

2013-09-30 Thread Amit Sela
I would like to add new machines to my existing cluster but they won't be similar to the current nodes. I have to scenarios I'm thinking of: 1. What are the implications (besides initial load balancing) of adding a new node to the cluster, if this node runs on a machine similar to all other nodes

Bzip2 vs Gzip

2013-09-17 Thread Amit Sela
Hi all, I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size files). I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean that any input file bigger then block size will be split between maps ? What are the tradeoffs betw

Fair Scheduler pools regardless of users

2013-07-08 Thread Amit Sela
Hi all, I was wondering if there is a way to let fair scheduler ignore the user and submit a job to a specific pool. I would like to have 3/4 pools: 1. Very short (~1 min) routine jobs. 2. Normal processing time (<1 hr) routine jobs. 3. Long (days) experimental jobs. 4. ? ad hoc immediate jobs ?

Capacity scheduler for dividing cluster resousrces

2013-07-07 Thread Amit Sela
Hi everyone, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines) and I would like to divide my cluster resources by job's process time. The jobs running on the cluster can be divided as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. V

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and th

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and th

Re: Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Sorry, Gmail tab error, please disregard and I will re-send, Thanks. On Sat, Jul 6, 2013 at 5:02 PM, Amit Sela wrote: > Hi all, > > I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). > The jobs running on the cluster can be divided (resource wise) as follows: > >

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:

Failing to run ant test on clean Hadoop branch-1 checkout

2013-04-27 Thread Amit Sela
Hi all, I'm trying to run ant test on a clean Hadoop branch-1 checkout. ant works fine but when I run ant test I get a lot of failures: Test org.apache.hadoop.cli.TestCLI FAILED Test org.apache.hadoop.fs.TestFileUtil FAILED Test org.apache.hadoop.fs.TestHarFileSystem FAILED Test org.apache.hadoop

Re: Configuration clone constructor not cloning classloader

2013-04-21 Thread Amit Sela
issues.apache.org/jira/browse/HADOOP-6103, although the fix > never made it into branch-1. Can you create a branch-1 patch for this > please? > > Thanks, > Tom > > On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela wrote: > > Hi all, > > > > I was wondering if there is

Configuration clone constructor not cloning classloader

2013-04-18 Thread Amit Sela
Hi all, I was wondering if there is a good reason why public Configuration(Configuration other) constructor in Hadoop 1.0.4 doesn't clone the classloader in "other" to the new Configration ? Is this a bug ? I'm asking because I'm trying to run a Hadoop client in OSGI environment and I need to pa

Setting up a Hadoop client in OSGI bundle

2013-04-17 Thread Amit Sela
Hi all, I'm trying to setup an Hadoop client for job submissions (and more) as an OSGI bundle. I came over a lot of hardships but I'm kinda stuck now. When I create a new Job for submission I setClassLoader() for the Job Configuration so that it would use the bundle's ClassLoader (felix), but w

Re: Submitting mapreduce and nothing happens

2013-04-17 Thread Amit Sela
11_02job_201304150711_37 while the webapp shows 11 submissions that were actually executed (not remotely...) On Wed, Apr 17, 2013 at 6:40 AM, Zizon Qiu wrote: > try use job.waitFromComplete(true) instead of job.submit(). > it should show more details. > > > On Mon, Apr 15, 2013 a

Re: Submitting mapreduce and nothing happens

2013-04-16 Thread Amit Sela
Nothing on JT log, but as I mentioned I see this in the client log: [WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [INFO ] org.apache.hadoop.mapred.JobClient » Cleaning up the staging are

Re: Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
Reading my own message I understand that maybe it's not clear so just to clarify - the previously mentioned JT ID is indeed the correct ID. Thanks. On Apr 15, 2013 4:35 PM, "Amit Sela" wrote: > This is the JT ID and there is no problem running jobs from command line, > jus

Re: Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
This is the JT ID and there is no problem running jobs from command line, just remote. On Apr 15, 2013 4:24 PM, "Harsh J" wrote: > Thats interesting; is the JT you're running on the cluster started > with the ID 201304150711 or something else? > > On Mon, Apr 15,

Re: Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
ing, or the cluster doesn't run anything? > > On Mon, Apr 15, 2013 at 3:36 PM, Amit Sela wrote: > > Hi all, > > > > I'm trying to submit a mapreduce job remotely using job.submit() > > > > I get the following: > > > > [WARN ] org.apache

Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
Hi all, I'm trying to submit a mapreduce job remotely using job.submit() I get the following: [WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [INFO ] org.apache.hadoop.mapred.JobClient »

Re: Child error

2013-03-13 Thread Amit Sela
10x On Wed, Mar 13, 2013 at 1:56 PM, Azuryy Yu wrote: > dont wait patch, its a very simple fix. just do it. > On Mar 13, 2013 5:04 PM, "Amit Sela" wrote: > >> But the patch will work on 1.0.4 correct ? >> >> On Wed, Mar 13, 2013 at 4:57 AM, George Datsk

Re: Child error

2013-03-13 Thread Amit Sela
on for this bug 1.1.2 > > > George > > > or https://issues.apache.org/jira/browse/MAPREDUCE-4857 > > Which is fixed in 1.0.4 > > ** ** > > ** ** > > *From:* Amit Sela [mailto:am...@infolinks.com ] > *Sent:* Tuesday, March 12, 2013 5:08 AM > *

Re: Child error

2013-03-12 Thread Amit Sela
houldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/bro

Child error

2013-03-12 Thread Amit Sela
Hi all, I have a weird failure occurring every now and then during a MapReduce job. This is the error: *java.lang.Throwable: Child Error* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)* *Caused by: java.io.IOException: Task process exit with nonzero status of 255.* * at org.ap

JobTracker client - max connections

2013-03-05 Thread Amit Sela
Hi all, I'm implementing an API over the JobTracker client - JobClient. My plan is to have a pool of JobClient objects that will expose the ability to submit jobs, poll status etc. My question is: Should I set a maximum pool size ? How many connections aree too many connection for the JobTracker

Re: Generic output key class

2013-02-10 Thread Amit Sela
; text.writeFields(out); > } else { > integer.writeFields(out); > } > } > > [... readFields method that works in a similar way] > } > > -Sandy > > On Sun, Feb 10, 2013 at 4:00 AM, Amit Sela wrote: > >> Hi all, >> >> Has anyo

Generic output key class

2013-02-10 Thread Amit Sela
Hi all, Has anyone ever used some kind of a "generic output key" for a mapreduce job ? I have a job running multiple tasks and I want them to be able to use both Text and IntWritable as output key classes. Any suggestions ? Thanks, Amit.

Re: Submitting MapReduce job from remote server using JobClient

2013-01-27 Thread Amit Sela
d not on the cluster. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > -- > *From: * Amit Sela > *Date: *Thu, 24 Jan 2013 18:15:49 +0200 > *To: * > *ReplyTo: * user@hadoop.apache.org > *Subject: *Re: Submitting Ma

Re: Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread Amit Sela
t; you're looking for. > > On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela wrote: > > Hi all, > > > > I want to run a MapReduce job using the Hadoop Java api from my analytics > > server. It is not the master or even a data node but it has the same > Hadoop >

Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread Amit Sela
Hi all, I want to run a MapReduce job using the Hadoop Java api from my analytics server. It is not the master or even a data node but it has the same Hadoop installation as all the nodes in the cluster. I tried using JobClient.runJob() but it accepts JobConf as argument and when using JobConf it

Using JCUDA with MapReduce

2013-01-20 Thread Amit Sela
Hi all, I was wondering if anyone here tried using the GPU of a Hadoop Node to enhance MapReduce processing ? I read about it but it always comes down to heavy computations such as Matrix multiplications and Mote Carlo algorithms. Did anyone try it with MapReduce jobs that analyze logs or any ot

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
E-4451, has been resolved for 1.2.0. > > On Tue, Nov 27, 2012 at 3:20 PM, Amit Sela wrote: > > Hi Jon, > > > > I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4 > > and I haven't noticed any performance issues. By &qu

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
Hi Jon, I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4 and I haven't noticed any performance issues. By "multiple assignment feature" do you mean speculative execution (mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution) ? On Mon, Nov

Facebook corona compatibility

2012-11-12 Thread Amit Sela
Hi everyone, Anyone knows if the new corona tools (Facebook just released as open source) are compatible with hadoop 1.0.x ? or just 0.20.x ? Thanks.

HDFS upgrade

2012-10-17 Thread Amit Sela
Hi all, I want to upgrade a 1TB cluster from hadoop 0.20.3 to hadoop 1.0.3. I am interested to know how long does the hdfs upgrade take and in general how long it takes from deploying new versions until the cluster is back to running heavy MapReduce ? I'd also appreciate it if someone could elab

HDFS upgrade

2012-10-17 Thread Amit Sela
Hi all, I want to upgrade a 1TB cluster from hadoop 0.20.3 to hadoop 1.0.3. I am interested to know how long does the hdfs upgrade take and in general how long it takes from deploying new versions until the cluster is back to running heavy MapReduce ? I'd also appreciate it if someone could elab