Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Jean-Daniel Cryans
Fifth offense.

Yuan Jin is out of the office. - I will be out of the office starting
06/22/2012 and will not return until 06/25/2012. I am out of
Jun 21

Yuan Jin is out of the office. - I will be out of the office starting
04/13/2012 and will not return until 04/16/2012. I am out of
Apr 12

Yuan Jin is out of the office. - I will be out of the office starting
04/02/2012 and will not return until 04/05/2012. I am out of
Apr 2

Yuan Jin is out of the office. - I will be out of the office starting
02/17/2012 and will not return until 02/20/2012. I am out of
Feb 16


On Mon, Jul 23, 2012 at 1:09 PM, Yuan Jin jiny...@cn.ibm.com wrote:


 I am out of the office until 07/25/2012.

 I am out of office.

 For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM)
 For CFM related things, you can contact Daniel(Liang SH Su/China/Contr/IBM)
 For TMB related things, you can contact Flora(Jun Ying Li/China/IBM)
 For TWB related things, you can contact Kim(Yuan SH Jin/China/IBM)
 For others, I will reply you when I am back.


 Note: This is an automated response to your message  Reducer
 MapFileOutpuFormat sent on 24/07/2012 4:09:51.

 This is the only notification you will receive while this person is away.


Re: Doubt from the book Definitive Guide

2012-04-05 Thread Jean-Daniel Cryans
On Thu, Apr 5, 2012 at 7:03 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Only advantage I was thinking of was that in some cases reducers might be
 able to take advantage of data locality and avoid multiple HTTP calls, no?
 Data is anyways written, so last merged file could go on HDFS instead of
 local disk.
 I am new to hadoop so just asking question to understand the rational
 behind using local disk for final output.

So basically it's a tradeoff here, you get more replicas to copy from
but you have 2 more copies to write. Considering that that data's very
short lived and that it doesn't need to be replicated (since if the
machine fails the maps are replayed anyway) it seems that writing 2
replicas that are potentially unused would be hurtful.

Regarding locality, it might make sense on a small cluster but the
more you add nodes the smaller the chance to have local replicas for
each blocks of data you're looking for.

J-D


Re: Fairscheduler - disable default pool

2012-03-13 Thread Jean-Daniel Cryans
We do it here by setting this:

poolMaxJobsDefault0/poolMaxJobsDefault

So that you _must_ have a pool (that's configured with a different
maxRunningJobs) in order to run jobs.

Hope this helps,

J-D

On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com wrote:
 I know that by design all unmarked jobs goes to that pool, however I am
 doing some testing and I am interested if is possible to disable it..

 Thanks


Re: Regarding Parrallel Iron's claim

2011-12-08 Thread Jean-Daniel Cryans
Isn't that old news?

http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/

Googling around, doesn't seem anything happened after that.

J-D

On Thu, Dec 8, 2011 at 6:52 PM, JS Jang jsja...@gmail.com wrote:
 Hi,

 Does anyone know any discussion in Apache Hadoop regarding the claim by
 Parrallel Iron with their patent against use of HDFS?
 Thanks in advance.

 Regards,
 JS




Re: Regarding Parrallel Iron's claim

2011-12-08 Thread Jean-Daniel Cryans
You could just look at the archives:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/

It is also indexed by all search engines.

J-D

On Thu, Dec 8, 2011 at 7:44 PM, JS Jang jsja...@gmail.com wrote:
 I appreciate your help, J-D.
 Yes, I wondered whether there was any update since or previous discussion
 within Apache Hadoop as I am new in this mailing list.


 On 12/9/11 12:19 PM, Jean-Daniel Cryans wrote:

 Isn't that old news?

 http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/

 Googling around, doesn't seem anything happened after that.

 J-D

 On Thu, Dec 8, 2011 at 6:52 PM, JS Jangjsja...@gmail.com  wrote:

 Hi,

 Does anyone know any discussion in Apache Hadoop regarding the claim by
 Parrallel Iron with their patent against use of HDFS?
 Thanks in advance.

 Regards,
 JS




 --
 
 장정식 / jsj...@gruter.com
 (주)그루터, RD팀 수석
 www.gruter.com
 Cloud, Search and Social
 



Re: Hadoop 0.21

2011-12-06 Thread Jean-Daniel Cryans
Yep.

J-D

On Tue, Dec 6, 2011 at 10:41 AM, Saurabh Sehgal saurabh@gmail.com wrote:
 Hi All,

 According to the Hadoop release notes, version 0.21.0 should not be
 considered stable or suitable for production:

 23 August, 2010: release 0.21.0 available
 This release contains many improvements, new features, bug fixes and
 optimizations. It has not undergone testing at scale and should not be
 considered stable or suitable for production. This release is being
 classified as a minor release, which means that it should be API
 compatible with 0.20.2.


 Is this still the case ?

 Thank you,

 Saurabh


Re: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread Jean-Daniel Cryans
For the record, this thread was started from another discussion in
user@hbase. 0.20.205 does work with HBase 0.90.4, I think the OP was a
little too quick saying it doesn't.

J-D

On Tue, Dec 6, 2011 at 11:44 AM,  jcfol...@pureperfect.com wrote:

 Sadly, CDH3 is not an option although I wish it was. I need to get an
 official release of HBase from apache to work.

 I've tried every version of HBase 0.89 and up with 0.20.205 and all of
 them throw EOFExceptions. Which version of Hadoop core should I be
 using? HBase 0.94 ships with a 20-append version which doesn't work
 throws an EOFException, but when I tried replacing it with the
 hadoop-core included with hadoop 0.20.205 I still get the same
 exception.

 Thanks


   Original Message 
  Subject: Re: Version of Hadoop That Will Work With HBase?
  From: Harsh J ha...@cloudera.com
  Date: Tue, December 06, 2011 2:32 pm
  To: common-user@hadoop.apache.org

  0.20.205 should work, and so should CDH3 or 0.20-append branch builds
  (no longer maintained, after 0.20.205 replaced it though).

  What problem are you facing? Have you ensured HBase does not have a
  bad hadoop version jar in its lib/?

  On Wed, Dec 7, 2011 at 12:55 AM, jcfol...@pureperfect.com wrote:
  
  
   Hi,
  
  
   Can someone please tell me which versions of hadoop contain the
   20-appender code and will work with HBase? According to the Hbase
 docs
   (http://hbase.apache.org/book/hadoop.html), Hadoop 0.20.205 should
 work
   with HBase but it does not appear to.
  
  
   Thanks!
  



  --
  Harsh J



Re: Adjusting column value size.

2011-10-06 Thread Jean-Daniel Cryans
(BCC'd common-user@ since this seems strictly HBase related)

Interesting question... And you probably need all those ints at the same
time right? No streaming? I'll assume no.

So the second solution seems better due to the overhead of storing each
cell. Basically, storing one int per cell you would end up storing more keys
than values (size wise).

Another thing is that if you pack enough ints together and there's some sort
of repetition, you might be able to use LZO compression on that table.

I'd love to hear about your experimentations once you've done them.

J-D

On Mon, Oct 3, 2011 at 10:58 PM, edward choi mp2...@gmail.com wrote:

 Hi,

 I have a question regarding the performance and column value size.
 I need to store per row several million integers. (Several million is
 important here)
 I was wondering which method would be more beneficial performance wise.

 1) Store each integer to a single column so that when a row is called,
 several million columns will also be called. And the user would map each
 column values to some kind of container (ex: vector, arrayList)
 2) Store, for example, a thousand integers into a single column (by
 concatenating them) so that when a row is called, only several thousand
 columns will be called along. The user would have to split the column value
 into 4 bytes and map the split integer to some kind of container (ex:
 vector, arrayList)

 I am curious which approach would be better. 1) would call several millions
 of columns but no additional process is needed. 2) would call only several
 thousands of columns but additional process is needed.
 Any advice would be appreciated.

 Ed



Re: Using HBase for real time transaction

2011-09-21 Thread Jean-Daniel Cryans
On Wed, Sep 21, 2011 at 8:36 AM, Jignesh Patel jign...@websoft.com wrote:
  I am not looking for relational database. But looking creating multi tenant 
 database, now at this time I am not sure whether it needs transactions or not 
 and even that kind of architecture can support transactions.

Currently in HBase nothing prevents you from having multiple tenants,
as long as they have different table names. Also keep in mind that
there's no security implemented, but it *might* make it for 0.92
(crossing fingers).

 Row mutations in HBase are seen by the user as soon as they are done,
 atomicity is guaranteed at the row level, which seems to satisfy his
 requirement. If multi-row transactions are needed then I agree HBase
 might not be what he wants.

 Can't we handle transaction through application or container, before data 
 even goes to HBase?

Sure, you could do something like what Megastore[1] does, but you
really need to evaluate your needs and see if that works.


 And I do have one more doubt, how to handle low read latency?


HBase offers that out of the box, a more precise question would be
what 99th percentile read latency you need. Just for the sake of
giving a data point, right now our 99p is 20ms but that's with our
type of workload, machines, front end caching, etc, so YYMV.

J-D

1. Megastore (transactions are described in chapter 3.3):
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Re: Using HBase for real time transaction

2011-09-20 Thread Jean-Daniel Cryans
While HBase isn't ACID-compliant, it does have have some guarantees:

http://hbase.apache.org/acid-semantics.html

J-D

On Tue, Sep 20, 2011 at 2:56 PM, Michael Segel
michael_se...@hotmail.com wrote:

 Since Tom isn't technical... ;-)

 The short answer is No.
 HBase is not capable of being a transactional because it doesn't support 
 transactions.
 Nor is HBase ACID compliant.

 Having said that, yes you can use HBase to serve data in real time.

 HTH

 -Mike


 Subject: Re: Using HBase for real time transaction
 From: jign...@websoft.com
 Date: Tue, 20 Sep 2011 17:25:17 -0400
 To: common-user@hadoop.apache.org

 Tom,
 Let me reword: can HBase be used as a transactional database(i.e. in 
 replacement of mysql)?

 The requirement is to have real time read and write operations. I mean as 
 soon as data is written the user should see the data(Here data should be 
 written in Hbase).

 -Jignesh


 On Sep 20, 2011, at 5:11 PM, Tom Deutsch wrote:

  Real-time means different things to different people. Can you share your
  latency requirements from the time the data is generated to when it needs
  to be consumed, or how you are thinking of using Hbase in the overall
  flow?
 
  
  Tom Deutsch
  Program Director
  CTO Office: Information Management
  Hadoop Product Manager / Customer Exec
  IBM
  3565 Harbor Blvd
  Costa Mesa, CA 92626-1420
  tdeut...@us.ibm.com
 
 
 
 
  Jignesh Patel jign...@websoft.com
  09/20/2011 12:57 PM
  Please respond to
  common-user@hadoop.apache.org
 
 
  To
  common-user@hadoop.apache.org
  cc
 
  Subject
  Using HBase for real time transaction
 
 
 
 
 
 
  We are exploring possibility of using HBase for the real time
  transactions. Is that possible?
 
  -Jignesh
 




Re: Using HBase for real time transaction

2011-09-20 Thread Jean-Daniel Cryans
 I think there has to be some clarification.

 The OP was asking about a mySQL replacement.
 HBase will never be a RDBMS replacement.  No Transactions means no way of 
 doing OLTP.
 Its the wrong tool for that type of work.

Agreed, if you are looking to handle relational data in a relational
fashion, might be better to look elsewhere

 Recognize what HBase is and what it is not.

Not sure what you're referring to here.

 This doesn't mean you can't take in or deliver data in real time, it can.
 So if you want to use it in a real time manner, sure. Note that like with 
 other databases, you will have to do some work to handle real time data.
 I guess you would have to provide a specific use case on what you want to 
 achieve in order to know if its a good fit.

He says:

 The requirement is to have real time read and write operations. I mean as 
 soon as data is written the user should see the data(Here data should be 
 written in Hbase).

Row mutations in HBase are seen by the user as soon as they are done,
atomicity is guaranteed at the row level, which seems to satisfy his
requirement. If multi-row transactions are needed then I agree HBase
might not be what he wants.

J-D


Re: Hadoop in Canada

2011-03-29 Thread Jean-Daniel Cryans
(moving to general@ since this is not a question regarding the usage
of the hadoop commons, which I BCC'd)

I moved from Montreal to SF a year and a half ago because I saw two
things 1) companies weren't interested (they are still trying to get
rid of COBOL or worse) or didn't have the data to use Hadoop (not
enough big companies) and 2) the universities were either uninterested
or just amused by this new comer. I know of one company that really
does cool stuff with Hadoop in Montreal and it's Hopper
(www.hopper.travel, they are still in closed alpha AFAIK) who also
organized hackreduce.org last weekend. This is what their CEO has to
say to the question Is there something you would do differently now
if you would start it over?:

Move to the Valley.

(see the rest here
http://nextmontreal.com/product-market-fit-hopper-travel-fred-lalonde/)

I'm sure there are a lot of other companies that are either
considering using or already using Hadoop to some extent in Canada
but, like anything else, only a portion of them are interested in
talking about it or even organizing an event.

I would actually love to see something getting organized and I'd be on
the first plane to Y**, but I'm afraid that to achieve any sort of
critical mass you'd have to fly in people from all the provinces. Air
Canada becomes a SPOF :P

Now that I think about it, there's probably enough Canucks around here
that use Hadoop that we could have our own little user group. If you
want to have a nice vacation and geek out with us, feel free to stop
by and say hi.

/rant

J-D

On Tue, Mar 29, 2011 at 6:21 AM, James Seigel ja...@tynt.com wrote:
 Hello,

 You might remember me from a couple of weeks back asking if there were any 
 Calgary people interested in a “meetup” about #bigdata or using hadoop.  
 Well, I’ve expanded my search a little to see if any of my Canadian brothers 
 and sisters are using the elephant for good or for evil.  It might be harder 
 to grab coffee, but it would be fun to see where everyone is.

 Shout out if you’d like or ping me, I think it’d be fun to chat!

 Cheers
 James Seigel
 Captain Hammer at Tynt.com


Re: google snappy

2011-03-23 Thread Jean-Daniel Cryans
(Please don't cross-post like that, it only adds confusion. I put
everything in bcc and posted to general instead)

Their README says the following:

Snappy usually is faster than algorithms in the same class (e.g. LZO,
LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression
ratios.

Somebody obviously needs to publish some benchmarks, but knowing
Snappy's origin I can believe that claim.

Relevant jiras:

HADOOP-7206 Integrate Snappy compression
HBASE-3691   Add compressor support for 'snappy', google's compressor

J-D

On Wed, Mar 23, 2011 at 9:52 AM, Weishung Chung weish...@gmail.com wrote:
 Hey my fellow hadoop/hbase developers,

 I just came across this google compression/decompression package yesterday,
 could we make a good use of this compression scheme in hadoop? It's written
 in C++ though.

 http://code.google.com/p/snappy/

 http://code.google.com/p/snappy/I haven't looked close into this snappy
 package yet but i would love to know about the differences compared to LZO.

 Thank you,
 Wei Shung



Re: HBase crashes when one server goes down

2011-02-14 Thread Jean-Daniel Cryans
Please use the hbase mailing list for HBase-related questions.

Regarding your issue, we'll need more information to help you out.
Haven you checked the logs? If you see exceptions in there, did you
google them trying to figure out what's going on?

Finally, does your setup meet all the requirements?
http://hbase.apache.org/notsoquick.html#requirements

J-D

On Mon, Feb 14, 2011 at 9:49 AM, Rodrigo Barreto rodbarr...@gmail.com wrote:
 Hi,

 We are new with Hadoop, we have just configured a cluster with 3 servers and
 everything is working ok except when one server goes down, the Hadoop / HDFS
 continues working but the HBase stops, the queries does not return results
 until we restart the HBase. The HBase configuration is copied bellow, please
 help us.

 ## HBASE-SITE.XML ###

 configuration
        property
                namehbase.zookeeper.quorum/name
                valuemaster,slave1,slave2/value
                descriptionThe directory shared by region servers.
                /description
        /property
        property
                namehbase.rootdir/name
                valuehdfs://master:54310/hbase/value
        /property
        property
                namehbase.cluster.distributed/name
                valuetrue/value
        /property
        property
                namehbase.master/name
                valuemaster:6/value
                descriptionThe host and port that the HBase master runs
 at.
                /description
        /property

        property
                namedfs.replication/name
                value2/value
                descriptionDefault block replication.
                The actual number of replications can be specified when the
 file is created.
                The default is used if replication is not specified in
 create time.
                /description
        /property
 /configuration


 Thanks,

 Rodrigo Barreto.



Re: User History Location

2011-02-11 Thread Jean-Daniel Cryans
For cloudera-related questions, please use their mailing lists.

J-D

2011/2/11 Alexander Schätzle schae...@informatik.uni-freiburg.de:
 Hello,

 I'm a little bit confused about the right key for specifying the User
 History Location in CDH3B3 (which is Hadoop 0.20.2+737). Could anybody
 please give me a short answer which key is the right one and which
 configuration file is the right one to place the key?

 1) mapreduce.job.userhistorylocation ?
 2) hadoop.job.history.user.location ?

 Is the mapred-site.xml the right config-file for this key?

 Thx a lot!

 Best regards,

 Alexander Schätzle
 University of Freiburg, Germany



Re: State of high availability in Hadoop 0.20.1

2010-06-24 Thread Jean-Daniel Cryans
It's the same.

J-D

On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote:
 Just to clarify, I mean the NameNode high availability.

 Regards.

 On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 What is the state of high-availability in Hadoop 0.20.1?

 In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in
 0.20.1?

 Regards.




Re: State of high availability in Hadoop 0.20.1

2010-06-24 Thread Jean-Daniel Cryans
The Backup Namenode will be in 0.21 but it's not a complete NN HA
solution (far from that):

https://issues.apache.org/jira/browse/HADOOP-4539

Dhruba at Facebook has a AvatarNode for 0.20:

https://issues.apache.org/jira/browse/HDFS-976

And the umbrella issue for NN availability is:

https://issues.apache.org/jira/browse/HDFS-1064

J-D

On Thu, Jun 24, 2010 at 10:10 AM, Stas Oskin stas.os...@gmail.com wrote:
 Hi.

 The check-point node is expected to be included in 0.21?

 Regards.

 On Thu, Jun 24, 2010 at 7:47 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 It's the same.

 J-D

 On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote:
  Just to clarify, I mean the NameNode high availability.
 
  Regards.
 
  On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com
 wrote:
 
  Hi.
 
  What is the state of high-availability in Hadoop 0.20.1?
 
  In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in
  0.20.1?
 
  Regards.
 
 




Re: Hbase Hive

2010-04-30 Thread Jean-Daniel Cryans
Inline (and added hbase-user to the recipients).

J-D

On Thu, Apr 29, 2010 at 9:23 PM, Amit Kumar amkumar@gmail.com wrote:
 Hi Everyone,

 I want to ask about Hbase and Hive.

 Q1 Is there any dialect available which can be used with Hibernate to
 create persistence with Hbase. Has somebody written one. I came across HBql
 at
       www.hbql.com. Can this be used to create a dialect for Hbase?

HBQL queries HBase directly, but it's not SQL-compliant and doesn't
feature relational keywords (since HBase doesn't support them, JOINs
don't scale). I don't know if anybody tried integrating HBQL in
Hibernate... it's still a very young project.


 Q2  Once the data is in there in Hbase. In this link I found that it can be
 used with Hive ( https://issues.apache.org/jira/browse/HIVE-705 ). So the
 question is is it safe enough to use the below architecture for application
 Hibernate -- Dialect for Hbase -- Hbase -- query from Hbase using Hive to
 use MapReduce effectively.

Hive goes on top of HBase, so you can use its query language to mine
HBase tables. Be aware that a MapReduce job isn't meant for live
queries, so issuing them from Hibernate doesn't make much sense...
unless you meant something else and this which case please do give
more details.


 Thanks  Regards
 Amit Kumar



Re: DFS too busy/down? while writing back to HDFS.

2010-04-05 Thread Jean-Daniel Cryans
Look at your datanode logs around the same time. You probably either have this

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

or that

http://wiki.apache.org/hadoop/Hbase/FAQ#A6

Also you see to be putting a fair number of regions on those region
servers judging by the metrics, do consider setting HBASE_HEAP higher
than 1GB in conf/hbase-env.sh

J-D

On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 greetings,

        while I was importing data into my HBase Cluster, I found one
 regionserver is down, and by check the log, I found following exceptoin:
 *EOFException*(during HBase flush memstore to HDFS file? not sure)

        seems that it's caused by DFSClient not working, I don't know the
 exact reason, maybe it's caused by the heavy load on the machine where the
 datanode is residing on, or the disk is full. but I am not sure which DFS
 node should I check.
        has anybody met the same problem? any pointer or hint is
 appreciated.

       The log is as follows:


 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 Blocking updates for 'IPC Server handler 20 on 60020' on region
 hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m
 size
 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 34; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store
 size is 2.9m
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 5 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438,
 entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 35; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store
 size is 2.9m
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130,
 entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005,
 entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,866 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 started.  Attempting to free 20853136 bytes
 2010-04-06 03:04:37,010 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 completed. Freed 20866928 bytes.  Priority Sizes: Single=17.422821MB
 (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0)
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-6935524980745310745_1391901
 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 36; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store
 size is 2.9m
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.*EOFException*
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_2467598422201289982_1391902
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-2065206049437531800_1391902
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-3059563223628992257_1391902
 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient: 

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Jean-Daniel Cryans
Bringing the discussion in hbase-user

That usually happens after a DNS hiccup. There's a fix for that in
https://issues.apache.org/jira/browse/HBASE-2174

J-D

On Wed, Mar 10, 2010 at 1:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 I noticed two lines for the same region server on HBase Master page:
 X.com:60030    1268160765854    requests=0, regions=16, usedHeap=1068,
 maxHeap=6127
 X.com:60030    1268250726442    requests=21, regions=9, usedHeap=1258,
 maxHeap=6127

 I checked there is only one
 org.apache.hadoop.hbase.regionserver.HRegionServer instance running on that
 machine.

 This is from region server log:

 2010-03-10 13:25:38,157 ERROR [IPC Server handler 43 on 60020]
 regionserver.HRegionServer(844):
 org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2010-03-10 13:25:38,189 ERROR [IPC Server handler 0 on 60020]
 regionserver.HRegionServer(844):
 org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 If you know how to troubleshoot, please share.



Re: should data be evenly distributed to each (physical) node

2010-03-04 Thread Jean-Daniel Cryans
There's nothing like reading the manual:
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps

Quote:

For the common case, when the replication factor is three, HDFS’s
placement policy is to put one replica on one node in the local rack,
another on a different node in the local rack, and the last on a
different node in a different rack. 

So if you write the data from only 1 machine, every block will have 1
replica on that machine (although you can run the balancer
afterwards).

J-D

On Thu, Mar 4, 2010 at 7:25 AM, openresearch
qiming...@openresearchinc.com wrote:

 I am building a small two node cluster following
 http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

 Every thing seems to be working, except I notice the data are NOT evenly
 distributed to each physical box.
 e.g., when I hadoop dfs -put 6G data. I am expecting ~3G on each node
 (take turns every ~64MB), however, I checked dfshealth.jsp and du -k on
 local box, and found the uploaded data are ONLY residing on the physical box
 where I start dfs -put. That defeats the whole (data locality) purpose of
 hadoop?!

 Please help.

 Thanks

 --
 View this message in context: 
 http://old.nabble.com/should-data-be-evenly-distributed-to-each-%28physical%29-node-tp27782215p27782215.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Hbase VS Hive

2010-03-03 Thread Jean-Daniel Cryans
HBase is used to do random reads on files stored in Hadoop, among
other things. It's really a database.

Hive is a data warehousing infrastructure built on top of Hadoop and
will even soon work on top of HBase too.

J-D

On Wed, Mar 3, 2010 at 9:42 AM, Fitrah Elly Firdaus
fitrah.fird...@gmail.com wrote:
 Hello Everyone

 I want to ask about Hbase and Hive.

 What is the different between Hbase and Hive? and then what is the
 consideration for
 choose between Hbase or Hive?


 Kind regards




Re: problem regarding hadoop

2010-01-15 Thread Jean-Daniel Cryans
There seems to be a mismatch between the hbase versions you are using.
In particular, there is a known bug when using hbase 0.20.0 with
0.20.1 and 0.20.2. The best is to just upgrade to 0.20.2

J-D

On Thu, Jan 14, 2010 at 12:11 AM, Muhammad Mudassar
mudassa...@gmail.com wrote:
 Basically I am trying to create table in Hbase by using *hbaseAdmin* by
 using a java programe but i am getting trouble however table is created but
 it does not store anything in it when i use *batchUpdate.put* to insert
 anything in it the exception shown in ide is

 Exception in thread main java.lang.reflect.UndeclaredThrowableException
        at $Proxy1.getRegionInfo(Unknown Source)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:795)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:515)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:515)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
        at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
        at
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:159)

 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
 java.lang.NoSuchMethodException:
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRow([B)
        at java.lang.Class.getMethod(Class.java:1605)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:627)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)

        at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:701)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
        ... 14 more
 Java Result: 1

 when i checked logs of hbase master

 On Wed, Jan 13, 2010 at 10:37 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 This is probably a question better for common-user rather than hbase.

 But to answer your problem, your JobTracker is able to talk to your
 Namenode but there's something wrong with the Datanode, your should
 grep its log for any exception.

 J-D

 On Wed, Jan 13, 2010 at 3:11 AM, Muhammad Mudassar mudassa...@gmail.com
 wrote:
  hi i am running hadoop 0.20.1 on single node and  i am getting some
 problem
  My hdfs-site configurations are
  configuration
  property
     namedfs.replication/name
     value1/value
   /property
  property
   namehadoop.tmp.dir/name
   value/home/hadoop/Desktop/hadoop-store/hadoop-$hadoop/value
   descriptionA base for other temporary directories./description
  /property
  /configuration
 
 
  and core site configurations are
  configuration
   property
     namefs.default.name/name
     valuehdfs://localhost:54310/value
   /property
  property
   namehadoop.tmp.dir/name
   value/home/hadoop/Desktop/hadoop-store/hadoop-$hadoop/value
   descriptionA base for other temporary directories./description
  /property
  /configuration
 
 
  the problem is with jobtracker log file says that
 
  2010-01-13 16:00:33,015 INFO org.apache.hadoop.mapred.JobTracker:
 Scheduler
  configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
  limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
  2010-01-13 16:00:33,043 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
  Initializing RPC Metrics with hostName=JobTracker, port=54311
  2010-01-13 16:00:38,309 INFO org.mortbay.log: Logging to
  org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
  org.mortbay.log.Slf4jLog
  2010-01-13 16:00:38,407 INFO org.apache.hadoop.http.HttpServer: Port
  returned by webServer.getConnectors()[0].getLocalPort() before open() is
 -1.
  Opening the listener on 50030
  2010-01-13 16:00:38,408 INFO org.apache.hadoop.http.HttpServer:
  listener.getLocalPort() returned 50030
  webServer.getConnectors()[0].getLocalPort() returned 50030
  2010-01-13 16:00:38,408 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound
  to port 50030
  2010-01-13 16:00:38,408 INFO org.mortbay.log: jetty-6.1.14
  2010-01-13 16:00:51,429 INFO org.mortbay.log: Started
  selectchannelconnec...@0.0.0.0:50030
  2010-01-13 16:00:51,430 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
  Initializing JVM Metrics with processName=JobTracker, sessionId=
  2010-01-13 16:00:51,431 INFO org.apache.hadoop.mapred.JobTracker:
 JobTracker
  up

Re: problem regarding hadoop

2010-01-13 Thread Jean-Daniel Cryans
This is probably a question better for common-user rather than hbase.

But to answer your problem, your JobTracker is able to talk to your
Namenode but there's something wrong with the Datanode, your should
grep its log for any exception.

J-D

On Wed, Jan 13, 2010 at 3:11 AM, Muhammad Mudassar mudassa...@gmail.com wrote:
 hi i am running hadoop 0.20.1 on single node and  i am getting some problem
 My hdfs-site configurations are
 configuration
 property
    namedfs.replication/name
    value1/value
  /property
 property
  namehadoop.tmp.dir/name
  value/home/hadoop/Desktop/hadoop-store/hadoop-$hadoop/value
  descriptionA base for other temporary directories./description
 /property
 /configuration


 and core site configurations are
 configuration
  property
    namefs.default.name/name
    valuehdfs://localhost:54310/value
  /property
 property
  namehadoop.tmp.dir/name
  value/home/hadoop/Desktop/hadoop-store/hadoop-$hadoop/value
  descriptionA base for other temporary directories./description
 /property
 /configuration


 the problem is with jobtracker log file says that

 2010-01-13 16:00:33,015 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
 configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
 2010-01-13 16:00:33,043 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=JobTracker, port=54311
 2010-01-13 16:00:38,309 INFO org.mortbay.log: Logging to
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
 org.mortbay.log.Slf4jLog
 2010-01-13 16:00:38,407 INFO org.apache.hadoop.http.HttpServer: Port
 returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
 Opening the listener on 50030
 2010-01-13 16:00:38,408 INFO org.apache.hadoop.http.HttpServer:
 listener.getLocalPort() returned 50030
 webServer.getConnectors()[0].getLocalPort() returned 50030
 2010-01-13 16:00:38,408 INFO org.apache.hadoop.http.HttpServer: Jetty bound
 to port 50030
 2010-01-13 16:00:38,408 INFO org.mortbay.log: jetty-6.1.14
 2010-01-13 16:00:51,429 INFO org.mortbay.log: Started
 selectchannelconnec...@0.0.0.0:50030
 2010-01-13 16:00:51,430 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=JobTracker, sessionId=
 2010-01-13 16:00:51,431 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
 up at: 54311
 2010-01-13 16:00:51,431 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
 webserver: 50030
 2010-01-13 16:00:51,574 INFO org.apache.hadoop.mapred.JobTracker: Cleaning
 up the system directory
 2010-01-13 16:00:51,643 INFO
 org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store is
 inactive
 2010-01-13 16:00:51,674 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
 Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /home/hadoop/Desktop/hadoop-store/hadoop-$hadoop/mapred/system/
 jobtracker.info could only be replicated to 0 nodes, instead of 1
    at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
    at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

    at org.apache.hadoop.ipc.Client.call(Client.java:739)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy4.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy4.addBlock(Unknown Source)
    at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2904)
    at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
    at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
    at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

 2*010-01-13 16:00:51,674 WARN 

SF HBase User Group Meetup Jan. 27th @ StumbleUpon

2010-01-06 Thread Jean-Daniel Cryans
Hi all,

This year's first San Francisco HBase User Group meetup takes place on
January 27th at StumbleUpon. The first talk will be about the upcoming
versions, others to be announced.

RSVP at: http://su.pr/6Cldz7

See you there!

J-D


Re: error while using ArrayWritable

2010-01-01 Thread Jean-Daniel Cryans
This is explained in the javadoc:

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/io/ArrayWritable.html

J-D

On Fri, Jan 1, 2010 at 11:29 PM, bharath vissapragada
bhara...@students.iiit.ac.in wrote:
 Hi all ,

 I am using ArrayWritable in my MR job .

 Map outputs Text ,ArrayWritable

 Reduce takes Text,IterableArrayWritable

 The moment Iam trying to use the ArrayWritable in reduce using the iterator
 ..I get the following error :

 10/01/02 18:23:41 WARN mapred.LocalJobRunner: job_local_0001
 java.lang.RuntimeException: java.lang.NoSuchMethodException:
 org.apache.hadoop.io.ArrayWritable.init()
    at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
    at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:62)
    at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
    at
 org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:940)
    at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:880)
    at
 org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:237)
    at
 org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:233)
    at HashJoin.MR_hash$redu.reduce(MR_hash.java:132)
    at mtr.MyTableReduce.reduce(MyTableReduce.java:1)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
    at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
 Caused by: java.lang.NoSuchMethodException:
 org.apache.hadoop.io.ArrayWritable.init()
    at java.lang.Class.getConstructor0(Class.java:2723)
    at java.lang.Class.getDeclaredConstructor(Class.java:2002)
    at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:75)
    ... 10 more
 java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at HashJoin.MR_hash.run(MR_hash.java:294)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at HashJoin.MR_hash.join(MR_hash.java:320)
    at HashJoin.MR_hash.main(MR_hash.java:338)


 Can Anyone tell me what the error is ??



Re: Hey Cloudera can you help us In beating Google Yahoo Facebook?

2009-10-02 Thread Jean-Daniel Cryans
Stan,

First, this is not the Cloudera mailing list and this is not a dev question.

Also, AFAIK, Google uses Hadoop only to interface with people outside
since MapReduce works the same way.
I think this article is wrong in saying that Google, Yahoo! and
Facebook are using Hadoop via Cloudera and I'm 99% sure of that. They
all have enough expertise to not be dependent on a support contract
and Y! even has it's own distro of Hadoop (tho not supported like
cloudera does). Maybe Leena Rao thought that Cloudera were the only
ones developing Hadoop and took the biggest names out of the PoweredBy
page.

J-D

On Fri, Oct 2, 2009 at 7:02 PM, Smith Stan smiths...@gmail.com wrote:
 Hey Cloudera genius guys .

 I read this

 Via Cloudera, Hadoop is currently used by most of the giants in the
 space including Google, Yahoo, Facebook (we wrote about Facebook’s use
 of Cloudera here), Amazon, AOL, Baidu and more.

 On.
 http://www.techcrunch.com/2009/10/01/hadoop-clusters-get-a-monitoring-client-with-cloudera-desktop/

 if this is true can you guys help us beat Y G and F.

 Is it true that Google uses hadoop?
 Is it true that above mentoned giants use Hadoop via Cloudera?

 Thanks,
 Stan S



Re: Error in running hadoop examples

2009-07-16 Thread Jean-Daniel Cryans
Make sure every machine is able to talk to every other one, especially
if you use hostnames defined in /etc/hosts on the master.

J-D

On Thu, Jul 16, 2009 at 1:04 PM, Pooja Davedavepo...@gmail.com wrote:

 Hi

 I am relatively new to using hadoop.  After installing hadoop on 3 machines
 i tried running the word count example one one of the machines running as a
 single node only. However when i try to tun the word count example using the
 following command on the terminal:

 had...@user5:~$ /home/hadoop/Desktop/hadoop/bin/hadoop jar
 /home/hadoop/Desktop/hadoop/hadoop-0.19.1-examples.jar wordcount gutenberg
 gut-out

 where : hadoop is my user account and gutenberg is where the txt files for
 the word count example are stored and gut-out is where the result is to be
 stored

 it starts the map-reduce however the recduce gets stuck at 0 % even though
 map reaches 100 % and the output on the console is as follows. I need help.
 Have been stuck on this problem since 3 days !

 09/07/16 12:32:01 INFO mapred.FileInputFormat: Total input paths to process
 : 3
 09/07/16 12:32:44 INFO mapred.JobClient: Running job: job_200907161230_0001
 09/07/16 12:32:45 INFO mapred.JobClient:  map 0% reduce 0%
 09/07/16 12:33:33 INFO mapred.JobClient:  map 1% reduce 0%
 09/07/16 12:33:37 INFO mapred.JobClient:  map 3% reduce 0%
 09/07/16 12:33:54 INFO mapred.JobClient:  map 5% reduce 0%
 09/07/16 12:33:57 INFO mapred.JobClient:  map 7% reduce 0%
 09/07/16 12:34:07 INFO mapred.JobClient:  map 9% reduce 0%
 09/07/16 12:34:14 INFO mapred.JobClient:  map 11% reduce 0%
 09/07/16 12:34:21 INFO mapred.JobClient:  map 12% reduce 0%
 09/07/16 12:34:29 INFO mapred.JobClient:  map 14% reduce 0%
 09/07/16 12:34:37 INFO mapred.JobClient:  map 16% reduce 0%
 09/07/16 12:34:44 INFO mapred.JobClient:  map 18% reduce 0%
 09/07/16 12:34:51 INFO mapred.JobClient:  map 20% reduce 0%
 09/07/16 12:34:58 INFO mapred.JobClient:  map 22% reduce 0%
 09/07/16 12:35:09 INFO mapred.JobClient:  map 24% reduce 0%
 09/07/16 12:35:41 INFO mapred.JobClient:  map 25% reduce 0%
 09/07/16 12:36:01 INFO mapred.JobClient:  map 27% reduce 0%
 09/07/16 12:36:10 INFO mapred.JobClient:  map 29% reduce 0%
 09/07/16 12:36:34 INFO mapred.JobClient:  map 31% reduce 0%
 09/07/16 12:36:58 INFO mapred.JobClient:  map 33% reduce 0%
 09/07/16 12:37:08 INFO mapred.JobClient:  map 35% reduce 0%
 09/07/16 12:37:15 INFO mapred.JobClient:  map 37% reduce 0%
 09/07/16 12:37:29 INFO mapred.JobClient:  map 38% reduce 0%
 09/07/16 12:37:31 INFO mapred.JobClient:  map 40% reduce 0%
 09/07/16 12:37:47 INFO mapred.JobClient:  map 42% reduce 0%
 09/07/16 12:37:48 INFO mapred.JobClient:  map 44% reduce 0%
 09/07/16 12:38:04 INFO mapred.JobClient:  map 46% reduce 0%
 09/07/16 12:38:06 INFO mapred.JobClient:  map 48% reduce 0%
 09/07/16 12:38:22 INFO mapred.JobClient:  map 49% reduce 0%
 09/07/16 12:38:23 INFO mapred.JobClient:  map 51% reduce 0%
 09/07/16 12:38:39 INFO mapred.JobClient:  map 53% reduce 0%
 09/07/16 12:38:40 INFO mapred.JobClient:  map 55% reduce 0%
 09/07/16 12:39:17 INFO mapred.JobClient:  map 59% reduce 0%
 09/07/16 12:39:37 INFO mapred.JobClient: Task Id :
 attempt_200907161230_0001_m_00_0, Status : FAILED
 Too many fetch-failures
 09/07/16 12:39:37 WARN mapred.JobClient: Error reading task outputConnection
 refused
 09/07/16 12:39:37 WARN mapred.JobClient: Error reading task outputConnection
 refused
 09/07/16 12:39:43 INFO mapred.JobClient:  map 61% reduce 0%
 09/07/16 12:40:06 INFO mapred.JobClient:  map 64% reduce 0%
 09/07/16 12:40:25 INFO mapred.JobClient:  map 66% reduce 0%
 09/07/16 12:40:27 INFO mapred.JobClient:  map 68% reduce 0%
 09/07/16 12:40:46 INFO mapred.JobClient:  map 70% reduce 0%
 09/07/16 12:40:48 INFO mapred.JobClient:  map 72% reduce 0%
 09/07/16 12:41:06 INFO mapred.JobClient:  map 74% reduce 0%
 09/07/16 12:41:07 INFO mapred.JobClient:  map 75% reduce 0%
 09/07/16 12:41:27 INFO mapred.JobClient:  map 77% reduce 0%
 09/07/16 12:41:28 INFO mapred.JobClient:  map 79% reduce 0%
 09/07/16 12:41:44 INFO mapred.JobClient:  map 81% reduce 0%
 09/07/16 12:41:47 INFO mapred.JobClient:  map 83% reduce 0%
 09/07/16 12:42:03 INFO mapred.JobClient:  map 85% reduce 0%
 09/07/16 12:42:06 INFO mapred.JobClient:  map 87% reduce 0%
 09/07/16 12:42:42 INFO mapred.JobClient:  map 88% reduce 0%
 09/07/16 12:42:45 INFO mapred.JobClient:  map 90% reduce 0%
 09/07/16 12:43:37 INFO mapred.JobClient:  map 92% reduce 0%
 09/07/16 12:43:40 INFO mapred.JobClient:  map 94% reduce 0%
 09/07/16 12:44:30 INFO mapred.JobClient:  map 96% reduce 0%
 09/07/16 12:44:34 INFO mapred.JobClient:  map 98% reduce 0%
 09/07/16 12:45:21 INFO mapred.JobClient:  map 100% reduce 0%
 09/07/16 12:46:27 INFO mapred.JobClient: Task Id :
 attempt_200907161230_0001_m_01_0, Status : FAILED
 Too many fetch-failures
 09/07/16 12:46:27 WARN mapred.JobClient: Error reading task outputConnection
 refused
 09/07/16 12:46:27 WARN mapred.JobClient: Error reading task outputConnection
 refused
 09/07/16