Hi guys I was able to get this to work after using bigger VMs for data nodes; however now the bigger problem I am facing is after my MR job completes successfully I am not seeing any rows loaded in my table (count shows 0 both via phoenix and hbase)
Am I missing something simple ? Thanks Gaurav On 12 September 2015 at 11:16, Gabriel Reid <gabriel.r...@gmail.com> wrote: > Around 1400 mappers sounds about normal to me -- I assume your block > size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of > input. > > To add to what Krishna asked, can you be a bit more specific on what > you're seeing (in log files or elsewhere) which leads you to believe > the data nodes are running out of capacity? Are map tasks failing? > > If this is indeed a capacity issue, one thing you should ensure is > that map output comression is enabled. This doc from Cloudera explains > this (and the same information applies whether you're using CDH or > not) - > http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_23_3.html > > In any case, apart from that there isn't any basic thing that you're > probably missing, so any additional information that you can supply > about what you're running into would be useful. > > - Gabriel > > > On Sat, Sep 12, 2015 at 2:17 AM, Krishna <research...@gmail.com> wrote: > > 1400 mappers on 9 nodes is about 155 mappers per datanode which sounds > high > > to me. There are very few specifics in your mail. Are you using YARN? Can > > you provide details like table structure, # of rows & columns, etc. Do > you > > have an error stack? > > > > > > On Friday, September 11, 2015, Gaurav Kanade <gaurav.kan...@gmail.com> > > wrote: > >> > >> Hi All > >> > >> I am new to Apache Phoenix (and relatively new to MR in general) but I > am > >> trying a bulk insert of a 200GB tar separated file in an HBase table. > This > >> seems to start off fine and kicks off about ~1400 mappers and 9 > reducers (I > >> have 9 data nodes in my setup). > >> > >> At some point I seem to be running into problems with this process as it > >> seems the data nodes run out of capacity (from what I can see my data > nodes > >> have 400GB local space). It does seem that certain reducers eat up most > of > >> the capacity on these - thus slowing down the process to a crawl and > >> ultimately leading to Node Managers complaining that Node Health is bad > >> (log-dirs and local-dirs are bad) > >> > >> Is there some inherent setting I am missing that I need to set up for > the > >> particular job ? > >> > >> Any pointers would be appreciated > >> > >> Thanks > >> > >> -- > >> Gaurav Kanade, > >> Software Engineer > >> Big Data > >> Cloud and Enterprise Division > >> Microsoft > -- Gaurav Kanade, Software Engineer Big Data Cloud and Enterprise Division Microsoft