Re: Using Phoenix Bulk Upload CSV to upload 200GB data

Gaurav Kanade Wed, 16 Sep 2015 12:08:03 -0700

Sure, attached below the job counter values. I checked the final status of
the job and it said succeeded. I could not see the import tool exactly
because I ran it overnight and my machine rebooted at some point for some
updates - I wonder if there is some post-processing after the MR job which
might have failed due to this ?


Thanks for the help !
----------------
Logged in as: dr.who
Counters for job_1442389862209_0002
Application Job

   - Overview
   
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/job/job_1442389862209_0002>
   - Counters
   
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/jobcounters/job_1442389862209_0002>
   - Configuration
   
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/conf/job_1442389862209_0002>
   - Map tasks
   
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/tasks/job_1442389862209_0002/m>
   - Reduce tasks
   
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/tasks/job_1442389862209_0002/r>

Tools
Counter Group Counters File System Counters
Name
Map
Reduce
Total
FILE: Number of bytes read
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_BYTES_READ>
1520770904675
2604849340144 4125620244819 FILE: Number of bytes written
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_BYTES_WRITTEN>
3031784709196
2616689890216 5648474599412 FILE: Number of large read operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_LARGE_READ_OPS>
0
0 0 FILE: Number of read operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_READ_OPS>
0
0 0 FILE: Number of write operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_WRITE_OPS>
0
0 0 WASB: Number of bytes read
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_BYTES_READ>
186405294283
0 186405294283 WASB: Number of bytes written
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_BYTES_WRITTEN>
0
363027342839 363027342839 WASB: Number of large read operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_LARGE_READ_OPS>
0
0 0 WASB: Number of read operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_READ_OPS>
0
0 0 WASB: Number of write operations
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_WRITE_OPS>
0
0 0
Job Counters
Name
Map
Reduce
Total
Launched map tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/TOTAL_LAUNCHED_MAPS>
0
0 348 Launched reduce tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/TOTAL_LAUNCHED_REDUCES>
0
0 9 Rack-local map tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/RACK_LOCAL_MAPS>
0
0 348 Total megabyte-seconds taken by all map tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MB_MILLIS_MAPS>
0
0 460560315648 Total megabyte-seconds taken by all reduce tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MB_MILLIS_REDUCES>
0
0 158604449280 Total time spent by all map tasks (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MILLIS_MAPS>
0
0 599687911 Total time spent by all maps in occupied slots (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/SLOTS_MILLIS_MAPS>
0
0 599687911 Total time spent by all reduce tasks (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MILLIS_REDUCES>
0
0 103258105 Total time spent by all reduces in occupied slots (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/SLOTS_MILLIS_REDUCES>
0
0 206516210 Total vcore-seconds taken by all map tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/VCORES_MILLIS_MAPS>
0
0 599687911 Total vcore-seconds taken by all reduce tasks
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/VCORES_MILLIS_REDUCES>
0
0 103258105
Map-Reduce Framework
Name
Map
Reduce
Total
Combine input records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMBINE_INPUT_RECORDS>
0
0 0 Combine output records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMBINE_OUTPUT_RECORDS>
0
0 0 CPU time spent (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/CPU_MILLISECONDS>
162773540
90154160 252927700 Failed Shuffles
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/FAILED_SHUFFLE>
0
0 0 GC time elapsed (ms)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/GC_TIME_MILLIS>
7667781
1607188 9274969 Input split bytes
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SPLIT_RAW_BYTES>
52548
0 52548 Map input records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_INPUT_RECORDS>
861890673
0 861890673 Map output bytes
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_BYTES>
1488284643774
0 1488284643774 Map output materialized bytes
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_MATERIALIZED_BYTES>
1515865164102
0 1515865164102 Map output records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_RECORDS>
13790250768
0 13790250768 Merged Map outputs
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MERGED_MAP_OUTPUTS>
0
3132 3132 Physical memory (bytes) snapshot
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/PHYSICAL_MEMORY_BYTES>
192242380800
4546826240 196789207040 Reduce input groups
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_INPUT_GROUPS>
0
861890673 861890673 Reduce input records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_INPUT_RECORDS>
0
13790250768 13790250768 Reduce output records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_OUTPUT_RECORDS>
0
13790250768 13790250768 Reduce shuffle bytes
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_SHUFFLE_BYTES>
0
1515865164102 1515865164102 Shuffled Maps
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SHUFFLED_MAPS>
0
3132 3132 Spilled Records
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SPILLED_RECORDS>
27580501536
23694179168 51274680704 Total committed heap usage (bytes)
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMMITTED_HEAP_BYTES>
186401685504
3023044608 189424730112 Virtual memory (bytes) snapshot
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/VIRTUAL_MEMORY_BYTES>
537370951680
19158048768 556529000448
Phoenix MapReduce Import
Name
Map
Reduce
Total
Upserts Done
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Phoenix%20MapReduce%20Import/Upserts%20Done>
861890673
0 861890673
Shuffle Errors
Name
Map
Reduce
Total
BAD_ID
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/BAD_ID>
0
0 0 CONNECTION
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/CONNECTION>
0
0 0 IO_ERROR
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/IO_ERROR>
0
0 0 WRONG_LENGTH
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_LENGTH>
0
0 0 WRONG_MAP
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_MAP>
0
0 0 WRONG_REDUCE
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_REDUCE>
0
0 0
File Input Format Counters
Name
Map
Reduce
Total
Bytes Read
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter/BYTES_READ>
186395934997
0 186395934997
File Output Format Counters
Name
Map
Reduce
Total
Bytes Written
<http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter/BYTES_WRITTEN>
0
363027342839 363027342839

On 16 September 2015 at 11:46, Gabriel Reid <gabriel.r...@gmail.com> wrote:

> Can you view (and post) the job counters values from the import job?
> These should be visible in the job history server.
>
> Also, did you see the import tool exit successfully (in the terminal
> where you started it?)
>
> - Gabriel
>
> On Wed, Sep 16, 2015 at 6:24 PM, Gaurav Kanade <gaurav.kan...@gmail.com>
> wrote:
> > Hi guys
> >
> > I was able to get this to work after using bigger VMs for data nodes;
> > however now the bigger problem I am facing is after my MR job completes
> > successfully I am not seeing any rows loaded in my table (count shows 0
> both
> > via phoenix and hbase)
> >
> > Am I missing something simple ?
> >
> > Thanks
> > Gaurav
> >
> >
> > On 12 September 2015 at 11:16, Gabriel Reid <gabriel.r...@gmail.com>
> wrote:
> >>
> >> Around 1400 mappers sounds about normal to me -- I assume your block
> >> size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of
> >> input.
> >>
> >> To add to what Krishna asked, can you be a bit more specific on what
> >> you're seeing (in log files or elsewhere) which leads you to believe
> >> the data nodes are running out of capacity? Are map tasks failing?
> >>
> >> If this is indeed a capacity issue, one thing you should ensure is
> >> that map output comression is enabled. This doc from Cloudera explains
> >> this (and the same information applies whether you're using CDH or
> >> not) -
> >>
> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_23_3.html
> >>
> >> In any case, apart from that there isn't any basic thing that you're
> >> probably missing, so any additional information that you can supply
> >> about what you're running into would be useful.
> >>
> >> - Gabriel
> >>
> >>
> >> On Sat, Sep 12, 2015 at 2:17 AM, Krishna <research...@gmail.com> wrote:
> >> > 1400 mappers on 9 nodes is about 155 mappers per datanode which sounds
> >> > high
> >> > to me. There are very few specifics in your mail. Are you using YARN?
> >> > Can
> >> > you provide details like table structure, # of rows & columns, etc. Do
> >> > you
> >> > have an error stack?
> >> >
> >> >
> >> > On Friday, September 11, 2015, Gaurav Kanade <gaurav.kan...@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi All
> >> >>
> >> >> I am new to Apache Phoenix (and relatively new to MR in general) but
> I
> >> >> am
> >> >> trying a bulk insert of a 200GB tar separated file in an HBase table.
> >> >> This
> >> >> seems to start off fine and kicks off about ~1400 mappers and 9
> >> >> reducers (I
> >> >> have 9 data nodes in my setup).
> >> >>
> >> >> At some point I seem to be running into problems with this process as
> >> >> it
> >> >> seems the data nodes run out of capacity (from what I can see my data
> >> >> nodes
> >> >> have 400GB local space). It does seem that certain reducers eat up
> most
> >> >> of
> >> >> the capacity on these - thus slowing down the process to a crawl and
> >> >> ultimately leading to Node Managers complaining that Node Health is
> bad
> >> >> (log-dirs and local-dirs are bad)
> >> >>
> >> >> Is there some inherent setting I am missing that I need to set up for
> >> >> the
> >> >> particular job ?
> >> >>
> >> >> Any pointers would be appreciated
> >> >>
> >> >> Thanks
> >> >>
> >> >> --
> >> >> Gaurav Kanade,
> >> >> Software Engineer
> >> >> Big Data
> >> >> Cloud and Enterprise Division
> >> >> Microsoft
> >
> >
> >
> >
> > --
> > Gaurav Kanade,
> > Software Engineer
> > Big Data
> > Cloud and Enterprise Division
> > Microsoft
>



-- 
Gaurav Kanade,
Software Engineer
Big Data
Cloud and Enterprise Division
Microsoft

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

Reply via email to