How spark writes to HBASE

vignesh Mon, 22 Jan 2018 07:19:37 -0800

Hi,

I have a Spark job which reads some timeseries data and pushes that to
HBASE using HBASE client API. I am executing this Spark job on a 10
node cluster. Say at first when spark kicks off it picks
machine1,machine2,machine3 as its executors. Now when the job inserts
a row to HBASE. Below is what my undersatnding on what it does.


Based on the row key a particular region(from the META) would be
chosen and that row will be pushed to that RegionServer's memstore and
WAL and once the memestore is full it would be flushed to the disk.Now
if assume a particular row is being processed by a executor on
machine2 and the regionserver which handles that region to which the
put is to be made is on machine6. Will the data be transferred from
machine2 to machine6 over network and then the data will be stored in
memstore of machine6. Or spark will wisely launch an executor on that
machine during write(if the dynamic allocation is turned on) and
pushes to it?


-- 
I.VIGNESH

How spark writes to HBASE

Reply via email to