For #1, with HDFS replication set to 3, HFile replication is handled by
hdfs. There shouldn't be HFile loss once bulk load completes.

For #3, multiple HFiles may be generated per region.

bq. If multiple does loadIncrementalHFiles merges these Hfiles to 1

There is no merging of HFiles in bulk load.

For #4, frequent compactions are likely given the small size of bulk loaded
data.

Cheers

On Tue, Jul 21, 2015 at 7:20 AM, Shushant Arora <[email protected]>
wrote:

> 1.Does bulk loaded HFile not  get replicated? Is it mean if a Regionserver
> gets down , all Hfiles which were bulk loaded to this server are lost
> irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded HFiles
> are not replicated.
>
> 2.Is there any issue in timestamp prefix as key of table- and used bulk
> load for writing.
>
> 3.Does in bulk load MR job using HFileOutPutFormat2 as outputformat will
> create single HFile per region ? Or it can be multiple Hfiles per region?
> If multiple does loadIncrementalHFiles merges these Hfiles to 1 while
> loading to same region or just do simple copy?
>
> 4.Is there any performance issue if I run bulk load every 5 sec -
> containing ~20MB of data.Does it  creates frequent compactions and that
> lead to performance issue?
>

Reply via email to