For #1, with HDFS replication set to 3, HFile replication is handled by hdfs. There shouldn't be HFile loss once bulk load completes.
For #3, multiple HFiles may be generated per region. bq. If multiple does loadIncrementalHFiles merges these Hfiles to 1 There is no merging of HFiles in bulk load. For #4, frequent compactions are likely given the small size of bulk loaded data. Cheers On Tue, Jul 21, 2015 at 7:20 AM, Shushant Arora <[email protected]> wrote: > 1.Does bulk loaded HFile not get replicated? Is it mean if a Regionserver > gets down , all Hfiles which were bulk loaded to this server are lost > irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded HFiles > are not replicated. > > 2.Is there any issue in timestamp prefix as key of table- and used bulk > load for writing. > > 3.Does in bulk load MR job using HFileOutPutFormat2 as outputformat will > create single HFile per region ? Or it can be multiple Hfiles per region? > If multiple does loadIncrementalHFiles merges these Hfiles to 1 while > loading to same region or just do simple copy? > > 4.Is there any performance issue if I run bulk load every 5 sec - > containing ~20MB of data.Does it creates frequent compactions and that > lead to performance issue? >
