Bulk inserting into HBase with NiFi

Mike Thomsen Tue, 06 Jun 2017 16:22:29 -0700

We have a very large body of CSV files (well over 1TB) that need to be
imported into HBase. For a single 20GB segment, we are looking at having to
push easily 100M flowfiles into HBase and most of the JSON files generated
are rather small (like 20-250 bytes).


It's going very slowly, and I assume that is because we're taxing the disk
very heavily because of the content and provenance repositories coming into
play. So I'm wondering if anyone has a suggestion on a good NiFiesque way
of solving this. Right now, I'm considering two options:

1. Looking for a way to inject the HBase controller service into an
ExecuteScript processor so I can handle the data in large chunks (splitting
text and generating a List<Put> inside the processor myself and doing one
huge Put)

2. Creating a library that lets me generate HFiles from within an
ExecuteScript processor.

What I really need is something fast within NiFi that would let me generate
huge blocks of updates for HBase and push them out. Any ideas?

Thanks,

Mike

Bulk inserting into HBase with NiFi

Reply via email to