Re: Merging Small Files

Joe Witt Sun, 06 Aug 2017 08:38:00 -0700

Steve

It is a very common case and should work very well.  Bring in data.  Use
MergeContent even for long periods of time required to reach desired bin
size.  Send somewhere.


If you are seeing the content repo fill up then let's look into that.  How
large is the content repo?  Is it on its own partition?  What are the
nifi.properties repo settings?  When it appears full on disk how much data
does nifi see as active in the flow?

Thanks
Joe

On Aug 6, 2017 8:29 AM, "Steve Champagne" <champa...@gmail.com> wrote:

> Hello,
>
> I'm pulling data from API endpoints every five minutes and putting it into
> HDFS. This, however, is giving me quite a few small files. 288 files per
> day times however many endpoints I am reading. My current approach for
> handling them is to load the small files into some sort of staging
> directory under each of the endpoint directories. I then have list and
> fetch HDFS processors pulling them back into NiFi so that I can merge them
> based on size. This way I can keep the files in HDFS as they are waiting to
> be merged so they can be queried at any time. When they get close to an
> HDFS block size, I then merge them into an archive directory and delete the
> small files that were merged.
>
> My biggest problem with this is that I have to pull the files into NiFi
> where they might sit for extended periods waiting to be merged. This causes
> problems that I think are related to the problems brought up in
> NIFI-3376 where my content repository continues to grow unbounded and fills
> up my disk.
>
> I was wondering what other patterns people are using for this sort of
> stuff.
>
> Thanks!
>

Re: Merging Small Files

Reply via email to