Re: Merging Small Files

2017-08-06 Thread Joe Witt
Steve It is a very common case and should work very well. Bring in data. Use MergeContent even for long periods of time required to reach desired bin size. Send somewhere. If you are seeing the content repo fill up then let's look into that. How large is the content repo? Is it on its own pa

Merging Small Files

2017-08-06 Thread Steve Champagne
Hello, I'm pulling data from API endpoints every five minutes and putting it into HDFS. This, however, is giving me quite a few small files. 288 files per day times however many endpoints I am reading. My current approach for handling them is to load the small files into some sort of staging direc