Not directly answering your question, but note that the KafkaChannel makes #3 very easy - simple use a console consumer to read events from the topic used by Flume to see what was written there. Your end-users may even be able to do this for themselves.
On Mon, Jun 8, 2015 at 7:30 PM, Robert B Hamilton <[email protected]> wrote: > Is there anything like a logdump tool for flume file channel? > > Specifically I’m looking for some way to extract say the event data for the > last N puts. > > Alternatively can the logs be modified so that the last N (sink) commits > will be ignored on restart? > > > > The scenario that I’m concerned about is this: > > > > 1. server crashes, flume is restarted once the server is brought back. > > 2. End user sees something odd in his HiveQL and speculates that data > was lost. > > 3. We peek into the WAL as they existed just before the restart (we > saved off a copy) and either > > a. Find an event corresponding to the missing data and use that to fix > the data in the destination, or > > b. Prove that the event corresponding to the missing data was not > present at least as far back as the logs go > > > > I’m just wondering if there is a tool which makes number 3 possible…. > > > > > > Nothing in this message is intended to constitute an electronic signature > unless a specific statement to the contrary is included in this message. > > Confidentiality Note: This message is intended only for the person or entity > to which it is addressed. It may contain confidential and/or privileged > material. Any review, transmission, dissemination or other use, or taking of > any action in reliance upon this message by persons or entities other than > the intended recipient is prohibited and may be unlawful. If you received > this message in error, please contact the sender and delete it from your > computer.
