Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:
bin/flume-ng tool FCINTEGRITYTOOL You can see the arguments you’d need to supply when you execute this command. Thanks, Hari Shreedharan > On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <[email protected]> wrote: > > Is there anything like a logdump tool for flume file channel? > Specifically I+IBk-m looking for some way to extract say the event data for > the last N puts. > Alternatively can the logs be modified so that the last N (sink) commits will > be ignored on restart? > > The scenario that I+IBk-m concerned about is this: > > 1. server crashes, flume is restarted once the server is brought back. > 2. End user sees something odd in his HiveQL and speculates that data > was lost. > 3. We peek into the WAL as they existed just before the restart (we > saved off a copy) and either > a. Find an event corresponding to the missing data and use that to fix > the data in the destination, or > b. Prove that the event corresponding to the missing data was not > present at least as far back as the logs go > > I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY. > > > > Nothing in this message is intended to constitute an electronic signature > unless a specific statement to the contrary is included in this message. > > Confidentiality Note: This message is intended only for the person or entity > to which it is addressed. It may contain confidential and/or privileged > material. Any review, transmission, dissemination or other use, or taking of > any action in reliance upon this message by persons or entities other than > the intended recipient is prohibited and may be unlawful. If you received > this message in error, please contact the sender and delete it from your > computer.
