Hi Hari,
Thanks you for replying on my question. You are absolutely right, I am using only one channel for both the sinks which is causing the problem. Thanks for pointing that out, One problem is solved. For spooldirectory, I am processing the files directly using my own custom interceptor. Here is the config for the source:

dnAgent.sources.gpslog.type = spooldir
dnAgent.sources.gpslog.spoolDir = /home/ktspool
dnAgent.sources.gpslog.batchSize = 500
dnAgent.sources.gpslog.channels = MemChannel
dnAgent.sources.gpslog.fileHeader = true
dnAgent.sources.gpslog.deletePolicy = immediate
dnAgent.sources.gpslog.useStrictSpooledFilePolicies = false
dnAgent.sources.gpslog.interceptors = KTFlowProcessInterceptor
dnAgent.sources.gpslog.interceptors.KTFlowProcessInterceptor.type=com.souvikbose.flume.interceptors.KTFlowProcessInterceptor$Builder

Generally this works great if everything is okay. But the problem is the gps provider doesn't have full control on what comes in so sometimes blank file with 0 bytes size comes in which causes flume to stop processing with exception and I have to manually restart the flume.

P.S: I am using flume 1.4.0 on cdh 4.4.0 on 4 data nodes in EC2.

Thanks & Regards,
Souvik
On 12/8/2014 11:36 PM, Hari Shreedharan wrote:
You are likely reading from the same channel for both sinks. That means only one sink gets your data. You’d need to have 2 channels connected to the same source and each sink get its own channel.

About the Spool Dir not processing data, what format/serializer etc are you using?

Thanks,
Hari


On Mon, Dec 8, 2014 at 3:37 AM, Souvik Bose <[email protected] <mailto:[email protected]>> wrote:

    Hello All,
    I am stuck with a problem with flume version 1.4.0. I am using
    spooldirectory source with a custom interceptor to process encoded
    gps files and save it in hdfs and solr (using morphline solr
    sink). The main informtion is stored on the file name itself which
    is coming in on the spool directory and the content is irrelevant.
    So I am using the custom interceptor to extract and transform the
    file header and store the extracted data in Json format as the
    output of the event.
    My problem comes in:

    1. When there is a 0 byte file comes in (generally files come in
    with a "!" symbol in the content) flume stops and throws an
    exception. We don't need the content of the file in any case, but
    still face exception as flume cannot handle 0 byte files.
    2. When there is content with some weird characters like !ƒ!,
    flume stops with exception
    3. Even when everything is running fine, I am losing some data/
    events. On closer introspection I found that some are available in
    hdfs but not in solr and vice versa. I am not using any processor
    sinkgroups like failover or load balancing. Is it because of that?

    I want to achieve a solution where I can handle any exceptions and
    the file/data which causes the exception is discarded and flume
    processes the next file in the spool directory. The date comes in
    at high velocity 100 files every seconds. So manually deleting the
    file and retstarting flume is the regular practice I do to keep
    everything back on track. But I am sure there must be some better
    ways to handle this case. Can you guys please suggests some better
    alternatives for my approach please//?/

    Thanks & Regards,
    Souvik Bose
    ///



--
Met vriendelijke groeten / Mit freundlichen Grüßen / With kind regards,



Delgence | Delivering Intelligence
Delivering high quality IT solutions.

*Souvik Bose*
CIO

Development Office:
Rishi Tech Park Office No. E -3, Premises No. 02-360 Street No. 360 New Town Rajarhat
Kolkata-700156. India

Europe Office:
Liessentstraat 9a, 5405 AH  Uden
The Netherlands

*T*+91 9831607354 | T +31 616392268 | *
E* [email protected] <mailto:[email protected]> | *W* www.delgence.com <http://www.delgence.com>

/This communication and any attachments hereto may contain confidential information. Unauthorized use// //or disclosure to additional parties is prohibited. If you are not an intended recipient, kindly notify the sender//
//and destroy all copies in your possession/

Reply via email to