Hi ,
I am moving logs from local machine to HDFS server using flume with spooling
directory. Each log contain lacks of lines
My use case is below
Log file name foldername-filename-timestamp.suffix example file name is
LogFiles-Log1-1463238298.log
my CONF is below
a1.sinks = k1a1.channels = c1
#the source
a1.sources.r1.type = spooldira1.sources.r1.spoolDir =
F:\\SpoolingDirectorya1.sources.r1.deletePolicy=immediatea1.sources.r1.fileHeader
= truea1.sources.r1.interceptors = i1a1.sources.r1.interceptors.i1.type =
com.company.CustomInterceptor.CustomInterceptor$Builder
#the sinka1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.fileType =
DataStreama1.sinks.k1.hdfs.fileSuffix= .txta1.sinks.k1.hdfs.path =
hdfs://localhost:9000/spoolingdirectory/{foldername}
#Channela1.channels.c1.type = memorya1.channels.c1.capacity =
10000a1.channels.c1.transactionCapacity = 1000
#Flowa1.sources.r1.channels = c1a1.sinks.k1.channel = c1
in the custom interceptor we will process the file hear and extract the folder
name and add this as {foldername} header it is use in hdfspath. What problem we
are facing is for single file with lacks line this interceptor extract the
same folder name for lacks of time this will leads very high performance
degradation.
Is there any way to handle my case without performing the same file header for
lacks time ?
thanks.