Will Flume support Avro Schema Registry?
I have table data coming to kafka topic as Avro. Flume reads from kafka topic 
and writes to HDFS as Parquet, later on the hive external tables are created.
I could see that the data in hdfs is created with .tmp extension and not 
creating a proper file. For some reason the flume sink is not calling the close 
on the file, I feel. Following are the configurations that I am using.

agent.channels = c1
agent.sources = s1
agent.sinks = k1

agent.channels.c1.type = memory
agent.channels.c1.capacity = 10000000
agent.channels.c1.transactionCapacity = 1000

agent.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
agent.sources.s1.channels = c1
agent.sources.s1.kafka.bootstrap.servers=XXXX:1025 XXXX:1025 XXXX:1025
agent.sources.s1.kafka.topics=t10

agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kite.DatasetSink
agent.sinks.k1.kite.dataset.uri = 
dataset:hdfs://namenodeHA/kite/avro_income_band_1
#agent.sinks.k1.kite.dataset.uri = 
dataset:hive:hdfs://namenodeHA/kite/avro_income_bandt1/schema_parquet/income_band
agent.sinks.k1.serializer = 
org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
agent.sinks.k1.hdfs.filePrefix = parquetdata
agent.sinks.k1.hdfs.fileSuffix = .parquet
agent.sinks.k1.hdfs.fileType=DataStream
agent.sinks.k1.serializer.class = kafka.serializer.DefaultEncoder
agent.sinks.k1.kafka.message.coder.schema.registry.class = 
com.linkedin.camus.schemaregistry.AvroRestSchemaRegistry
agent.sinks.k1.schema.registry.url = 
http://schema_registry:8081/subjects/income_band/versions/latest
agent.sinks.k1.kite.batchSize = 2
agent.sinks.k1.kite.rollInterval = 30
agent.sinks.k1.kite.flushable.commitOnBatch = true

agent.sources.s1.channels = c1
agent.sinks.k1.channel= c1



Reply via email to