Will Flume support Avro Schema Registry? I have table data coming to kafka topic as Avro. Flume reads from kafka topic and writes to HDFS as Parquet, later on the hive external tables are created. I could see that the data in hdfs is created with .tmp extension and not creating a proper file. For some reason the flume sink is not calling the close on the file, I feel. Following are the configurations that I am using.
agent.channels = c1 agent.sources = s1 agent.sinks = k1 agent.channels.c1.type = memory agent.channels.c1.capacity = 10000000 agent.channels.c1.transactionCapacity = 1000 agent.sources.s1.type = org.apache.flume.source.kafka.KafkaSource agent.sources.s1.channels = c1 agent.sources.s1.kafka.bootstrap.servers=XXXX:1025 XXXX:1025 XXXX:1025 agent.sources.s1.kafka.topics=t10 agent.sinks.k1.channel = c1 agent.sinks.k1.type = org.apache.flume.sink.kite.DatasetSink agent.sinks.k1.kite.dataset.uri = dataset:hdfs://namenodeHA/kite/avro_income_band_1 #agent.sinks.k1.kite.dataset.uri = dataset:hive:hdfs://namenodeHA/kite/avro_income_bandt1/schema_parquet/income_band agent.sinks.k1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder agent.sinks.k1.hdfs.filePrefix = parquetdata agent.sinks.k1.hdfs.fileSuffix = .parquet agent.sinks.k1.hdfs.fileType=DataStream agent.sinks.k1.serializer.class = kafka.serializer.DefaultEncoder agent.sinks.k1.kafka.message.coder.schema.registry.class = com.linkedin.camus.schemaregistry.AvroRestSchemaRegistry agent.sinks.k1.schema.registry.url = http://schema_registry:8081/subjects/income_band/versions/latest agent.sinks.k1.kite.batchSize = 2 agent.sinks.k1.kite.rollInterval = 30 agent.sinks.k1.kite.flushable.commitOnBatch = true agent.sources.s1.channels = c1 agent.sinks.k1.channel= c1