Will,
The HDFS connector we ship today is for Kafka -> HDFS, so it isn't
reading/processing data in HDFS.
I was discussing both directions because the question was unclear. However,
there's no reason you couldn't create a connector that processes files in
splits to parallelize an HDFS -> Kafka
In terms of big files which is quite often in HDFS, does connect task parallel
process the same file like what MR deal with split files? I do not think so. In
this case, Kafka connect implement has no advantages to read single big file
unless you also use mapreduce.
Sent from my iPhone
On Jan
> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
The question is a bit unclear as to whether you mean "use Kafka to send
data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
topic". But in both cases, Kafka Connect provides a good option.
The more
If you want to know if "kafka" can read hadoop files, then no. But you can
write your own producer that reads from hdfs any which way and pushes to
kafka. We use kafka as the ingestion pipeline's main queue. Read from
various sources and push everything to kafka.
On Tue, Jan 10, 2017 at 6:26 AM,
Can you explain in more detail? Do you want to have files created in hdfs
somehow broken into records and put into Kafka?
> On Jan 9, 2017, at 19:57, Cas Apanowicz wrote:
>
> Hi,
>
> I have general understanding of main Kafka functionality as a streaming tool.
>
Hi,
I have general understanding of main Kafka functionality as a streaming tool.
However, I'm trying to figure out if I can use Kafka to read Hadoop file.
Can you please advise?
Thanks
Cas