Hi,

related to hdfs2 and normal file, you might find,
that camel sends message per data chunk,
NOT message per file (which I would expect).

They probably don't intent to change it.

It was reported
as bug https://issues.apache.org/jira/browse/CAMEL-8040 (won't fix)
and as doc enhancment https://issues.apache.org/jira/browse/CAMEL-8150 (done).

Btw nice catch with that tmp file :)

Josef

On 03/24/2015 09:19 PM, Sergey Zhemzhitsky wrote:
Hello,

Really interesting question.
The answer is this jira issue: https://issues.apache.org/jira/browse/CAMEL-4555
and this diff: 
http://mail-archives.apache.org/mod_mbox/camel-commits/201110.mbox/%3c20111022140442.94f362388...@eris.apache.org%3E

It would be really great if
1. the component will make this feature optional to be able to stream 
multigigabyte data from within hdfs directly
on the file by file basis
2. the component will merge the files on the fly without any intermediate 
storage.

Just raised the JIRA: https://issues.apache.org/jira/browse/CAMEL-8542

Regards,
Sergey

Hi, all!
I'm looking at ways to use hdfs2 component to read files stored in a Hadoop
directory. As a quite new Hadoop user I assume that simplest way is when
data is stored in normal file format.
I was looking at code in
'org.apache.camel.component.hdfs2.HdfsFileType#NORMAL_FILE' class that is
responsible for creating the input stream and noticed that it will copy the
whole file to the local file system (in temp file) before opening input
stream (the case when using 'hdfs://' URI).
I wonder what is the reason behind this? Isn't it possible that file can be
very large and then this operation will be quite costly? Maybe I missing
some basic restrictions on using normal files in Hadoop?
Thanks in advance
Alexey



Reply via email to