You could also consider using WebHDFS instead of sftp / flume. WebHDFS is a
REST API which will allow you to copy data directly into HDFS.
Regards,
Olivier
On 23 January 2014 05:25, sudhakara st sudhakara...@gmail.com wrote:
Hello Kaalu Singh,
Flume is best mach for your requirement. First
Hi,
I posted a question to Stack Overflow yesterday about an issue I'm seeing, but
judging by the low interest (only 7 views in 24 hours, and 3 of them are
probably me! :-) it seems like I should switch venue. I'm pasting the same
question here in hopes of finding someone with interest.
I think In order to configure a Hadoop Job to read the Compressed input you
have to specify compression codec in code or in command linelike
*-D io.compression.codecs=org.apache.hadoop.io.compress.BZip2Codec *
On Thu, Jan 23, 2014 at 12:40 AM, Williams, Ken ken.willi...@windlogics.com
wrote:
I am sorry if that site is illegal. Just though it will be useful.
On Wed, Jan 22, 2014 at 11:43 PM, Amr Shahin amrnab...@gmail.com wrote:
Looks pretty illegal to me as well.
On Thu, Jan 23, 2014 at 11:32 AM, Marco Shaw marco.s...@gmail.com wrote:
I'm pretty sure that site is illegal...
What is the interaction between dfs.stream-buffer-size and
dfs.client-write-packet-size?
I see that the default for dfs.stream-buffer-size is 4K. Does anyone have
experience using larger buffers to optimize large writes?
Thanks
John
I believe Hadoop can figure out the codec from the file name extension, and
Bzip2 codec is supported from Hadoop as Java implementation, which is also a
SplitableCompressionCodec.
So 5G bzip2 files generate about 45 mappers is very reasonable, assuming
128M/block.
The question is why ONLY one
Hi,
We tried setting up HDFS name node federation set up with 2 name nodes. I
am facing few issues.
Can any one help me in understanding below points?
1) how can we configure different namespaces to different name node? Where
exactly we need to configure this?
2) After formatting each NN with
Hi,
hdfs fsck -locations
is supposed to show every block with its location? Is location the ip of
the datanode?
Thank you,
Mark
I cannot explain it (Your configuration looks fine to me, and you mention that
those mappers can ONLY run on one node in one run, but could be on different
nodes across running). But as I said, I am not an expect in Yarn, as it is also
very new to me. Let's see if someone else in the list can
HDFS does not appear to use dfs.stream-buffer-size.
On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.netwrote:
What is the interaction between dfs.stream-buffer-size and
dfs.client-write-packet-size?
I see that the default for dfs.stream-buffer-size is 4K. Does anyone
Hello
I have a use case that requires transfer of input files from remote storage
using SCP protocol (using jSCH jar). To optimize this use case, I have
pre-loaded all my input files into HDFS and modified my use case so that it
copies required files from HDFS. So, when tasktrackers works, it
11 matches
Mail list logo