Re: Question about Flume

2014-01-23 Thread Olivier Renault
You could also consider using WebHDFS instead of sftp / flume. WebHDFS is a REST API which will allow you to copy data directly into HDFS. Regards, Olivier On 23 January 2014 05:25, sudhakara st sudhakara...@gmail.com wrote: Hello Kaalu Singh, Flume is best mach for your requirement. First

Streaming jobs getting poor locality

2014-01-23 Thread Williams, Ken
Hi, I posted a question to Stack Overflow yesterday about an issue I'm seeing, but judging by the low interest (only 7 views in 24 hours, and 3 of them are probably me! :-) it seems like I should switch venue. I'm pasting the same question here in hopes of finding someone with interest.

Re: Streaming jobs getting poor locality

2014-01-23 Thread sudhakara st
I think In order to configure a Hadoop Job to read the Compressed input you have to specify compression codec in code or in command linelike *-D io.compression.codecs=org.apache.hadoop.io.compress.BZip2Codec * On Thu, Jan 23, 2014 at 12:40 AM, Williams, Ken ken.willi...@windlogics.com wrote:

Re: How to learn hadoop follow Tom White

2014-01-23 Thread Abirami V
I am sorry if that site is illegal. Just though it will be useful. On Wed, Jan 22, 2014 at 11:43 PM, Amr Shahin amrnab...@gmail.com wrote: Looks pretty illegal to me as well. On Thu, Jan 23, 2014 at 11:32 AM, Marco Shaw marco.s...@gmail.com wrote: I'm pretty sure that site is illegal...

HDFS buffer sizes

2014-01-23 Thread John Lilley
What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone have experience using larger buffers to optimize large writes? Thanks John

RE: Streaming jobs getting poor locality

2014-01-23 Thread java8964
I believe Hadoop can figure out the codec from the file name extension, and Bzip2 codec is supported from Hadoop as Java implementation, which is also a SplitableCompressionCodec. So 5G bzip2 files generate about 45 mappers is very reasonable, assuming 128M/block. The question is why ONLY one

HDFS federation configuration

2014-01-23 Thread AnilKumar B
Hi, We tried setting up HDFS name node federation set up with 2 name nodes. I am facing few issues. Can any one help me in understanding below points? 1) how can we configure different namespaces to different name node? Where exactly we need to configure this? 2) After formatting each NN with

hdfs fsck -locations

2014-01-23 Thread Mark Kerzner
Hi, hdfs fsck -locations is supposed to show every block with its location? Is location the ip of the datanode? Thank you, Mark

RE: Streaming jobs getting poor locality

2014-01-23 Thread java8964
I cannot explain it (Your configuration looks fine to me, and you mention that those mappers can ONLY run on one node in one run, but could be on different nodes across running). But as I said, I am not an expect in Yarn, as it is also very new to me. Let's see if someone else in the list can

Re: HDFS buffer sizes

2014-01-23 Thread Arpit Agarwal
HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.netwrote: What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone

HDFS data transfer is faster than SCP based transfer?

2014-01-23 Thread rab ra
Hello I have a use case that requires transfer of input files from remote storage using SCP protocol (using jSCH jar). To optimize this use case, I have pre-loaded all my input files into HDFS and modified my use case so that it copies required files from HDFS. So, when tasktrackers works, it