Re: dfs.block.size

2012-02-28 Thread madhu phatak
You can use FileSystem.getFileStatus(Path p) which gives you the block size
specific to a file.

On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt  wrote:

> "hadoop fsck  -blocks" is something that I think of quickly.
>
> http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas 
> more details
>
> Kai
>
> Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:
>
> > How do I verify the block size of a given file? Is there a command?
> >
> > On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria 
> wrote:
> >
> >> dfs.block.size can be set per job.
> >>
> >> mapred.tasktracker.map.tasks.maximum is per tasktracker.
> >>
> >> -Joey
> >>
> >> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia  >
> >> wrote:
> >>> Can someone please suggest if parameters like dfs.block.size,
> >>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
> >> can
> >>> these be set per client job configuration?
> >>>
> >>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia  >>> wrote:
> >>>
> >>>> If I want to change the block size then can I use Configuration in
> >>>> mapreduce job and set it when writing to the sequence file or does it
> >> need
> >>>> to be cluster wide setting in .xml files?
> >>>>
> >>>> Also, is there a way to check the block of a given file?
> >>>>
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >>
>
> --
> Kai Voigt
> k...@123.org
>
>
>
>
>


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: dfs.block.size

2012-02-27 Thread Kai Voigt
"hadoop fsck  -blocks" is something that I think of quickly.

http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more 
details

Kai

Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:

> How do I verify the block size of a given file? Is there a command?
> 
> On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria  wrote:
> 
>> dfs.block.size can be set per job.
>> 
>> mapred.tasktracker.map.tasks.maximum is per tasktracker.
>> 
>> -Joey
>> 
>> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia 
>> wrote:
>>> Can someone please suggest if parameters like dfs.block.size,
>>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
>> can
>>> these be set per client job configuration?
>>> 
>>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >> wrote:
>>> 
>>>> If I want to change the block size then can I use Configuration in
>>>> mapreduce job and set it when writing to the sequence file or does it
>> need
>>>> to be cluster wide setting in .xml files?
>>>> 
>>>> Also, is there a way to check the block of a given file?
>>>> 
>> 
>> 
>> 
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>> 

-- 
Kai Voigt
k...@123.org






Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
How do I verify the block size of a given file? Is there a command?

On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria  wrote:

> dfs.block.size can be set per job.
>
> mapred.tasktracker.map.tasks.maximum is per tasktracker.
>
> -Joey
>
> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia 
> wrote:
> > Can someone please suggest if parameters like dfs.block.size,
> > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
> can
> > these be set per client job configuration?
> >
> > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia  >wrote:
> >
> >> If I want to change the block size then can I use Configuration in
> >> mapreduce job and set it when writing to the sequence file or does it
> need
> >> to be cluster wide setting in .xml files?
> >>
> >> Also, is there a way to check the block of a given file?
> >>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>


Re: dfs.block.size

2012-02-27 Thread Joey Echeverria
dfs.block.size can be set per job.

mapred.tasktracker.map.tasks.maximum is per tasktracker.

-Joey

On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia  wrote:
> Can someone please suggest if parameters like dfs.block.size,
> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
> these be set per client job configuration?
>
> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote:
>
>> If I want to change the block size then can I use Configuration in
>> mapreduce job and set it when writing to the sequence file or does it need
>> to be cluster wide setting in .xml files?
>>
>> Also, is there a way to check the block of a given file?
>>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
Can someone please suggest if parameters like dfs.block.size,
mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
these be set per client job configuration?

On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote:

> If I want to change the block size then can I use Configuration in
> mapreduce job and set it when writing to the sequence file or does it need
> to be cluster wide setting in .xml files?
>
> Also, is there a way to check the block of a given file?
>


dfs.block.size

2012-02-25 Thread Mohit Anchlia
If I want to change the block size then can I use Configuration in
mapreduce job and set it when writing to the sequence file or does it need
to be cluster wide setting in .xml files?

Also, is there a way to check the block of a given file?


Re: Question about dfs.block.size setting

2010-07-22 Thread Yu Li
Hi Harsh,

Thanks for your comments, I found "Increasing the number of tasks increases
the framework overhead, but increases load balancing and lowers the cost of
failures." quite useful. But I'm still confued why increase block size for
large jobs will improve performance. And according to the result of my test,
while sorting 2TB data on 30 nodes cluster, increase block size from 64M to
256M would decline performance instead of improving it, could anybody tell
me why this happened?

Any comments on this? Thanks.

Best Regards,
Carp

2010/7/22 Harsh J 

> This article has a few good lines that should clear that doubt of yours:
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> On Thu, Jul 22, 2010 at 9:17 AM, Yu Li  wrote:
> > Hi all,
> >
> > There're lots of materials from internet suggest to set dfs.block.size
> > larger, e.g. from 64M to 256M, when the job is large. And they said the
> > performance would improve. But I'm not clear why increse the block size
> will
> > improve. I know that increase block size will reduce the map task number
> for
> > the same input, but why lesser map tasks will improve overall
> performance?
> >
> > Any comments would be highly valued, and thanks in advance.
> >
> > Best Regards,
> > Carp
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>


Re: Question about dfs.block.size setting

2010-07-21 Thread Harsh J
This article has a few good lines that should clear that doubt of yours:
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

On Thu, Jul 22, 2010 at 9:17 AM, Yu Li  wrote:
> Hi all,
>
> There're lots of materials from internet suggest to set dfs.block.size
> larger, e.g. from 64M to 256M, when the job is large. And they said the
> performance would improve. But I'm not clear why increse the block size will
> improve. I know that increase block size will reduce the map task number for
> the same input, but why lesser map tasks will improve overall performance?
>
> Any comments would be highly valued, and thanks in advance.
>
> Best Regards,
> Carp
>



-- 
Harsh J
www.harshj.com


Question about dfs.block.size setting

2010-07-21 Thread Yu Li
Hi all,

There're lots of materials from internet suggest to set dfs.block.size
larger, e.g. from 64M to 256M, when the job is large. And they said the
performance would improve. But I'm not clear why increse the block size will
improve. I know that increase block size will reduce the map task number for
the same input, but why lesser map tasks will improve overall performance?

Any comments would be highly valued, and thanks in advance.

Best Regards,
Carp


Questions on dfs.block.size

2010-05-03 Thread stan lee
Hi Experts:

Is there any method to make the dfs.block.size to take effect on old file
before it changes? Or is it meaningful?

If I run a job A, would it copy input files to hdfs file system if that
input file has been in hdfs file system?

If so, then perhaps make dfs.block.size has is meaningful: if I found Job A
runs slowly and the subsequent Job B has same input file with Job A, then
perhaps I can change the dfs.block.size and then the number of mapper task
for Job B would be increased, which may faster the Job running.

If it always needs to copy input file(even that file has been on hdfs
system) everytime a Job runs, then making dfs.block.size take effect on old
files would be not meaningful.

Thanks!
Stan Lee


Re: dfs.block.size change not taking affect?

2009-07-24 Thread hadooping

Block size may not be the only answer, look into the way the namenode
distributes the blocks on your datanodes, see if the client datanode is not
creating a bottleneck.



zeevik wrote:
> 
> 
> .. New member here, hello everyone! ..
> 
> I am changing the default dfs.block.size from 64MB to 256MB (or any other
> value) in hadoop-site.xml file and restarting the cluster to make sure
> changes are applied. Now the issue is that when I am trying to put a file
> on the hdfs (hadoop fs -put) it seems like the block size is always 64MB
> (browsing the filesystem via the http interface). Hadoop version is 0.19.1
> on a 6 node cluster. 
> 
> 1. Why the new block size is not reflected when I am creating/loading a
> new file into the hdfs?
> 2. How can I see current parameters and their values on Hadoop to make
> sure the change in hadoop-site.xml file took affect at the restart? 
> 
> I am trying to load a large file into HDFS and it seems slow (1.5min for
> 1GB), that's why I am trying to increase the block size.
> 
> Thanks,
> Zeev
> 

-- 
View this message in context: 
http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654233.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



dfs.block.size change not taking affect?

2009-07-24 Thread zeevik


.. New member here, hello everyone! ..

I am changing the default dfs.block.size from 64MB to 256MB (or any other
value) in hadoop-site.xml file and restarting the cluster to make sure
changes are applied. Now the issue is that when I am trying to put a file on
the hdfs (hadoop fs -put) it seems like the block size is always 64MB
(browsing the filesystem via the http interface). Hadoop version is 0.19.1
on a 6 node cluster. 

1. Why the new block size is not reflected when I am creating/loading a new
file into the hdfs?
2. How can I see current parameters and their values on Hadoop to make sure
the change in hadoop-site.xml file took affect at the restart? 

I am trying to load a large file into HDFS and it seems slow (1.5min for
1GB), that's why I am trying to increase the block size.

Thanks,
Zeev
-- 
View this message in context: 
http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654181.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.