Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-24 Thread Martin Mailand

Hi Chris,
great to hear that, could you give me a ping if you fixed it, than I can 
retry it?


-martin

Am 24.01.2012 20:40, schrieb Chris Mason:

On Tue, Jan 24, 2012 at 08:15:58PM +0100, Martin Mailand wrote:

Hi
I tried the branch on one of my ceph osd, and there is a big
difference in the performance.
The average request size stayed high, but after around a hour the
kernel crashed.

IOstat
http://pastebin.com/xjuriJ6J

Kernel trace
http://pastebin.com/SYE95GgH


Aha, this I know how to fix.  Thanks for trying it out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-24 Thread Chris Mason
On Tue, Jan 24, 2012 at 08:15:58PM +0100, Martin Mailand wrote:
> Hi
> I tried the branch on one of my ceph osd, and there is a big
> difference in the performance.
> The average request size stayed high, but after around a hour the
> kernel crashed.
> 
> IOstat
> http://pastebin.com/xjuriJ6J
> 
> Kernel trace
> http://pastebin.com/SYE95GgH

Aha, this I know how to fix.  Thanks for trying it out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-24 Thread Martin Mailand

Hi
I tried the branch on one of my ceph osd, and there is a big difference 
in the performance.
The average request size stayed high, but after around a hour the kernel 
crashed.


IOstat
http://pastebin.com/xjuriJ6J

Kernel trace
http://pastebin.com/SYE95GgH

-martin

Am 23.01.2012 19:50, schrieb Chris Mason:

On Mon, Jan 23, 2012 at 01:19:29PM -0500, Josef Bacik wrote:

On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote:

As you might know, I have been seeing btrfs slowdowns in our ceph
cluster for quite some time. Even with the latest btrfs code for 3.3
I'm still seeing these problems. To make things reproducible, I've now
written a small test, that imitates ceph's behavior:

On a freshly created btrfs filesystem (2 TB size, mounted with
"noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening
100 files. After that I'm doing random writes on these files with a
sync_file_range after each write (each write has a size of 100 bytes)
and ioctl(BTRFS_IOC_SYNC) after every 100 writes.

After approximately 20 minutes, write activity suddenly increases
fourfold and the average request size decreases (see chart in the
attachment).

You can find IOstat output here: http://pastebin.com/Smbfg1aG

I hope that you are able to trace down the problem with the test
program in the attachment.


Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree and
formatted the fs with 64k node and leaf sizes and the problem appeared to go
away.  So surprise surprise fragmentation is biting us in the ass.  If you can
try running that branch with 64k node and leaf sizes with your ceph cluster and
see how that works out.  Course you should only do that if you dont mind if you
lose everything :).  Thanks,



Please keep in mind this branch is only out there for development, and
it really might have huge flaws.  scrub doesn't work with it correctly
right now, and the IO error recovery code is probably broken too.

Long term though, I think the bigger block sizes are going to make a
huge difference in these workloads.

If you use the very dangerous code:

mkfs.btrfs -l 64k -n 64k /dev/xxx

(-l is leaf size, -n is node size).

64K is the max right now, 32K may help just as much at a lower CPU cost.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-23 Thread Christian Brunner
2012/1/23 Chris Mason :
> On Mon, Jan 23, 2012 at 01:19:29PM -0500, Josef Bacik wrote:
>> On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote:
>> > As you might know, I have been seeing btrfs slowdowns in our ceph
>> > cluster for quite some time. Even with the latest btrfs code for 3.3
>> > I'm still seeing these problems. To make things reproducible, I've now
>> > written a small test, that imitates ceph's behavior:
>> >
>> > On a freshly created btrfs filesystem (2 TB size, mounted with
>> > "noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening
>> > 100 files. After that I'm doing random writes on these files with a
>> > sync_file_range after each write (each write has a size of 100 bytes)
>> > and ioctl(BTRFS_IOC_SYNC) after every 100 writes.
>> >
>> > After approximately 20 minutes, write activity suddenly increases
>> > fourfold and the average request size decreases (see chart in the
>> > attachment).
>> >
>> > You can find IOstat output here: http://pastebin.com/Smbfg1aG
>> >
>> > I hope that you are able to trace down the problem with the test
>> > program in the attachment.
>>
>> Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree 
>> and
>> formatted the fs with 64k node and leaf sizes and the problem appeared to go
>> away.  So surprise surprise fragmentation is biting us in the ass.  If you 
>> can
>> try running that branch with 64k node and leaf sizes with your ceph cluster 
>> and
>> see how that works out.  Course you should only do that if you dont mind if 
>> you
>> lose everything :).  Thanks,
>>
>
> Please keep in mind this branch is only out there for development, and
> it really might have huge flaws.  scrub doesn't work with it correctly
> right now, and the IO error recovery code is probably broken too.
>
> Long term though, I think the bigger block sizes are going to make a
> huge difference in these workloads.
>
> If you use the very dangerous code:
>
> mkfs.btrfs -l 64k -n 64k /dev/xxx
>
> (-l is leaf size, -n is node size).
>
> 64K is the max right now, 32K may help just as much at a lower CPU cost.

Thanks for taking a look. - I'm glad to hear that there is a solution
on the horizon, but I'm not brave enough to try this on our ceph
cluster. I'll try it when the code has stabilized a bit.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-23 Thread Chris Mason
On Mon, Jan 23, 2012 at 01:19:29PM -0500, Josef Bacik wrote:
> On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote:
> > As you might know, I have been seeing btrfs slowdowns in our ceph
> > cluster for quite some time. Even with the latest btrfs code for 3.3
> > I'm still seeing these problems. To make things reproducible, I've now
> > written a small test, that imitates ceph's behavior:
> > 
> > On a freshly created btrfs filesystem (2 TB size, mounted with
> > "noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening
> > 100 files. After that I'm doing random writes on these files with a
> > sync_file_range after each write (each write has a size of 100 bytes)
> > and ioctl(BTRFS_IOC_SYNC) after every 100 writes.
> > 
> > After approximately 20 minutes, write activity suddenly increases
> > fourfold and the average request size decreases (see chart in the
> > attachment).
> > 
> > You can find IOstat output here: http://pastebin.com/Smbfg1aG
> > 
> > I hope that you are able to trace down the problem with the test
> > program in the attachment.
>  
> Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree 
> and
> formatted the fs with 64k node and leaf sizes and the problem appeared to go
> away.  So surprise surprise fragmentation is biting us in the ass.  If you can
> try running that branch with 64k node and leaf sizes with your ceph cluster 
> and
> see how that works out.  Course you should only do that if you dont mind if 
> you
> lose everything :).  Thanks,
> 

Please keep in mind this branch is only out there for development, and
it really might have huge flaws.  scrub doesn't work with it correctly
right now, and the IO error recovery code is probably broken too.

Long term though, I think the bigger block sizes are going to make a
huge difference in these workloads.

If you use the very dangerous code:

mkfs.btrfs -l 64k -n 64k /dev/xxx

(-l is leaf size, -n is node size).

64K is the max right now, 32K may help just as much at a lower CPU cost.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-23 Thread Josef Bacik
On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote:
> As you might know, I have been seeing btrfs slowdowns in our ceph
> cluster for quite some time. Even with the latest btrfs code for 3.3
> I'm still seeing these problems. To make things reproducible, I've now
> written a small test, that imitates ceph's behavior:
> 
> On a freshly created btrfs filesystem (2 TB size, mounted with
> "noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening
> 100 files. After that I'm doing random writes on these files with a
> sync_file_range after each write (each write has a size of 100 bytes)
> and ioctl(BTRFS_IOC_SYNC) after every 100 writes.
> 
> After approximately 20 minutes, write activity suddenly increases
> fourfold and the average request size decreases (see chart in the
> attachment).
> 
> You can find IOstat output here: http://pastebin.com/Smbfg1aG
> 
> I hope that you are able to trace down the problem with the test
> program in the attachment.
 
Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree and
formatted the fs with 64k node and leaf sizes and the problem appeared to go
away.  So surprise surprise fragmentation is biting us in the ass.  If you can
try running that branch with 64k node and leaf sizes with your ceph cluster and
see how that works out.  Course you should only do that if you dont mind if you
lose everything :).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs slowdown with ceph (how to reproduce)

2012-01-20 Thread Christian Brunner
As you might know, I have been seeing btrfs slowdowns in our ceph
cluster for quite some time. Even with the latest btrfs code for 3.3
I'm still seeing these problems. To make things reproducible, I've now
written a small test, that imitates ceph's behavior:

On a freshly created btrfs filesystem (2 TB size, mounted with
"noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening
100 files. After that I'm doing random writes on these files with a
sync_file_range after each write (each write has a size of 100 bytes)
and ioctl(BTRFS_IOC_SYNC) after every 100 writes.

After approximately 20 minutes, write activity suddenly increases
fourfold and the average request size decreases (see chart in the
attachment).

You can find IOstat output here: http://pastebin.com/Smbfg1aG

I hope that you are able to trace down the problem with the test
program in the attachment.

Thanks,
Christian
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define FILE_COUNT 100
#define FILE_SIZE 4194304

#define STRING "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"

#define BTRFS_IOCTL_MAGIC 0x94
#define BTRFS_IOC_SYNC _IO(BTRFS_IOCTL_MAGIC, 8)

int main(int argc, char *argv[]) {
char *imgname = argv[1]; 
char *tempname;
int fd[FILE_COUNT]; 
int ilen, i;

ilen = strlen(imgname);
tempname = malloc(ilen + 8);

for(i=0; i < FILE_COUNT; i++) {
	snprintf(tempname, ilen + 8, "%s.%i", imgname, i);
	fd[i] = open(tempname, O_CREAT|O_RDWR);
}
	
i=0;
while(1) {
int start = rand() % FILE_SIZE;
int file = rand() % FILE_COUNT;

putc('.', stderr);

lseek(fd[file], start, SEEK_SET);
write(fd[file], STRING, 100);
sync_file_range(fd[file], start, 100, 0x2);

usleep(25000);

i++;
if (i == 100) {
i=0;
ioctl(fd[file], BTRFS_IOC_SYNC);
}
}
}
<>

Re: Btrfs slowdown

2011-08-09 Thread Christian Brunner
Hi Sage,

I did some testing with btrfs-unstable yesterday. With the recent
commit from Chris it looks quite good:

"Btrfs: force unplugs when switching from high to regular priority bios"


However I can't test it extensively, because our main environment is
on ext4 at the moment.

Regards
Christian

2011/8/8 Sage Weil :
> Hi Christian,
>
> Are you still seeing this slowness?
>
> sage
>
>
> On Wed, 27 Jul 2011, Christian Brunner wrote:
>> 2011/7/25 Chris Mason :
>> > Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
>> >> Hi,
>> >>
>> >> we are running a ceph cluster with btrfs as it's base filesystem
>> >> (kernel 3.0). At the beginning everything worked very well, but after
>> >> a few days (2-3) things are getting very slow.
>> >>
>> >> When I look at the object store servers I see heavy disk-i/o on the
>> >> btrfs filesystems (disk utilization is between 60% and 100%). I also
>> >> did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
>> >> certain, that the majority of the disk I/O is not caused by ceph or
>> >> any other userland process.
>> >>
>> >> When reboot the system(s) the problems go away for another 2-3 days,
>> >> but after that, it starts again. I'm not sure if the problem is
>> >> related to the kernel warning I've reported last week. At least there
>> >> is no temporal relationship between the warning and the slowdown.
>> >>
>> >> Any hints on how to trace this would be welcome.
>> >
>> > The easiest way to trace this is with latencytop.
>> >
>> > Apply this patch:
>> >
>> > http://oss.oracle.com/~mason/latencytop.patch
>> >
>> > And then use latencytop -c for a few minutes while the system is slow.
>> > Send the output here and hopefully we'll be able to figure it out.
>>
>> I've now installed latencytop. Attached are two output files: The
>> first is from yesterday and was created aproxematly half an hour after
>> the boot. The second on is from today, uptime is 19h. The load on the
>> system is already rising. Disk utilization is approximately at 50%.
>>
>> Thanks for your help.
>>
>> Christian
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-08-08 Thread Sage Weil
Hi Christian,

Are you still seeing this slowness?

sage


On Wed, 27 Jul 2011, Christian Brunner wrote:
> 2011/7/25 Chris Mason :
> > Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
> >> Hi,
> >>
> >> we are running a ceph cluster with btrfs as it's base filesystem
> >> (kernel 3.0). At the beginning everything worked very well, but after
> >> a few days (2-3) things are getting very slow.
> >>
> >> When I look at the object store servers I see heavy disk-i/o on the
> >> btrfs filesystems (disk utilization is between 60% and 100%). I also
> >> did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
> >> certain, that the majority of the disk I/O is not caused by ceph or
> >> any other userland process.
> >>
> >> When reboot the system(s) the problems go away for another 2-3 days,
> >> but after that, it starts again. I'm not sure if the problem is
> >> related to the kernel warning I've reported last week. At least there
> >> is no temporal relationship between the warning and the slowdown.
> >>
> >> Any hints on how to trace this would be welcome.
> >
> > The easiest way to trace this is with latencytop.
> >
> > Apply this patch:
> >
> > http://oss.oracle.com/~mason/latencytop.patch
> >
> > And then use latencytop -c for a few minutes while the system is slow.
> > Send the output here and hopefully we'll be able to figure it out.
> 
> I've now installed latencytop. Attached are two output files: The
> first is from yesterday and was created aproxematly half an hour after
> the boot. The second on is from today, uptime is 19h. The load on the
> system is already rising. Disk utilization is approximately at 50%.
> 
> Thanks for your help.
> 
> Christian
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Slowdown (due to Memory Handling?)

2011-08-04 Thread Chris Mason
Excerpts from Mitch Harder's message of 2011-08-04 14:40:20 -0400:
> On Thu, Aug 4, 2011 at 10:05 AM, Chris Mason  wrote:
> >>
> >> Ok, so I'm going to guess that your problem is really with either file
> >> layout or just us using more metadata pages than the others.  The file
> >> layout part is easy to test, just replace your git repo with a fresh
> >> clone (or completely repack it).
> >
> > Sorry, I should have said replace your git repo with a fresh,
> > non-hardlinked clone.  git clone by default will just make hardlinks if
> > it can, so it has to be a fresh clone.
> >
> > -chris
> >
> 
> Oops, sorry, I let my responses slip off the list.
> 
> You are right about there being a potentially huge difference between
> a cloned git repo and it's parent.  I didn't realize it could make
> such a difference.
> 
> This problem now appears to have nothing to do with btrfs.  I can
> replicate the problem on an ext4 partition also if I use a copy of the
> parent git repository instead of a clone.  The problem seems to lie in
> the fragmentation of the git repository.
> 
> If I work with a clone of my linux-btrfs repository, subsequent clones
> are much faster.  Cloning my parent linux-btrfs repo takes about 90
> minutes (when I have restricted free RAM).  Cloning a clone of the
> parent drops down to less than 10 minutes.
> 
> With there being several other threads relating to btrfs 'slow downs',
> I though this issue might be related.

Great, glad to hear turned out to be filesystem agnostic.

The original git file format was basically very filesystem unfriendly
and it tends to fragment very badly. 

Linus' solution to this is the pack file format, which is space
efficient and very fast to access.  The only downside is that you need
to repack the repo from time to time or performance tends to fall off a
cliff.

There is a git-pack command and a git gc command that you can use to
restructure things, both making it smaller and much faster.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Slowdown (due to Memory Handling?)

2011-08-04 Thread Mitch Harder
On Thu, Aug 4, 2011 at 10:05 AM, Chris Mason  wrote:
> Excerpts from Chris Mason's message of 2011-08-04 11:04:54 -0400:
>> Excerpts from Mitch Harder's message of 2011-08-04 10:45:51 -0400:
>> > On Thu, Aug 4, 2011 at 9:22 AM, Chris Mason  wrote:
>> > > Excerpts from Mitch Harder's message of 2011-08-02 10:35:54 -0400:
>> > >> I'm running into a significant slowdown in Btrfs (> 10x slower than
>> > >> normal) that appears to be due to some issue between how Btrfs is
>> > >> allocating memory, and how the kernel is expecting Btrfs to allocate
>> > >> memory.
>> > >>
>> > >> The problem does seem to be somewhat hardware specific.  I can
>> > >> reproduce on two of my computers (an older AMD Athlon(tm) XP 2600+
>> > >> with PATA, and a newer ACER Aspire netbook with an Atom CPU).  My
>> > >> Core2Duo computer with SATA seems unaffected by this slowdown.
>> > >>
>> > >> I've replicated this on 2.6.38, 2.6.39, and 3.0 kernels.  The
>> > >> following information was all obtained running on a 3.0 kernel merged
>> > >> with the latest 'for-linus' branch of Chris' git repo.  I've also
>> > >> tested on ext4 (no slow down encountered) to make sure the issue
>> > >> wasn't completely unrelated to Btrfs.
>> > >
>> > > Just to double check, what was the top commit of for-linus when you did
>> > > this?
>> > >
>> > > The tracing shows that you're spending your time in mmap'd readahead.
>> > > So one of three things is happening:
>> > >
>> > > 1) The VM is favoring our metadata over data pages for the git packed
>> > > file
>> > >
>> > > 2) We're reading ahead too aggressively, or not aggressively enough
>> > >
>> > > 3) The git pack file is somehow more fragmented, and this is making the
>> > > read ahead much less effective.
>> > >
>> > > The very first thing I'd check is to make sure the .git repo between the
>> > > slow machines and the fast machines are identical.  Git does a lot of
>> > > packing behind the scenes, and so an older repo that isn't freshly
>> > > cloned is going to be slower than a new repo.
>> > >
>> > > -chris
>> > >
>> >
>> > The top commit merged for the kernel used to generate the information
>> > in this post was:
>> >
>> > Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
>> > 75c195a2cac2c3c8366c0b87de2d6814c4f4d638
>> >
>> > I have since replicated the slowdown with a kernel merged with the
>> > latest 'for-linus' branch, whose top commit was:
>> > Btrfs: don't call writepages from within write_full_page
>> > 0d10ee2e6deb5c8409ae65b970846344897d5e4e
>>
>> Ok, so I'm going to guess that your problem is really with either file
>> layout or just us using more metadata pages than the others.  The file
>> layout part is easy to test, just replace your git repo with a fresh
>> clone (or completely repack it).
>
> Sorry, I should have said replace your git repo with a fresh,
> non-hardlinked clone.  git clone by default will just make hardlinks if
> it can, so it has to be a fresh clone.
>
> -chris
>

Oops, sorry, I let my responses slip off the list.

You are right about there being a potentially huge difference between
a cloned git repo and it's parent.  I didn't realize it could make
such a difference.

This problem now appears to have nothing to do with btrfs.  I can
replicate the problem on an ext4 partition also if I use a copy of the
parent git repository instead of a clone.  The problem seems to lie in
the fragmentation of the git repository.

If I work with a clone of my linux-btrfs repository, subsequent clones
are much faster.  Cloning my parent linux-btrfs repo takes about 90
minutes (when I have restricted free RAM).  Cloning a clone of the
parent drops down to less than 10 minutes.

With there being several other threads relating to btrfs 'slow downs',
I though this issue might be related.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Slowdown (due to Memory Handling?)

2011-08-04 Thread Chris Mason
Excerpts from Mitch Harder's message of 2011-08-02 10:35:54 -0400:
> I'm running into a significant slowdown in Btrfs (> 10x slower than
> normal) that appears to be due to some issue between how Btrfs is
> allocating memory, and how the kernel is expecting Btrfs to allocate
> memory.
> 
> The problem does seem to be somewhat hardware specific.  I can
> reproduce on two of my computers (an older AMD Athlon(tm) XP 2600+
> with PATA, and a newer ACER Aspire netbook with an Atom CPU).  My
> Core2Duo computer with SATA seems unaffected by this slowdown.
> 
> I've replicated this on 2.6.38, 2.6.39, and 3.0 kernels.  The
> following information was all obtained running on a 3.0 kernel merged
> with the latest 'for-linus' branch of Chris' git repo.  I've also
> tested on ext4 (no slow down encountered) to make sure the issue
> wasn't completely unrelated to Btrfs.

Just to double check, what was the top commit of for-linus when you did
this?

The tracing shows that you're spending your time in mmap'd readahead.
So one of three things is happening:

1) The VM is favoring our metadata over data pages for the git packed
file

2) We're reading ahead too aggressively, or not aggressively enough

3) The git pack file is somehow more fragmented, and this is making the
read ahead much less effective.

The very first thing I'd check is to make sure the .git repo between the
slow machines and the fast machines are identical.  Git does a lot of
packing behind the scenes, and so an older repo that isn't freshly
cloned is going to be slower than a new repo.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-08-03 Thread mck

I can confirm this as well (64-bit, Core i7, single-disk).

> The issue seems to be gone in 3.0.0.

After a few hours working 3.0.0 slows down on me too. The performance
becomes unusable and a reboot is a must. Certain applications
(particularly evolution ad firefox) are next to permanently greyed out.

I have had a couple of corrupted tree logs recently and had to use
btrfs-zero-log (mentioned in an earlier thread). Otherwise returning to
2.6.38 is the workaround.

~mck

-- 
"A mind that has been stretched will never return to it's original
dimension." Albert Einstein 
| www.semb.wever.org | www.sesat.no 
| http://tech.finn.no | http://xss-http-filter.sf.net



signature.asc
Description: This is a digitally signed message part


Btrfs Slowdown (due to Memory Handling?)

2011-08-02 Thread Mitch Harder
I'm running into a significant slowdown in Btrfs (> 10x slower than
normal) that appears to be due to some issue between how Btrfs is
allocating memory, and how the kernel is expecting Btrfs to allocate
memory.

The problem does seem to be somewhat hardware specific.  I can
reproduce on two of my computers (an older AMD Athlon(tm) XP 2600+
with PATA, and a newer ACER Aspire netbook with an Atom CPU).  My
Core2Duo computer with SATA seems unaffected by this slowdown.

I've replicated this on 2.6.38, 2.6.39, and 3.0 kernels.  The
following information was all obtained running on a 3.0 kernel merged
with the latest 'for-linus' branch of Chris' git repo.  I've also
tested on ext4 (no slow down encountered) to make sure the issue
wasn't completely unrelated to Btrfs.

The steps to reproduce are as follows:
Prerequisite:  Have a btrfs partition with a copy of a linux kernel
git repository stored.
(1)  Boot with 768 MB RAM (using 'mem=768M' in the grub command line).
(2)  From a second machine, run a git clone of of the kernel git
repository (such as 'git clone
ssh://@/path/to/linux-git-repo').

The clone process slows down when it reaches the 'remote: Compressing
objects:' step.  Looking at the Alt-SysRq-W output and Latencytop
output (see attached), I get a steady stream of memory page faults,
and other memory issues.

The git clone is definitely causing memory pressure when booted with
only 768MB of RAM.  However, I still see plenty of cached RAM
available, and there is little or no activity on my swap partition.

The dmesg output is otherwise silent except for the Alt-SysRq-W
output.  No OOM errors.

A typical 'top' snapshot during the affected period looks like this:

top - 08:53:08 up 32 min,  3 users,  load average: 1.06, 1.01, 0.84
Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.3%us, 12.3%sy,  0.0%ni,  0.0%id, 85.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:768452k total,   760248k used, 8204k free, 4396k buffers
Swap:  1004056k total,13824k used,   990232k free,   352596k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 2876 root  20   0 000 S 11.0  0.0   1:26.62
btrfs-endio-1
 3117 dontpani  20   0  720m 386m  52m D  4.0 51.5   2:38.78 git
  526 root  20   0 000 S  0.3  0.0   0:06.42 kswapd0
 2576 root  20   0 000 S  0.3  0.0   0:44.09
btrfs-endio-0
1 root  20   0  1844  568  540 S  0.0  0.1   0:00.32 init
2 root  20   0 000 S  0.0  0.0   0:00.00 kthreadd
3 root  20   0 000 S  0.0  0.0   0:00.00
ksoftirqd/0
5 root  20   0 000 S  0.0  0.0   0:00.01
kworker/u:0
6 root  -2   0 000 S  0.0  0.0   0:04.17
rcu_kthread
7 root   0 -20 000 S  0.0  0.0   0:00.00 cpuset
8 root   0 -20 000 S  0.0  0.0   0:00.00 khelper

So, while I may be truly running out of RAM, the kernel doesn't seem
to be handling issue normally (i.e., pushing more off to the Swap or
giving OOM errors).

Let me know if you have some feedback on how to track this issue down.
=== Mon Aug  1 14:24:05 2011
Globals: Cause Maximum Percentage
Page fault  189.6 msec100.0 %
Process details:
Process kworker/0:1 (395) Total:  27.6 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper 
Process kswapd0 (526) Total:  11.5 msec
kswapd() kernel thread3.6 msec100.0 %
kswapd kthread kernel_thread_helper 
Process btrfs-endio-0 (2567) Total: 878.4 msec
[worker_loop] 5.0 msec100.0 %
worker_loop kthread kernel_thread_helper 
Process btrfs-endio-1 (2768) Total:   1.1 msec
[worker_loop] 1.1 msec100.0 %
worker_loop kthread kernel_thread_helper 
Process git (2769) Total: 1117.1 msec
Page fault  189.6 msec100.0 %
sleep_on_page_killable wait_on_page_bit_killable 
__lock_page_or_retry filemap_fault __do_fault handle_pte_fault 
handle_mm_fault do_page_fault error_code 
=== Mon Aug  1 14:24:15 2011
Globals: Cause Maximum Percentage
Page fault  388.9 msec 98.0 %
Creating block layer request 74.6 msec  0.8 %
Reading from file74.1 msec  0.8 %
[sleep_on_page]  37.9 msec  0.4 %
Waiting for event (poll)  1.8 msec  0.0 %
Waiting for event (select)1.7 msec  0.0 %
Process details:
Process sync_supers (259) Total:   0.5 msec
Waiting for buffer IO to complete 0.3 msec100.0 %
sleep_on_buffer __wait_on_buffer flush_commit_list 
do_journal_end.clone.32 journal_end_sync reiserfs_sync_fs 
reiserfs_write_super sync_supers bdi_sync_supers kthread 
kernel_thread_helper 
Process kworker/0:1 (395) Total:   0.2 msec
.   

Re: Btrfs slowdown

2011-07-28 Thread Sage Weil
On Thu, 28 Jul 2011, Christian Brunner wrote:
> When I look at the latencytop results, there is a high latency when
> calling "btrfs_commit_transaction_async". Isn't "async" supposed to
> return immediately?

It depends.  That function has to block until the commit has started 
before returning in the case where it creates a new btrfs root (i.e., 
snapshot creation).  Otherwise a subsequent operation (after the ioctl 
returns) can sneak in before the snapshot is taken.  (IIRC there was also 
another problem with keeping internal structures consistent, tho I'm 
forgetting the details.)  And there are a bunch of things 
btrfs_commit_transaction() does before setting blocked = 1 that can be 
slow.  There is a fair bit of transaction commit optimization work that 
should eventually be done here that we sadly haven't had the resources to 
look at yet.

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-07-28 Thread Christian Brunner
2011/7/28 Marcus Sorensen :
> Christian,
>
> Have you checked up on the disks themselves and hardware? High
> utilization can mean that the i/o load has increased, but it can also
> mean that the i/o capacity has decreased.  Your traces seem to
> indicate that a good portion of the time is being spent on commits,
> that could be waiting on disk. That "wait_for_commit" looks to
> basically just spin waiting for the commit to complete, and at least
> one thing that calls it raises a BUG_ON, not sure if it's one you've
> seen even on 2.6.38.
>
> There could be all sorts of performance related reasons that aren't
> specific to btrfs or ceph, on our various systems we've seen things
> like the raid card module being upgraded in newer kernels and suddenly
> our disks start to go into sleep mode after a bit, dirty_ratio causing
> multiple gigs of memory to sync because its not optimized for the
> workload, external SAS enclosures stop communicating a few days after
> reboot (but the disks keep working with sporadic issues), things like
> patrol read hitting a bad sector on a disk, causing it to go into
> enhanced error recovery and stop responding, etc.

I' fairly confident that the hardware is ok. We see the problem on
four machines. It could be a problem with the hpsa driver/firmware,
but we haven't seen the behavior with 2.6.38 and the changes in the
hpsa driver are not that big.

> Maybe you have already tried these things. It's where I would start
> anyway. Looking at /proc/meminfo, dirty, writeback, swap, etc both
> while the system is functioning desirably and when it's misbehaving.
> Looking at anything else that might be in D state. Looking at not just
> disk util, but the workload causing it (e.g. Was I doing 300 iops
> previously with an average size of 64k, and now I'm only managing 50
> iops at 64k before the disk util reports 100%?) Testing the system in
> a filesystem-agnostic manner, for example when performance is bad
> through btrfs, is performance the same as you got on fresh boot when
> testing iops on /dev/sdb or whatever? You're not by chance swapping
> after a bit of uptime on any volume that's shared with the underlying
> disks that make up your osd, obfuscated by a hardware raid? I didn't
> see the kernel warning you're referring to, just the ixgbe malloc
> failure you mentioned the other day.

I've looked at most of this. What makes me point to btrfs, is that the
problem goes away when I reboot on server in our cluster, but persists
on the other systems. So it can't be related to the number of requests
that come in.

> I do not mean to presume that you have not looked at these things
> already. I am not very knowledgeable in btrfs specifically, but I
> would expect any degradation in performance over time to be due to
> what's on disk (lots of small files, fragmented, etc). This is
> obviously not the case in this situation since a reboot recovers the
> performance. I suppose it could also be a memory leak or something
> similar, but you should be able to detect something like that by
> monitoring your memory situation, /proc/slabinfo etc.

It could be related to a memory leak. The machine has a lot RAM (24
GB), but we have seen page allocation failures in the ixgbe driver,
when we are using jumbo frames.

> Just my thoughts, good luck on this. I am currently running 2.6.39.3
> (btrfs) on the 7 node cluster I put together, but I just built it and
> am comparing between various configs. It will be awhile before it is
> under load for several days straight.

Thanks!

When I look at the latencytop results, there is a high latency when
calling "btrfs_commit_transaction_async". Isn't "async" supposed to
return immediately?

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-07-27 Thread Marcus Sorensen
Christian,

Have you checked up on the disks themselves and hardware? High
utilization can mean that the i/o load has increased, but it can also
mean that the i/o capacity has decreased.  Your traces seem to
indicate that a good portion of the time is being spent on commits,
that could be waiting on disk. That "wait_for_commit" looks to
basically just spin waiting for the commit to complete, and at least
one thing that calls it raises a BUG_ON, not sure if it's one you've
seen even on 2.6.38.

There could be all sorts of performance related reasons that aren't
specific to btrfs or ceph, on our various systems we've seen things
like the raid card module being upgraded in newer kernels and suddenly
our disks start to go into sleep mode after a bit, dirty_ratio causing
multiple gigs of memory to sync because its not optimized for the
workload, external SAS enclosures stop communicating a few days after
reboot (but the disks keep working with sporadic issues), things like
patrol read hitting a bad sector on a disk, causing it to go into
enhanced error recovery and stop responding, etc.

Maybe you have already tried these things. It's where I would start
anyway. Looking at /proc/meminfo, dirty, writeback, swap, etc both
while the system is functioning desirably and when it's misbehaving.
Looking at anything else that might be in D state. Looking at not just
disk util, but the workload causing it (e.g. Was I doing 300 iops
previously with an average size of 64k, and now I'm only managing 50
iops at 64k before the disk util reports 100%?) Testing the system in
a filesystem-agnostic manner, for example when performance is bad
through btrfs, is performance the same as you got on fresh boot when
testing iops on /dev/sdb or whatever? You're not by chance swapping
after a bit of uptime on any volume that's shared with the underlying
disks that make up your osd, obfuscated by a hardware raid? I didn't
see the kernel warning you're referring to, just the ixgbe malloc
failure you mentioned the other day.

I do not mean to presume that you have not looked at these things
already. I am not very knowledgeable in btrfs specifically, but I
would expect any degradation in performance over time to be due to
what's on disk (lots of small files, fragmented, etc). This is
obviously not the case in this situation since a reboot recovers the
performance. I suppose it could also be a memory leak or something
similar, but you should be able to detect something like that by
monitoring your memory situation, /proc/slabinfo etc.

Just my thoughts, good luck on this. I am currently running 2.6.39.3
(btrfs) on the 7 node cluster I put together, but I just built it and
am comparing between various configs. It will be awhile before it is
under load for several days straight.

On Wed, Jul 27, 2011 at 2:41 AM, Christian Brunner  wrote:
> 2011/7/25 Chris Mason :
>> Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
>>> Hi,
>>>
>>> we are running a ceph cluster with btrfs as it's base filesystem
>>> (kernel 3.0). At the beginning everything worked very well, but after
>>> a few days (2-3) things are getting very slow.
>>>
>>> When I look at the object store servers I see heavy disk-i/o on the
>>> btrfs filesystems (disk utilization is between 60% and 100%). I also
>>> did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
>>> certain, that the majority of the disk I/O is not caused by ceph or
>>> any other userland process.
>>>
>>> When reboot the system(s) the problems go away for another 2-3 days,
>>> but after that, it starts again. I'm not sure if the problem is
>>> related to the kernel warning I've reported last week. At least there
>>> is no temporal relationship between the warning and the slowdown.
>>>
>>> Any hints on how to trace this would be welcome.
>>
>> The easiest way to trace this is with latencytop.
>>
>> Apply this patch:
>>
>> http://oss.oracle.com/~mason/latencytop.patch
>>
>> And then use latencytop -c for a few minutes while the system is slow.
>> Send the output here and hopefully we'll be able to figure it out.
>
> I've now installed latencytop. Attached are two output files: The
> first is from yesterday and was created aproxematly half an hour after
> the boot. The second on is from today, uptime is 19h. The load on the
> system is already rising. Disk utilization is approximately at 50%.
>
> Thanks for your help.
>
> Christian
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-07-25 Thread Chris Mason
Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
> Hi,
> 
> we are running a ceph cluster with btrfs as it's base filesystem
> (kernel 3.0). At the beginning everything worked very well, but after
> a few days (2-3) things are getting very slow.
> 
> When I look at the object store servers I see heavy disk-i/o on the
> btrfs filesystems (disk utilization is between 60% and 100%). I also
> did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
> certain, that the majority of the disk I/O is not caused by ceph or
> any other userland process.
> 
> When reboot the system(s) the problems go away for another 2-3 days,
> but after that, it starts again. I'm not sure if the problem is
> related to the kernel warning I've reported last week. At least there
> is no temporal relationship between the warning and the slowdown.
> 
> Any hints on how to trace this would be welcome.

The easiest way to trace this is with latencytop.

Apply this patch:

http://oss.oracle.com/~mason/latencytop.patch

And then use latencytop -c for a few minutes while the system is slow.
Send the output here and hopefully we'll be able to figure it out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-07-25 Thread Jeremy Sanders
Christian Brunner wrote:

> we are running a ceph cluster with btrfs as it's base filesystem
> (kernel 3.0). At the beginning everything worked very well, but after
> a few days (2-3) things are getting very slow.

We get quite a slowdown over time, doing rsyncs to different snapshots. 
Btrfs seems to go from using several threads in parallel btrfs-endio-0,1,2, 
shown in top, to just using a single thread btrfs-delalloc.

Jeremy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown

2011-07-25 Thread Andrej Podzimek

Just a quick note: The issue seems to be gone in 3.0.0. But that's just a wild 
guess based on 1/2 hour without thrashing. :-)

Andrej


Hello,

I can see something similar on the machines I maintain, mostly single-disk 
setups with a 2.6.39 kernel:

1) Heavy and frequent disk thrashing, although less than 20% of RAM is used and 
no swap usage is reported.
2) During the disk thrashing, some processors (usually 2 or 3) spend 100% of 
their time busy waiting, according to htop.
3) Some userspace applications freeze for tens of seconds during the thrashing 
and busy waiting, sometimes even htop itself...

The problem has only been observed on 64-bit multiprocessors (Core i7 laptop 
and Nehalem class server Xeons). A 32-bit multiprocessor (Intel Core Duo) and a 
64-bit uniprocessor (Intel Core 2 Duo class Celeron) do not seem to have any 
issues.

Furthermore, none of the machines had this problem with 2.6.38 and earlier kernels. Btrfs 
"just worked" before 2.6.39. I'll test 3.0 today to see whether some of these 
issues disappear.

Neither ceph nor any other remote/distributed filesystem (not even NFS) runs on 
the machines.

The second problem listed above looks like illegal blocking of a vital spinlock 
during a long disk operation, which freezes some kernel subsystems for an 
inordinate amount of time and causes a number of processors to wait actively 
for tens of seconds. (Needless to say that this is not acceptable on a 
laptop...)

Web browsers (Firefox and Chromium) seem to trigger this issue slightly more 
often than other applications, but I have no detailed statistics to prove this. 
;-)

Two Core i7 class multiprocessors work 100% flawlessly with ext4, although 
their kernel configuration is otherwise identical to the machines that use 
Btrfs.

Andrej


Hi,

we are running a ceph cluster with btrfs as it's base filesystem
(kernel 3.0). At the beginning everything worked very well, but after
a few days (2-3) things are getting very slow.

When I look at the object store servers I see heavy disk-i/o on the
btrfs filesystems (disk utilization is between 60% and 100%). I also
did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
certain, that the majority of the disk I/O is not caused by ceph or
any other userland process.

When reboot the system(s) the problems go away for another 2-3 days,
but after that, it starts again. I'm not sure if the problem is
related to the kernel warning I've reported last week. At least there
is no temporal relationship between the warning and the slowdown.

Any hints on how to trace this would be welcome.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html




smime.p7s
Description: Elektronický podpis S/MIME


Re: Btrfs slowdown

2011-07-25 Thread Andrej Podzimek

Hello,

I can see something similar on the machines I maintain, mostly single-disk 
setups with a 2.6.39 kernel:

1) Heavy and frequent disk thrashing, although less than 20% of RAM is 
used and no swap usage is reported.
2) During the disk thrashing, some processors (usually 2 or 3) spend 
100% of their time busy waiting, according to htop.
3) Some userspace applications freeze for tens of seconds during the 
thrashing and busy waiting, sometimes even htop itself...

The problem has only been observed on 64-bit multiprocessors (Core i7 laptop 
and Nehalem class server Xeons). A 32-bit multiprocessor (Intel Core Duo) and a 
64-bit uniprocessor (Intel Core 2 Duo class Celeron) do not seem to have any 
issues.

Furthermore, none of the machines had this problem with 2.6.38 and earlier kernels. Btrfs 
"just worked" before 2.6.39. I'll test 3.0 today to see whether some of these 
issues disappear.

Neither ceph nor any other remote/distributed filesystem (not even NFS) runs on 
the machines.

The second problem listed above looks like illegal blocking of a vital spinlock 
during a long disk operation, which freezes some kernel subsystems for an 
inordinate amount of time and causes a number of processors to wait actively 
for tens of seconds. (Needless to say that this is not acceptable on a 
laptop...)

Web browsers (Firefox and Chromium) seem to trigger this issue slightly more 
often than other applications, but I have no detailed statistics to prove this. 
;-)

Two Core i7 class multiprocessors work 100% flawlessly with ext4, although 
their kernel configuration is otherwise identical to the machines that use 
Btrfs.

Andrej


Hi,

we are running a ceph cluster with btrfs as it's base filesystem
(kernel 3.0). At the beginning everything worked very well, but after
a few days (2-3) things are getting very slow.

When I look at the object store servers I see heavy disk-i/o on the
btrfs filesystems (disk utilization is between 60% and 100%). I also
did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
certain, that the majority of the disk I/O is not caused by ceph or
any other userland process.

When reboot the system(s) the problems go away for another 2-3 days,
but after that, it starts again. I'm not sure if the problem is
related to the kernel warning I've reported last week. At least there
is no temporal relationship between the warning and the slowdown.

Any hints on how to trace this would be welcome.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




smime.p7s
Description: Elektronický podpis S/MIME


Btrfs slowdown

2011-07-25 Thread Christian Brunner
Hi,

we are running a ceph cluster with btrfs as it's base filesystem
(kernel 3.0). At the beginning everything worked very well, but after
a few days (2-3) things are getting very slow.

When I look at the object store servers I see heavy disk-i/o on the
btrfs filesystems (disk utilization is between 60% and 100%). I also
did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
certain, that the majority of the disk I/O is not caused by ceph or
any other userland process.

When reboot the system(s) the problems go away for another 2-3 days,
but after that, it starts again. I'm not sure if the problem is
related to the kernel warning I've reported last week. At least there
is no temporal relationship between the warning and the slowdown.

Any hints on how to trace this would be welcome.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html