David,
I hate to question what you've said, but are you sure that you were
getting good performance with 2.8.0? Is it possible that you only got
good performance with 2.7.1, and that switching to 2.8.0 (and 2.8.1)
has caused this performance degradation? I ask because (as Rob has
hinted at) we changed the way we manage the side of datafiles in
releases >=2.8.0, and we've seen performance drops for serial, small
file workloads. Its a bug, and we've fixed it in CVS, but you may be
seeing it in your setup.
-sam
On Jul 1, 2009, at 5:57 PM, David Bonnie wrote:
Sam -
All of the nodes checked out fine with netpipe, still no errors on
any of the adapters.
- Dave
On Wed, Jul 1, 2009 at 4:47 PM, Sam Lang <[email protected]> wrote:
On Jul 1, 2009, at 5:45 PM, David Bonnie wrote:
I'll run it on each node and let you know if anything is out of
place. I believe the above results are fine for GigE, yes?
They certainly don't match with the numbers you're getting from PVFS.
-sam
- Dave
On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <[email protected]> wrote:
David,
It sounds like your initial thought (that there is a network
problem) could be correct. I would probably explore that first.
What sort of numbers do you get from netpipe runs (or even
bmi_pingpong) between client and server?
-sam
On Jul 1, 2009, at 5:15 PM, David Bonnie wrote:
Sorry for not being clear.
The hardware and software is unchanged. Runs from a few months
ago (on 2.8.0) performed as expected. Current runs (on both 2.8.0
and 2.8.1) are slow.
The nodes are sitting there with very low CPU usage even when
running the benchmark. I'm the only one running any jobs and
there aren't any processes running (the system load is < .02 and
the cpu usage is pretty much 0%).
The local disks haven't changed and are empty except for the pvfs2
storage space; performance is bad even when I put the PVFS2 file
system storage onto a very fast (>300 MB/s local bandwidth) Atrato
vlun connected over fiber channel.
My initial thought is that some hardware along the line died but I
can't seem to pinpoint it. All of the network interfaces show 0
errors and 0 dropped packets.
- Dave
On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <[email protected]> wrote:
Hi David,
I still don't get it: when was the performance good? Same software
and hardware, just some time in the past? Or is there a software
change?
The nodes aren't being used for anything else, there are no rogue
processes, and the local file systems are otherwise empty?
Thanks,
Rob
On Jul 1, 2009, at 5:05 PM, David Bonnie wrote:
Rob -
Performance is down across all PVFS2 installations. The benchmark
simply creates files of a random size (between 1 and 25 MB) in a
single folder on the mounted PVFS2 partition, 16 KB at a time.
It's not anywhere near ideal, but it's the workload I'm working
with.
Prior to this problem we were getting ~22 MB/s write throughput
and we're down to about 2.5 MB/s for no apparent reason. Reads
are down from about 55 MB/s to 30 MB/s. No hardware has changed
and as far as I can tell no hardware has died either.
- Dave
On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <[email protected]> wrote:
Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe
the benchmark and how you are doing your measurements?
Rob
On Jul 1, 2009, at 4:43 PM, David Bonnie wrote:
Hello all -
I'm having trouble figuring out a problem with performance
depredation on a simple 10 node cluster. Prior runs on the
cluster (before this problem manifested itself) resulted in
bandwidth and IOPS about 10 times higher on a small file creation
workload. Each node is running as a metadata server and a data
server.
The problem is persistent between versions and installations of
PVFS2 2.8.0 and 2.8.1. Rebooting all of the nodes didn't improve
anything. The network connections (simple GigE) showed no errors
or dropped packets. Using different physical disks (both SAS and
FC) didn't improve things. The kernel logs didn't show anything
out of place nor did the pvfs2 server or client logs. It seems
like a network issue but I can't seem to find anything wrong with
any of the connections.
Has anyone seen this kind of problem before? I seem to remember
something on the list before about performance suddenly dropping
but I can't find the message now (of course). Any insight would
be appreciated!
Thanks,
- Dave
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers