Greetings,
we have a customer who reports difficulties fetching a backup as shown
below.
The platform is wal-e 0.6.6 on Python 2.7 on Ubuntu 11.04, linux kernel
2.6.38-16-virtual on x86_64.
Does anyone have any clue what might be going on here?
cheers
andrew
-------- Original Message --------
So there is something else I noticed here. When it hangs for a while it
seems one or two processes are stuck. Hitting ctrl-c once will kill the
hanging process and I see the rest of the processes complete. Then a new
set of processes starts up and more data gets transfered. This continues
for a while before getting stuck again. Sometimes it clears up but over
time it needs tom intervention. When it "finishes" I am left with about
1/4 of the data that should be there.
On Wed, 29 Jan 2014, client wrote:
Hey guys,
I'm trying some practice recovery runs with WAL-E and have a few questions.
I've created a postgres box with WAL-E installed and have an empty data
directory. So I started by issuing:
envdir /etc/wal-e.d/env wal-e backup-fetch /usr/local/pgsql/data LATEST
I found I needed to increase the user's max file limit but it seems to be
running. However, the database is roughly 250GB in size and over the past 3
hours WAL-E has pulled only 17GB down. It's coming down in waves too. It
will grab 3GB in a few minutes then it seems to hang for while with the
following from strace repeating:
wait4(29419, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(29420, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28156, 0x7fffa849b8d4, WNOHANG, NULL) = 0
pipe([17, 21]) = 0
fcntl(17, F_GETFD) = 0
fcntl(17, F_SETFD, FD_CLOEXEC) = 0
fcntl(21, F_GETFD) = 0
fcntl(21, F_SETFD, FD_CLOEXEC) = 0
pipe([3531, 3532]) = 0
fcntl(3531, F_GETFD) = 0
fcntl(3531, F_SETFD, FD_CLOEXEC) = 0
fcntl(3532, F_GETFD) = 0
fcntl(3532, F_SETFD, FD_CLOEXEC) = 0
pipe([3533, 3534]) = 0
fcntl(3533, F_GETFD) = 0
fcntl(3533, F_SETFD, FD_CLOEXEC) = 0
fcntl(3534, F_GETFD) = 0
fcntl(3534, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f7444df99f0) = -1 ENOMEM (Cannot allocate memory)
close(3534) = 0
close(3533) = 0
close(17) = 0
close(21) = 0
close(3531) = 0
close(3532) = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 103254399}) = 0
gettimeofday({1391032665, 358075}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 103479291}) = 0
epoll_wait(5, {}, 1704, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 104771194}) = 0
wait4(28159, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28160, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28161, 0x7fffa849b8d4, WNOHANG, NULL) = 0
I don't know why it's getting the memory error as the server has 8GB of ram
and 3.5GB cached
# free -m
total used free shared buffers cached
Mem: 7466 7424 42 0 44 3521
-/+ buffers/cache: 3858 3608
Swap: 0 0 0
It will eventually continue and copy another ~3GB chunk of data and then
"hang" again. I was wondering if you had any insight on this.
--
You received this message because you are subscribed to the Google Groups
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.