Greetings,

we have a customer who reports difficulties fetching a backup as shown below.

The platform is wal-e 0.6.6 on Python 2.7 on Ubuntu 11.04, linux kernel 2.6.38-16-virtual on x86_64.

Does anyone have any clue what might be going on here?

cheers

andrew


 -------- Original Message --------

So there is something else I noticed here.  When it hangs for a while it
seems one or two processes are stuck.  Hitting ctrl-c once will kill the
hanging process and I see the rest of the processes complete.  Then a new
set of processes starts up and more data gets transfered.  This continues
for a while before getting stuck again.  Sometimes it clears up but over
time it needs tom intervention.  When it "finishes" I am left with about
1/4 of the data that should be there.


On Wed, 29 Jan 2014, client wrote:

Hey guys,

I'm trying some practice recovery runs with WAL-E and have a few questions.
I've created a postgres box with WAL-E installed and have an empty data
directory.  So I started by issuing:

envdir /etc/wal-e.d/env wal-e backup-fetch /usr/local/pgsql/data LATEST

I found I needed to increase the user's max file limit but it seems to be
running.  However, the database is roughly 250GB in size and over the past 3
hours WAL-E has pulled only 17GB down.  It's coming down in waves too. It
will grab 3GB in a few minutes then it seems to hang for while with the
following from strace repeating:

wait4(29419, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(29420, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28156, 0x7fffa849b8d4, WNOHANG, NULL) = 0
pipe([17, 21])                          = 0
fcntl(17, F_GETFD)                      = 0
fcntl(17, F_SETFD, FD_CLOEXEC)          = 0
fcntl(21, F_GETFD)                      = 0
fcntl(21, F_SETFD, FD_CLOEXEC)          = 0
pipe([3531, 3532])                      = 0
fcntl(3531, F_GETFD)                    = 0
fcntl(3531, F_SETFD, FD_CLOEXEC)        = 0
fcntl(3532, F_GETFD)                    = 0
fcntl(3532, F_SETFD, FD_CLOEXEC)        = 0
pipe([3533, 3534])                      = 0
fcntl(3533, F_GETFD)                    = 0
fcntl(3533, F_SETFD, FD_CLOEXEC)        = 0
fcntl(3534, F_GETFD)                    = 0
fcntl(3534, F_SETFD, FD_CLOEXEC)        = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f7444df99f0) = -1 ENOMEM (Cannot allocate memory)
close(3534)                             = 0
close(3533)                             = 0
close(17)                               = 0
close(21)                               = 0
close(3531)                             = 0
close(3532)                             = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 103254399}) = 0
gettimeofday({1391032665, 358075}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 103479291}) = 0
epoll_wait(5, {}, 1704, 1)              = 0
clock_gettime(CLOCK_MONOTONIC, {92269, 104771194}) = 0
wait4(28159, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28160, 0x7fffa849b8d4, WNOHANG, NULL) = 0
wait4(28161, 0x7fffa849b8d4, WNOHANG, NULL) = 0

I don't know why it's getting the memory error as the server has 8GB of ram
and 3.5GB cached

# free -m
            total       used       free     shared    buffers     cached
Mem:          7466       7424         42          0         44       3521
-/+ buffers/cache:       3858       3608
Swap:            0          0          0

It will eventually continue and copy another ~3GB chunk of data and then
"hang" again.  I was wondering if you had any insight on this.






--
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to