Re: error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

2017-04-11 Thread John Hawkinson via rsync
> But at first blush, it appeared that adding - made things hang
> forever.

Yes. Confirmed against the git HEAD (9e7b8ab7cf66ecd152002926a7da61d8ad862522).

Running:

rsync -n -iaHJAX $d $b

Does some initial work and then gets to:

bash-3.2# lldb -p 6458
(lldb) process attach --pid 6458
Process 6458 stopped
* thread #1: tid = 0x1d5ff0, 0x7fff8c2d83fa libsystem_kernel.dylib`__select 
+ 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x7fff8c2d83fa libsystem_kernel.dylib`__select + 10
libsystem_kernel.dylib`__select:
->  0x7fff8c2d83fa <+10>: jae0x7fff8c2d8404; <+20>
0x7fff8c2d83fc <+12>: movq   %rax, %rdi
0x7fff8c2d83ff <+15>: jmp0x7fff8c2d3c78; cerror
0x7fff8c2d8404 <+20>: retq   

Executable module set to "/usr/local/bin/rsync".
Architecture set to: x86_64h-apple-macosx.
(lldb) bt
* thread #1: tid = 0x1d5ff0, 0x7fff8c2d83fa libsystem_kernel.dylib`__select 
+ 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff8c2d83fa libsystem_kernel.dylib`__select + 10
frame #1: 0x00010dd6aedc rsync`perform_io(needed=47, flags=4) + 2924 at 
io.c:742
frame #2: 0x00010dd6a0bd rsync`send_msg(code=MSG_INFO, buf="[receiver] 
receiving flist for dir 1319\n", len=40, convert=0) + 349 at io.c:958
frame #3: 0x00010dd5c765 rsync`rwrite(code=FINFO, buf="[receiver] 
receiving flist for dir 1319\n", len=40, is_utf8=0) + 469 at log.c:279
frame #4: 0x00010dd5d3f3 rsync`rprintf(code=FINFO, format="[%s] 
receiving flist for dir %d\n") + 627 at log.c:435
frame #5: 0x00010dd394a6 rsync`read_ndx_and_attrs(f_in=0, f_out=4, 
iflag_ptr=0x7fff51ed0f64, type_ptr="\x80", buf="", 
len_ptr=0x7fff51ed0f60) + 646 at rsync.c:362
frame #6: 0x00010dd4483b rsync`recv_files(f_in=0, f_out=4, 
local_name=0x) + 379 at receiver.c:542
frame #7: 0x00010dd566c3 rsync`do_recv(f_in=0, f_out=4, 
local_name=0x) + 899 at main.c:909
frame #8: 0x00010dd54ac7 rsync`do_server_recv(f_in=0, f_out=1, argc=1, 
argv=0x7fff51ed2328) + 1207 at main.c:1078
frame #9: 0x00010dd5431e rsync`start_server(f_in=0, f_out=1, argc=2, 
argv=0x7fff51ed2320) + 286 at main.c:1112
frame #10: 0x00010dd541ee rsync`child_main(argc=2, 
argv=0x7fff51ed2320) + 46 at main.c:1085
frame #11: 0x00010dd852ad rsync`local_child(argc=2, 
argv=0x7fff51ed2320, f_in=0x7fff51ed4328, f_out=0x7fff51ed4324, 
child_main=(rsync`child_main at main.c:1084)) + 701 at pipe.c:167
frame #12: 0x00010dd58e42 rsync`do_cmd(cmd=0x, 
machine=0x, user=0x, 
remote_argv=0x7fee40c04a50, remote_argc=0, f_in_p=0x7fff51ed4328, 
f_out_p=0x7fff51ed4324) + 2210 at main.c:543
frame #13: 0x00010dd57fd2 rsync`start_client(argc=1, 
argv=0x7fee40c04a40) + 2402 at main.c:1414
frame #14: 0x00010dd5747b rsync`main(argc=2, argv=0x7fee40c04a40) + 
2555 at main.c:1652
frame #15: 0x7fff96cad5c9 libdyld.dylib`start + 1
frame #16: 0x7fff96cad5c9 libdyld.dylib`start + 1
(lldb) up
frame #1: 0x00010dd6aedc rsync`perform_io(needed=47, flags=4) + 2924 at 
io.c:742
   739  tv.tv_sec = select_timeout;
   740  tv.tv_usec = 0;
   741
-> 742 cnt = select(max_fd + 1, _fds, _fds, _fds, );
   743  
   744 if (cnt <= 0) {
   745  if (cnt < 0 && errno == EBADF) {
(lldb) fin

And then never returns fomr that stack frame, seems to hang out in
select() forever (unless interrupted by a breakpoint or whatnot).


Bugzilla account request submitted.

--jh...@mit.edu
  John Hawkinson

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

2017-04-11 Thread John Hawkinson via rsync
Sorry to keep replying to myself:

> Because this is a Time Machine backup, and there were 66 snapshots of a
> 1 TB disk consuming about 1.5 TB, there were a *lot* of hard links. Many
> of directories rather than individual files, so it's a little

Err, whoops? No, I was tired and confused. They are not hard links
to directories, that would screw up the universe. Still, lots of hard
linked files.

> Do I need to run this under lldb and set a breakpoint in expand_item_list()?
> Quick inspection suggests running with - might give some useful output:


Err...

the result of this was it processed a few files for a minute or so and
then hung in select() and consumed no cpu and there was no disk
activity. Unfortunately apparently my clang/lldb workflow was broken
and I didn't have functional debugging symbols (...) and I also lost
the stack trace I thought I had (inadequate scrollback), so I'm not
sure what was going on. But at first blush, it appeared that adding -
made things hang forever.

Removing it, and rerunning, it's now happily trucking along and has
been for the past hour actually doing work.

--jh...@mit.edu
  John Hawkinson

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

2017-04-11 Thread John Hawkinson via rsync
> I guess I can turn on core dumps and increase (unlimit completely) the 
> stack size...
> 
> Although it doesn't seem to have segfaulted, so I'm not sure having
> core dumps enabled would have helped?

Indeed. I reran it on the 50 remaining individual directories,
it seems to have made it through 10 before failing again overnight:

66596  20:38:03 rsync -vP -iaHJAX  `cat /tmp/unback`  
/Volumes/platinum-barratry/x/Backups.backupdb/pb3/

...
rsync: unpack_smb_acl: sys_acl_get_info(): Undefined error: 0 (0)
rsync: unpack_smb_acl: sys_acl_get_info(): Undefined error: 0 (0)
...
.fog.a.. 
2015-06-19-072520/platinum-bar2/Users/jhawk/.emacs.d/auto-save-list/.saves-1541-platinum-bar2.local~
ERROR: out of memory in expand_item_list [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(106) 
[sender=3.1.2]
bash-3.2#  


No segfault, no coredump, no syslogs.

Do I need to run this under lldb and set a breakpoint in expand_item_list()?
Quick inspection suggests running with - might give some useful output:

1680 if (DEBUG_GTE(FLIST, 3)) {
1681 rprintf(FINFO, "[%s] expand %s to %s bytes, did%s move\n",
1682 who_am_i(), desc, big_num(new_size * item_size),
1683 new_ptr == lp->items ? " not" : "");
1684 }
1685 if (!new_ptr)
1686 out_of_memory("expand_item_list");

Although I imagine that output might be voluminous [but maybe not]?

Again, I don't have time to build test cases and reproduce this
carefully, every run is painful and long and slow. But I'd like to do the
responsible thing if someone can tell me what that is.

> p.s.: If I had to start over, I would have spent less time just deleting
> the data and recopying it, rather than trying to fixup the metadata and

Indeed, it's looking like fixing the metadata with rsync is an order
of magnitude slower, even as far as I've gotten. So maybe it's time to find
another method. I don't think fts(3) is optimized any better for large
hardlink farms, so I think maybe I need a homegrown solution? Ug.

--jh...@mit.edu
  John Hawkinson

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html