I don't think I explained well enough, but I'm fairly confident that the "safe_strcpy_fn" error is red herring. That single strcpy error may be another issue that can occur under low mem conditions but i don't understand how it can relate. To clarify what I was doing:
-Attempt to copy 800GB from an OF system to another OF system via CIFS from an intermediate machine using Robocopy -Target system is configured to take snapshots at the 0/24 hr -Snapshots configured to allow up to 500GB of the 1TB vol -Old OF system has 100MBit so migration expected to take some time -Began copy mid evening -Checked in about 36hrs later and the box system running the robocopy process shows very little network IO (total 1-2 MB/s total in/out but continuing to copy data) -Robocopy log shows no unusual error -Checked source OF system and it was functioning normally - access to files quick from robocopy management station -Checked target system and share was very slow to respond -Navigated to Netbios name of target system and it showed 2 shares, my "usr" share and the snapshot share as mentioned below -I navigate to snapshot share from robocopy management station to and it responds several minutes later -I attempt to copy a small file from the snapshot share to the local desktop and it doesn't appear to respond -SSH into target system and obvious sluggishness, box is under heavy "disk wait" with load at 6+ -Attempt to run "ps auxww > process.log" from SSH session. -Navigation to target system "usr" share from robocopy management station finally returns within Windows Explorer after 3-5 minutes -I attempt to copy a small file into "usr" share -The SSH session with the target system never returns from the ps command -An error appears on-screen on the robocopy management station stating the error from my previous email (apparently truncated from message below, will try to find if needed but something to the affect "name to long") - CHANGE: I now suspect that due to the 1-2 minute delay in everything happening on the system, it could be that the error I received was due to my attempting to navigate to the snapshot share and trying to copy a file from there to the robocopy management station, then finally returning with the "name to long" error several minutes later and I suspect that at this point the box was in fact dead and the error was not due to my attempt to copy the small file to the "usr" share -I connect a monitor to the target system and the out of memory (OOM) killer output is spewing to the screen about once every second or 2 -System is non-responsive to any input and disks are no longer being accessed by indication of HD activity lights -I hard reset the system and when it comes back up it appears to the network and i can copy test files to/from it -Check messages log and it is filled with LDAP errors and about the time of the final-finale there was that strcpy error -emailed this list -Decided to see if it was just a fluke so without clearing out the data on the target volume I started the robocopy session again with the /PURGE switch (robocopy /S /E /COPY:DAT /DCOPY:T /NP /PURGE > migration.log) and opened an ssy session to monitor it. -As before, the network traffic on the robocopy management system was solidly saturating the network at 80%/80% it's incoming/incoming rate of 100Mbit -Several hours later I check in and I think I was lucky to catch the system more at the begining of this failure scenario... -"top" command on SSH session shows box now under load of around 4 with IOWait maxed out -After a few minutes, IOWait would drop to normal rates and SMBD would then run at around 12-15% proc -After a few minutes or less, IOWait would instantly rise to maximum again and all other processing load drop to zero -Checking back at the robocopy station, this "on/off" copy scenario is evident by the simple Windows Taskmgr graph log of the network IO... dropping to zero for several minutes, then return to typical saturated 80Mbit for a short time You can see the result a bit more clearly in these screenshots here: http://www.se30.com/users/grindingbassline/pix/storage Since you certainly understand your system more than I do, I'll try renaming the snaphot and let you know the result but I doubt it will have any impact. Since it has nothing to do with what I'm doing, except simply its mere existance on the system. -=dave ---------------------------------------- > Date: Sat, 3 Feb 2007 13:35:14 +0000 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > CC: [email protected] > Subject: Re: [OF-users] OOM crash > > dave johnson wrote: > > I checked the messages and found an asortment of ldap secrets errors > > (not sure about that) but the only entry i found that seemed suspect was: > > > > Feb 1 22:28:38 hopper smbd[5714]: [2007/02/01 22:28:38, 0] > > lib/util_str.c:safe_strcpy_fn(603) > > Feb 1 22:28:41 hopper smbd[5714]: ERROR: string overflow by 1 (24 - > > 23) in safe_strcpy [snapshots.vg0.vol0.sched0.usr 2007- > > 01-31 00.00.06 > > Rename that share in smb.conf to just [snapshots-] and > try again. > > R. > > > F > > > > bringing back up, snapshot report shows: > > > > *Snapshot name* *Date/time taken* *Block utilization (in MB)* > > *Snapshot size (in MB)* *Share contents* *Save* *Delete > > snapshot* > > sched0 January 31, 2007 00:00:06 181030 524288 Yes, do > > N/A N/A > > > > > > > > want me to try again with a "ps auxww" process logging all output ? > > > > -=dave > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Openfiler-users mailing list > > [email protected] > > https://lists.openfiler.com/mailman/listinfo/openfiler-users > > > _______________________________________________ Openfiler-users mailing list [email protected] https://lists.openfiler.com/mailman/listinfo/openfiler-users
