If you can duplicate this the first thing I'd look at during a slow replication is "sudo netstat -tanp tcp" to see if you're maybe bumping up against open socket limits.
On Fri, Jan 24, 2014 at 7:40 AM, Scott Weber <[email protected]> wrote: > I appreciate the digging, but in the case of the test file we were using, it > is some text that doesn't have dashes or newlines, mixed with image data > which are big binary blobs. > > So strings that look like mime boundaries aren't likely to be present. > > -Scott > > > > > ----- Original Message ----- > From: Nick North <[email protected]> > To: "[email protected]" <[email protected]>; > [email protected] > Cc: > Sent: Friday, January 24, 2014 9:28 AM > Subject: Re: Replication of attachment is extremely slow > > On 24 January 2014 15:01, Jens Alfke <[email protected]> wrote: > >> >> On Jan 24, 2014, at 5:06 AM, Nick North <[email protected]> wrote: >> >> > I'm not really expecting this problem to be the cause of the slowdown: >> > the attachment needs to contain a lot of initial prefixes of the MIME >> > boundary string for things to be really bad. >> >> This is on the reading side, where the MIME parser is looking for the >> boundary string that signals the end of the attachment part? >> But the boundary string has to appear after a CRLF, so the actual sequence >> to search for starts with "\r\n--". I'd expect the slowdown to happen only >> if the data contains a lot of those sequences, not just any old hyphens. >> >> (Also, that search is really slow enough to be noticeable?! Doesn't Erlang >> have a native string-search primitive?) >> >> —Jens >> >> PS: Maybe we should move this thread to the new replication mailing list :) > > > Copied to the replication list (though not with all the preceding posts > including, with their top and bottom posting). > > I don't have the code in front of me, but what you say about the search > string sounds right, so apologies for the error. However, that makes things > worse: the current code searches each 4KB block of the attachment for any > initial prefix of the boundary sequence. If it finds a prefix, but not the > whole string, it passes the block up to that point through, and starts > searching again from about the place where the prefix was found, on the > remainder of the original block, plus the next 4KB appended to the end. So, > if the boundary sequence begins with "\r", then every occurrence of "\r" > will slow it down, by causing boundary sequence searching to start again > from where it occurs, with a larger piece of attachment to search. "\r" is > probably more common than "-", making the problem more likely to pop up. > > Nick >
