[email protected] wrote:
On Wed, Jul 15, 2009 at 03:27:05PM -0700, Brock Pytlik wrote:
Brock Pytlik wrote:
[email protected] wrote:
On Wed, Jul 15, 2009 at 03:08:41PM -0700, Brock Pytlik wrote:
Webrev:
http://cr.opensolaris.org/~bpytlik/ips-10050-v1/

Bug:
reading manifests during search should pick an appropriate buffer size
http://defect.opensolaris.org/bz/show_bug.cgi?id=10050
Stupid question:  What was the rationale for the choice of 512 bytes?

Here's the relevant pydoc for open():

open = class file(object)
 |  file(name[, mode[, buffering]]) -> file object
| | Open a file. The mode can be 'r', 'w' or 'a' for reading (default),
 |  writing or appending.  The file will be created if it doesn't exist
 |  when opened for writing or appending; it will be truncated when
 |  opened for writing.  Add a 'b' to the mode for binary files.
 |  Add a '+' to the mode to allow simultaneous reading and writing.
 |  If the buffering argument is given, 0 means unbuffered, 1 means line
 |  buffered, and larger numbers specify the buffer size.

I may have misunderstood, but if you're just reading a line then you
probably want set buffering to line buffering (1).  Also, you don't seem
to be supplying any mode arguments to open(), "rb" is usually customary
for reading files.  Do you perform any other operations other than
reading a line out of the file here?

-j
I missed that line buffering was an option to file. Let me try that and I'll get back to you with the results in a moment. I'll add the "rb" as well.
So, I'm not sure what "line buffering" is supposed to do, but here' what it does in practice. It sets the buffering size to 1016 in (nearly?) all cases, which means that the I/O becomes about 5-6M/s instead of 3. I appreciate you catching that option, but it seems like, despite what the docs might suggest, line buffering isn't what we want. To some extent, that makes sense right? In order to know how big to make the buffer for each line, I'd have to know how long each line is, which suggests I've already looked at the line. But if I've already looked at the line, why not just return the line in the first place? ;) So I'm not sure what problem line buffering is supposed to solve, but it appears that it's not ours :)

Ok, thanks for looking at that option.  You'll include the file mode
with the final fix, correct?  I'm still curious how you chose 512,
though.  I've come across ancient code in Solaris that picked constants
that seemed like a good idea in 1984, but were horribly outdated in >
2000.  Are we ever likely to need to read more than 512 bytes at a time
in this situation?

-j
The bug report contains the details, but essentially, I looked at all the packages on my system and took statistics over the length of each line. I also plotted a histogram of the length of lines and it looks like a roughly fairly sharp distribution (if I remember my statistic terms right) with the biggest peak around 200 characters, and smaller peaks around 80 characters and 275 characters.

I agree that picking constants is always an area that's fraught with danger, but my question is whether you think the average length of our actions is going to consistently grow over time? If so, then maybe this is a bad idea. My guess is that by and large, 20 years from now, our actions are probably going to be on the same order of magnitude. If that's not true, we can always come back and tweak this.

Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to