[email protected] wrote:
On Wed, Jul 15, 2009 at 03:27:05PM -0700, Brock Pytlik wrote:
Brock Pytlik wrote:
[email protected] wrote:
On Wed, Jul 15, 2009 at 03:08:41PM -0700, Brock Pytlik wrote:
Webrev:
http://cr.opensolaris.org/~bpytlik/ips-10050-v1/
Bug:
reading manifests during search should pick an appropriate buffer size
http://defect.opensolaris.org/bz/show_bug.cgi?id=10050
Stupid question: What was the rationale for the choice of 512 bytes?
Here's the relevant pydoc for open():
open = class file(object)
| file(name[, mode[, buffering]]) -> file object
| | Open a file. The mode can be 'r', 'w' or 'a' for reading
(default),
| writing or appending. The file will be created if it doesn't exist
| when opened for writing or appending; it will be truncated when
| opened for writing. Add a 'b' to the mode for binary files.
| Add a '+' to the mode to allow simultaneous reading and writing.
| If the buffering argument is given, 0 means unbuffered, 1 means line
| buffered, and larger numbers specify the buffer size.
I may have misunderstood, but if you're just reading a line then you
probably want set buffering to line buffering (1). Also, you don't seem
to be supplying any mode arguments to open(), "rb" is usually customary
for reading files. Do you perform any other operations other than
reading a line out of the file here?
-j
I missed that line buffering was an option to file. Let me try that
and I'll get back to you with the results in a moment. I'll add the
"rb" as well.
So, I'm not sure what "line buffering" is supposed to do, but here' what
it does in practice. It sets the buffering size to 1016 in (nearly?) all
cases, which means that the I/O becomes about 5-6M/s instead of 3. I
appreciate you catching that option, but it seems like, despite what the
docs might suggest, line buffering isn't what we want. To some extent,
that makes sense right? In order to know how big to make the buffer for
each line, I'd have to know how long each line is, which suggests I've
already looked at the line. But if I've already looked at the line, why
not just return the line in the first place? ;) So I'm not sure what
problem line buffering is supposed to solve, but it appears that it's
not ours :)
Ok, thanks for looking at that option. You'll include the file mode
with the final fix, correct? I'm still curious how you chose 512,
though. I've come across ancient code in Solaris that picked constants
that seemed like a good idea in 1984, but were horribly outdated in >
2000. Are we ever likely to need to read more than 512 bytes at a time
in this situation?
-j
The bug report contains the details, but essentially, I looked at all
the packages on my system and took statistics over the length of each
line. I also plotted a histogram of the length of lines and it looks
like a roughly fairly sharp distribution (if I remember my statistic
terms right) with the biggest peak around 200 characters, and smaller
peaks around 80 characters and 275 characters.
I agree that picking constants is always an area that's fraught with
danger, but my question is whether you think the average length of our
actions is going to consistently grow over time? If so, then maybe this
is a bad idea. My guess is that by and large, 20 years from now, our
actions are probably going to be on the same order of magnitude. If
that's not true, we can always come back and tweak this.
Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss