Re: [pkg-discuss] Code review for 10050 please

Brock Pytlik Wed, 15 Jul 2009 15:50:13 -0700

[email protected] wrote:

On Wed, Jul 15, 2009 at 03:27:05PM -0700, Brock Pytlik wrote:
Brock Pytlik wrote:
[email protected] wrote:
On Wed, Jul 15, 2009 at 03:08:41PM -0700, Brock Pytlik wrote:
Webrev:
http://cr.opensolaris.org/~bpytlik/ips-10050-v1/
Bug:
reading manifests during search should pick an appropriate buffer size
http://defect.opensolaris.org/bz/show_bug.cgi?id=10050
Stupid question:  What was the rationale for the choice of 512 bytes?

Here's the relevant pydoc for open():

open = class file(object)
 |  file(name[, mode[, buffering]]) -> file object
| | Open a file. The mode can be 'r', 'w' or 'a' for reading(default),
 |  writing or appending.  The file will be created if it doesn't exist
 |  when opened for writing or appending; it will be truncated when
 |  opened for writing.  Add a 'b' to the mode for binary files.
 |  Add a '+' to the mode to allow simultaneous reading and writing.
 |  If the buffering argument is given, 0 means unbuffered, 1 means line
 |  buffered, and larger numbers specify the buffer size.

I may have misunderstood, but if you're just reading a line then you
probably want set buffering to line buffering (1).  Also, you don't seem
to be supplying any mode arguments to open(), "rb" is usually customary
for reading files.  Do you perform any other operations other than
reading a line out of the file here?

-j
I missed that line buffering was an option to file. Let me try thatand I'll get back to you with the results in a moment. I'll add the"rb" as well.
So, I'm not sure what "line buffering" is supposed to do, but here' whatit does in practice. It sets the buffering size to 1016 in (nearly?) allcases, which means that the I/O becomes about 5-6M/s instead of 3. Iappreciate you catching that option, but it seems like, despite what thedocs might suggest, line buffering isn't what we want. To some extent,that makes sense right? In order to know how big to make the buffer foreach line, I'd have to know how long each line is, which suggests I'vealready looked at the line. But if I've already looked at the line, whynot just return the line in the first place? ;) So I'm not sure whatproblem line buffering is supposed to solve, but it appears that it'snot ours :)
Ok, thanks for looking at that option.  You'll include the file mode
with the final fix, correct?  I'm still curious how you chose 512,
though.  I've come across ancient code in Solaris that picked constants
that seemed like a good idea in 1984, but were horribly outdated in >
2000.  Are we ever likely to need to read more than 512 bytes at a time
in this situation?

-j

The bug report contains the details, but essentially, I looked at allthe packages on my system and took statistics over the length of eachline. I also plotted a histogram of the length of lines and it lookslike a roughly fairly sharp distribution (if I remember my statisticterms right) with the biggest peak around 200 characters, and smallerpeaks around 80 characters and 275 characters.

I agree that picking constants is always an area that's fraught withdanger, but my question is whether you think the average length of ouractions is going to consistently grow over time? If so, then maybe thisis a bad idea. My guess is that by and large, 20 years from now, ouractions are probably going to be on the same order of magnitude. Ifthat's not true, we can always come back and tweak this.


Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review for 10050 please

Reply via email to