On 18/9/2013 10:36, Roy Smith wrote:

>> Dave Angel <da...@davea.name> wrote (and I agreed with):
>>> I'd suggest you open the file twice, and get two file objects.  Then you
>>> can iterate over them independently.
>
>
> On Sep 18, 2013, at 9:09 AM, Oscar Benjamin wrote:
>> There's no need to use OS resources by opening the file twice or to
>> screw up the IO caching with seek().
>
> There's no reason NOT to use OS resources.  That's what the OS is there for; 
> to make life easier on application programmers.  Opening a file twice costs 
> almost nothing.  File descriptors are almost as cheap as whitespace.
>
>> Peter's version holds just as many lines as is necessary in an
>> internal Python buffer and performs the minimum possible
>> amount of IO.
>
> I believe by "Peter's version", you're talking about:
>
>> from itertools import islice, tee 
>> 
>> with open("tmp.txt") as f: 
>>     while True: 
>>         for outer in f: 
>>             print outer, 
>>             if "*" in outer: 
>>                 f, g = tee(f) 
>>                 for inner in islice(g, 3): 
>>                     print "   ", inner, 
>>                 break 
>>         else: 
>>             break 
>
>
> There's this note from 
> http://docs.python.org/2.7/library/itertools.html#itertools.tee:
>
>> This itertool may require significant auxiliary storage (depending on how 
>> much temporary data needs to be stored). In general, if one iterator uses 
>> most or all of the data before another iterator starts, it is faster to use 
>> list() instead of tee().
>
>
> I have no idea how that interacts with the pattern above where you call tee() 
> serially.  You're basically doing
>
> with open("my_file") as f:
> while True:
>       f, g = tee(f)
>
> Are all of those g's just hanging around, eating up memory, while waiting to 
> be garbage collected?  I have no idea.  But I do know that no such problems 
> exist with the two file descriptor versions.
>
>
>
>
>
>
>> I would expect this to be more
>> efficient as well as less error-prone on Windows.
>> 
>> 
>> Oscar
>> 
>
>
> ---
> Roy Smith
> r...@panix.com
>
>
>
>
>
> <html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: 
> space; -webkit-line-break: after-white-space; "><div><div><blockquote 
> type="cite">Dave Angel &lt;<a 
> href="mailto:da...@davea.name";>da...@davea.name</a>&gt; wrote (and I agreed 
> with):<br></blockquote><blockquote type="cite"><blockquote type="cite">I'd 
> suggest you open the file twice, and get two file objects. &nbsp;Then 
> you<br></blockquote></blockquote><blockquote type="cite"><blockquote 
> type="cite">can iterate over them 
> independently.<br></blockquote></blockquote></div><div><br></div><div>On Sep 
> 18, 2013, at 9:09 AM, Oscar Benjamin wrote:</div><blockquote 
> type="cite"><div>There's no need to use OS resources by opening the file 
> twice or to<br>screw up the IO caching with 
> seek().</div></blockquote><div><br></div><div>There's no reason NOT to use OS 
> resources. &nbsp;That's what the OS is there for; to make life easier on 
> application programmers. &nbsp;Opening a file twice costs almost nothing. 
> &nbsp;File descrip
 tors are almost as cheap as whitespace.</div><div><br></div><blockquote 
type="cite"><div>Peter's version holds just as&nbsp;many lines as is necessary 
in an</div></blockquote><blockquote type="cite"><div>internal Python buffer and 
performs&nbsp;the minimum possible</div></blockquote><blockquote 
type="cite"><div>amount of IO.</div></blockquote><div><br></div><div>I believe 
by "Peter's version", you're talking 
about:</div><div><br></div><div></div><blockquote type="cite"><div><span 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 
255, 255); display: inline !important; float: none;">from itertools import 
islice, tee&nbsp;</span><br style="color: rgb(34, 3
 4, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: 
normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><br 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 
255, 255);"><span style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, 
sans-serif; font-size: 13px; font-style: normal; font-variant: normal; 
font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 
auto; 
 text-align: start; text-indent: 0px; text-transform: none; white-space: 
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; 
background-color: rgb(255, 255, 255); display: inline !important; float: 
none;">with open("tmp.txt") as f:&nbsp;</span><br style="color: rgb(34, 34, 
34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-style: 
normal; font-variant: normal; font-weight: normal; letter-spacing: normal; 
line-height: normal; orphans: auto; text-align: start; text-indent: 0px; 
text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px
 ; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); 
display: inline !important; float: none;">&nbsp; &nbsp; while 
True:&nbsp;</span><br style="color: rgb(34, 34, 34); font-family: Arial, 
Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant: 
normal; font-weight: normal; letter-spacing: normal; line-height: normal; 
orphans: auto; text-align: start; text-indent: 0px; text-transform: none; 
white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 
255, 255); display: inline !important; float: none;
 ">&nbsp; &nbsp; &nbsp; &nbsp; for outer in f:&nbsp;</span><br style="color: 
rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: 
normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 
255);"><span style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, 
sans-serif; font-size: 13px; font-style: normal; font-variant: normal; 
font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 
auto; text-align: start; text-indent: 0px; text-transform: none; white-space: 
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; 
background-color: rgb(255, 255, 255); display: inline !important; float: 
none;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print outer,&nbsp;</span><br 
style="color: rgb(34,
  34, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: 
normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 
255);"><span style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, 
sans-serif; font-size: 13px; font-style: normal; font-variant: normal; 
font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 
auto; text-align: start; text-indent: 0px; text-transform: none; white-space: 
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; 
background-color: rgb(255, 255, 255); display: inline !important; float: 
none;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if "*" in 
outer:&nbsp;</span><br style="color: rgb(34, 34, 34); font-family: Arial, 
Helvetica, sans-serif; font-size: 13px; font-styl
 e: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; 
line-height: normal; orphans: auto; text-align: start; text-indent: 0px; 
text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 
255, 255); display: inline !important; float: none;">&nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; f, g = tee(f)&nbsp;</span><br style="color: 
rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing
 : normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 
255);"><span style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, 
sans-serif; font-size: 13px; font-style: normal; font-variant: normal; 
font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 
auto; text-align: start; text-indent: 0px; text-transform: none; white-space: 
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; 
background-color: rgb(255, 255, 255); display: inline !important; float: 
none;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for inner in 
islice(g, 3):&nbsp;</span><br style="color: rgb(34, 34, 34); font-family: 
Arial, Helvetica, sans-serif; font-size: 13px; font-style: normal; 
font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 
normal; orphans: auto; text-alig
 n: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 
auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: 
rgb(255, 255, 255);"><span style="color: rgb(34, 34, 34); font-family: Arial, 
Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant: 
normal; font-weight: normal; letter-spacing: normal; line-height: normal; 
orphans: auto; text-align: start; text-indent: 0px; text-transform: none; 
white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: 
inline !important; float: none;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; print " &nbsp; ", inner,&nbsp;</span><br 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform:
  none; white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span 
style="color: rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; 
font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; 
letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; 
text-indent: 0px; text-transform: none; white-space: normal; widows: auto; 
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 
255, 255); display: inline !important; float: none;">&nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp;</span><br style="color: rgb(34, 
34, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: 
normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-st
 roke-width: 0px; background-color: rgb(255, 255, 255);"><span style="color: 
rgb(34, 34, 34); font-family: Arial, Helvetica, sans-serif; font-size: 13px; 
font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: 
normal; line-height: normal; orphans: auto; text-align: start; text-indent: 
0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); 
display: inline !important; float: none;">&nbsp; &nbsp; &nbsp; &nbsp; 
else:&nbsp;</span><br style="color: rgb(34, 34, 34); font-family: Arial, 
Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant: 
normal; font-weight: normal; letter-spacing: normal; line-height: normal; 
orphans: auto; text-align: start; text-indent: 0px; text-transform: none; 
white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span 
style="color: rgb(34, 34, 34); font-family
 : Arial, Helvetica, sans-serif; font-size: 13px; font-style: normal; 
font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 
normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: 
none; white-space: normal; widows: auto; word-spacing: 0px; 
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: 
inline !important; float: none;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
break&nbsp;</span></div></blockquote><div><br></div><div><br></div>There's this 
note from&nbsp;<a 
href="http://docs.python.org/2.7/library/itertools.html#itertools.tee";>http://docs.python.org/2.7/library/itertools.html#itertools.tee</a>:</div><div><br></div><div><blockquote
 type="cite">This itertool may require significant auxiliary storage (depending 
on how much temporary data needs to be&nbsp;stored). In general, if one 
iterator uses most or all of the data before another iterator starts, it is 
faster to use&nbsp;list()&nbsp;instead of&nbsp;tee().<sp
 an></span></blockquote></div><div><br></div><div>I have no idea how that 
interacts with the pattern above where you call tee() serially. &nbsp;You're 
basically doing</div><div><br></div><div>with open("my_file") as 
f:</div><div>while True:</div><div><span class="Apple-tab-span" 
style="white-space:pre">        </span>f, g = 
tee(f)</div><div><br></div><div>Are all of those g's just hanging around, 
eating up memory, while waiting to be garbage collected? &nbsp;I have no idea. 
&nbsp;But I do know that no such problems exist with the two file descriptor 
versions.</div><div><br><div><br></div><div><br></div><div><br></div><div><br></div><br><blockquote
 type="cite"><div>I would expect this to be more<br>efficient as well as less 
error-prone on 
Windows.<br><br><br>Oscar<br><br></div></blockquote></div><br><div 
apple-content-edited="true">
> <div style="word-wrap: break-word; -webkit-nbsp-mode: space; 
> -webkit-line-break: after-white-space; "><div style="word-wrap: break-word; 
> -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span 
> class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 
> 0); font-family: Helvetica; font-style: normal; font-variant: normal; 
> font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; 
> text-indent: 0px; text-transform: none; white-space: normal; widows: 2; 
> word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; 
> -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: 
> none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; 
> font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: 
> space; -webkit-line-break: after-white-space; "><div><br 
> class="Apple-interchange-newline">---</div><div>Roy Smith</div><div><a 
> href="mailto:r...@panix.com";>r...@panix.com</a></div><div><br></div></div></span></div
 ></div><br class="Apple-interchange-newline">
> </div>
> <br></body></html>
>

And if you're willing to ignore the possibility that the text file has
unix line endings, I'm willing to ignore the possibility that the text
file has a huge number of lines.  Everything is MUCH simpler if one
assumes readlines() will work.  Most of these other approaches are much
more complex than the OP probably needs, if he ever gets around to
actually describing his requirements.

BTW, please post in text, all that html is really annoying.


-- 
DaveA


-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to