On Sun, Jul 18, 2010 at 05:49, Steven D'Aprano <st...@pearwood.info> wrote: > On Sun, 18 Jul 2010 08:30:05 pm Richard D. Moores wrote: > >> > Taking the string '555', you should get two digraphs: 55_ and _55. >> >> That seems wrong to me. When I search on '999999' and there's a >> '9999999' I don't want to think I've found 2 instances of '999999'. >> But that's just my preference. Instances should be distinct, IMO, >> and not overlap. > > I think we're talking about different things here.
Yes. I was as interested in finding non-overlapping patterns as testing randomness, I suppose because we wouldn't have been sure about the randomness anyway. >You're (apparently) > interested in searching for patterns, in which case looking for > non-overlapping patterns is perfectly fine. I'm talking about testing > the randomness of the generator by counting the frequency of digraphs > and trigraphs, in which case you absolutely do want them to overlap. > Otherwise, you're throwing away every second digraph, or two out of > every three trigraphs, which could potentially hide a lot of > non-randomness. > > >> >> I was surprised that I could read in the whole billion file with >> >> one gulp without running out of memory. >> > >> > Why? One billion bytes is less than a GB. It's a lot, but not >> > *that* much. >> >> I earlier reported that my laptop couldn't handle even 800 million. > > What do you mean, "couldn't handle"? Couldn't handle 800 million of > what? Obviously not bytes, I meant what the context implied. Bytes. Look back in this thread to see my description of my laptop's problems. >because your laptop *can* handle well over > 800 million bytes. It has 4GB of memory, after all :) > > There's a big difference in memory usage between (say): > > data = "1"*10**9 # a single string of one billion characters > > and > > data = ["1"]*10**9 # a list of one billion separate strings > > or even > > number = 10**(1000000000)-1 # a one billion digit longint > > This is just an example, of course. As they say, the devil is in the > details. Overkill, Steve. >> >> Memory usage went to 80% (from >> >> the usual 35%), but no higher except at first, when I saw 98% for >> >> a few seconds, and then a drop to 78-80% where it stayed. >> > >> > That suggests to me that your PC probably has 2GB of RAM. Am I >> > close? >> >> No. 4GB. > > Interesting. Presumably the rest of the memory is being used by the > operating system and other running applications and background > processes. I suppose so. Dick _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor