Re: Working with the set of real numbers (was: Finding size of Variable)
Following up on my own post. On Wed, 05 Mar 2014 07:52:01 +, Steven D'Aprano wrote: On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote: I stopped paying attention to mathematicians when they tried to convince me that the sum of all natural numbers is -1/12. [...] In effect, the author Mark Carrol-Chu in the GoodMath blog above wants to make the claim that the divergent sum is not equal to ζ(-1), but everywhere you find that divergent sum in your calculations you can rub it out and replace it with ζ(-1), which is -1/12. In other words, he's accepting that the divergent sum behaves *as if* it were equal to -1/12, he just doesn't want to say that it *is* equal to -1/12. Is this a mere semantic trick, or a difference of deep and fundamental importance? Mark C-C thinks it's an important difference. Mathematicians who actually work on this stuff all the time think he's making a semantic trick to avoid facing up to the fact that sums of infinite sequences don't always behave like sums of finite sequences. Here's another mathematician who is even more explicit about what she's complaining about: http://blogs.scientificamerican.com/roots-of-unity/2014/01/20/is-the-sum-of-positive-integers-negative/ [quote] There is a meaningful way to associate the number -1/12 to the series 1+2+3+4…, but in my opinion, it is misleading to call it the sum of the series. [end quote] Evelyn Lamb's objection isn't about the mathematics that leads to the conclusion that the sum of natural numbers is equivalent to -1/12. That's conclusion is pretty much bulletproof. Her objection is over the use of the word equals to describe that association. Or possibly the use of the word sum to describe what we're doing when we replace the infinite series with -1/12. Whatever it is that we're doing, it doesn't seem to have the same behavioural properties as summing finitely many finite numbers. So perhaps she is right, and we shouldn't call the sum of a divergent series a sum? -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
Mathematics? The Flexible String Representation is a very nice example of a mathematical absurdity. jmf PS Do not even think to expect to contradict me. Hint: sheet of paper and pencil. -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On 5 March 2014 07:52, Steven D'Aprano st...@pearwood.info wrote: On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote: I stopped paying attention to mathematicians when they tried to convince me that the sum of all natural numbers is -1/12. I'm pretty sure they did not. Possibly a physicist may have tried to tell you that, but most mathematicians consider physicists to be lousy mathematicians, and the mere fact that they're results seem to actually work in practice is an embarrassment for the entire universe. A mathematician would probably have said that the sum of all natural numbers is divergent and therefore there is no finite answer. Why the dig at physicists? I think most physicists would be able to tell you that the sum of all natural numbers is not -1/12. In fact most people with very little background in mathematics can tell you that. The argument that the sum of all natural numbers comes to -1/12 is just some kind of hoax. I don't think *anyone* seriously believes it. Well, that is, apart from mathematicians like Euler and Ramanujan. When people like them tell you something, you better pay attention. Really? Euler didn't even know about absolutely convergent series (the point in question) and would quite happily combine infinite series to obtain a formula. snip Normally mathematicians will tell you that divergent series don't have a total. That's because often the total you get can vary depending on how you add them up. The classic example is summing the infinite series: 1 - 1 + 1 - 1 + 1 - ... There is a distinction between absolute convergence and convergence. Rearranging the order of the terms in the above infinite sum is invalid because the series is not absolutely convergent. For this particular series there is no sense in which its sum converges on an answer but there are other series that cannot be rearranged while still being convergent: http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#Alternating_harmonic_series Personally I think it's reasonable to just say that the sum of the natural numbers is infinite rather than messing around with terms like undefined, divergent, or existence. There is a clear difference between a series (or any limit) that fails to converge asymptotically and another that just goes to +-infinity. The difference is usually also relevant to any practical application of this kind of maths. Oscar -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wed, 05 Mar 2014 12:21:37 +, Oscar Benjamin wrote: On 5 March 2014 07:52, Steven D'Aprano st...@pearwood.info wrote: On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote: I stopped paying attention to mathematicians when they tried to convince me that the sum of all natural numbers is -1/12. I'm pretty sure they did not. Possibly a physicist may have tried to tell you that, but most mathematicians consider physicists to be lousy mathematicians, and the mere fact that they're results seem to actually work in practice is an embarrassment for the entire universe. A mathematician would probably have said that the sum of all natural numbers is divergent and therefore there is no finite answer. Why the dig at physicists? There is considerable professional rivalry between the branches of science. Physicists tend to look at themselves as the paragon of scientific hardness, and look down at mere chemists, who look down at biologists. (Which is ironic really, since the actual difficulty in doing good science is in the opposite order. Hundreds of years ago, using quite primitive techniques, people were able to predict the path of comets accurately. I'd like to see them predict the path of a house fly.) According to this greedy reductionist viewpoint, since all living creatures are made up of chemicals, biology is just a subset of chemistry, and since chemicals are made up of atoms, chemistry is likewise just a subset of physics. Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. Theoretical physicists who work on the deep, fundamental questions of Space and Time tend to be the worst for this reductionist streak. They have a tendency to think of themselves as elites in an elite field of science. Mathematicians, possibly out of professional jealousy, like to look down at physics as mere applied maths. They also get annoyed that physicists often aren't as vigorous with their maths as they should be. The controversy over renormalisation in Quantum Electrodynamics (QED) is a good example. When you use QED to try to calculate the strength of the electron's electric field, you end up trying to sum a lot of infinities. Basically, the interaction of the electron's charge with it's own electric field gets larger the more closely you look. The sum of all those interactions is a divergent series. So the physicists basically cancelled out all the infinities, and lo and behold just like magic what's left over gives you the right answer. Richard Feynman even described it as hocus-pocus. The mathematicians *hated* this, and possibly still do, because it looks like cheating. It's certainly not vigorous, at least it wasn't back in the 1940s. The mathematicians were appalled, and loudly said You can't do that! and the physicists basically said Oh yeah, watch us! and ignored them, and then the Universe had the terribly bad manners to side with the physicists. QED has turned out to be *astonishingly* accurate, the most accurate physical theory of all time. The hocus-pocus worked. I think most physicists would be able to tell you that the sum of all natural numbers is not -1/12. In fact most people with very little background in mathematics can tell you that. Ah, but there's the rub. People with *very little* background in mathematics will tell you that. People with *a very deep and solid* background in mathematics will tell you different, particularly if their background is complex analysis. (That's *complex numbers*, not complicated -- although it is complicated too.) The argument that the sum of all natural numbers comes to -1/12 is just some kind of hoax. I don't think *anyone* seriously believes it. You would be wrong. I suggest you read the links I gave earlier. Even the mathematicians who complain about describing this using the word equals don't try to dispute the fact that you can identify the sum of natural numbers with ζ(-1), or that ζ(-1) = -1/12. They simply dispute that we should describe this association as equals. What nobody believes is that the sum of natural numbers is a convergent series that sums to -1/12, because it is provably not. In other words, this is not an argument about the maths. Everyone who looks at the maths has to admit that it is sound. It's an argument about the words we use to describe this. Is it legitimate to say that the infinite sum *equals* -1/12? Or only that the series has the value -1/12? Or that we can associate (talk about a sloppy, non-vigorous term!) the series with -1/12? Well, that is, apart from mathematicians like Euler and Ramanujan. When people like them tell you something, you better pay attention. Really? Euler didn't even know about absolutely convergent series (the point in question) and would quite happily
Re: Working with the set of real numbers (was: Finding size of Variable)
On Thu, Mar 6, 2014 at 4:43 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. Everything is, except what isn't. That's my theory, and I'm sticking to it! ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wed, Mar 5, 2014 at 9:43 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: At one time, Euler summed an infinite series and got -1, from which he concluded that -1 was (in some sense) larger than infinity. I don't know what justification he gave, but the way I think of it is to take the number line from -∞ to +∞ and then bend it back upon itself so that there is a single infinity, rather like the projective plane only in a single dimension. If you start at zero and move towards increasingly large numbers, then like Buzz Lightyear you can go to infinity and beyond: 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0 This makes me think that maybe the universe is using ones or two complement math (is there a negative zero?)... Chris -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On 2014-03-05, Chris Kaynor ckay...@zindagigames.com wrote: On Wed, Mar 5, 2014 at 9:43 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: At one time, Euler summed an infinite series and got -1, from which he concluded that -1 was (in some sense) larger than infinity. I don't know what justification he gave, but the way I think of it is to take the number line from -∞ to +∞ and then bend it back upon itself so that there is a single infinity, rather like the projective plane only in a single dimension. If you start at zero and move towards increasingly large numbers, then like Buzz Lightyear you can go to infinity and beyond: 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0 This makes me think that maybe the universe is using ones or two complement math (is there a negative zero?)... If the Universe (like most all Python implementations) is using IEEE-754 floating point, there is. -- Grant Edwards grant.b.edwardsYow! This PIZZA symbolizes at my COMPLETE EMOTIONAL gmail.comRECOVERY!! -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On 5 March 2014 17:43, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wed, 05 Mar 2014 12:21:37 +, Oscar Benjamin wrote: The argument that the sum of all natural numbers comes to -1/12 is just some kind of hoax. I don't think *anyone* seriously believes it. You would be wrong. I suggest you read the links I gave earlier. Even the mathematicians who complain about describing this using the word equals don't try to dispute the fact that you can identify the sum of natural numbers with ζ(-1), or that ζ(-1) = -1/12. They simply dispute that we should describe this association as equals. What nobody believes is that the sum of natural numbers is a convergent series that sums to -1/12, because it is provably not. In other words, this is not an argument about the maths. Everyone who looks at the maths has to admit that it is sound. It's an argument about the words we use to describe this. Is it legitimate to say that the infinite sum *equals* -1/12? Or only that the series has the value -1/12? Or that we can associate (talk about a sloppy, non-vigorous term!) the series with -1/12? This is the point. You can identify numbers with many different things. It does not mean to say that the thing is equal to that number. I can associate the number 2 with my bike since it has 2 wheels. That doesn't mean that the bike is equal to 2. So the problem with saying that the sum of the natural numbers equals -1/12 is precisely as you say with the word equals because they're not equal! If you restate the conclusion in more accurate (but technical and less accessible) way that the analytic continuation of a related set of convergent series has the value -1/12 at the value that would correspond to this divergent series then it becomes less mysterious. Do I really have to associate the finite negative value found in the analytic continuation with the sum of the series that is provably greater than any finite number? snip At one time, Euler summed an infinite series and got -1, from which he concluded that -1 was (in some sense) larger than infinity. I don't know what justification he gave, but the way I think of it is to take the number line from -∞ to +∞ and then bend it back upon itself so that there is a single infinity, rather like the projective plane only in a single dimension. If you start at zero and move towards increasingly large numbers, then like Buzz Lightyear you can go to infinity and beyond: 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0 In this sense, -1/12 is larger than infinity. There are many examples that appear to show wrapping round from +infinity to -infinity e.g. the tan function. The thing is that it is not really physical (or meaningful in any direct sense). So for example I might consider the forces on a particle, apply Newton's 2nd law and arrive at a differential equation for the acceleration of the particle, solve the equation and find that the position of the particle at time t is given by tan(t). This would seem to imply that as t increases toward pi/2 the particle heads off infinity miles West but at the exact time pi/2 it wraps around to reappear at infinity miles East and starts heading back toward its starting point. The truth is less interesting: the solution tan(t) becomes invalid at pi/2 and mathematics can tell us nothing about what happens after that even if all the physics we used was exactly true. Now of course this is an ad hoc sloppy argument, but I'm not a professional mathematician. However I can tell you that it's pretty close to what the professional mathematicians and physicists do with negative absolute temperatures, and that is rigorous. http://en.wikipedia.org/wiki/Negative_temperature The key point from that page is the sentence A definition of temperature can be based on the relationship It is clear that temperature is a theoretical abstraction. We have intuitive understandings of what it means but in order for the current body of thermodynamic theory to be consistent it is necessary to sometimes give negative values to the temperature. There's nothing unintuitive about negative temperatures if you understand the usual thermodynamic definitions of temperature. Personally I think it's reasonable to just say that the sum of the natural numbers is infinite rather than messing around with terms like undefined, divergent, or existence. There is a clear difference between a series (or any limit) that fails to converge asymptotically and another that just goes to +-infinity. The difference is usually also relevant to any practical application of this kind of maths. And this is where you get it exactly backwards. The *practical application* comes from physics, where they do exactly what you argue against: they associate ζ(-1) with the sum of the natural numbers (see, I too can avoid the word equals too), and *it works*. I don't know all the details of what they do there and whether or not
Re: Working with the set of real numbers (was: Finding size of Variable)
In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. A mathematician, a chemist, and a physicist are arguing the nature of prime numbers. The chemist says, All odd numbers are prime. Look, I can prove it. Three is prime. Five is prime. Seven is prime. The mathematician says, That's nonsense. Nine is not prime. The physicist looks at him and says, H, you may be right, but eleven is prime, and thirteen is prime. It appears that within the limits of experimental error, all odd number are indeed prime! -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wed, 05 Mar 2014 21:31:51 -0500, Roy Smith wrote: In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. A mathematician, a chemist, and a physicist are arguing the nature of prime numbers. The chemist says, All odd numbers are prime. Look, I can prove it. Three is prime. Five is prime. Seven is prime. The mathematician says, That's nonsense. Nine is not prime. The physicist looks at him and says, H, you may be right, but eleven is prime, and thirteen is prime. It appears that within the limits of experimental error, all odd number are indeed prime! They ask a computer programmer to adjudicate who is right, so he writes a program to print out all the primes: 1 is prime 1 is prime 1 is prime 1 is prime 1 is prime ... -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Thu, Mar 6, 2014 at 2:06 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: They ask a computer programmer to adjudicate who is right, so he writes a program to print out all the primes: 1 is prime 1 is prime 1 is prime 1 is prime 1 is prime And he claimed that he was correct, because he had - as is known to be true in reality - a countably infinite number of primes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On 2014-03-06, Roy Smith r...@panix.com wrote: In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. A mathematician, a chemist, and a physicist are arguing the nature of prime numbers. The chemist says, All odd numbers are prime. Look, I can prove it. Three is prime. Five is prime. Seven is prime. The mathematician says, That's nonsense. Nine is not prime. The physicist looks at him and says, H, you may be right, but eleven is prime, and thirteen is prime. It appears that within the limits of experimental error, all odd number are indeed prime! Assuming spherical odd numbers in a vacuum on a frictionless surface, of course. -- Grant -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
In article 5317e640$0$29985$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wed, 05 Mar 2014 21:31:51 -0500, Roy Smith wrote: In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Physics is the fundamental science, at least according to the physicists, and Real Soon Now they'll have a Theory Of Everything, something small enough to print on a tee-shirt, which will explain everything. At least in principle. A mathematician, a chemist, and a physicist are arguing the nature of prime numbers. The chemist says, All odd numbers are prime. Look, I can prove it. Three is prime. Five is prime. Seven is prime. The mathematician says, That's nonsense. Nine is not prime. The physicist looks at him and says, H, you may be right, but eleven is prime, and thirteen is prime. It appears that within the limits of experimental error, all odd number are indeed prime! They ask a computer programmer to adjudicate who is right, so he writes a program to print out all the primes: 1 is prime 1 is prime 1 is prime 1 is prime 1 is prime ... So, a mathematician, a biologist, and a physicist are watching a house. The physicist says, It appears to be empty. Sometime later, a man and a woman go into the house. Shortly after that, the man and the woman come back out, with a child. The biologist says, They must have reproduced. The mathematician says, If one more person goes into the house, it'll be empty again. -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Mon, Mar 3, 2014 at 11:35 PM, Chris Angelico ros...@gmail.com wrote: In constant space, that will produce the sum of two infinite sequences of digits. (And it's constant time, too, except when it gets a stream of nines. Adding three thirds together will produce an infinite loop as it waits to see if there'll be anything that triggers an infinite cascade of carries.) Now, if there's a way to do that for square rooting a number, then the CF notation has a distinct benefit over the decimal expansion used here. As far as I know, there's no simple way, in constant space and/or time, to progressively yield more digits of a number's square root, working in decimal. The code for that looks like this: def cf_sqrt(n): Yield the terms of the square root of n as a continued fraction. m = 0 d = 1 a = a0 = floor_sqrt(n) while True: yield a next_m = d * a - m next_d = (n - next_m * next_m) // d if next_d == 0: break next_a = (a0 + next_m) // next_d m, d, a = next_m, next_d, next_a def floor_sqrt(n): Return the integer part of the square root of n. n = int(n) if n == 0: return 0 lower = 2 ** int(math.log(n, 2) // 2) upper = lower * 2 while upper - lower 1: mid = (upper + lower) // 2 if n mid * mid: upper = mid else: lower = mid return lower The floor_sqrt function is merely doing a simple binary search and could probably be optimized, but then it's only called once during initialization anyway. The meat of the loop, as you can see, is just a constant amount of integer arithmetic. If it were desired to halt once the continued fraction starts to repeat, that would just be a matter of checking whether the triple (m, d, a) has been seen already. Going back to your example of adding generated digits though, I don't know how to add two continued fractions together without evaluating them. -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, Mar 4, 2014 at 4:19 AM, Ian Kelly ian.g.ke...@gmail.com wrote: def cf_sqrt(n): Yield the terms of the square root of n as a continued fraction. m = 0 d = 1 a = a0 = floor_sqrt(n) while True: yield a next_m = d * a - m next_d = (n - next_m * next_m) // d if next_d == 0: break next_a = (a0 + next_m) // next_d m, d, a = next_m, next_d, next_a Sorry, all that next business is totally unnecessary. More simply: def cf_sqrt(n): Yield the terms of the square root of n as a continued fraction. m = 0 d = 1 a = a0 = floor_sqrt(n) while True: yield a m = d * a - m d = (n - m * m) // d if d == 0: break a = (a0 + m) // d -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
In article mailman.7702.1393932047.18130.python-l...@python.org, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Mar 3, 2014 at 11:35 PM, Chris Angelico ros...@gmail.com wrote: In constant space, that will produce the sum of two infinite sequences of digits. (And it's constant time, too, except when it gets a stream of nines. Adding three thirds together will produce an infinite loop as it waits to see if there'll be anything that triggers an infinite cascade of carries.) Now, if there's a way to do that for square rooting a number, then the CF notation has a distinct benefit over the decimal expansion used here. As far as I know, there's no simple way, in constant space and/or time, to progressively yield more digits of a number's square root, working in decimal. The code for that looks like this: def cf_sqrt(n): Yield the terms of the square root of n as a continued fraction. m = 0 d = 1 a = a0 = floor_sqrt(n) while True: yield a next_m = d * a - m next_d = (n - next_m * next_m) // d if next_d == 0: break next_a = (a0 + next_m) // next_d m, d, a = next_m, next_d, next_a def floor_sqrt(n): Return the integer part of the square root of n. n = int(n) if n == 0: return 0 lower = 2 ** int(math.log(n, 2) // 2) upper = lower * 2 while upper - lower 1: mid = (upper + lower) // 2 if n mid * mid: upper = mid else: lower = mid return lower The floor_sqrt function is merely doing a simple binary search and could probably be optimized, but then it's only called once during initialization anyway. The meat of the loop, as you can see, is just a constant amount of integer arithmetic. If it were desired to halt once the continued fraction starts to repeat, that would just be a matter of checking whether the triple (m, d, a) has been seen already. Going back to your example of adding generated digits though, I don't know how to add two continued fractions together without evaluating them. That is highly non-trivial indeed. See the gosper.txt reference I gave in another post. Groetjes Albert -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
In article mailman.7687.1393902132.18130.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst alb...@spenarnc.xs4all.nl wrote: No, the Python built-in float type works with a subset of real numbers: To be more precise: a subset of the rational numbers, those with a denominator that is a power of two. And no more than N bits (53 in a 64-bit float) in the numerator, and the denominator between the limits of the exponent. (Unless it's subnormal. That adds another set of small numbers.) It's a pretty tight set of restrictions, and yet good enough for so many purposes. But it's a far cry from all real numbers. Even allowing for continued fractions adds only some more; I don't think you can represent surds that way. Adding cf's adds all computable numbers in infinite precision. However that is not even a drop in the ocean, as the computable numbers have measure zero. A cf object yielding its coefficients amounts to a program that generates an infinite amount of data (in infinite time), so it is not very surprising it can represent any computable number. Pretty humbling really. ChrisA Groetjes Albert -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wed, 05 Mar 2014 02:15:14 +, Albert van der Horst wrote: Adding cf's adds all computable numbers in infinite precision. However that is not even a drop in the ocean, as the computable numbers have measure zero. On the other hand, it's not really clear that the non-computable numbers are useful or necessary for anything. They exist as mathematical abstractions, but they'll never be the result of any calculation or measurement that anyone might do. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wednesday, March 5, 2014 9:11:13 AM UTC+5:30, Steven D'Aprano wrote: On Wed, 05 Mar 2014 02:15:14 +, Albert van der Horst wrote: Adding cf's adds all computable numbers in infinite precision. However that is not even a drop in the ocean, as the computable numbers have measure zero. On the other hand, it's not really clear that the non-computable numbers are useful or necessary for anything. They exist as mathematical abstractions, but they'll never be the result of any calculation or measurement that anyone might do. There are even more extreme versions of this amounting to roughly this view: Any infinity supposedly 'larger' than the natural numbers is a nonsensical notion. See eg http://en.wikipedia.org/wiki/Controversy_over_Cantor%27s_theory and Weyl/Polya bet (pg 10 of http://research.microsoft.com/en-us/um/people/gurevich/Opera/123.pdf ) I cannot find the exact quote so from memory Weyl says something to this effect: Cantor's diagonalization PROOF is not in question. Its CONCLUSION very much is. The classical/platonic mathematician (subject to wooly thinking) concludes that the real numbers are a superset of the integers The constructvist mathematician (who supposedly thinks clearly) only concludes the obvious, viz that real numbers cannot be enumerated To go from 'cannot be enumerated' to 'is a proper superset of' requires the assumption of 'completed infinities' and that is not math but theology -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
In article c39d5b44-6c7b-40d1-bbb5-791a36af6...@googlegroups.com, Rustom Mody rustompm...@gmail.com wrote: I cannot find the exact quote so from memory Weyl says something to this effect: Cantor's diagonalization PROOF is not in question. Its CONCLUSION very much is. The classical/platonic mathematician (subject to wooly thinking) concludes that the real numbers are a superset of the integers The constructvist mathematician (who supposedly thinks clearly) only concludes the obvious, viz that real numbers cannot be enumerated To go from 'cannot be enumerated' to 'is a proper superset of' requires the assumption of 'completed infinities' and that is not math but theology I stopped paying attention to mathematicians when they tried to convince me that the sum of all natural numbers is -1/12. Sure, you can manipulate the symbols in a way which is consistent with some set of rules that we believe govern the legal manipulation of symbols, but it just plain doesn't make sense. -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote: I stopped paying attention to mathematicians when they tried to convince me that the sum of all natural numbers is -1/12. I'm pretty sure they did not. Possibly a physicist may have tried to tell you that, but most mathematicians consider physicists to be lousy mathematicians, and the mere fact that they're results seem to actually work in practice is an embarrassment for the entire universe. A mathematician would probably have said that the sum of all natural numbers is divergent and therefore there is no finite answer. Well, that is, apart from mathematicians like Euler and Ramanujan. When people like them tell you something, you better pay attention. We have an intuitive understanding of the properties of addition. You can't add 1000 positive whole numbers and get a negative fraction, that's obvious. But that intuition only applies to *finite* sums. They don't even apply to infinite *convergent* series, and they're *easy*. Remember Zeno's Paradoxes? People doubted that the convergent series: 1/2 + 1/4 + 1/8 + 1/16 + ... added up to 1 for the longest time, even though they could see with their own eyes that it had to. Until they worked out what *infinite* sums actually meant, their intuitions were completely wrong. This is a good lesson for us all. The sum of all the natural numbers is a divergent infinite series, so we shouldn't expect that our intuitions hold. We can't add it up as if it were a convergent series, because it's not convergent. Nobody disputes that. But perhaps there's another way? Normally mathematicians will tell you that divergent series don't have a total. That's because often the total you get can vary depending on how you add them up. The classic example is summing the infinite series: 1 - 1 + 1 - 1 + 1 - ... Depending on how you group them, you can get: (1 - 1) + (1 - 1) + (1 - 1) ... = 0 + 0 + 0 + ... = 0 or you can get: 1 - (1 - 1 + 1 - 1 + ... ) = 1 - (1 - 1) - (1 - 1) - ... ) = 1 - 0 - 0 - 0 ... = 1 Or you can do a neat little trick where we define the sum as x: x = 1 - 1 + 1 - 1 + 1 - ... x = 1 - (1 - 1 + 1 - 1 + ... ) x = 1 - x 2x = 1 x = 1/2 So at first glance, summing a divergent series is like dividing by zero. You get contradictory results, at least in this case. But that's not necessarily always the case. You do have to be careful when summing divergent series, but that doesn't always mean you can't do it and get a meaningful answer. Sometimes you can, sometimes you can't, it depends on the specific series. With the sum of the natural numbers, rather than getting three different results from three different methods, mathematicians keep getting the same -1/12 result using various methods. That's a good hint that there is something logically sound going on here, even if it seems unintuitive. Remember Zeno's Paradoxes? Our intuitions about equality and plus and sums of numbers don't apply to infinite series. We should be at least open to the possibility that while all the *finite* sums: 1 + 2 1 + 2 + 3 1 + 2 + 3 + 4 ... and so on sum to positive whole numbers, that doesn't mean that the *infinite* sum has to total to a positive whole number. Maybe that's not how addition works. I don't know about you, but I've never personally added up an infinite number of every-increasing quantities to see what the result is. Maybe it is a negative fraction. (I'd say try it and see, but I don't have an infinite amount of time to spend on it.) And in fact that's exactly what seems to be case here. Mathematicians can demonstrate an identity (that is, equality) between the divergent sum of the natural numbers with the zeta function ζ(-1), and *that* can be worked out independently, and equals -1/12. So there are a bunch of different ways to show that the divergent sum adds up to -1/12, some of them are more vigorous than others. The zeta function method is about as vigorous as they come. The addition of an infinite number of things behaves differently than the addition of finite numbers of things. More here: http://scitation.aip.org/content/aip/magazine/physicstoday/news/10.1063/PT.5.8029 http://math.ucr.edu/home/baez/week126.html http://en.wikipedia.org/wiki/1_+_2_+_3_+_4_+_%E2%8B%AF and even here: http://scientopia.org/blogs/goodmath/2014/01/20/oy-veh-power-series-analytic-continuations-and-riemann-zeta/ where a mathematician tries *really hard* to discredit the idea that the sum equals -1/12, but ends up proving that it does. So he simply plays a linguistic slight of hand and claims that despite the series and the zeta function being equal, they're not *actually* equal. In effect, the author Mark Carrol-Chu in the GoodMath blog above wants to make the claim that the divergent sum is not equal to ζ(-1), but everywhere you find that divergent sum in your calculations you can rub it out and replace it with ζ(-1), which is -1/12. In other words, he's
Re: Working with the set of real numbers (was: Finding size of Variable)
In article mailman.6735.1392194885.18130.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney ben+pyt...@benfinney.id.au wrote: Chris Angelico ros...@gmail.com writes: I have yet to find any computer that works with the set of real numbers in any way. Never mind optimization, they simply cannot work with real numbers. Not *any* computer? Not in *any* way? The Python built-in âfloatâ type âworks with the set of real numbersâ, in a way. No, the Python built-in float type works with a subset of real numbers: To be more precise: a subset of the rational numbers, those with a denominator that is a power of two. float(pi) Traceback (most recent call last): File pyshell#1, line 1, in module float(pi) ValueError: could not convert string to float: 'pi' float(Ï) Traceback (most recent call last): File pyshell#2, line 1, in module float(Ï) ValueError: could not convert string to float: 'Ï' Same goes for fractions.Fraction and [c]decimal.Decimal. All of them are restricted to some subset of rational numbers, not all reals. The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC defines behaviours for types implementing the set of real numbers. What specific behaviour would, for you, qualify as âworks with the set of real numbers in any wayâ? Being able to represent surds, pi, e, etc, for a start. It'd theoretically be possible with an algebraic notation (eg by carrying through some representation like 2*pi rather than 6.28), but otherwise, irrationals can't be represented with finite storage and a digit-based system. An interesting possibility is working with rules that generate the continued fraction sequence of a real number. Say yield() gives the next coefficient (or the next hex digit). It was generally believed that summing two numbers in their cf representation was totally impractical because it required conversion to a rational number. OTOH if we consider a cf as an ongoing progress, the situation is much better. Summing would be a process that yields coefficients of the sum, and you could just stop when you've enough precision. Fascinating stuff. It is described in a self contained, type writer style document gosper.txt that is found on the web in several places e.g. http://home.strw.leidenuniv.nl/~gurkan/gosper.pdf I have a gosper.txt, don't know from where. It really is a cookbook, one could built a python implementation from there, without being overly math savvy. I'd love to hear if some one does it. ( in principle a coefficient of a cf can overflow machine precision, that has never been observed in the wild. A considerable percentage of the coefficients for a random number are ones or otherwise small. The golden ratio has all ones.) ChrisA Groetjes Albert -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst alb...@spenarnc.xs4all.nl wrote: No, the Python built-in float type works with a subset of real numbers: To be more precise: a subset of the rational numbers, those with a denominator that is a power of two. And no more than N bits (53 in a 64-bit float) in the numerator, and the denominator between the limits of the exponent. (Unless it's subnormal. That adds another set of small numbers.) It's a pretty tight set of restrictions, and yet good enough for so many purposes. But it's a far cry from all real numbers. Even allowing for continued fractions adds only some more; I don't think you can represent surds that way. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tuesday, March 4, 2014 8:32:01 AM UTC+5:30, Chris Angelico wrote: On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst wrote: No, the Python built-in float type works with a subset of real numbers: To be more precise: a subset of the rational numbers, those with a denominator that is a power of two. And no more than N bits (53 in a 64-bit float) in the numerator, and the denominator between the limits of the exponent. (Unless it's subnormal. That adds another set of small numbers.) It's a pretty tight set of restrictions, and yet good enough for so many purposes. But it's a far cry from all real numbers. Even allowing for continued fractions adds only some more; I don't think you can represent surds that way. See http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, Mar 4, 2014 at 2:13 PM, Rustom Mody rustompm...@gmail.com wrote: But it's a far cry from all real numbers. Even allowing for continued fractions adds only some more; I don't think you can represent surds that way. See http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts That's neat, didn't know that. Is there an efficient way to figure out, for any integer N, what its sqrt's CF sequence is? And what about the square roots of non-integers - can you represent √π that way? I suspect, though I can't prove, that there will be numbers that can't be represented even with an infinite series - or at least numbers whose series can't be easily calculated. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tuesday, March 4, 2014 9:16:25 AM UTC+5:30, Chris Angelico wrote: On Tue, Mar 4, 2014 at 2:13 PM, Rustom Mody wrote: But it's a far cry from all real numbers. Even allowing for continued fractions adds only some more; I don't think you can represent surds that way. See http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts That's neat, didn't know that. Is there an efficient way to figure out, for any integer N, what its sqrt's CF sequence is? And what about the square roots of non-integers - can you represent √π that way? I suspect, though I can't prove, that there will be numbers that can't be represented even with an infinite series - or at least numbers whose series can't be easily calculated. You are now asking questions that are really (real-ly?) outside my capacities. What I know (which may be quite off the mark :-) ) Just as all real numbers almost by definition have a decimal form (may be infinite eg 1/3 becomes 0.3...) all real numbers likewise have a CF form For some mathematical (aka arcane) reasons the CF form is actually better. Furthermore: 1. Transcendental numbers like e and pi have non-repeating infinite CF forms 2. Algebraic numbers (aka surds) have repeating maybe finite(?) forms 3. For some numbers its not known whether they are transcendental or not (vague recollection pi^sqrt(pi) is one such) 4 Since e^ipi is very much an integer, above question is surprisingly non-trivial -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, 04 Mar 2014 14:46:25 +1100, Chris Angelico wrote: That's neat, didn't know that. Is there an efficient way to figure out, for any integer N, what its sqrt's CF sequence is? And what about the square roots of non-integers - can you represent √π that way? I suspect, though I can't prove, that there will be numbers that can't be represented even with an infinite series - or at least numbers whose series can't be easily calculated. Every rational number can be written as a continued fraction with a finite number of terms[1]. Every irrational number can be written as a continued fraction with an infinite number of terms, just as every irrational number can be written as a decimal number with an infinite number of digits. Most of them (to be precise: an uncountably infinite number of them) will have no simple or obvious pattern. [1] To be pedantic, written as *two* continued fractions, one ending with the term 1, and one with one less term which isn't 1. That is: [a; b, c, d, ..., z, 1] == [a; b, c, d, ..., z+1] Any *finite* CF ending with one can be simplified to use one fewer term. Infinite CFs of course don't have a last term. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Tue, Mar 4, 2014 at 4:53 PM, Steven D'Aprano st...@pearwood.info wrote: On Tue, 04 Mar 2014 14:46:25 +1100, Chris Angelico wrote: That's neat, didn't know that. Is there an efficient way to figure out, for any integer N, what its sqrt's CF sequence is? And what about the square roots of non-integers - can you represent √π that way? I suspect, though I can't prove, that there will be numbers that can't be represented even with an infinite series - or at least numbers whose series can't be easily calculated. Every irrational number can be written as a continued fraction with an infinite number of terms, just as every irrational number can be written as a decimal number with an infinite number of digits. It's easy enough to have that kind of expansion, I'm wondering if it's possible to identify it directly. To render the decimal expansion of a square root by the cut-and-try method, you effectively keep dividing until you find that you're close enough; that means you (a) have to keep the entire number around for each step, and (b) need to do a few steps to find that the digits aren't changing. But if you can take a CF (finite or infinite) and do an O(n) transformation on it to produce that number's square root, then you have an effective means of representing square roots. Suppose I make a generator function that represents a fraction: def one_third(): while True: yield 3 def one_seventh(): while True: yield 1; yield 4; yield 2; yield 8; yield 5; yield 7 I could then make a generator that returns the sum of those two: def add_without_carry(x, y): whiile True: yield next(x)+next(y) Okay, that's broken for nearly any case, but with a bit more sophistication: def add(x, y): prev=None nines=0 while True: xx,yy=next(x),next(y) tot=xx+yy if tot==9: nines+=1 continue if tot9: if prev is None: raise OverflowError(exceeds 1.0) yield prev+1 tot-=10 for _ in range(nines): yield 0 nines=0 else: if prev is not None: yield prev prev=tot def show(n): return ''.join(str(_) for _ in itertools.islice(n,20)) show(add(one_third(),one_seventh())) '47619047619047619047' show(add(add(add(one_seventh(),one_seventh()),add(one_seventh(),one_seventh())),add(one_seventh(),one_seventh( '85714285714285714285' In constant space, that will produce the sum of two infinite sequences of digits. (And it's constant time, too, except when it gets a stream of nines. Adding three thirds together will produce an infinite loop as it waits to see if there'll be anything that triggers an infinite cascade of carries.) Now, if there's a way to do that for square rooting a number, then the CF notation has a distinct benefit over the decimal expansion used here. As far as I know, there's no simple way, in constant space and/or time, to progressively yield more digits of a number's square root, working in decimal. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wed, Feb 12, 2014 at 6:49 PM, wxjmfa...@gmail.com wrote: The day you find an operator working on the set of reals (R) and it is somehow optimized for N (the subset of natural numbers), let me know. I have yet to find any computer that works with the set of real numbers in any way. Never mind optimization, they simply cannot work with real numbers. As to operations that are optimized for integers (usually not for naturals - supporting zero and negatives isn't hard), they are legion. In Python, integers have arbitrary precision, but floats, Fractions, and Decimals, don't. Nearly any operation on arbitrarily large numbers will be either more accurate or more efficient (maybe both) with integers than with any of the other types. Letting you know, that's all. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Working with the set of real numbers (was: Finding size of Variable)
Chris Angelico ros...@gmail.com writes: I have yet to find any computer that works with the set of real numbers in any way. Never mind optimization, they simply cannot work with real numbers. Not *any* computer? Not in *any* way? The Python built-in ‘float’ type “works with the set of real numbers”, in a way. The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC defines behaviours for types implementing the set of real numbers. What specific behaviour would, for you, qualify as “works with the set of real numbers in any way”? -- \ “The fact that I have no remedy for all the sorrows of the | `\ world is no reason for my accepting yours. It simply supports | _o__) the strong probability that yours is a fake.” —Henry L. Mencken | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
Integers are integers. (1) Characters are characters. (2) (1) is a unique natural set. (2) is an artificial construct working with 3 sets (unicode). jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney ben+pyt...@benfinney.id.au wrote: Chris Angelico ros...@gmail.com writes: I have yet to find any computer that works with the set of real numbers in any way. Never mind optimization, they simply cannot work with real numbers. Not *any* computer? Not in *any* way? The Python built-in ‘float’ type “works with the set of real numbers”, in a way. No, the Python built-in float type works with a subset of real numbers: float(pi) Traceback (most recent call last): File pyshell#1, line 1, in module float(pi) ValueError: could not convert string to float: 'pi' float(π) Traceback (most recent call last): File pyshell#2, line 1, in module float(π) ValueError: could not convert string to float: 'π' Same goes for fractions.Fraction and [c]decimal.Decimal. All of them are restricted to some subset of rational numbers, not all reals. The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC defines behaviours for types implementing the set of real numbers. What specific behaviour would, for you, qualify as “works with the set of real numbers in any way”? Being able to represent surds, pi, e, etc, for a start. It'd theoretically be possible with an algebraic notation (eg by carrying through some representation like 2*pi rather than 6.28), but otherwise, irrationals can't be represented with finite storage and a digit-based system. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
Le mercredi 12 février 2014 09:35:38 UTC+1, wxjm...@gmail.com a écrit : Integers are integers. (1) Characters are characters. (2) (1) is a unique natural set. (2) is an artificial construct working with 3 sets (unicode). jmf Addendum: One should not confuse unicode and the implementation of unicode. jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Chris Angelico writes: On Wed, Feb 12, 2014 at 6:49 PM, wxjmfa...@gmail.com wrote: The day you find an operator working on the set of reals (R) and it is somehow optimized for N (the subset of natural numbers), let me know. ... In Python, integers have arbitrary precision, but floats, Fractions, and Decimals, don't. Nearly any operation on arbitrarily large numbers will be either more accurate or more efficient (maybe both) with integers than with any of the other types. Is that true about Fractions? -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wed, Feb 12, 2014 at 7:57 PM, Jussi Piitulainen jpiit...@ling.helsinki.fi wrote: In Python, integers have arbitrary precision, but floats, Fractions, and Decimals, don't. Nearly any operation on arbitrarily large numbers will be either more accurate or more efficient (maybe both) with integers than with any of the other types. Is that true about Fractions? I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit the size or precision of their data, but certainly if they don't, it'll be at horrendous expense of performance. (Decimal can add and subtract in reasonable time complexity, but multiplication and division will get slow when you have huge numbers of digits. Fraction can multiply and divide efficiently, but will get crazily slow on addition and subtraction.) Integers are an optimized case in many ways. I can do accurate arbitrary-precision integer arithmetic without worrying about simple operations suddenly saturating the CPU. I can't do that with non-integers in any way. It's not optimized for natural numbers (nonnegative integers), as negatives are just as cheap as positives, but it's certainly an optimization for integers. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
Chris Angelico writes: On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney wrote: What specific behaviour would, for you, qualify as “works with the set of real numbers in any way”? Being able to represent surds, pi, e, etc, for a start. It'd theoretically be possible with an algebraic notation (eg by carrying through some representation like 2*pi rather than 6.28), but otherwise, irrationals can't be represented with finite storage and a digit-based system. I've seen papers on exact computable reals that would, in effect, generate more precision when needed for some operation. It wasn't symbolic like 2pi, more like 6.28... with a promise to delve into the ellipsis, and some notable operations not supported. Equality testing was missing, I think, and I think it could not be known in general whether such a number is positive, zero or negative, so even approximate printing in the usual digit notation would not be possible. (Interval arithmetic, I hear, has a similar problem about not knowing the sign of a number.) In stark contrast, exact rationals work nicely, up to efficiency considerations. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Chris Angelico writes: On Wed, Feb 12, 2014 at 7:57 PM, Jussi Piitulainen wrote: In Python, integers have arbitrary precision, but floats, Fractions, and Decimals, don't. Nearly any operation on arbitrarily large numbers will be either more accurate or more efficient (maybe both) with integers than with any of the other types. Is that true about Fractions? I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit the size or precision of their data, but certainly if they don't, it'll be at horrendous expense of performance. (Decimal can add and subtract in reasonable time complexity, but multiplication and division will get slow when you have huge numbers of digits. Fraction can multiply and divide efficiently, but will get crazily slow on addition and subtraction.) Integers are an optimized case in many ways. I can do accurate arbitrary-precision integer arithmetic without worrying about simple operations suddenly saturating the CPU. I can't do that with non-integers in any way. It's not optimized for natural numbers (nonnegative integers), as negatives are just as cheap as positives, but it's certainly an optimization for integers. Right. I don't know about Decimal, but I don't think there are any precision restrictions in Fraction, other than running out of heap (or possibly integer precision). In my (quite limited) experience, the most expensive operation on both exact rationals and exact integers has been the printing, in decimal, of several screenfuls of digits. The actual calculations have taken a couple of seconds and then I have wished that I could interrupt the printing of a single number :) -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 12/02/2014 07:49, wxjmfa...@gmail.com wrote: Le mardi 11 février 2014 20:04:02 UTC+1, Mark Lawrence a écrit : On 11/02/2014 18:53, wxjmfa...@gmail.com wrote: Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit : On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. A utf optimizes the memory and the performance at the same time. It behaves like a mathematical operator, a unique operator for a unique set of elements. Unbeatable. The FSR is an exclusive or mechanism. I you wish to same memory, you have to encode, and if you are encoding, maybe because you have to, one loses performance. Paradoxal. Your O(1) indexing works only and only because and when you are working explicitly with a static unicode string you never touch. It's a little bit the the corresponding performance case of the memory case. jmf Why are you so rude as to continually post your nonsense here that not a single person believes, and at the same time still quite deliberately use gg to post it with double line spacing. If you lack the courtesy to stop the former, please have the courtesy to stop the latter. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Nonsense? sys.getsizeof('') - sys.getsizeof('a') -1 The day you find an operator working on the set of reals (R) and it is somehow optimized for N (the subset of natural numbers), let me know. A conflict is quickly appearing. Either the operator is not correctly defined or the choice of the set is wrong. You can replace the operator with an encoding and the set with a repertoire of characters. It's the main reason, why we have to live today with all these coding schemes. Even in more sophisticated cases like, CID-fonts or char boxes in a pdf (with the hope you understand how it works). jmf I ask you, members of the jury, to find the accused, jmf, guilty of writing nonsense and deliberately using google groups to double line space. The evidence is directly above and quite clearly prooves, beyond a resonable doubt, that no verdict other than guilty can be recorded. I rest my case, m'lud. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote: I ask you, members of the jury, to find the accused, jmf, guilty of writing nonsense and deliberately using google groups to double line space. The evidence is directly above and quite clearly prooves, beyond a resonable doubt, that no verdict other than guilty can be recorded. I rest my case, m'lud. Is a proof more fool-proof because prove is spelt proove wink? -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 12/02/2014 14:14, Rustom Mody wrote: On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote: I ask you, members of the jury, to find the accused, jmf, guilty of writing nonsense and deliberately using google groups to double line space. The evidence is directly above and quite clearly prooves, beyond a resonable doubt, that no verdict other than guilty can be recorded. I rest my case, m'lud. Is a proof more fool-proof because prove is spelt proove wink? Fauultee keebored :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 12, 2014 7:55:32 PM UTC+5:30, Mark Lawrence wrote: On 12/02/2014 14:14, Rustom Mody wrote: On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote: I ask you, members of the jury, to find the accused, jmf, guilty of writing nonsense and deliberately using google groups to double line space. The evidence is directly above and quite clearly prooves, beyond a resonable doubt, that no verdict other than guilty can be recorded. I rest my case, m'lud. Is a proof more fool-proof because prove is spelt proove wink? Fauultee keebored :) Very O(n)T considering the relation between Fawlty towers and Monty python :-) -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
On 2014-02-12, Ben Finney ben+pyt...@benfinney.id.au wrote: Chris Angelico ros...@gmail.com writes: I have yet to find any computer that works with the set of real numbers in any way. Never mind optimization, they simply cannot work with real numbers. Not *any* computer? Not in *any* way? The Python built-in float type works with the set of real numbers, in a way. The only people who think that are people who don't actualy _use_ floating point types on computers. What specific behaviour would, for you, qualify as works with the set of real numbers in any way There's a whole laundry list of things (some of them rather nasty and difficult) you have to worry about when using FP that simply don't apply to real numbers. -- Grant Edwards grant.b.edwardsYow! HUGH BEAUMONT died at in 1982!! gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with the set of real numbers (was: Finding size of Variable)
Grant Edwards wrote: Not *any* computer? Not in *any* way? The Python built-in float type works with the set of real numbers, in a way. The only people who think that are people who don't actualy _use_ floating point types on computers. FPU parsing the IEEE spec, or?. I didn't quite parse what *you* wrote. To paraphrase: #include math.h there are FP_NORMAL and FP_SUBNORMAL people in the world; those who understand IEEE 754 and those who don't. .. --gv -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2014-02-10, Ned Batchelder n...@nedbatchelder.com wrote: On 2/10/14 9:43 AM, Tim Chase wrote: The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. Please don't engage in this debate with JMF. His mind is made up, and he will not be swayed, no matter how persuasive and reasonable your arguments. Just ignore him. I think reasonable criticisms should be contested no matter who posts them. I agree jmf shouldn't be singled out for abuse, summoned, insulted, or have his few controversial opinions brought into other topics. Tim's post was responding to a specific, well-presented criticism of Python's string implementation. Left unchallenged, it might linger unhappily in the air, like a symphony ended on a dominant 7th chord. -- Neil Cerutti -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit : On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. A utf optimizes the memory and the performance at the same time. It behaves like a mathematical operator, a unique operator for a unique set of elements. Unbeatable. The FSR is an exclusive or mechanism. I you wish to same memory, you have to encode, and if you are encoding, maybe because you have to, one loses performance. Paradoxal. Your O(1) indexing works only and only because and when you are working explicitly with a static unicode string you never touch. It's a little bit the the corresponding performance case of the memory case. jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 11/02/2014 18:53, wxjmfa...@gmail.com wrote: Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit : On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. A utf optimizes the memory and the performance at the same time. It behaves like a mathematical operator, a unique operator for a unique set of elements. Unbeatable. The FSR is an exclusive or mechanism. I you wish to same memory, you have to encode, and if you are encoding, maybe because you have to, one loses performance. Paradoxal. Your O(1) indexing works only and only because and when you are working explicitly with a static unicode string you never touch. It's a little bit the the corresponding performance case of the memory case. jmf Why are you so rude as to continually post your nonsense here that not a single person believes, and at the same time still quite deliberately use gg to post it with double line spacing. If you lack the courtesy to stop the former, please have the courtesy to stop the latter. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Le mardi 11 février 2014 20:04:02 UTC+1, Mark Lawrence a écrit : On 11/02/2014 18:53, wxjmfa...@gmail.com wrote: Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit : On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. A utf optimizes the memory and the performance at the same time. It behaves like a mathematical operator, a unique operator for a unique set of elements. Unbeatable. The FSR is an exclusive or mechanism. I you wish to same memory, you have to encode, and if you are encoding, maybe because you have to, one loses performance. Paradoxal. Your O(1) indexing works only and only because and when you are working explicitly with a static unicode string you never touch. It's a little bit the the corresponding performance case of the memory case. jmf Why are you so rude as to continually post your nonsense here that not a single person believes, and at the same time still quite deliberately use gg to post it with double line spacing. If you lack the courtesy to stop the former, please have the courtesy to stop the latter. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Nonsense? sys.getsizeof('') - sys.getsizeof('a') -1 The day you find an operator working on the set of reals (R) and it is somehow optimized for N (the subset of natural numbers), let me know. A conflict is quickly appearing. Either the operator is not correctly defined or the choice of the set is wrong. You can replace the operator with an encoding and the set with a repertoire of characters. It's the main reason, why we have to live today with all these coding schemes. Even in more sophisticated cases like, CID-fonts or char boxes in a pdf (with the hope you understand how it works). jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Le samedi 8 février 2014 03:48:12 UTC+1, Steven D'Aprano a écrit : We consider it A GOOD THING that Python spends memory for programmer convenience and safety. Python looks for memory optimizations when it can save large amounts of memory, not utterly trivial amounts. So in a Python wide build, a ten-thousand block character string requires a little bit more than 40KB. In Python 3.3, that can be reduced to only 10KB for a purely Latin-1 string, or 20K for a string without any astral characters. That's the sort of memory savings that are worthwhile, reducing memory usage by 75%. In its attempt to save memory, Python only succeeds to do worse than any utf* coding schemes. --- Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Monday, February 10, 2014 4:07:14 PM UTC+2, wxjm...@gmail.com wrote: Interesting sys.getsizeof('a' * 100) here you get string type sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) and here bytes type ('a' * 1) class 'str' type(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) class 'bytes' Why? -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 10/02/2014 14:25, Asaf Las wrote: On Monday, February 10, 2014 4:07:14 PM UTC+2, wxjm...@gmail.com wrote: Interesting sys.getsizeof('a' * 100) here you get string type sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) and here bytes type ('a' * 1) class 'str' type(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) class 'bytes' Why? Please don't feed this particular troll, he's spent 18 months driving us nuts with his nonsense. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2/10/14 9:43 AM, Tim Chase wrote: On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. sys.getsizeof('a' * 100) 125 sys.getsizeof('a' * 100 + 'oe') 240 sys.getsizeof('a' * 100 + 'oe' + '\U0001') 448 If Python used UTF-32 for EVERYTHING, then all three of those cases would be 448, so it clearly disproves your claim that python does not save memory at all. The opposite of what the utf8/utf16 do! sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8')) 123 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16')) 225 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. -tkc Please don't engage in this debate with JMF. His mind is made up, and he will not be swayed, no matter how persuasive and reasonable your arguments. Just ignore him. -- Ned Batchelder, http://nedbatchelder.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 08/02/2014 02:48, Steven D'Aprano wrote: On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote: Sorry, I'm only pointing you may lose memory when working with short strings as it was explained. I really, very really, do not see what is absurd or obsure in: sys.getsizeof('abc' + 'EURO') 46 sys.getsizeof(('abc' + 'EURO').encode('utf-32')) 37 Why do you care about NINE bytes? The least amount of memory in any PC that I know about is 5 bytes, more than fifty million times more. And you are whinging about wasting nine bytes? If you care about that lousy nine bytes, Python is not the language for you. Go and program in C, where you can spent ten or twenty times longer programming, but save nine bytes in every string. Nobody cares about your memory benchmark except you. Python is not designed to save memory, Python is designed to use as much memory as needed to give the programmer an easier job. In C, I can store a single integer in a single byte. In Python, horror upon horrors, it takes 14 bytes!!! py sys.getsizeof(1) 14 We consider it A GOOD THING that Python spends memory for programmer convenience and safety. Python looks for memory optimizations when it can save large amounts of memory, not utterly trivial amounts. So in a Python wide build, a ten-thousand block character string requires a little bit more than 40KB. In Python 3.3, that can be reduced to only 10KB for a purely Latin-1 string, or 20K for a string without any astral characters. That's the sort of memory savings that are worthwhile, reducing memory usage by 75%. Could Python save memory by using UTF-8? Yes. But it would cost complexity and time, strings would be even slower than they are now. That is not a trade-off that the core developers have chosen to make, and I agree with them. This is a C +1 to save memory when compared against this Python +1 :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sat, Feb 8, 2014 at 8:17 AM, Mark Lawrence breamore...@yahoo.co.ukwrote: On 08/02/2014 02:48, Steven D'Aprano wrote: On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote: Sorry, I'm only pointing you may lose memory when working with short strings as it was explained. I really, very really, do not see what is absurd or obsure in: sys.getsizeof('abc' + 'EURO') 46 sys.getsizeof(('abc' + 'EURO').encode('utf-32')) 37 Why do you care about NINE bytes? The least amount of memory in any PC that I know about is 5 bytes, more than fifty million times more. And you are whinging about wasting nine bytes? One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care? Because it goes into the optimization of the code one is 'developing' in python. If you care about that lousy nine bytes, Python is not the language for you. Go and program in C, where you can spent ten or twenty times longer programming, but save nine bytes in every string. Nobody cares about your memory benchmark except you. Python is not designed to save memory, Python is designed to use as much memory as needed to give the programmer an easier job. In C, I can store a single integer in a single byte. In Python, horror upon horrors, it takes 14 bytes!!! py sys.getsizeof(1) 14 We consider it A GOOD THING that Python spends memory for programmer convenience and safety. Python looks for memory optimizations when it can save large amounts of memory, not utterly trivial amounts. So in a Python wide build, a ten-thousand block character string requires a little bit more than 40KB. In Python 3.3, that can be reduced to only 10KB for a purely Latin-1 string, or 20K for a string without any astral characters. That's the sort of memory savings that are worthwhile, reducing memory usage by 75%. Could Python save memory by using UTF-8? Yes. But it would cost complexity and time, strings would be even slower than they are now. That is not a trade-off that the core developers have chosen to make, and I agree with them. This is a C +1 to save memory when compared against this Python +1 :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com* -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote: One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care? Because it goes into the optimization of the code one is 'developing' in python. Yes... There are cases when python is an inappropriate language to use... So??? Its good to get a bit of context here. loop: jmf says python is inappropriate. Someone asks him: Is it? In what case? jmf: No answer After a delay of few days jmp to start of loop [BTW: In my book this classic trolling] -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody rustompm...@gmail.com wrote: On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote: One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care? Because it goes into the optimization of the code one is 'developing' in python. Yes... There are cases when python is an inappropriate language to use... So??? I didn't say she couldn't optimize in another language, and was just prototyping in Python. I just said she was optimizing her python code...dufus. Its good to get a bit of context here. loop: jmf says python is inappropriate. Someone asks him: Is it? In what case? jmf: No answer After a delay of few days jmp to start of loop loop: mov head,up_your_ass push repeat pop repeat jmp loop [BTW: In my book this classic trolling] -- And the title of this book would be...Pieces of Cliche Bullshit Internet Arguments for Dummies https://mail.python.org/mailman/listinfo/python-list -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com* -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sun, Feb 9, 2014 at 1:56 PM, David Hutto dwightdhu...@gmail.com wrote: Yes... There are cases when python is an inappropriate language to use... So??? I didn't say she couldn't optimize in another language, and was just prototyping in Python. I just said she was optimizing her python code...dufus. And there are a *lot* of cases where that is inappropriate language to use. Please don't. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sat, Feb 8, 2014 at 9:59 PM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 9, 2014 at 1:56 PM, David Hutto dwightdhu...@gmail.com wrote: Yes... There are cases when python is an inappropriate language to use... So??? I didn't say she couldn't optimize in another language, and was just prototyping in Python. I just said she was optimizing her python code...dufus. And there are a *lot* of cases where that is inappropriate language to use. Please don't. ChrisA -- https://mail.python.org/mailman/listinfo/python-list it's also inappropriate for him to call people trolls, while they're just commenting on why what she might be using is a necessity for her particular case of developing in Python, and not using another language, yet. He started it! :P -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com* -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2/8/14 9:56 PM, David Hutto wrote: On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody rustompm...@gmail.com mailto:rustompm...@gmail.com wrote: On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote: One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care? Because it goes into the optimization of the code one is 'developing' in python. Yes... There are cases when python is an inappropriate language to use... So??? I didn't say she couldn't optimize in another language, and was just prototyping in Python. I just said she was optimizing her python code...dufus. Please keep the discussion respectful. Misunderstandings are easy, I suspect this is one of them. There's no reason to start calling people names. Its good to get a bit of context here. loop: jmf says python is inappropriate. Someone asks him: Is it? In what case? jmf: No answer After a delay of few days jmp to start of loop loop: mov head,up_your_ass push repeat pop repeat jmp loop Please keep in mind the Code of Conduct: http://www.python.org/psf/codeofconduct Thanks. [BTW: In my book this classic trolling] -- And the title of this book would be...Pieces of Cliche Bullshit Internet Arguments for Dummies https://mail.python.org/mailman/listinfo/python-list -- Best Regards, David Hutto /*CEO:*/ _http://www.hitwebdevelopment.com_ -- Ned Batchelder, http://nedbatchelder.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Maybe I'll just roll my fat, bald, troll arse out from under the bridge, and comment back, off list, next time. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2/8/14 10:09 PM, David Hutto wrote: Maybe I'll just roll my fat, bald, troll arse out from under the bridge, and comment back, off list, next time. I'm not sure what happened in this thread. It might be that you think Rustom Mody was referring to you when he said, BTW: In my book this classic trolling. I don't think he was, I think he was referring to JMF. In any case, perhaps it would be best to just take a break? -- Ned Batchelder, http://nedbatchelder.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Sunday, February 9, 2014 8:46:50 AM UTC+5:30, Ned Batchelder wrote: On 2/8/14 10:09 PM, David Hutto wrote: Maybe I'll just roll my fat, bald, troll arse out from under the bridge, and comment back, off list, next time. I'm not sure what happened in this thread. It might be that you think Rustom Mody was referring to you when he said, BTW: In my book this classic trolling. I don't think he was, I think he was referring to JMF. Of course! And given the turn of this thread, we must hand it to jmf for being even better at trolling than I thought :-) See the first para http://en.wikipedia.org/wiki/Troll_%28Internet%29 -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote: Sorry, I'm only pointing you may lose memory when working with short strings as it was explained. I really, very really, do not see what is absurd or obsure in: sys.getsizeof('abc' + 'EURO') 46 sys.getsizeof(('abc' + 'EURO').encode('utf-32')) 37 Why do you care about NINE bytes? The least amount of memory in any PC that I know about is 5 bytes, more than fifty million times more. And you are whinging about wasting nine bytes? If you care about that lousy nine bytes, Python is not the language for you. Go and program in C, where you can spent ten or twenty times longer programming, but save nine bytes in every string. Nobody cares about your memory benchmark except you. Python is not designed to save memory, Python is designed to use as much memory as needed to give the programmer an easier job. In C, I can store a single integer in a single byte. In Python, horror upon horrors, it takes 14 bytes!!! py sys.getsizeof(1) 14 We consider it A GOOD THING that Python spends memory for programmer convenience and safety. Python looks for memory optimizations when it can save large amounts of memory, not utterly trivial amounts. So in a Python wide build, a ten-thousand block character string requires a little bit more than 40KB. In Python 3.3, that can be reduced to only 10KB for a purely Latin-1 string, or 20K for a string without any astral characters. That's the sort of memory savings that are worthwhile, reducing memory usage by 75%. Could Python save memory by using UTF-8? Yes. But it would cost complexity and time, strings would be even slower than they are now. That is not a trade-off that the core developers have chosen to make, and I agree with them. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 02/07/2014 06:48 PM, Steven D'Aprano wrote: That is not a trade-off that the core developers have chosen to make, and I agree with them. Even though you haven't broken all the build-bots yet, you can still stop saying them. ;) -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Le mercredi 5 février 2014 12:44:47 UTC+1, Chris Angelico a écrit : On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: where stopWords.txt is a file of size 4KB My guess is that if you split a 4K file into words, then put the words into a list, you'll probably end up with 6-8K in memory. I'd guess rather more; Python strings have a fair bit of fixed overhead, so with a whole lot of small strings, it will get more costly. sys.version '3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32 bit (Intel)]' sys.getsizeof(asdf) 29 Stop words tend to be short, rather than long, words, so I'd look at an average of 2-3 letters per word. Assuming they're separated by spaces or newlines, that means there'll be roughly a thousand of them in the file, for about 25K of overhead. A bit less if the words are longer, but still quite a bit. (Byte strings have slightly less overhead, 17 bytes apiece, but still quite a bit.) ChrisA sum([sys.getsizeof(c) for c in ['a']]) 26 sum([sys.getsizeof(c) for c in ['a', 'a EURO']]) 68 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']]) 112 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 158 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', ' EURO']]) 238 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a']]) 21 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO']]) 46 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO']]) 75 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 108 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', ' EURO']]) 209 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336 sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150 sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261 sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135 jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2/6/14 5:15 AM, wxjmfa...@gmail.com wrote: sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336 sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150 sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261 sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135 jmf JMF, we've told you I-don't-know-how-many-times to stop this. Seriously: think hard about what your purpose is in sending these absurd benchmarks. I guarantee you are not accomplishing it. -- Ned Batchelder, http://nedbatchelder.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Le jeudi 6 février 2014 12:10:08 UTC+1, Ned Batchelder a écrit : On 2/6/14 5:15 AM, wxjmfa...@gmail.com wrote: sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336 sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150 sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261 sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135 jmf JMF, we've told you I-don't-know-how-many-times to stop this. Seriously: think hard about what your purpose is in sending these absurd benchmarks. I guarantee you are not accomplishing it. -- Ned Batchelder, http://nedbatchelder.com Sorry, I'm only pointing you may lose memory when working with short strings as it was explained. I really, very really, do not see what is absurd or obsure in: sys.getsizeof('abc' + 'EURO') 46 sys.getsizeof(('abc' + 'EURO').encode('utf-32')) 37 I apologize for the a EURO which should have been a real EURO. No idea, what's happend. jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Some mysterious problem with the euro. Let's take a real French char. sys.getsizeof('abc' + 'œ') 46 sys.getsizeof(('abc' + 'œ').encode('utf-32')) 37 or a German char, ẞ sys.getsizeof('abc' + '\N{LATIN CAPITAL LETTER SHARP S}') 46 sys.getsizeof(('abc' + '\N{LATIN CAPITAL LETTER SHARP S}').encode('utf-32')) 37 -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Ayushi Dalmia wrote: On Wednesday, February 5, 2014 12:51:31 AM UTC+5:30, Dave Angel wrote: Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: Where am I going wrong? What are the alternatives I can try? You've rejected all the alternatives so far without showing your code, or even properly specifying your problem. To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) This can be high if some of the strings are interned and get counted twice. But you're not likely to get closer without some knowledge of the data objects and where they come from. -- DaveA Hello Dave, I just thought that saving others time is better and hence I explained only the subset of my problem. Here is what I am trying to do: I am trying to index the current wikipedia dump without using databases and create a search engine for Wikipedia documents. Note, I CANNOT USE DATABASES. My approach: I am parsing the wikipedia pages using SAX Parser, and then, I am dumping the words along with the posting list (a list of doc ids in which the word is present) into different files after reading 'X' number of pages. Now these files may have the same word and hence I need to merge them and write the final index again. Now these final indexes must be of limited size as I need to be of limited size. This is where I am stuck. I need to know how to determine the size of content in a variable before I write into the file. Here is the code for my merging: def mergeFiles(pathOfFolder, countFile): listOfWords={} indexFile={} topOfFile={} flag=[0]*countFile data=defaultdict(list) heap=[] countFinalFile=0 for i in xrange(countFile): fileName = pathOfFolder+'\index'+str(i)+'.txt.bz2' indexFile[i]= bz2.BZ2File(fileName, 'rb') flag[i]=1 topOfFile[i]=indexFile[i].readline().strip() listOfWords[i] = topOfFile[i].split(' ') if listOfWords[i][0] not in heap: heapq.heappush(heap, listOfWords[i][0]) At this point you have already done it wrong as your heap contains the complete data and you have done a lot of O(N) tests on the heap. This is both slow and consumes a lot of memory. See http://code.activestate.com/recipes/491285-iterator-merge/ for a sane way to merge sorted data from multiple files. Your code becomes (untested) with open(outfile.txt, wb) as outfile: infiles = [] for i in xrange(countFile): filename = os.path.join(pathOfFolder, 'index'+str(i)+'.txt.bz2') infiles.append(bz2.BZ2File(filename, rb)) outfile.writelines(imerge(*infiles)) for infile in infiles: infile.close() Once you have your data in a single file you can read from that file and do the postprocessing you mention below. while any(flag)==1: temp = heapq.heappop(heap) for i in xrange(countFile): if flag[i]==1: if listOfWords[i][0]==temp: //This is where I am stuck. I cannot wait until memory //error, as I need to do some postprocessing too. try: data[temp].extend(listOfWords[i][1:]) except MemoryError: writeFinalIndex(data, countFinalFile, pathOfFolder) data=defaultdict(list) countFinalFile+=1 topOfFile[i]=indexFile[i].readline().strip() if topOfFile[i]=='': flag[i]=0 indexFile[i].close() os.remove(pathOfFolder+'\index'+str(i)+'.txt.bz2') else: listOfWords[i] = topOfFile[i].split(' ') if listOfWords[i][0] not in heap: heapq.heappush(heap, listOfWords[i][0]) writeFinalIndex(data, countFinalFile, pathOfFolder) countFile is the number of files and writeFileIndex method writes into the file. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tue, 04 Feb 2014 21:35:05 -0800, Ayushi Dalmia wrote: On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote: On 2014-02-04 14:21, Dave Angel wrote: To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) I always find this sort of accumulation weird (well, at least in Python; it's the *only* way in many other languages) and would write it as a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist) This also doesn't gives the true size. I did the following: What do you mean by true size? Do you mean the amount of space a certain amount of data will take in memory? With or without the overhead of object headers? Or do you mean how much space it will take when written to disk? You have not been clear what you are trying to measure. If you are dealing with one-byte characters, you can measure the amount of memory they take up (excluding object overhead) by counting the number of characters: 23 one-byte characters requires 23 bytes. Plus the object overhead gives: py sys.getsizeof('a'*23) 44 44 bytes (23 bytes for the 23 single-byte characters, plus 21 bytes overhead). One thousand such characters takes: py sys.getsizeof('a'*1000) 1021 If you write such a string to disk, it will take 1000 bytes (or 1KB), unless you use some sort of compression. import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) This will give you the amount of space taken by the list object. It will *not* give you the amount of space taken by the individual strings. A Python list looks like this: | header | array of pointers | The header is of constant or near-constant size; the array depends on the number of items in the list. It may be bigger than the list, e.g. a list with 1000 items might have allocated space for 2000 items. It will never be smaller. getsizeof(list) only counts the direct size of that list, including the array, but not the things which the pointers point at. If you want the total size, you need to count them as well. where stopWords.txt is a file of size 4KB My guess is that if you split a 4K file into words, then put the words into a list, you'll probably end up with 6-8K in memory. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: where stopWords.txt is a file of size 4KB My guess is that if you split a 4K file into words, then put the words into a list, you'll probably end up with 6-8K in memory. I'd guess rather more; Python strings have a fair bit of fixed overhead, so with a whole lot of small strings, it will get more costly. sys.version '3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32 bit (Intel)]' sys.getsizeof(asdf) 29 Stop words tend to be short, rather than long, words, so I'd look at an average of 2-3 letters per word. Assuming they're separated by spaces or newlines, that means there'll be roughly a thousand of them in the file, for about 25K of overhead. A bit less if the words are longer, but still quite a bit. (Byte strings have slightly less overhead, 17 bytes apiece, but still quite a bit.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote: On 2014-02-04 14:21, Dave Angel wrote: To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) I always find this sort of accumulation weird (well, at least in Python; it's the *only* way in many other languages) and would write it as a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist) -tkc This also doesn't gives the true size. I did the following: import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) Did you actually READ either of my posts or Tim's? For a container, you can't just use getsizeof on the container. a = sys.getsizeof (data) for item in mylist: a += sys.getsizeof (data) print a -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 5, 2014 7:13:34 PM UTC+5:30, Dave Angel wrote: Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote: On 2014-02-04 14:21, Dave Angel wrote: To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) I always find this sort of accumulation weird (well, at least in Python; it's the *only* way in many other languages) and would write it as a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist) -tkc This also doesn't gives the true size. I did the following: import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) Did you actually READ either of my posts or Tim's? For a container, you can't just use getsizeof on the container. a = sys.getsizeof (data) for item in mylist: a += sys.getsizeof (data) print a -- DaveA Yes, I did. I now understand how to find the size. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 05/02/2014 14:33, Ayushi Dalmia wrote: Please stop sending double line spaced messages, just follow the instructions here https://wiki.python.org/moin/GoogleGroupsPython to prevent this happening, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Finding size of Variable
Hello, I have 10 files and I need to merge them (using K way merging). The size of each file is around 200 MB. Now suppose I am keeping the merged data in a variable named mergedData, I had thought of checking the size of mergedData using sys.getsizeof() but it somehow doesn't gives the actual value of the memory occupied. For example, if a file in my file system occupies 4 KB of data, if I read all the lines in a list, the size of the list is around 2100 bytes only. Where am I going wrong? What are the alternatives I can try? -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Ayushi Dalmia wrote: I have 10 files and I need to merge them (using K way merging). The size of each file is around 200 MB. Now suppose I am keeping the merged data in a variable named mergedData, I had thought of checking the size of mergedData using sys.getsizeof() but it somehow doesn't gives the actual value of the memory occupied. For example, if a file in my file system occupies 4 KB of data, if I read all the lines in a list, the size of the list is around 2100 bytes only. Where am I going wrong? What are the alternatives I can try? getsizeof() gives you the size of the list only; to complete the picture you have to add the sizes of the lines. However, why do you want to keep track of the actual memory used by variables in your script? You should instead concentrate on the algorithm, and as long as either the size of the dataset is manageable or you can limit the amount of data accessed at a given time you are golden. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tuesday, February 4, 2014 5:10:25 PM UTC+5:30, Peter Otten wrote: Ayushi Dalmia wrote: I have 10 files and I need to merge them (using K way merging). The size of each file is around 200 MB. Now suppose I am keeping the merged data in a variable named mergedData, I had thought of checking the size of mergedData using sys.getsizeof() but it somehow doesn't gives the actual value of the memory occupied. For example, if a file in my file system occupies 4 KB of data, if I read all the lines in a list, the size of the list is around 2100 bytes only. Where am I going wrong? What are the alternatives I can try? getsizeof() gives you the size of the list only; to complete the picture you have to add the sizes of the lines. However, why do you want to keep track of the actual memory used by variables in your script? You should instead concentrate on the algorithm, and as long as either the size of the dataset is manageable or you can limit the amount of data accessed at a given time you are golden. As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern. Can you please elaborate on the point of taking lines into consideration? -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tuesday, February 4, 2014 2:43:21 PM UTC+2, Ayushi Dalmia wrote: As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern. Can you please elaborate on the point of taking lines into consideration? have you tried os.sendfile()? http://docs.python.org/dev/library/os.html#os.sendfile -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: getsizeof() gives you the size of the list only; to complete the picture you have to add the sizes of the lines. However, why do you want to keep track of the actual memory used by variables in your script? You should instead concentrate on the algorithm, and as long as either the size of the dataset is manageable or you can limit the amount of data accessed at a given time you are golden. As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern. Can you please elaborate on the point of taking lines into consideration? Please don't doublespace your quotes. If you must use googlegroups, fix its bugs before posting. There's usually no net gain in trying to 'chunk' your output to a text file. The python file system already knows how to do that for a sequential file. For list of strings just add the getsizeof for the list to the sum of the getsizeof of all the list items. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tuesday, February 4, 2014 6:39:00 PM UTC+5:30, Dave Angel wrote: Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: getsizeof() gives you the size of the list only; to complete the picture you have to add the sizes of the lines. However, why do you want to keep track of the actual memory used by variables in your script? You should instead concentrate on the algorithm, and as long as either the size of the dataset is manageable or you can limit the amount of data accessed at a given time you are golden. As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern. Can you please elaborate on the point of taking lines into consideration? Please don't doublespace your quotes. If you must use googlegroups, fix its bugs before posting. There's usually no net gain in trying to 'chunk' your output to a text file. The python file system already knows how to do that for a sequential file. For list of strings just add the getsizeof for the list to the sum of the getsizeof of all the list items. -- DaveA Hey! I need to chunk out the outputs otherwise it will give Memory Error. I need to do some postprocessing on the data read from the file too. If I donot stop before memory error, I won't be able to perform any more operations on it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tuesday, February 4, 2014 6:23:19 PM UTC+5:30, Asaf Las wrote: On Tuesday, February 4, 2014 2:43:21 PM UTC+2, Ayushi Dalmia wrote: As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern. Can you please elaborate on the point of taking lines into consideration? have you tried os.sendfile()? http://docs.python.org/dev/library/os.html#os.sendfile os.sendfile will not serve my purpose. I not only need to merge files, but do it in a sorted way. Thus some postprocessing is needed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 2014-02-04 14:21, Dave Angel wrote: To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) I always find this sort of accumulation weird (well, at least in Python; it's the *only* way in many other languages) and would write it as a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist) -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On 04/02/2014 19:21, Dave Angel wrote: Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: Where am I going wrong? What are the alternatives I can try? You've rejected all the alternatives so far without showing your code, or even properly specifying your problem. To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) The documentation for sys.getsizeof: http://docs.python.org/dev/library/sys#sys.getsizeof warns about the limitations of this function when applied to a container, and even points to a recipe by Raymond Hettinger which attempts to do a more complete job. TJG -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Tuesday, February 4, 2014 7:36:48 PM UTC+5:30, Dennis Lee Bieber wrote: On Tue, 4 Feb 2014 05:19:48 -0800 (PST), Ayushi Dalmia ayushidalmia2...@gmail.com declaimed the following: I need to chunk out the outputs otherwise it will give Memory Error. I need to do some postprocessing on the data read from the file too. If I donot stop before memory error, I won't be able to perform any more operations on it. 10 200MB files is only 2GB... Most any 64-bit processor these days can handle that. Even some 32-bit systems could handle it (WinXP booted with the server option gives 3GB to user processes -- if the 4GB was installed in the machine). However, you speak of an n-way merge. The traditional merge operation only reads one record from each file at a time, examines them for first, writes that first, reads next record from the file first came from, and then reassesses the set. You mention needed to chunk the data -- that implies performing a merge sort in which you read a few records from each file into memory, sort them, and right them out to newFile1; then read the same number of records from each file, sort, and write them to newFile2, up to however many files you intend to work with -- at that point you go back and append the next chunk to newFile1. When done, each file contains chunks of n*r records. You now make newFilex the inputs, read/merge the records from those chunks outputting to another file1, when you reach the end of the first chunk in the files you then read/merge the second chunk into another file2. You repeat this process until you end up with only one chunk in one file. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/ The way you mentioned for merging the file is an option but that will involve a lot of I/O operation. Also, I do not want the size of the file to increase beyond a certain point. When I reach the file size upto a certain limit, I want to start writing in a new file. This is because I want to store them in memory again later. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 5, 2014 12:51:31 AM UTC+5:30, Dave Angel wrote: Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message: Where am I going wrong? What are the alternatives I can try? You've rejected all the alternatives so far without showing your code, or even properly specifying your problem. To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) This can be high if some of the strings are interned and get counted twice. But you're not likely to get closer without some knowledge of the data objects and where they come from. -- DaveA Hello Dave, I just thought that saving others time is better and hence I explained only the subset of my problem. Here is what I am trying to do: I am trying to index the current wikipedia dump without using databases and create a search engine for Wikipedia documents. Note, I CANNOT USE DATABASES. My approach: I am parsing the wikipedia pages using SAX Parser, and then, I am dumping the words along with the posting list (a list of doc ids in which the word is present) into different files after reading 'X' number of pages. Now these files may have the same word and hence I need to merge them and write the final index again. Now these final indexes must be of limited size as I need to be of limited size. This is where I am stuck. I need to know how to determine the size of content in a variable before I write into the file. Here is the code for my merging: def mergeFiles(pathOfFolder, countFile): listOfWords={} indexFile={} topOfFile={} flag=[0]*countFile data=defaultdict(list) heap=[] countFinalFile=0 for i in xrange(countFile): fileName = pathOfFolder+'\index'+str(i)+'.txt.bz2' indexFile[i]= bz2.BZ2File(fileName, 'rb') flag[i]=1 topOfFile[i]=indexFile[i].readline().strip() listOfWords[i] = topOfFile[i].split(' ') if listOfWords[i][0] not in heap: heapq.heappush(heap, listOfWords[i][0]) while any(flag)==1: temp = heapq.heappop(heap) for i in xrange(countFile): if flag[i]==1: if listOfWords[i][0]==temp: //This is where I am stuck. I cannot wait until memory //error, as I need to do some postprocessing too. try: data[temp].extend(listOfWords[i][1:]) except MemoryError: writeFinalIndex(data, countFinalFile, pathOfFolder) data=defaultdict(list) countFinalFile+=1 topOfFile[i]=indexFile[i].readline().strip() if topOfFile[i]=='': flag[i]=0 indexFile[i].close() os.remove(pathOfFolder+'\index'+str(i)+'.txt.bz2') else: listOfWords[i] = topOfFile[i].split(' ') if listOfWords[i][0] not in heap: heapq.heappush(heap, listOfWords[i][0]) writeFinalIndex(data, countFinalFile, pathOfFolder) countFile is the number of files and writeFileIndex method writes into the file. -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote: On 2014-02-04 14:21, Dave Angel wrote: To get the total size of a list of strings, try (untested): a = sys.getsizeof (mylist ) for item in mylist: a += sys.getsizeof (item) I always find this sort of accumulation weird (well, at least in Python; it's the *only* way in many other languages) and would write it as a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist) -tkc This also doesn't gives the true size. I did the following: import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) where stopWords.txt is a file of size 4KB -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 5, 2014 11:05:05 AM UTC+5:30, Ayushi Dalmia wrote: This also doesn't gives the true size. I did the following: import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) where stopWords.txt is a file of size 4KB Try getsizeof(.join(data)) General advice: - You have been recommended (by Chris??) that you should use a database - You say you cant use a database (for whatever reason) Now the fact is you NEED database (functionality) How to escape this catch-22 situation? In computer science its called somewhat sardonically Greenspun's 10th rule And the best way out is to 1 isolate those aspects of database functionality you need 2 temporarily forget about your original problem and implement the dbms (subset of) DBMS functionality you need 3 Use 2 above to implement 1 -- https://mail.python.org/mailman/listinfo/python-list
Re: Finding size of Variable
On Wednesday, February 5, 2014 11:15:09 AM UTC+5:30, Rustom Mody wrote: On Wednesday, February 5, 2014 11:05:05 AM UTC+5:30, Ayushi Dalmia wrote: This also doesn't gives the true size. I did the following: import sys data=[] f=open('stopWords.txt','r') for line in f: line=line.split() data.extend(line) print sys.getsizeof(data) where stopWords.txt is a file of size 4KB Try getsizeof(.join(data)) General advice: - You have been recommended (by Chris??) that you should use a database - You say you cant use a database (for whatever reason) Now the fact is you NEED database (functionality) How to escape this catch-22 situation? In computer science its called somewhat sardonically Greenspun's 10th rule And the best way out is to 1 isolate those aspects of database functionality you need 2 temporarily forget about your original problem and implement the dbms (subset of) DBMS functionality you need 3 Use 2 above to implement 1 Hello Rustum, Thanks for the enlightenment. I did not know about the Greenspun's Tenth rule. It is interesting to know that. However, it is an academic project and not a research one. Hence I donot have the liberty to choose what to work with. Life is easier with databases though, but I am not allowed to use them. Thanks for the tip. I will try to replicate those functionality. -- https://mail.python.org/mailman/listinfo/python-list