Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Steven D'Aprano
Following up on my own post.

On Wed, 05 Mar 2014 07:52:01 +, Steven D'Aprano wrote:

 On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote:
 
 I stopped paying attention to mathematicians when they tried to
 convince me that the sum of all natural numbers is -1/12.
[...]
 In effect, the author Mark Carrol-Chu in the GoodMath blog above wants
 to make the claim that the divergent sum is not equal to ζ(-1), but
 everywhere you find that divergent sum in your calculations you can rub
 it out and replace it with ζ(-1), which is -1/12. In other words, he's
 accepting that the divergent sum behaves *as if* it were equal to -1/12,
 he just doesn't want to say that it *is* equal to -1/12.
 
 Is this a mere semantic trick, or a difference of deep and fundamental
 importance? Mark C-C thinks it's an important difference. Mathematicians
 who actually work on this stuff all the time think he's making a
 semantic trick to avoid facing up to the fact that sums of infinite
 sequences don't always behave like sums of finite sequences.

Here's another mathematician who is even more explicit about what she's 
complaining about:

http://blogs.scientificamerican.com/roots-of-unity/2014/01/20/is-the-sum-of-positive-integers-negative/

[quote]
There is a meaningful way to associate the number -1/12 to the 
series 1+2+3+4…, but in my opinion, it is misleading to call 
it the sum of the series.
[end quote]

Evelyn Lamb's objection isn't about the mathematics that leads to the 
conclusion that the sum of natural numbers is equivalent to -1/12. That's 
conclusion is pretty much bulletproof. Her objection is over the use of 
the word equals to describe that association. Or possibly the use of 
the word sum to describe what we're doing when we replace the infinite 
series with -1/12.

Whatever it is that we're doing, it doesn't seem to have the same 
behavioural properties as summing finitely many finite numbers. So 
perhaps she is right, and we shouldn't call the sum of a divergent series 
a sum?


-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread wxjmfauth
Mathematics?
The Flexible String Representation is a very nice example
of a mathematical absurdity.

jmf

PS Do not even think to expect to contradict me. Hint:
sheet of paper and pencil.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Oscar Benjamin
On 5 March 2014 07:52, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote:

 I stopped paying attention to mathematicians when they tried to convince
 me that the sum of all natural numbers is -1/12.

 I'm pretty sure they did not. Possibly a physicist may have tried to tell
 you that, but most mathematicians consider physicists to be lousy
 mathematicians, and the mere fact that they're results seem to actually
 work in practice is an embarrassment for the entire universe. A
 mathematician would probably have said that the sum of all natural
 numbers is divergent and therefore there is no finite answer.

Why the dig at physicists? I think most physicists would be able to
tell you that the sum of all natural numbers is not -1/12. In fact
most people with very little background in mathematics can tell you
that.

The argument that the sum of all natural numbers comes to -1/12 is
just some kind of hoax. I don't think *anyone* seriously believes it.

 Well, that is, apart from mathematicians like Euler and Ramanujan. When
 people like them tell you something, you better pay attention.

Really? Euler didn't even know about absolutely convergent series (the
point in question) and would quite happily combine infinite series to
obtain a formula.

snip
 Normally mathematicians will tell you that divergent series don't have a
 total. That's because often the total you get can vary depending on how
 you add them up. The classic example is summing the infinite series:

 1 - 1 + 1 - 1 + 1 - ...

There is a distinction between absolute convergence and convergence.
Rearranging the order of the terms in the above infinite sum is
invalid because the series is not absolutely convergent. For this
particular series there is no sense in which its sum converges on an
answer but there are other series that cannot be rearranged while
still being convergent:
http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#Alternating_harmonic_series

Personally I think it's reasonable to just say that the sum of the
natural numbers is infinite rather than messing around with terms like
undefined, divergent, or existence. There is a clear difference
between a series (or any limit) that fails to converge  asymptotically
and another that just goes to +-infinity. The difference is usually
also relevant to any practical application of this kind of maths.


Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Steven D'Aprano
On Wed, 05 Mar 2014 12:21:37 +, Oscar Benjamin wrote:

 On 5 March 2014 07:52, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote:

 I stopped paying attention to mathematicians when they tried to
 convince me that the sum of all natural numbers is -1/12.

 I'm pretty sure they did not. Possibly a physicist may have tried to
 tell you that, but most mathematicians consider physicists to be lousy
 mathematicians, and the mere fact that they're results seem to actually
 work in practice is an embarrassment for the entire universe. A
 mathematician would probably have said that the sum of all natural
 numbers is divergent and therefore there is no finite answer.
 
 Why the dig at physicists? 

There is considerable professional rivalry between the branches of 
science. Physicists tend to look at themselves as the paragon of 
scientific hardness, and look down at mere chemists, who look down at 
biologists. (Which is ironic really, since the actual difficulty in doing 
good science is in the opposite order. Hundreds of years ago, using quite 
primitive techniques, people were able to predict the path of comets 
accurately. I'd like to see them predict the path of a house fly.) 
According to this greedy reductionist viewpoint, since all living 
creatures are made up of chemicals, biology is just a subset of 
chemistry, and since chemicals are made up of atoms, chemistry is 
likewise just a subset of physics.

Physics is the fundamental science, at least according to the physicists, 
and Real Soon Now they'll have a Theory Of Everything, something small 
enough to print on a tee-shirt, which will explain everything. At least 
in principle.

Theoretical physicists who work on the deep, fundamental questions of 
Space and Time tend to be the worst for this reductionist streak. They 
have a tendency to think of themselves as elites in an elite field of 
science. Mathematicians, possibly out of professional jealousy, like to 
look down at physics as mere applied maths.

They also get annoyed that physicists often aren't as vigorous with their 
maths as they should be. The controversy over renormalisation in Quantum 
Electrodynamics (QED) is a good example. When you use QED to try to 
calculate the strength of the electron's electric field, you end up 
trying to sum a lot of infinities. Basically, the interaction of the 
electron's charge with it's own electric field gets larger the more 
closely you look. The sum of all those interactions is a divergent 
series. So the physicists basically cancelled out all the infinities, and 
lo and behold just like magic what's left over gives you the right 
answer. Richard Feynman even described it as hocus-pocus.

The mathematicians *hated* this, and possibly still do, because it looks 
like cheating. It's certainly not vigorous, at least it wasn't back in 
the 1940s. The mathematicians were appalled, and loudly said You can't 
do that! and the physicists basically said Oh yeah, watch us! and 
ignored them, and then the Universe had the terribly bad manners to side 
with the physicists. QED has turned out to be *astonishingly* accurate, 
the most accurate physical theory of all time. The hocus-pocus worked.


 I think most physicists would be able to tell
 you that the sum of all natural numbers is not -1/12. In fact most
 people with very little background in mathematics can tell you that.

Ah, but there's the rub. People with *very little* background in 
mathematics will tell you that. People with *a very deep and solid* 
background in mathematics will tell you different, particularly if their 
background is complex analysis. (That's *complex numbers*, not 
complicated -- although it is complicated too.)


 The argument that the sum of all natural numbers comes to -1/12 is just
 some kind of hoax. I don't think *anyone* seriously believes it.

You would be wrong. I suggest you read the links I gave earlier. Even the 
mathematicians who complain about describing this using the word equals 
don't try to dispute the fact that you can identify the sum of natural 
numbers with ζ(-1), or that ζ(-1) = -1/12. They simply dispute that we 
should describe this association as equals.

What nobody believes is that the sum of natural numbers is a convergent 
series that sums to -1/12, because it is provably not.

In other words, this is not an argument about the maths. Everyone who 
looks at the maths has to admit that it is sound. It's an argument about 
the words we use to describe this. Is it legitimate to say that the 
infinite sum *equals* -1/12? Or only that the series has the value -1/12? 
Or that we can associate (talk about a sloppy, non-vigorous term!) the 
series with -1/12?


 Well, that is, apart from mathematicians like Euler and Ramanujan. When
 people like them tell you something, you better pay attention.
 
 Really? Euler didn't even know about absolutely convergent series (the
 point in question) and would quite happily 

Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Chris Angelico
On Thu, Mar 6, 2014 at 4:43 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 Physics is the fundamental science, at least according to the physicists,
 and Real Soon Now they'll have a Theory Of Everything, something small
 enough to print on a tee-shirt, which will explain everything. At least
 in principle.

Everything is, except what isn't.

That's my theory, and I'm sticking to it!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Chris Kaynor
On Wed, Mar 5, 2014 at 9:43 AM, Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info wrote:

 At one time, Euler summed an infinite series and got -1, from which he
 concluded that -1 was (in some sense) larger than infinity. I don't know
 what justification he gave, but the way I think of it is to take the
 number line from -∞ to +∞ and then bend it back upon itself so that there
 is a single infinity, rather like the projective plane only in a single
 dimension. If you start at zero and move towards increasingly large
 numbers, then like Buzz Lightyear you can go to infinity and beyond:

 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0


This makes me think that maybe the universe is using ones or two complement
math (is there a negative zero?)...

Chris
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Grant Edwards
On 2014-03-05, Chris Kaynor ckay...@zindagigames.com wrote:
 On Wed, Mar 5, 2014 at 9:43 AM, Steven D'Aprano 
 steve+comp.lang.pyt...@pearwood.info wrote:

 At one time, Euler summed an infinite series and got -1, from which he
 concluded that -1 was (in some sense) larger than infinity. I don't know
 what justification he gave, but the way I think of it is to take the
 number line from -∞ to +∞ and then bend it back upon itself so that there
 is a single infinity, rather like the projective plane only in a single
 dimension. If you start at zero and move towards increasingly large
 numbers, then like Buzz Lightyear you can go to infinity and beyond:

 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0


 This makes me think that maybe the universe is using ones or two complement
 math (is there a negative zero?)...

If the Universe (like most all Python implementations) is using
IEEE-754 floating point, there is.

-- 
Grant Edwards   grant.b.edwardsYow! This PIZZA symbolizes
  at   my COMPLETE EMOTIONAL
  gmail.comRECOVERY!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Oscar Benjamin
On 5 March 2014 17:43, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 On Wed, 05 Mar 2014 12:21:37 +, Oscar Benjamin wrote:

 The argument that the sum of all natural numbers comes to -1/12 is just
 some kind of hoax. I don't think *anyone* seriously believes it.

 You would be wrong. I suggest you read the links I gave earlier. Even the
 mathematicians who complain about describing this using the word equals
 don't try to dispute the fact that you can identify the sum of natural
 numbers with ζ(-1), or that ζ(-1) = -1/12. They simply dispute that we
 should describe this association as equals.

 What nobody believes is that the sum of natural numbers is a convergent
 series that sums to -1/12, because it is provably not.

 In other words, this is not an argument about the maths. Everyone who
 looks at the maths has to admit that it is sound. It's an argument about
 the words we use to describe this. Is it legitimate to say that the
 infinite sum *equals* -1/12? Or only that the series has the value -1/12?
 Or that we can associate (talk about a sloppy, non-vigorous term!) the
 series with -1/12?

This is the point. You can identify numbers with many different
things. It does not mean to say that the thing is equal to that
number. I can associate the number 2 with my bike since it has 2
wheels. That doesn't mean that the bike is equal to 2.

So the problem with saying that the sum of the natural numbers equals
-1/12 is precisely as you say with the word equals because they're
not equal!

If you restate the conclusion in more accurate (but technical and less
accessible) way that the analytic continuation of a related set of
convergent series has the value -1/12 at the value that would
correspond to this divergent series then it becomes less mysterious.
Do I really have to associate the finite negative value found in the
analytic continuation with the sum of the series that is provably
greater than any finite number?

snip

 At one time, Euler summed an infinite series and got -1, from which he
 concluded that -1 was (in some sense) larger than infinity. I don't know
 what justification he gave, but the way I think of it is to take the
 number line from -∞ to +∞ and then bend it back upon itself so that there
 is a single infinity, rather like the projective plane only in a single
 dimension. If you start at zero and move towards increasingly large
 numbers, then like Buzz Lightyear you can go to infinity and beyond:

 0 - 1 - 10 - 1 - ... ∞ - ... -1 - -10 - -1 - 0

 In this sense, -1/12 is larger than infinity.

There are many examples that appear to show wrapping round from
+infinity to -infinity e.g. the tan function. The thing is that it is
not really physical (or meaningful in any direct sense).

So for example I might consider the forces on a particle, apply
Newton's 2nd law and arrive at a differential equation for the
acceleration of the particle, solve the equation and find that the
position of the particle at time t is given by tan(t). This would seem
to imply that as t increases toward pi/2 the particle heads off
infinity miles West but at the exact time pi/2 it wraps around to
reappear at infinity miles East and starts heading back toward its
starting point. The truth is less interesting: the solution tan(t)
becomes invalid at pi/2 and mathematics can tell us nothing about what
happens after that even if all the physics we used was exactly true.

 Now of course this is an ad hoc sloppy argument, but I'm not a
 professional mathematician. However I can tell you that it's pretty close
 to what the professional mathematicians and physicists do with negative
 absolute temperatures, and that is rigorous.

 http://en.wikipedia.org/wiki/Negative_temperature

The key point from that page is the sentence A definition of
temperature can be based on the relationship  It is clear that
temperature is a theoretical abstraction. We have intuitive
understandings of what it means but in order for the current body of
thermodynamic theory to be consistent it is necessary to sometimes
give negative values to the temperature. There's nothing unintuitive
about negative temperatures if you understand the usual thermodynamic
definitions of temperature.

 Personally I think it's reasonable to just say that the sum of the
 natural numbers is infinite rather than messing around with terms like
 undefined, divergent, or existence. There is a clear difference between
 a series (or any limit) that fails to converge  asymptotically and
 another that just goes to +-infinity. The difference is usually also
 relevant to any practical application of this kind of maths.

 And this is where you get it exactly backwards. The *practical
 application* comes from physics, where they do exactly what you argue
 against: they associate ζ(-1) with the sum of the natural numbers (see, I
 too can avoid the word equals too), and *it works*.

I don't know all the details of what they do there and whether or not

Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Roy Smith
In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com,
 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:

 Physics is the fundamental science, at least according to the physicists, 
 and Real Soon Now they'll have a Theory Of Everything, something small 
 enough to print on a tee-shirt, which will explain everything. At least 
 in principle.

A mathematician, a chemist, and a physicist are arguing the nature of 
prime numbers.  The chemist says, All odd numbers are prime.  Look, I 
can prove it.  Three is prime.  Five is prime.  Seven is prime.  The 
mathematician says, That's nonsense.  Nine is not prime.  The 
physicist looks at him and says, H, you may be right, but eleven 
is prime, and thirteen is prime.  It appears that within the limits of 
experimental error, all odd number are indeed prime!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Steven D'Aprano
On Wed, 05 Mar 2014 21:31:51 -0500, Roy Smith wrote:

 In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com,
  Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
 
 Physics is the fundamental science, at least according to the
 physicists, and Real Soon Now they'll have a Theory Of Everything,
 something small enough to print on a tee-shirt, which will explain
 everything. At least in principle.
 
 A mathematician, a chemist, and a physicist are arguing the nature of
 prime numbers.  The chemist says, All odd numbers are prime.  Look, I
 can prove it.  Three is prime.  Five is prime.  Seven is prime.  The
 mathematician says, That's nonsense.  Nine is not prime.  The
 physicist looks at him and says, H, you may be right, but eleven is
 prime, and thirteen is prime.  It appears that within the limits of
 experimental error, all odd number are indeed prime!

They ask a computer programmer to adjudicate who is right, so he writes a 
program to print out all the primes:

1 is prime
1 is prime
1 is prime
1 is prime
1 is prime
...



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Chris Angelico
On Thu, Mar 6, 2014 at 2:06 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 They ask a computer programmer to adjudicate who is right, so he writes a
 program to print out all the primes:

 1 is prime
 1 is prime
 1 is prime
 1 is prime
 1 is prime

And he claimed that he was correct, because he had - as is known to be
true in reality - a countably infinite number of primes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Grant Edwards
On 2014-03-06, Roy Smith r...@panix.com wrote:
 In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com,
  Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:

 Physics is the fundamental science, at least according to the
 physicists, and Real Soon Now they'll have a Theory Of Everything,
 something small enough to print on a tee-shirt, which will explain
 everything. At least in principle.

 A mathematician, a chemist, and a physicist are arguing the nature of 
 prime numbers.  The chemist says, All odd numbers are prime.  Look, I 
 can prove it.  Three is prime.  Five is prime.  Seven is prime.  The 
 mathematician says, That's nonsense.  Nine is not prime.  The 
 physicist looks at him and says, H, you may be right, but eleven 
 is prime, and thirteen is prime.  It appears that within the limits of 
 experimental error, all odd number are indeed prime!

Assuming spherical odd numbers in a vacuum on a frictionless surface,
of course.

-- 
Grant


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-05 Thread Roy Smith
In article 5317e640$0$29985$c3e8da3$54964...@news.astraweb.com,
 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:

 On Wed, 05 Mar 2014 21:31:51 -0500, Roy Smith wrote:
 
  In article 53176225$0$29987$c3e8da3$54964...@news.astraweb.com,
   Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
  
  Physics is the fundamental science, at least according to the
  physicists, and Real Soon Now they'll have a Theory Of Everything,
  something small enough to print on a tee-shirt, which will explain
  everything. At least in principle.
  
  A mathematician, a chemist, and a physicist are arguing the nature of
  prime numbers.  The chemist says, All odd numbers are prime.  Look, I
  can prove it.  Three is prime.  Five is prime.  Seven is prime.  The
  mathematician says, That's nonsense.  Nine is not prime.  The
  physicist looks at him and says, H, you may be right, but eleven is
  prime, and thirteen is prime.  It appears that within the limits of
  experimental error, all odd number are indeed prime!
 
 They ask a computer programmer to adjudicate who is right, so he writes a 
 program to print out all the primes:
 
 1 is prime
 1 is prime
 1 is prime
 1 is prime
 1 is prime
 ...

So, a mathematician, a biologist, and a physicist are watching a house.  
The physicist says, It appears to be empty.  Sometime later, a man and 
a woman go into the house.  Shortly after that, the man and the woman 
come back out, with a child.  The biologist says, They must have 
reproduced.  The mathematician says, If one more person goes into the 
house, it'll be empty again.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Ian Kelly
On Mon, Mar 3, 2014 at 11:35 PM, Chris Angelico ros...@gmail.com wrote:
 In constant space, that will produce the sum of two infinite sequences
 of digits. (And it's constant time, too, except when it gets a stream
 of nines. Adding three thirds together will produce an infinite loop
 as it waits to see if there'll be anything that triggers an infinite
 cascade of carries.) Now, if there's a way to do that for square
 rooting a number, then the CF notation has a distinct benefit over the
 decimal expansion used here. As far as I know, there's no simple way,
 in constant space and/or time, to progressively yield more digits of a
 number's square root, working in decimal.

The code for that looks like this:

def cf_sqrt(n):
Yield the terms of the square root of n as a continued fraction.
   m = 0
d = 1
a = a0 = floor_sqrt(n)
while True:
yield a
next_m = d * a - m
next_d = (n - next_m * next_m) // d
if next_d == 0:
break
next_a = (a0 + next_m) // next_d
m, d, a = next_m, next_d, next_a


def floor_sqrt(n):
Return the integer part of the square root of n.
n = int(n)
if n == 0: return 0
lower = 2 ** int(math.log(n, 2) // 2)
upper = lower * 2
while upper - lower  1:
mid = (upper + lower) // 2
if n  mid * mid:
upper = mid
else:
lower = mid
return lower


The floor_sqrt function is merely doing a simple binary search and
could probably be optimized, but then it's only called once during
initialization anyway.  The meat of the loop, as you can see, is just
a constant amount of integer arithmetic.  If it were desired to halt
once the continued fraction starts to repeat, that would just be a
matter of checking whether the triple (m, d, a) has been seen already.

Going back to your example of adding generated digits though, I don't
know how to add two continued fractions together without evaluating
them.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Ian Kelly
On Tue, Mar 4, 2014 at 4:19 AM, Ian Kelly ian.g.ke...@gmail.com wrote:
 def cf_sqrt(n):
 Yield the terms of the square root of n as a continued fraction.
m = 0
 d = 1
 a = a0 = floor_sqrt(n)
 while True:
 yield a
 next_m = d * a - m
 next_d = (n - next_m * next_m) // d
 if next_d == 0:
 break
 next_a = (a0 + next_m) // next_d
 m, d, a = next_m, next_d, next_a

Sorry, all that next business is totally unnecessary.  More simply:

def cf_sqrt(n):
Yield the terms of the square root of n as a continued fraction.
m = 0
d = 1
a = a0 = floor_sqrt(n)
while True:
yield a
m = d * a - m
d = (n - m * m) // d
if d == 0:
break
a = (a0 + m) // d
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Albert van der Horst
In article mailman.7702.1393932047.18130.python-l...@python.org,
Ian Kelly  ian.g.ke...@gmail.com wrote:
On Mon, Mar 3, 2014 at 11:35 PM, Chris Angelico ros...@gmail.com wrote:
 In constant space, that will produce the sum of two infinite sequences
 of digits. (And it's constant time, too, except when it gets a stream
 of nines. Adding three thirds together will produce an infinite loop
 as it waits to see if there'll be anything that triggers an infinite
 cascade of carries.) Now, if there's a way to do that for square
 rooting a number, then the CF notation has a distinct benefit over the
 decimal expansion used here. As far as I know, there's no simple way,
 in constant space and/or time, to progressively yield more digits of a
 number's square root, working in decimal.

The code for that looks like this:

def cf_sqrt(n):
Yield the terms of the square root of n as a continued fraction.
   m = 0
d = 1
a = a0 = floor_sqrt(n)
while True:
yield a
next_m = d * a - m
next_d = (n - next_m * next_m) // d
if next_d == 0:
break
next_a = (a0 + next_m) // next_d
m, d, a = next_m, next_d, next_a


def floor_sqrt(n):
Return the integer part of the square root of n.
n = int(n)
if n == 0: return 0
lower = 2 ** int(math.log(n, 2) // 2)
upper = lower * 2
while upper - lower  1:
mid = (upper + lower) // 2
if n  mid * mid:
upper = mid
else:
lower = mid
return lower


The floor_sqrt function is merely doing a simple binary search and
could probably be optimized, but then it's only called once during
initialization anyway.  The meat of the loop, as you can see, is just
a constant amount of integer arithmetic.  If it were desired to halt
once the continued fraction starts to repeat, that would just be a
matter of checking whether the triple (m, d, a) has been seen already.

Going back to your example of adding generated digits though, I don't
know how to add two continued fractions together without evaluating
them.

That is highly non-trivial indeed. See the gosper.txt reference
I gave in another post.

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Albert van der Horst
In article mailman.7687.1393902132.18130.python-l...@python.org,
Chris Angelico  ros...@gmail.com wrote:
On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst
alb...@spenarnc.xs4all.nl wrote:
No, the Python built-in float type works with a subset of real numbers:

 To be more precise: a subset of the rational numbers, those with a 
 denominator
 that is a power of two.

And no more than N bits (53 in a 64-bit float) in the numerator, and
the denominator between the limits of the exponent. (Unless it's
subnormal. That adds another set of small numbers.) It's a pretty
tight set of restrictions, and yet good enough for so many purposes.

But it's a far cry from all real numbers. Even allowing for
continued fractions adds only some more; I don't think you can
represent surds that way.

Adding cf's adds all computable numbers in infinite precision.
However that is not even a drop in the ocean, as the computable
numbers have measure zero.
A cf object yielding its coefficients amounts to a program that generates
an infinite amount of data (in infinite time), so it is not
very surprising it can represent any computable number.

Pretty humbling really.


ChrisA

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Steven D'Aprano
On Wed, 05 Mar 2014 02:15:14 +, Albert van der Horst wrote:

 Adding cf's adds all computable numbers in infinite precision. However
 that is not even a drop in the ocean, as the computable numbers have
 measure zero.

On the other hand, it's not really clear that the non-computable numbers 
are useful or necessary for anything. They exist as mathematical 
abstractions, but they'll never be the result of any calculation or 
measurement that anyone might do.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Rustom Mody
On Wednesday, March 5, 2014 9:11:13 AM UTC+5:30, Steven D'Aprano wrote:
 On Wed, 05 Mar 2014 02:15:14 +, Albert van der Horst wrote:

  Adding cf's adds all computable numbers in infinite precision. However
  that is not even a drop in the ocean, as the computable numbers have
  measure zero.

 On the other hand, it's not really clear that the non-computable numbers 
 are useful or necessary for anything. They exist as mathematical 
 abstractions, but they'll never be the result of any calculation or 
 measurement that anyone might do.

There are even more extreme versions of this amounting to roughly this view:
Any infinity supposedly 'larger' than the natural numbers is a nonsensical 
notion.

See eg
http://en.wikipedia.org/wiki/Controversy_over_Cantor%27s_theory

and Weyl/Polya bet (pg 10 of 
http://research.microsoft.com/en-us/um/people/gurevich/Opera/123.pdf )

I cannot find the exact quote so from memory Weyl says something to this effect:

Cantor's diagonalization PROOF is not in question.
Its CONCLUSION very much is.
The classical/platonic mathematician (subject to wooly thinking) concludes that 
the real numbers are a superset of the integers

The constructvist mathematician (who supposedly thinks clearly) only concludes
the obvious, viz that real numbers cannot be enumerated

To go from 'cannot be enumerated' to 'is a proper superset of' requires the 
assumption of 'completed infinities' and that is not math but theology
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Roy Smith
In article c39d5b44-6c7b-40d1-bbb5-791a36af6...@googlegroups.com,
 Rustom Mody rustompm...@gmail.com wrote:

 I cannot find the exact quote so from memory Weyl says something to this 
 effect:
 
 Cantor's diagonalization PROOF is not in question.
 Its CONCLUSION very much is.
 The classical/platonic mathematician (subject to wooly thinking) concludes 
 that 
 the real numbers are a superset of the integers
 
 The constructvist mathematician (who supposedly thinks clearly) only 
 concludes
 the obvious, viz that real numbers cannot be enumerated
 
 To go from 'cannot be enumerated' to 'is a proper superset of' requires the 
 assumption of 'completed infinities' and that is not math but theology

I stopped paying attention to mathematicians when they tried to convince 
me that the sum of all natural numbers is -1/12.  Sure, you can 
manipulate the symbols in a way which is consistent with some set of 
rules that we believe govern the legal manipulation of symbols, but it 
just plain doesn't make sense.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-04 Thread Steven D'Aprano
On Tue, 04 Mar 2014 23:25:37 -0500, Roy Smith wrote:

 I stopped paying attention to mathematicians when they tried to convince
 me that the sum of all natural numbers is -1/12.  

I'm pretty sure they did not. Possibly a physicist may have tried to tell 
you that, but most mathematicians consider physicists to be lousy 
mathematicians, and the mere fact that they're results seem to actually 
work in practice is an embarrassment for the entire universe. A 
mathematician would probably have said that the sum of all natural 
numbers is divergent and therefore there is no finite answer.

Well, that is, apart from mathematicians like Euler and Ramanujan. When 
people like them tell you something, you better pay attention.

We have an intuitive understanding of the properties of addition. You 
can't add 1000 positive whole numbers and get a negative fraction, that's 
obvious. But that intuition only applies to *finite* sums. They don't 
even apply to infinite *convergent* series, and they're *easy*. Remember 
Zeno's Paradoxes? People doubted that the convergent series:

1/2 + 1/4 + 1/8 + 1/16 + ... 

added up to 1 for the longest time, even though they could see with their 
own eyes that it had to. Until they worked out what *infinite* sums 
actually meant, their intuitions were completely wrong. This is a good 
lesson for us all.

The sum of all the natural numbers is a divergent infinite series, so we 
shouldn't expect that our intuitions hold. We can't add it up as if it 
were a convergent series, because it's not convergent. Nobody disputes 
that. But perhaps there's another way?

Normally mathematicians will tell you that divergent series don't have a 
total. That's because often the total you get can vary depending on how 
you add them up. The classic example is summing the infinite series:

1 - 1 + 1 - 1 + 1 - ... 

Depending on how you group them, you can get:

(1 - 1) + (1 - 1) + (1 - 1) ...  
= 0 + 0 + 0 + ... = 0

or you can get:

1 - (1 - 1 + 1 - 1 + ... ) 
= 1 - (1 - 1) - (1 - 1) - ... )
= 1 - 0 - 0 - 0 ... 
= 1

Or you can do a neat little trick where we define the sum as x:

x = 1 - 1 + 1 - 1 + 1 - ... 
x = 1 - (1 - 1 + 1 - 1 + ... )
x = 1 - x
2x = 1
x = 1/2


So at first glance, summing a divergent series is like dividing by zero. 
You get contradictory results, at least in this case.

But that's not necessarily always the case. You do have to be careful 
when summing divergent series, but that doesn't always mean you can't do 
it and get a meaningful answer. Sometimes you can, sometimes you can't, 
it depends on the specific series. With the sum of the natural numbers, 
rather than getting three different results from three different methods, 
mathematicians keep getting the same -1/12 result using various methods. 
That's a good hint that there is something logically sound going on here, 
even if it seems unintuitive.

Remember Zeno's Paradoxes? Our intuitions about equality and plus and 
sums of numbers don't apply to infinite series. We should be at least 
open to the possibility that while all the *finite* sums:

1 + 2
1 + 2 + 3
1 + 2 + 3 + 4
...

and so on sum to positive whole numbers, that doesn't mean that the 
*infinite* sum has to total to a positive whole number. Maybe that's not 
how addition works. I don't know about you, but I've never personally 
added up an infinite number of every-increasing quantities to see what 
the result is. Maybe it is a negative fraction. (I'd say try it and 
see, but I don't have an infinite amount of time to spend on it.)

And in fact that's exactly what seems to be case here. Mathematicians can 
demonstrate an identity (that is, equality) between the divergent sum of 
the natural numbers with the zeta function ζ(-1), and *that* can be 
worked out independently, and equals -1/12.

So there are a bunch of different ways to show that the divergent sum 
adds up to -1/12, some of them are more vigorous than others. The zeta 
function method is about as vigorous as they come. The addition of an 
infinite number of things behaves differently than the addition of finite 
numbers of things.

More here:

http://scitation.aip.org/content/aip/magazine/physicstoday/news/10.1063/PT.5.8029

http://math.ucr.edu/home/baez/week126.html

http://en.wikipedia.org/wiki/1_+_2_+_3_+_4_+_%E2%8B%AF

and even here:

http://scientopia.org/blogs/goodmath/2014/01/20/oy-veh-power-series-analytic-continuations-and-riemann-zeta/

where a mathematician tries *really hard* to discredit the idea that the 
sum equals -1/12, but ends up proving that it does. So he simply plays a 
linguistic slight of hand and claims that despite the series and the zeta 
function being equal, they're not *actually* equal.

In effect, the author Mark Carrol-Chu in the GoodMath blog above wants 
to make the claim that the divergent sum is not equal to ζ(-1), but 
everywhere you find that divergent sum in your calculations you can rub 
it out and replace it with ζ(-1), which is -1/12. In other words, he's 

Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Albert van der Horst
In article mailman.6735.1392194885.18130.python-l...@python.org,
Chris Angelico  ros...@gmail.com wrote:
On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney ben+pyt...@benfinney.id.au wrote:
 Chris Angelico ros...@gmail.com writes:

 I have yet to find any computer that works with the set of real
 numbers in any way. Never mind optimization, they simply cannot work
 with real numbers.

 Not *any* computer? Not in *any* way? The Python built-in ‘float’ type
 “works with the set of real numbers”, in a way.

No, the Python built-in float type works with a subset of real numbers:

To be more precise: a subset of the rational numbers, those with a denominator
that is a power of two.

 float(pi)
Traceback (most recent call last):
  File pyshell#1, line 1, in module
float(pi)
ValueError: could not convert string to float: 'pi'
 float(π)
Traceback (most recent call last):
  File pyshell#2, line 1, in module
float(π)
ValueError: could not convert string to float: 'π'

Same goes for fractions.Fraction and [c]decimal.Decimal. All of them
are restricted to some subset of rational numbers, not all reals.

 The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC
 defines behaviours for types implementing the set of real numbers.

 What specific behaviour would, for you, qualify as “works with the set
 of real numbers in any way”?

Being able to represent surds, pi, e, etc, for a start. It'd
theoretically be possible with an algebraic notation (eg by carrying
through some representation like 2*pi rather than 6.28), but
otherwise, irrationals can't be represented with finite storage and a
digit-based system.

An interesting possibility is working with rules that generate the
continued fraction sequence of a real number. Say yield() gives the
next coefficient (or the next hex digit).
It was generally believed that summing two numbers in their cf representation
was totally impractical because it required conversion to a rational number.
OTOH if we consider a cf as an ongoing progress, the situation is much better.
Summing would be a process that yields coefficients of the sum, and you could
just stop when you've  enough precision. Fascinating stuff.

It is described in a self contained, type writer style document gosper.txt
that is found on the web in several places e.g.

http://home.strw.leidenuniv.nl/~gurkan/gosper.pdf
I have a gosper.txt, don't know from where.

It really is a cookbook, one could built a python implementation from
there, without being overly math savvy. I'd love to hear if
some one does it.

( in principle a coefficient of a cf can overflow machine precision,
that has never been observed in the wild. A considerable percentage
of the coefficients for a random number are ones or otherwise small.
The golden ratio has all ones.)

ChrisA

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spearc.xs4all.nl =n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Chris Angelico
On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst
alb...@spenarnc.xs4all.nl wrote:
No, the Python built-in float type works with a subset of real numbers:

 To be more precise: a subset of the rational numbers, those with a denominator
 that is a power of two.

And no more than N bits (53 in a 64-bit float) in the numerator, and
the denominator between the limits of the exponent. (Unless it's
subnormal. That adds another set of small numbers.) It's a pretty
tight set of restrictions, and yet good enough for so many purposes.

But it's a far cry from all real numbers. Even allowing for
continued fractions adds only some more; I don't think you can
represent surds that way.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Rustom Mody
On Tuesday, March 4, 2014 8:32:01 AM UTC+5:30, Chris Angelico wrote:
 On Tue, Mar 4, 2014 at 1:45 PM, Albert van der Horst wrote:
 No, the Python built-in float type works with a subset of real numbers:
  To be more precise: a subset of the rational numbers, those with a 
  denominator
  that is a power of two.

 And no more than N bits (53 in a 64-bit float) in the numerator, and
 the denominator between the limits of the exponent. (Unless it's
 subnormal. That adds another set of small numbers.) It's a pretty
 tight set of restrictions, and yet good enough for so many purposes.

 But it's a far cry from all real numbers. Even allowing for
 continued fractions adds only some more; I don't think you can
 represent surds that way.

See

http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Chris Angelico
On Tue, Mar 4, 2014 at 2:13 PM, Rustom Mody rustompm...@gmail.com wrote:
 But it's a far cry from all real numbers. Even allowing for
 continued fractions adds only some more; I don't think you can
 represent surds that way.

 See

 http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts

That's neat, didn't know that. Is there an efficient way to figure
out, for any integer N, what its sqrt's CF sequence is? And what about
the square roots of non-integers - can you represent √π that way? I
suspect, though I can't prove, that there will be numbers that can't
be represented even with an infinite series - or at least numbers
whose series can't be easily calculated.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Rustom Mody
On Tuesday, March 4, 2014 9:16:25 AM UTC+5:30, Chris Angelico wrote:
 On Tue, Mar 4, 2014 at 2:13 PM, Rustom Mody  wrote:
  But it's a far cry from all real numbers. Even allowing for
  continued fractions adds only some more; I don't think you can
  represent surds that way.
  See
  http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/cfINTRO.html#sqrts

 That's neat, didn't know that. Is there an efficient way to figure
 out, for any integer N, what its sqrt's CF sequence is? And what about
 the square roots of non-integers - can you represent √π that way? I
 suspect, though I can't prove, that there will be numbers that can't
 be represented even with an infinite series - or at least numbers
 whose series can't be easily calculated.

You are now asking questions that are really (real-ly?) outside my capacities.

What I know (which may be quite off the mark :-) )

Just as all real numbers almost by definition have a decimal form (may
be infinite eg 1/3 becomes 0.3...) all real numbers likewise have a CF form

For some mathematical (aka arcane) reasons the CF form is actually better.

Furthermore:

1. Transcendental numbers like e and pi have non-repeating infinite CF forms
2. Algebraic numbers (aka surds) have repeating maybe finite(?) forms
3. For some numbers its not known whether they are transcendental or not
(vague recollection pi^sqrt(pi) is one such)
4 Since e^ipi is very much an integer, above question is surprisingly 
non-trivial
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Steven D'Aprano
On Tue, 04 Mar 2014 14:46:25 +1100, Chris Angelico wrote:

 That's neat, didn't know that. Is there an efficient way to figure out,
 for any integer N, what its sqrt's CF sequence is? And what about the
 square roots of non-integers - can you represent √π that way? I suspect,
 though I can't prove, that there will be numbers that can't be
 represented even with an infinite series - or at least numbers whose
 series can't be easily calculated.

Every rational number can be written as a continued fraction with a 
finite number of terms[1]. Every irrational number can be written as a 
continued fraction with an infinite number of terms, just as every 
irrational number can be written as a decimal number with an infinite 
number of digits. Most of them (to be precise: an uncountably infinite 
number of them) will have no simple or obvious pattern.


[1] To be pedantic, written as *two* continued fractions, one ending with 
the term 1, and one with one less term which isn't 1. That is:

[a; b, c, d, ..., z, 1] == [a; b, c, d, ..., z+1]


Any *finite* CF ending with one can be simplified to use one fewer term. 
Infinite CFs of course don't have a last term.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-03-03 Thread Chris Angelico
On Tue, Mar 4, 2014 at 4:53 PM, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 04 Mar 2014 14:46:25 +1100, Chris Angelico wrote:

 That's neat, didn't know that. Is there an efficient way to figure out,
 for any integer N, what its sqrt's CF sequence is? And what about the
 square roots of non-integers - can you represent √π that way? I suspect,
 though I can't prove, that there will be numbers that can't be
 represented even with an infinite series - or at least numbers whose
 series can't be easily calculated.

 Every irrational number can be written as a
 continued fraction with an infinite number of terms, just as every
 irrational number can be written as a decimal number with an infinite
 number of digits.

It's easy enough to have that kind of expansion, I'm wondering if it's
possible to identify it directly. To render the decimal expansion of a
square root by the cut-and-try method, you effectively keep dividing
until you find that you're close enough; that means you (a) have to
keep the entire number around for each step, and (b) need to do a few
steps to find that the digits aren't changing. But if you can take a
CF (finite or infinite) and do an O(n) transformation on it to produce
that number's square root, then you have an effective means of
representing square roots. Suppose I make a generator function that
represents a fraction:

def one_third():
while True:
yield 3

def one_seventh():
while True:
yield 1; yield 4; yield 2; yield 8; yield 5; yield 7

I could then make a generator that returns the sum of those two:

def add_without_carry(x, y):
whiile True:
yield next(x)+next(y)

Okay, that's broken for nearly any case, but with a bit more sophistication:

def add(x, y):
prev=None
nines=0
while True:
xx,yy=next(x),next(y)
tot=xx+yy
if tot==9:
nines+=1
continue
if tot9:
if prev is None: raise OverflowError(exceeds 1.0)
yield prev+1
tot-=10
for _ in range(nines): yield 0
nines=0
else:
if prev is not None: yield prev
prev=tot

def show(n):
return ''.join(str(_) for _ in itertools.islice(n,20))

 show(add(one_third(),one_seventh()))
'47619047619047619047'
 show(add(add(add(one_seventh(),one_seventh()),add(one_seventh(),one_seventh())),add(one_seventh(),one_seventh(
'85714285714285714285'

In constant space, that will produce the sum of two infinite sequences
of digits. (And it's constant time, too, except when it gets a stream
of nines. Adding three thirds together will produce an infinite loop
as it waits to see if there'll be anything that triggers an infinite
cascade of carries.) Now, if there's a way to do that for square
rooting a number, then the CF notation has a distinct benefit over the
decimal expansion used here. As far as I know, there's no simple way,
in constant space and/or time, to progressively yield more digits of a
number's square root, working in decimal.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Chris Angelico
On Wed, Feb 12, 2014 at 6:49 PM,  wxjmfa...@gmail.com wrote:
 The day you find an operator working on the set of
 reals (R) and it is somehow optimized for N
 (the subset of natural numbers), let me know.

I have yet to find any computer that works with the set of real
numbers in any way. Never mind optimization, they simply cannot work
with real numbers.

As to operations that are optimized for integers (usually not for
naturals - supporting zero and negatives isn't hard), they are legion.
In Python, integers have arbitrary precision, but floats, Fractions,
and Decimals, don't. Nearly any operation on arbitrarily large numbers
will be either more accurate or more efficient (maybe both) with
integers than with any of the other types.

Letting you know, that's all.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread Ben Finney
Chris Angelico ros...@gmail.com writes:

 I have yet to find any computer that works with the set of real
 numbers in any way. Never mind optimization, they simply cannot work
 with real numbers.

Not *any* computer? Not in *any* way? The Python built-in ‘float’ type
“works with the set of real numbers”, in a way.

The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC
defines behaviours for types implementing the set of real numbers.

What specific behaviour would, for you, qualify as “works with the set
of real numbers in any way”?

-- 
 \  “The fact that I have no remedy for all the sorrows of the |
  `\ world is no reason for my accepting yours. It simply supports |
_o__)  the strong probability that yours is a fake.” —Henry L. Mencken |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread wxjmfauth
Integers are integers. (1)
Characters are characters. (2)

(1) is a unique natural set.

(2) is an artificial construct working
with 3 sets (unicode).

jmf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread Chris Angelico
On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney ben+pyt...@benfinney.id.au wrote:
 Chris Angelico ros...@gmail.com writes:

 I have yet to find any computer that works with the set of real
 numbers in any way. Never mind optimization, they simply cannot work
 with real numbers.

 Not *any* computer? Not in *any* way? The Python built-in ‘float’ type
 “works with the set of real numbers”, in a way.

No, the Python built-in float type works with a subset of real numbers:

 float(pi)
Traceback (most recent call last):
  File pyshell#1, line 1, in module
float(pi)
ValueError: could not convert string to float: 'pi'
 float(π)
Traceback (most recent call last):
  File pyshell#2, line 1, in module
float(π)
ValueError: could not convert string to float: 'π'

Same goes for fractions.Fraction and [c]decimal.Decimal. All of them
are restricted to some subset of rational numbers, not all reals.

 The URL:http://docs.python.org/2/library/numbers.html#numbers.Real ABC
 defines behaviours for types implementing the set of real numbers.

 What specific behaviour would, for you, qualify as “works with the set
 of real numbers in any way”?

Being able to represent surds, pi, e, etc, for a start. It'd
theoretically be possible with an algebraic notation (eg by carrying
through some representation like 2*pi rather than 6.28), but
otherwise, irrationals can't be represented with finite storage and a
digit-based system.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread wxjmfauth
Le mercredi 12 février 2014 09:35:38 UTC+1, wxjm...@gmail.com a écrit :
 Integers are integers. (1)
 
 Characters are characters. (2)
 
 
 
 (1) is a unique natural set.
 
 
 
 (2) is an artificial construct working
 
 with 3 sets (unicode).
 
 
 
 jmf

Addendum: One should not confuse unicode and the implementation
of unicode.

jmf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Jussi Piitulainen
Chris Angelico writes:
 On Wed, Feb 12, 2014 at 6:49 PM,  wxjmfa...@gmail.com wrote:
  The day you find an operator working on the set of
  reals (R) and it is somehow optimized for N
  (the subset of natural numbers), let me know.

...

 In Python, integers have arbitrary precision, but floats, Fractions,
 and Decimals, don't. Nearly any operation on arbitrarily large
 numbers will be either more accurate or more efficient (maybe both)
 with integers than with any of the other types.

Is that true about Fractions?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Chris Angelico
On Wed, Feb 12, 2014 at 7:57 PM, Jussi Piitulainen
jpiit...@ling.helsinki.fi wrote:
 In Python, integers have arbitrary precision, but floats, Fractions,
 and Decimals, don't. Nearly any operation on arbitrarily large
 numbers will be either more accurate or more efficient (maybe both)
 with integers than with any of the other types.

 Is that true about Fractions?

I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit
the size or precision of their data, but certainly if they don't,
it'll be at horrendous expense of performance. (Decimal can add and
subtract in reasonable time complexity, but multiplication and
division will get slow when you have huge numbers of digits. Fraction
can multiply and divide efficiently, but will get crazily slow on
addition and subtraction.) Integers are an optimized case in many
ways. I can do accurate arbitrary-precision integer arithmetic without
worrying about simple operations suddenly saturating the CPU. I can't
do that with non-integers in any way.

It's not optimized for natural numbers (nonnegative integers), as
negatives are just as cheap as positives, but it's certainly an
optimization for integers.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread Jussi Piitulainen
Chris Angelico writes:
 On Wed, Feb 12, 2014 at 7:17 PM, Ben Finney wrote:
  What specific behaviour would, for you, qualify as “works with the
  set of real numbers in any way”?
 
 Being able to represent surds, pi, e, etc, for a start. It'd
 theoretically be possible with an algebraic notation (eg by carrying
 through some representation like 2*pi rather than 6.28), but
 otherwise, irrationals can't be represented with finite storage and
 a digit-based system.

I've seen papers on exact computable reals that would, in effect,
generate more precision when needed for some operation. It wasn't
symbolic like 2pi, more like 6.28... with a promise to delve into the
ellipsis, and some notable operations not supported.

Equality testing was missing, I think, and I think it could not be
known in general whether such a number is positive, zero or negative,
so even approximate printing in the usual digit notation would not be
possible. (Interval arithmetic, I hear, has a similar problem about
not knowing the sign of a number.)

In stark contrast, exact rationals work nicely, up to efficiency
considerations.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Jussi Piitulainen
Chris Angelico writes:
 On Wed, Feb 12, 2014 at 7:57 PM, Jussi Piitulainen wrote:
  In Python, integers have arbitrary precision, but floats, Fractions,
  and Decimals, don't. Nearly any operation on arbitrarily large
  numbers will be either more accurate or more efficient (maybe both)
  with integers than with any of the other types.
 
  Is that true about Fractions?
 
 I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit
 the size or precision of their data, but certainly if they don't,
 it'll be at horrendous expense of performance. (Decimal can add and
 subtract in reasonable time complexity, but multiplication and
 division will get slow when you have huge numbers of digits. Fraction
 can multiply and divide efficiently, but will get crazily slow on
 addition and subtraction.) Integers are an optimized case in many
 ways. I can do accurate arbitrary-precision integer arithmetic without
 worrying about simple operations suddenly saturating the CPU. I can't
 do that with non-integers in any way.
 
 It's not optimized for natural numbers (nonnegative integers), as
 negatives are just as cheap as positives, but it's certainly an
 optimization for integers.

Right. I don't know about Decimal, but I don't think there are any
precision restrictions in Fraction, other than running out of heap (or
possibly integer precision).

In my (quite limited) experience, the most expensive operation on both
exact rationals and exact integers has been the printing, in decimal,
of several screenfuls of digits. The actual calculations have taken a
couple of seconds and then I have wished that I could interrupt the
printing of a single number :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Mark Lawrence

On 12/02/2014 07:49, wxjmfa...@gmail.com wrote:

Le mardi 11 février 2014 20:04:02 UTC+1, Mark Lawrence a écrit :

On 11/02/2014 18:53, wxjmfa...@gmail.com wrote:


Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :



On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:







Python does not save memory at all. A str (unicode string)







uses less memory only - and only - because and when one uses







explicitly characters which are consuming less memory.















Not only the memory gain is zero, Python falls back to the







worse case.















sys.getsizeof('a' * 100)







125







sys.getsizeof('a' * 100 + 'oe')







240







sys.getsizeof('a' * 100 + 'oe' + '\U0001')







448















If Python used UTF-32 for EVERYTHING, then all three of those cases







would be 448, so it clearly disproves your claim that python







does not save memory at all.















The opposite of what the utf8/utf16 do!















sys.getsizeof(('a' * 100 + 'oe' +







'\U0001').encode('utf-8'))







123







sys.getsizeof(('a' * 100 + 'oe' +







'\U0001').encode('utf-16'))







225















However, as pointed out repeatedly, string-indexing in fixed-width







encodings are O(1) while indexing into variable-width encodings (e.g.







UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing







while saving space when a string doesn't need to use a full 32-bit







width.















A utf optimizes the memory and the performance at the same time.



It behaves like a mathematical operator, a unique operator for



a unique set of elements. Unbeatable.







The FSR is an exclusive or mechanism. I you wish to



same memory, you have to encode, and if you are encoding,



maybe because you have to, one loses performance. Paradoxal.







Your O(1) indexing works only and only because and



when you are working explicitly with a static unicode



string you never touch.



It's a little bit the the corresponding performance



case of the memory case.







jmf








Why are you so rude as to continually post your nonsense here that not a

single person believes, and at the same time still quite deliberately

use gg to post it with double line spacing.  If you lack the courtesy to

stop the former, please have the courtesy to stop the latter.



--

My fellow Pythonistas, ask not what our language can do for you, ask

what you can do for our language.




Nonsense?


sys.getsizeof('') - sys.getsizeof('a')

-1


The day you find an operator working on the set of
reals (R) and it is somehow optimized for N
(the subset of natural numbers), let me know.

A conflict is quickly appearing. Either the operator is
not correctly defined or the choice of the set is wrong.

You can replace the operator with an encoding and
the set with a repertoire of characters.

It's the main reason, why we have to live today with
all these coding schemes. Even in more sophisticated
cases like, CID-fonts or char boxes in a pdf (with the
hope you understand how it works).

jmf



I ask you, members of the jury, to find the accused, jmf, guilty of 
writing nonsense and deliberately using google groups to double line 
space.  The evidence is directly above and quite clearly prooves, beyond 
a resonable doubt, that no verdict other than guilty can be recorded.  I 
rest my case, m'lud.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Rustom Mody
On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote:

 I ask you, members of the jury, to find the accused, jmf, guilty of 
 writing nonsense and deliberately using google groups to double line 
 space.  The evidence is directly above and quite clearly prooves, beyond 
 a resonable doubt, that no verdict other than guilty can be recorded.  I 
 rest my case, m'lud.

Is a proof more fool-proof because prove is spelt proove wink?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Mark Lawrence

On 12/02/2014 14:14, Rustom Mody wrote:

On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote:


I ask you, members of the jury, to find the accused, jmf, guilty of
writing nonsense and deliberately using google groups to double line
space.  The evidence is directly above and quite clearly prooves, beyond
a resonable doubt, that no verdict other than guilty can be recorded.  I
rest my case, m'lud.


Is a proof more fool-proof because prove is spelt proove wink?



Fauultee keebored :)

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-12 Thread Rustom Mody
On Wednesday, February 12, 2014 7:55:32 PM UTC+5:30, Mark Lawrence wrote:
 On 12/02/2014 14:14, Rustom Mody wrote:
  On Wednesday, February 12, 2014 7:34:42 PM UTC+5:30, Mark Lawrence wrote:
  I ask you, members of the jury, to find the accused, jmf, guilty of
  writing nonsense and deliberately using google groups to double line
  space.  The evidence is directly above and quite clearly prooves, beyond
  a resonable doubt, that no verdict other than guilty can be recorded.  I
  rest my case, m'lud.
  Is a proof more fool-proof because prove is spelt proove wink?

 Fauultee keebored :)

Very O(n)T considering the relation between Fawlty towers and Monty python :-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread Grant Edwards
On 2014-02-12, Ben Finney ben+pyt...@benfinney.id.au wrote:
 Chris Angelico ros...@gmail.com writes:

 I have yet to find any computer that works with the set of real
 numbers in any way. Never mind optimization, they simply cannot work
 with real numbers.

 Not *any* computer? Not in *any* way? The Python built-in float
 type works with the set of real numbers, in a way.

The only people who think that are people who don't actualy _use_
floating point types on computers.

 What specific behaviour would, for you, qualify as works with the
 set of real numbers in any way

There's a whole laundry list of things (some of them rather nasty and
difficult) you have to worry about when using FP that simply don't
apply to real numbers.

-- 
Grant Edwards   grant.b.edwardsYow! HUGH BEAUMONT died
  at   in 1982!!
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with the set of real numbers (was: Finding size of Variable)

2014-02-12 Thread Gisle Vanem

Grant Edwards wrote:


Not *any* computer? Not in *any* way? The Python built-in float
type works with the set of real numbers, in a way.


The only people who think that are people who don't actualy _use_
floating point types on computers.


FPU parsing the IEEE spec, or?. I didn't quite parse what *you* wrote. 
To paraphrase:

 #include math.h
 there are FP_NORMAL and FP_SUBNORMAL people in the world; 
  those who understand IEEE 754 and those who don't.  ..


--gv
--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-11 Thread Neil Cerutti
On 2014-02-10, Ned Batchelder n...@nedbatchelder.com wrote:
 On 2/10/14 9:43 AM, Tim Chase wrote:
 The opposite of what the utf8/utf16 do!

 sys.getsizeof(('a' * 100 + 'oe' +
 '\U0001').encode('utf-8'))
 123
 sys.getsizeof(('a' * 100 + 'oe' +
 '\U0001').encode('utf-16'))
 225

 However, as pointed out repeatedly, string-indexing in
 fixed-width encodings are O(1) while indexing into
 variable-width encodings (e.g. UTF8/UTF16) are O(N).  The FSR
 gives the benefits of O(1) indexing while saving space when a
 string doesn't need to use a full 32-bit width.

 Please don't engage in this debate with JMF.  His mind is made
 up, and he will not be swayed, no matter how persuasive and
 reasonable your arguments.  Just ignore him.

I think reasonable criticisms should be contested no matter who
posts them. I agree jmf shouldn't be singled out for abuse,
summoned, insulted, or have his few controversial opinions
brought into other topics. Tim's post was responding to a
specific, well-presented criticism of Python's string
implementation. Left unchallenged, it might linger unhappily in
the air, like a symphony ended on a dominant 7th chord.

-- 
Neil Cerutti

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-11 Thread wxjmfauth
Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :
 On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:
 
  Python does not save memory at all. A str (unicode string)
 
  uses less memory only - and only - because and when one uses
 
  explicitly characters which are consuming less memory.
 
  
 
  Not only the memory gain is zero, Python falls back to the
 
  worse case.
 
  
 
   sys.getsizeof('a' * 100)  
 
  125
 
   sys.getsizeof('a' * 100 + 'oe')  
 
  240
 
   sys.getsizeof('a' * 100 + 'oe' + '\U0001')  
 
  448
 
 
 
 If Python used UTF-32 for EVERYTHING, then all three of those cases
 
 would be 448, so it clearly disproves your claim that python
 
 does not save memory at all.
 
 
 
  The opposite of what the utf8/utf16 do!
 
  
 
   sys.getsizeof(('a' * 100 + 'oe' +
 
   '\U0001').encode('utf-8'))  
 
  123
 
   sys.getsizeof(('a' * 100 + 'oe' +
 
   '\U0001').encode('utf-16'))  
 
  225
 
 
 
 However, as pointed out repeatedly, string-indexing in fixed-width
 
 encodings are O(1) while indexing into variable-width encodings (e.g.
 
 UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing
 
 while saving space when a string doesn't need to use a full 32-bit
 
 width.
 
 

A utf optimizes the memory and the performance at the same time.
It behaves like a mathematical operator, a unique operator for
a unique set of elements. Unbeatable.

The FSR is an exclusive or mechanism. I you wish to
same memory, you have to encode, and if you are encoding,
maybe because you have to, one loses performance. Paradoxal.

Your O(1) indexing works only and only because and
when you are working explicitly with a static unicode
string you never touch.
It's a little bit the the corresponding performance
case of the memory case.

jmf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-11 Thread Mark Lawrence

On 11/02/2014 18:53, wxjmfa...@gmail.com wrote:

Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :

On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:


Python does not save memory at all. A str (unicode string)



uses less memory only - and only - because and when one uses



explicitly characters which are consuming less memory.







Not only the memory gain is zero, Python falls back to the



worse case.







sys.getsizeof('a' * 100)



125



sys.getsizeof('a' * 100 + 'oe')



240



sys.getsizeof('a' * 100 + 'oe' + '\U0001')



448




If Python used UTF-32 for EVERYTHING, then all three of those cases

would be 448, so it clearly disproves your claim that python

does not save memory at all.




The opposite of what the utf8/utf16 do!







sys.getsizeof(('a' * 100 + 'oe' +



'\U0001').encode('utf-8'))



123



sys.getsizeof(('a' * 100 + 'oe' +



'\U0001').encode('utf-16'))



225




However, as pointed out repeatedly, string-indexing in fixed-width

encodings are O(1) while indexing into variable-width encodings (e.g.

UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing

while saving space when a string doesn't need to use a full 32-bit

width.




A utf optimizes the memory and the performance at the same time.
It behaves like a mathematical operator, a unique operator for
a unique set of elements. Unbeatable.

The FSR is an exclusive or mechanism. I you wish to
same memory, you have to encode, and if you are encoding,
maybe because you have to, one loses performance. Paradoxal.

Your O(1) indexing works only and only because and
when you are working explicitly with a static unicode
string you never touch.
It's a little bit the the corresponding performance
case of the memory case.

jmf



Why are you so rude as to continually post your nonsense here that not a 
single person believes, and at the same time still quite deliberately 
use gg to post it with double line spacing.  If you lack the courtesy to 
stop the former, please have the courtesy to stop the latter.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-11 Thread wxjmfauth
Le mardi 11 février 2014 20:04:02 UTC+1, Mark Lawrence a écrit :
 On 11/02/2014 18:53, wxjmfa...@gmail.com wrote:
 
  Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :
 
  On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:
 
 
 
  Python does not save memory at all. A str (unicode string)
 
 
 
  uses less memory only - and only - because and when one uses
 
 
 
  explicitly characters which are consuming less memory.
 
 
 
 
 
 
 
  Not only the memory gain is zero, Python falls back to the
 
 
 
  worse case.
 
 
 
 
 
 
 
  sys.getsizeof('a' * 100)
 
 
 
  125
 
 
 
  sys.getsizeof('a' * 100 + 'oe')
 
 
 
  240
 
 
 
  sys.getsizeof('a' * 100 + 'oe' + '\U0001')
 
 
 
  448
 
 
 
 
 
 
 
  If Python used UTF-32 for EVERYTHING, then all three of those cases
 
 
 
  would be 448, so it clearly disproves your claim that python
 
 
 
  does not save memory at all.
 
 
 
 
 
 
 
  The opposite of what the utf8/utf16 do!
 
 
 
 
 
 
 
  sys.getsizeof(('a' * 100 + 'oe' +
 
 
 
  '\U0001').encode('utf-8'))
 
 
 
  123
 
 
 
  sys.getsizeof(('a' * 100 + 'oe' +
 
 
 
  '\U0001').encode('utf-16'))
 
 
 
  225
 
 
 
 
 
 
 
  However, as pointed out repeatedly, string-indexing in fixed-width
 
 
 
  encodings are O(1) while indexing into variable-width encodings (e.g.
 
 
 
  UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing
 
 
 
  while saving space when a string doesn't need to use a full 32-bit
 
 
 
  width.
 
 
 
 
 
 
 
  A utf optimizes the memory and the performance at the same time.
 
  It behaves like a mathematical operator, a unique operator for
 
  a unique set of elements. Unbeatable.
 
 
 
  The FSR is an exclusive or mechanism. I you wish to
 
  same memory, you have to encode, and if you are encoding,
 
  maybe because you have to, one loses performance. Paradoxal.
 
 
 
  Your O(1) indexing works only and only because and
 
  when you are working explicitly with a static unicode
 
  string you never touch.
 
  It's a little bit the the corresponding performance
 
  case of the memory case.
 
 
 
  jmf
 
 
 
 
 
 Why are you so rude as to continually post your nonsense here that not a 
 
 single person believes, and at the same time still quite deliberately 
 
 use gg to post it with double line spacing.  If you lack the courtesy to 
 
 stop the former, please have the courtesy to stop the latter.
 
 
 
 -- 
 
 My fellow Pythonistas, ask not what our language can do for you, ask 
 
 what you can do for our language.
 
 

Nonsense?

 sys.getsizeof('') - sys.getsizeof('a')
-1


The day you find an operator working on the set of
reals (R) and it is somehow optimized for N
(the subset of natural numbers), let me know.

A conflict is quickly appearing. Either the operator is
not correctly defined or the choice of the set is wrong.

You can replace the operator with an encoding and
the set with a repertoire of characters.

It's the main reason, why we have to live today with
all these coding schemes. Even in more sophisticated
cases like, CID-fonts or char boxes in a pdf (with the
hope you understand how it works).

jmf

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-10 Thread wxjmfauth
Le samedi 8 février 2014 03:48:12 UTC+1, Steven D'Aprano a écrit :
 
 
 We consider it A GOOD THING that Python spends memory for programmer 
 
 convenience and safety. Python looks for memory optimizations when it can 
 
 save large amounts of memory, not utterly trivial amounts. So in a Python 
 
 wide build, a ten-thousand block character string requires a little bit 
 
 more than 40KB. In Python 3.3, that can be reduced to only 10KB for a 
 
 purely Latin-1 string, or 20K for a string without any astral characters. 
 
 That's the sort of memory savings that are worthwhile, reducing memory 
 
 usage by 75%.
 
 
 

In its attempt to save memory, Python only succeeds to
do worse than any utf* coding schemes.

---

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.

 sys.getsizeof('a' * 100)
125
 sys.getsizeof('a' * 100 + 'oe')
240
 sys.getsizeof('a' * 100 + 'oe' + '\U0001')
448

The opposite of what the utf8/utf16 do!

 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8'))
123
 sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-16'))
225


jmf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-10 Thread Asaf Las
On Monday, February 10, 2014 4:07:14 PM UTC+2, wxjm...@gmail.com wrote:
Interesting 

  sys.getsizeof('a' * 100)
here you get string type 

  sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8'))
and here bytes

 type ('a' * 1)
class 'str'
 type(('a' * 100 + 'oe' + '\U0001').encode('utf-8'))
class 'bytes'


Why? 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-10 Thread Mark Lawrence

On 10/02/2014 14:25, Asaf Las wrote:

On Monday, February 10, 2014 4:07:14 PM UTC+2, wxjm...@gmail.com wrote:
Interesting


sys.getsizeof('a' * 100)

here you get string type


sys.getsizeof(('a' * 100 + 'oe' + '\U0001').encode('utf-8'))

and here bytes


type ('a' * 1)

class 'str'

type(('a' * 100 + 'oe' + '\U0001').encode('utf-8'))

class 'bytes'




Why?



Please don't feed this particular troll, he's spent 18 months driving us 
nuts with his nonsense.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-10 Thread Tim Chase
On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:
 Python does not save memory at all. A str (unicode string)
 uses less memory only - and only - because and when one uses
 explicitly characters which are consuming less memory.
 
 Not only the memory gain is zero, Python falls back to the
 worse case.
 
  sys.getsizeof('a' * 100)  
 125
  sys.getsizeof('a' * 100 + 'oe')  
 240
  sys.getsizeof('a' * 100 + 'oe' + '\U0001')  
 448

If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 448, so it clearly disproves your claim that python
does not save memory at all.

 The opposite of what the utf8/utf16 do!
 
  sys.getsizeof(('a' * 100 + 'oe' +
  '\U0001').encode('utf-8'))  
 123
  sys.getsizeof(('a' * 100 + 'oe' +
  '\U0001').encode('utf-16'))  
 225

However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-10 Thread Ned Batchelder

On 2/10/14 9:43 AM, Tim Chase wrote:

On 2014-02-10 06:07, wxjmfa...@gmail.com wrote:

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.


sys.getsizeof('a' * 100)

125

sys.getsizeof('a' * 100 + 'oe')

240

sys.getsizeof('a' * 100 + 'oe' + '\U0001')

448


If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 448, so it clearly disproves your claim that python
does not save memory at all.


The opposite of what the utf8/utf16 do!


sys.getsizeof(('a' * 100 + 'oe' +
'\U0001').encode('utf-8'))

123

sys.getsizeof(('a' * 100 + 'oe' +
'\U0001').encode('utf-16'))

225


However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc





Please don't engage in this debate with JMF.  His mind is made up, and 
he will not be swayed, no matter how persuasive and reasonable your 
arguments.  Just ignore him.



--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Mark Lawrence

On 08/02/2014 02:48, Steven D'Aprano wrote:

On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote:


Sorry, I'm only pointing you may lose memory when working with short
strings as it was explained. I really, very really, do not see what is
absurd or obsure in:


sys.getsizeof('abc' + 'EURO')

46

sys.getsizeof(('abc' + 'EURO').encode('utf-32'))

37



Why do you care about NINE bytes? The least amount of memory in any PC
that I know about is 5 bytes, more than fifty million times more.
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for
you. Go and program in C, where you can spent ten or twenty times longer
programming, but save nine bytes in every string.

Nobody cares about your memory benchmark except you. Python is not
designed to save memory, Python is designed to use as much memory as
needed to give the programmer an easier job. In C, I can store a single
integer in a single byte. In Python, horror upon horrors, it takes 14
bytes!!!

py sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer
convenience and safety. Python looks for memory optimizations when it can
save large amounts of memory, not utterly trivial amounts. So in a Python
wide build, a ten-thousand block character string requires a little bit
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
purely Latin-1 string, or 20K for a string without any astral characters.
That's the sort of memory savings that are worthwhile, reducing memory
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost
complexity and time, strings would be even slower than they are now. That
is not a trade-off that the core developers have chosen to make, and I
agree with them.





This is a C +1 to save memory when compared against this Python +1 :)

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread David Hutto
On Sat, Feb 8, 2014 at 8:17 AM, Mark Lawrence breamore...@yahoo.co.ukwrote:

 On 08/02/2014 02:48, Steven D'Aprano wrote:

 On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote:

  Sorry, I'm only pointing you may lose memory when working with short
 strings as it was explained. I really, very really, do not see what is
 absurd or obsure in:

  sys.getsizeof('abc' + 'EURO')

 46

 sys.getsizeof(('abc' + 'EURO').encode('utf-32'))

 37



 Why do you care about NINE bytes? The least amount of memory in any PC
 that I know about is 5 bytes, more than fifty million times more.
 And you are whinging about wasting nine bytes?


One could argue that if you're parsing a particular file, a very large one,
that those 9 bytes can go into the optimization of parsing aforementioned
file. Of, course we have faster processors, so why care?

Because it goes into the optimization of the code one is 'developing' in
python.



 If you care about that lousy nine bytes, Python is not the language for
 you. Go and program in C, where you can spent ten or twenty times longer
 programming, but save nine bytes in every string.

 Nobody cares about your memory benchmark except you. Python is not
 designed to save memory, Python is designed to use as much memory as
 needed to give the programmer an easier job. In C, I can store a single
 integer in a single byte. In Python, horror upon horrors, it takes 14
 bytes!!!

 py sys.getsizeof(1)
 14

 We consider it A GOOD THING that Python spends memory for programmer
 convenience and safety. Python looks for memory optimizations when it can
 save large amounts of memory, not utterly trivial amounts. So in a Python
 wide build, a ten-thousand block character string requires a little bit
 more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
 purely Latin-1 string, or 20K for a string without any astral characters.
 That's the sort of memory savings that are worthwhile, reducing memory
 usage by 75%.

 Could Python save memory by using UTF-8? Yes. But it would cost
 complexity and time, strings would be even slower than they are now. That
 is not a trade-off that the core developers have chosen to make, and I
 agree with them.




 This is a C +1 to save memory when compared against this Python +1 :)

 --
 My fellow Pythonistas, ask not what our language can do for you, ask what
 you can do for our language.

 Mark Lawrence

 ---
 This email is free from viruses and malware because avast! Antivirus
 protection is active.
 http://www.avast.com


 --
 https://mail.python.org/mailman/listinfo/python-list




-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com*
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Rustom Mody
On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote:
 One could argue that if you're parsing a particular file, a very large one, 
 that those 9 bytes can go into the optimization of parsing aforementioned 
 file. Of, course we have faster processors, so why care? 
 Because it goes into the optimization of the code one is 'developing' in 
 python.

Yes... There are cases when python is an inappropriate language to use...
So???

Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

[BTW: In my book this classic trolling]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread David Hutto
On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody rustompm...@gmail.com wrote:

 On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote:
  One could argue that if you're parsing a particular file, a very large
 one, that those 9 bytes can go into the optimization of parsing
 aforementioned file. Of, course we have faster processors, so why care?
  Because it goes into the optimization of the code one is 'developing' in
 python.

 Yes... There are cases when python is an inappropriate language to use...
 So???


I didn't say  she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.




 Its good to get a bit of context here.

 loop:
 jmf says python is inappropriate.
 Someone asks him: Is it? In what case?
 jmf: No answer
 After a delay of few days jmp to start of loop

 loop:
mov head,up_your_ass
push  repeat
pop repeat
jmp loop

[BTW: In my book this classic trolling]
 --


And the title of this book would be...Pieces of Cliche Bullshit Internet
Arguments for Dummies

https://mail.python.org/mailman/listinfo/python-list




-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com*
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Chris Angelico
On Sun, Feb 9, 2014 at 1:56 PM, David Hutto dwightdhu...@gmail.com wrote:

 Yes... There are cases when python is an inappropriate language to use...
 So???


 I didn't say  she couldn't optimize in another language, and was just
 prototyping in Python. I just said she was optimizing her python
 code...dufus.

And there are a *lot* of cases where that is inappropriate language to
use. Please don't.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread David Hutto
On Sat, Feb 8, 2014 at 9:59 PM, Chris Angelico ros...@gmail.com wrote:

 On Sun, Feb 9, 2014 at 1:56 PM, David Hutto dwightdhu...@gmail.com
 wrote:
 
  Yes... There are cases when python is an inappropriate language to
 use...
  So???
 
 
  I didn't say  she couldn't optimize in another language, and was just
  prototyping in Python. I just said she was optimizing her python
  code...dufus.

 And there are a *lot* of cases where that is inappropriate language to
 use. Please don't.

 ChrisA
 --
 https://mail.python.org/mailman/listinfo/python-list
















it's also inappropriate for him to call people trolls, while they're just
commenting on why what she might be using is  a necessity for her
particular case of developing in Python, and not using another language,
yet.


He started it! :P

-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com http://www.hitwebdevelopment.com*
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Ned Batchelder

On 2/8/14 9:56 PM, David Hutto wrote:




On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody rustompm...@gmail.com
mailto:rustompm...@gmail.com wrote:

On Sunday, February 9, 2014 4:15:50 AM UTC+5:30, David Hutto wrote:
  One could argue that if you're parsing a particular file, a very
large one, that those 9 bytes can go into the optimization of
parsing aforementioned file. Of, course we have faster processors,
so why care?
  Because it goes into the optimization of the code one is
'developing' in python.

Yes... There are cases when python is an inappropriate language to
use...
So???


I didn't say  she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.


Please keep the discussion respectful.  Misunderstandings are easy, I 
suspect this is one of them.  There's no reason to start calling people 
names.





Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

loop:
mov head,up_your_ass
push  repeat
pop repeat
jmp loop


Please keep in mind the Code of Conduct:

http://www.python.org/psf/codeofconduct

Thanks.



[BTW: In my book this classic trolling]
--


And the title of this book would be...Pieces of Cliche Bullshit
Internet Arguments for Dummies

https://mail.python.org/mailman/listinfo/python-list




--
Best Regards,
David Hutto
/*CEO:*/ _http://www.hitwebdevelopment.com_





--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread David Hutto
Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Ned Batchelder

On 2/8/14 10:09 PM, David Hutto wrote:

Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.




I'm not sure what happened in this thread.  It might be that you think 
Rustom Mody was referring to you when he said, BTW: In my book this 
classic trolling.  I don't think he was, I think he was referring to JMF.


In any case, perhaps it would be best to just take a break?

--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-08 Thread Rustom Mody
On Sunday, February 9, 2014 8:46:50 AM UTC+5:30, Ned Batchelder wrote:
 On 2/8/14 10:09 PM, David Hutto wrote:
  Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
  and comment back, off list, next time.

 I'm not sure what happened in this thread.  It might be that you think 
 Rustom Mody was referring to you when he said, BTW: In my book this 
 classic trolling.  I don't think he was, I think he was referring to JMF.

Of course!
And given the turn of this thread, we must hand it to jmf for being even better 
at trolling than I thought :-)

See the first para
http://en.wikipedia.org/wiki/Troll_%28Internet%29
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-07 Thread Steven D'Aprano
On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote:

 Sorry, I'm only pointing you may lose memory when working with short
 strings as it was explained. I really, very really, do not see what is
 absurd or obsure in:
 
 sys.getsizeof('abc' + 'EURO')
 46
 sys.getsizeof(('abc' + 'EURO').encode('utf-32'))
 37


Why do you care about NINE bytes? The least amount of memory in any PC 
that I know about is 5 bytes, more than fifty million times more. 
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for 
you. Go and program in C, where you can spent ten or twenty times longer 
programming, but save nine bytes in every string.

Nobody cares about your memory benchmark except you. Python is not 
designed to save memory, Python is designed to use as much memory as 
needed to give the programmer an easier job. In C, I can store a single 
integer in a single byte. In Python, horror upon horrors, it takes 14 
bytes!!!

py sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer 
convenience and safety. Python looks for memory optimizations when it can 
save large amounts of memory, not utterly trivial amounts. So in a Python 
wide build, a ten-thousand block character string requires a little bit 
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a 
purely Latin-1 string, or 20K for a string without any astral characters. 
That's the sort of memory savings that are worthwhile, reducing memory 
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost 
complexity and time, strings would be even slower than they are now. That 
is not a trade-off that the core developers have chosen to make, and I 
agree with them.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-07 Thread Ethan Furman

On 02/07/2014 06:48 PM, Steven D'Aprano wrote:


That is not a trade-off that the core developers have chosen to make,
and I agree with them.


Even though you haven't broken all the build-bots yet, you can still stop saying 
them.  ;)

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-06 Thread wxjmfauth
Le mercredi 5 février 2014 12:44:47 UTC+1, Chris Angelico a écrit :
 On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano
 
 steve+comp.lang.pyt...@pearwood.info wrote:
 
  where stopWords.txt is a file of size 4KB
 
 
 
  My guess is that if you split a 4K file into words, then put the words
 
  into a list, you'll probably end up with 6-8K in memory.
 
 
 
 I'd guess rather more; Python strings have a fair bit of fixed
 
 overhead, so with a whole lot of small strings, it will get more
 
 costly.
 
 
 
  sys.version
 
 '3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan  5 2014, 16:23:43) [MSC v.1600 32
 
 bit (Intel)]'
 
  sys.getsizeof(asdf)
 
 29
 
 
 
 Stop words tend to be short, rather than long, words, so I'd look at
 
 an average of 2-3 letters per word. Assuming they're separated by
 
 spaces or newlines, that means there'll be roughly a thousand of them
 
 in the file, for about 25K of overhead. A bit less if the words are
 
 longer, but still quite a bit. (Byte strings have slightly less
 
 overhead, 17 bytes apiece, but still quite a bit.)
 
 
 
 ChrisA

 sum([sys.getsizeof(c) for c in ['a']])
26
 sum([sys.getsizeof(c) for c in ['a', 'a EURO']])
68
 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']])
112
 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']])
158
 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 
 ' EURO']])
238
 
 
 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a']])
21
 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO']])
46
 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa 
 EURO']])
75
 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa 
 EURO', 'aaa EURO']])
108
 sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa 
 EURO', 'aaa EURO', ' EURO']])
209
 
 
 sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])
336
 sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])
150
 sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa 
 EURO']*3])
261
 sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])
135


jmf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-06 Thread Ned Batchelder

On 2/6/14 5:15 AM, wxjmfa...@gmail.com wrote:



sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])

336

sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])

150

sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3])

261

sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])

135




jmf



JMF, we've told you I-don't-know-how-many-times to stop this. 
Seriously: think hard about what your purpose is in sending these absurd 
benchmarks.  I guarantee you are not accomplishing it.


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-06 Thread wxjmfauth
Le jeudi 6 février 2014 12:10:08 UTC+1, Ned Batchelder a écrit :
 On 2/6/14 5:15 AM, wxjmfa...@gmail.com wrote:
 
 
 
 
 
  sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])
 
  336
 
  sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])
 
  150
 
  sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa 
  EURO']*3])
 
  261
 
  sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])
 
  135
 
 
 
 
 
  jmf
 
 
 
 
 
 JMF, we've told you I-don't-know-how-many-times to stop this. 
 
 Seriously: think hard about what your purpose is in sending these absurd 
 
 benchmarks.  I guarantee you are not accomplishing it.
 
 
 
 -- 
 
 Ned Batchelder, http://nedbatchelder.com

Sorry, I'm only pointing you may lose memory when
working with short strings as it was explained.
I really, very really, do not see what is absurd
or obsure in:

 sys.getsizeof('abc' + 'EURO')
46
 sys.getsizeof(('abc' + 'EURO').encode('utf-32'))
37

I apologize for the  a EURO which should have
been a real EURO. No idea, what's happend.

jmf

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-06 Thread wxjmfauth
Some mysterious problem with the euro.
Let's take a real French char.
 sys.getsizeof('abc' + 'œ')
46
 sys.getsizeof(('abc' + 'œ').encode('utf-32'))
37

or a German char, ẞ

 sys.getsizeof('abc' + '\N{LATIN CAPITAL LETTER SHARP S}')
46
 sys.getsizeof(('abc' + '\N{LATIN CAPITAL LETTER SHARP S}').encode('utf-32'))
37



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Peter Otten
Ayushi Dalmia wrote:

 On Wednesday, February 5, 2014 12:51:31 AM UTC+5:30, Dave Angel wrote:
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 
 
 
  
 
  Where am I going wrong? What are the alternatives I can try?
 
 
 
 You've rejected all the alternatives so far without showing your
 
  code, or even properly specifying your problem.
 
 
 
 To get the total size of a list of strings,  try (untested):
 
 
 
 a = sys.getsizeof (mylist )
 
 for item in mylist:
 
 a += sys.getsizeof (item)
 
 
 
 This can be high if some of the strings are interned and get
 
  counted twice. But you're not likely to get closer without some
 
  knowledge of the data objects and where they come
 
  from.
 
 
 
 --
 
 DaveA
 
 Hello Dave,
 
 I just thought that saving others time is better and hence I explained
 only the subset of my problem. Here is what I am trying to do:
 
 I am trying to index the current wikipedia dump without using databases
 and create a search engine for Wikipedia documents. Note, I CANNOT USE
 DATABASES. My approach:
 
 I am parsing the wikipedia pages using SAX Parser, and then, I am dumping
 the words along with the posting list (a list of doc ids in which the word
 is present) into different files after reading 'X' number of pages. Now
 these files may have the same word and hence I need to merge them and
 write the final index again. Now these final indexes must be of limited
 size as I need to be of limited size. This is where I am stuck. I need to
 know how to determine the size of content in a variable before I write
 into the file.
 
 Here is the code for my merging:
 
 def mergeFiles(pathOfFolder, countFile):
 listOfWords={}
 indexFile={}
 topOfFile={}
 flag=[0]*countFile
 data=defaultdict(list)
 heap=[]
 countFinalFile=0
 for i in xrange(countFile):
 fileName = pathOfFolder+'\index'+str(i)+'.txt.bz2'
 indexFile[i]= bz2.BZ2File(fileName, 'rb')
 flag[i]=1
 topOfFile[i]=indexFile[i].readline().strip()
 listOfWords[i] = topOfFile[i].split(' ')
 if listOfWords[i][0] not in heap:
 heapq.heappush(heap, listOfWords[i][0])

At this point you have already done it wrong as your heap contains the 
complete data and you have done a lot of O(N) tests on the heap. 
This is both slow and consumes a lot of memory. See

http://code.activestate.com/recipes/491285-iterator-merge/

for a sane way to merge sorted data from multiple files.  Your code becomes 
(untested)

with open(outfile.txt, wb) as outfile:

infiles = []
for i in xrange(countFile):
filename = os.path.join(pathOfFolder, 'index'+str(i)+'.txt.bz2')
infiles.append(bz2.BZ2File(filename, rb))

outfile.writelines(imerge(*infiles))

for infile in infiles:
infile.close()

Once you have your data in a single file you can read from that file and do 
the postprocessing you mention below.

 
 while any(flag)==1:
 temp = heapq.heappop(heap)
 for i in xrange(countFile):
 if flag[i]==1:
 if listOfWords[i][0]==temp:
 
 //This is where I am stuck. I cannot wait until memory
 //error, as I need to do some postprocessing too. try:
 data[temp].extend(listOfWords[i][1:])
 except MemoryError:
 writeFinalIndex(data, countFinalFile,
 pathOfFolder) data=defaultdict(list)
 countFinalFile+=1
 
 topOfFile[i]=indexFile[i].readline().strip()
 if topOfFile[i]=='':
 flag[i]=0
 indexFile[i].close()
 
os.remove(pathOfFolder+'\index'+str(i)+'.txt.bz2')
 else:
 listOfWords[i] = topOfFile[i].split(' ')
 if listOfWords[i][0] not in heap:
 heapq.heappush(heap, listOfWords[i][0])
 writeFinalIndex(data, countFinalFile, pathOfFolder)
 
 countFile is the number of files and writeFileIndex method writes into the
 file.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Steven D'Aprano
On Tue, 04 Feb 2014 21:35:05 -0800, Ayushi Dalmia wrote:

 On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote:
 On 2014-02-04 14:21, Dave Angel wrote:
 
  To get the total size of a list of strings,  try (untested):
 
  
  a = sys.getsizeof (mylist )
  for item in mylist:
  a += sys.getsizeof (item)
 
 
 I always find this sort of accumulation weird (well, at least in
 Python; it's the *only* way in many other languages) and would write
 it as
 
   a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)
 
 
 This also doesn't gives the true size. I did the following:


What do you mean by true size?

Do you mean the amount of space a certain amount of data will take in 
memory? With or without the overhead of object headers? Or do you mean 
how much space it will take when written to disk? You have not been clear 
what you are trying to measure.

If you are dealing with one-byte characters, you can measure the amount 
of memory they take up (excluding object overhead) by counting the number 
of characters: 23 one-byte characters requires 23 bytes. Plus the object 
overhead gives:

py sys.getsizeof('a'*23)
44

44 bytes (23 bytes for the 23 single-byte characters, plus 21 bytes 
overhead). One thousand such characters takes:

py sys.getsizeof('a'*1000)
1021

If you write such a string to disk, it will take 1000 bytes (or 1KB), 
unless you use some sort of compression.

 import sys
 data=[]
 f=open('stopWords.txt','r')
 
 for line in f:
 line=line.split()
 data.extend(line)
 
 print sys.getsizeof(data)

This will give you the amount of space taken by the list object. It will 
*not* give you the amount of space taken by the individual strings.

A Python list looks like this:


| header | array of pointers |


The header is of constant or near-constant size; the array depends on the 
number of items in the list. It may be bigger than the list, e.g. a list 
with 1000 items might have allocated space for 2000 items. It will never 
be smaller.
 
getsizeof(list) only counts the direct size of that list, including the 
array, but not the things which the pointers point at. If you want the 
total size, you need to count them as well.


 where stopWords.txt is a file of size 4KB

My guess is that if you split a 4K file into words, then put the words 
into a list, you'll probably end up with 6-8K in memory.


-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Chris Angelico
On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 where stopWords.txt is a file of size 4KB

 My guess is that if you split a 4K file into words, then put the words
 into a list, you'll probably end up with 6-8K in memory.

I'd guess rather more; Python strings have a fair bit of fixed
overhead, so with a whole lot of small strings, it will get more
costly.

 sys.version
'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan  5 2014, 16:23:43) [MSC v.1600 32
bit (Intel)]'
 sys.getsizeof(asdf)
29

Stop words tend to be short, rather than long, words, so I'd look at
an average of 2-3 letters per word. Assuming they're separated by
spaces or newlines, that means there'll be roughly a thousand of them
in the file, for about 25K of overhead. A bit less if the words are
longer, but still quite a bit. (Byte strings have slightly less
overhead, 17 bytes apiece, but still quite a bit.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Dave Angel
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote:
 On 2014-02-04 14:21, Dave Angel wrote:
 
  To get the total size of a list of strings,  try (untested):
 
  
 
  a = sys.getsizeof (mylist )
 
  for item in mylist:
 
  a += sys.getsizeof (item)
 
 
 
 I always find this sort of accumulation weird (well, at least in
 
 Python; it's the *only* way in many other languages) and would write
 
 it as
 
 
 
   a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)
 
 
 
 -tkc
 
 This also doesn't gives the true size. I did the following:
 
 import sys
 data=[]
 f=open('stopWords.txt','r')
 
 for line in f:
 line=line.split()
 data.extend(line)
 
 print sys.getsizeof(data)
 

Did you actually READ either of my posts or Tim's? For a
 container,  you can't just use getsizeof on the container.
 

a = sys.getsizeof (data)
for item in mylist:
  a += sys.getsizeof (data)
print a

-- 
DaveA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Ayushi Dalmia
On Wednesday, February 5, 2014 7:13:34 PM UTC+5:30, Dave Angel wrote:
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 
  On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote:
 
  On 2014-02-04 14:21, Dave Angel wrote:
 
  
 
   To get the total size of a list of strings,  try (untested):
 
  
 
   
 
  
 
   a = sys.getsizeof (mylist )
 
  
 
   for item in mylist:
 
  
 
   a += sys.getsizeof (item)
 
  
 
  
 
  
 
  I always find this sort of accumulation weird (well, at least in
 
  
 
  Python; it's the *only* way in many other languages) and would write
 
  
 
  it as
 
  
 
  
 
  
 
a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)
 
  
 
  
 
  
 
  -tkc
 
  
 
  This also doesn't gives the true size. I did the following:
 
  
 
  import sys
 
  data=[]
 
  f=open('stopWords.txt','r')
 
  
 
  for line in f:
 
  line=line.split()
 
  data.extend(line)
 
  
 
  print sys.getsizeof(data)
 
  
 
 
 
 Did you actually READ either of my posts or Tim's? For a
 
  container,  you can't just use getsizeof on the container.
 
  
 
 
 
 a = sys.getsizeof (data)
 
 for item in mylist:
 
   a += sys.getsizeof (data)
 
 print a
 
 
 
 -- 
 
 DaveA

Yes, I did. I now understand how to find the size.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-05 Thread Mark Lawrence

On 05/02/2014 14:33, Ayushi Dalmia wrote:

Please stop sending double line spaced messages, just follow the 
instructions here https://wiki.python.org/moin/GoogleGroupsPython to 
prevent this happening, thanks.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
Hello,

I have 10 files and I need to merge them (using K way merging). The size of 
each file is around 200 MB. Now suppose I am keeping the merged data in a 
variable named mergedData, I had thought of checking the size of mergedData 
using sys.getsizeof() but it somehow doesn't gives the actual value of the 
memory occupied. 

For example, if a file in my file system occupies 4 KB of data, if I read all 
the lines in a list, the size of the list is around 2100 bytes only.

Where am I going wrong? What are the alternatives I can try?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Peter Otten
Ayushi Dalmia wrote:

 I have 10 files and I need to merge them (using K way merging). The size
 of each file is around 200 MB. Now suppose I am keeping the merged data in
 a variable named mergedData, I had thought of checking the size of
 mergedData using sys.getsizeof() but it somehow doesn't gives the actual
 value of the memory occupied.
 
 For example, if a file in my file system occupies 4 KB of data, if I read
 all the lines in a list, the size of the list is around 2100 bytes only.
 
 Where am I going wrong? What are the alternatives I can try?

getsizeof() gives you the size of the list only; to complete the picture you 
have to add the sizes of the lines.

However, why do you want to keep track of the actual memory used by 
variables in your script? You should instead concentrate on the algorithm, 
and as long as either the size of the dataset is manageable or you can limit 
the amount of data accessed at a given time you are golden.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Tuesday, February 4, 2014 5:10:25 PM UTC+5:30, Peter Otten wrote:
 Ayushi Dalmia wrote:
 
 
 
  I have 10 files and I need to merge them (using K way merging). The size
 
  of each file is around 200 MB. Now suppose I am keeping the merged data in
 
  a variable named mergedData, I had thought of checking the size of
 
  mergedData using sys.getsizeof() but it somehow doesn't gives the actual
 
  value of the memory occupied.
 
  
 
  For example, if a file in my file system occupies 4 KB of data, if I read
 
  all the lines in a list, the size of the list is around 2100 bytes only.
 
  
 
  Where am I going wrong? What are the alternatives I can try?
 
 
 
 getsizeof() gives you the size of the list only; to complete the picture you 
 
 have to add the sizes of the lines.
 
 
 
 However, why do you want to keep track of the actual memory used by 
 
 variables in your script? You should instead concentrate on the algorithm, 
 
 and as long as either the size of the dataset is manageable or you can limit 
 
 the amount of data accessed at a given time you are golden.

As I said, I need to merge large files and I cannot afford more I/O operations. 
So in order to minimise the I/O operation I am writing in chunks. Also, I need 
to use the merged files as indexes later which should be loaded in the memory 
for fast access. Hence the concern.

Can you please elaborate on the point of taking lines into consideration?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Asaf Las
On Tuesday, February 4, 2014 2:43:21 PM UTC+2, Ayushi Dalmia wrote:
 
 As I said, I need to merge large files and I cannot afford more I/O 
 operations. So in order to minimise the I/O operation I am writing in 
 chunks. Also, I need to use the merged files as indexes later which 
 should be loaded in the memory for fast access. Hence the concern.
 Can you please elaborate on the point of taking lines into consideration?

have you tried os.sendfile()? 

http://docs.python.org/dev/library/os.html#os.sendfile
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Dave Angel
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 
 getsizeof() gives you the size of the list only; to complete the picture you 
 
 have to add the sizes of the lines.
 
 
 
 However, why do you want to keep track of the actual memory used by 
 
 variables in your script? You should instead concentrate on the algorithm, 
 
 and as long as either the size of the dataset is manageable or you can limit 
 
 the amount of data accessed at a given time you are golden.
 
 As I said, I need to merge large files and I cannot afford more I/O 
 operations. So in order to minimise the I/O operation I am writing in chunks. 
 Also, I need to use the merged files as indexes later which should be loaded 
 in the memory for fast access. Hence the concern.
 
 Can you please elaborate on the point of taking lines into consideration?
 

Please don't doublespace your quotes.  If you must use
 googlegroups,  fix its bugs before posting. 

There's usually no net gain in trying to 'chunk' your output to a
 text file. The python file system already knows how to do that
 for a sequential file.

For list of strings just add the getsizeof for the list to the sum
 of the getsizeof of all the list items. 

-- 
DaveA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Tuesday, February 4, 2014 6:39:00 PM UTC+5:30, Dave Angel wrote:
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 
  
 
  getsizeof() gives you the size of the list only; to complete the picture 
  you 
 
  
 
  have to add the sizes of the lines.
 
  
 
  
 
  
 
  However, why do you want to keep track of the actual memory used by 
 
  
 
  variables in your script? You should instead concentrate on the algorithm, 
 
  
 
  and as long as either the size of the dataset is manageable or you can 
  limit 
 
  
 
  the amount of data accessed at a given time you are golden.
 
  
 
  As I said, I need to merge large files and I cannot afford more I/O 
  operations. So in order to minimise the I/O operation I am writing in 
  chunks. Also, I need to use the merged files as indexes later which should 
  be loaded in the memory for fast access. Hence the concern.
 
  
 
  Can you please elaborate on the point of taking lines into consideration?
 
  
 
 
 
 Please don't doublespace your quotes.  If you must use
 
  googlegroups,  fix its bugs before posting. 
 
 
 
 There's usually no net gain in trying to 'chunk' your output to a
 
  text file. The python file system already knows how to do that
 
  for a sequential file.
 
 
 
 For list of strings just add the getsizeof for the list to the sum
 
  of the getsizeof of all the list items. 
 
 
 
 -- 
 
 DaveA

Hey! 

I need to chunk out the outputs otherwise it will give Memory Error. I need to 
do some postprocessing on the data read from the file too. If I donot stop 
before memory error, I won't be able to perform any more operations on it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Tuesday, February 4, 2014 6:23:19 PM UTC+5:30, Asaf Las wrote:
 On Tuesday, February 4, 2014 2:43:21 PM UTC+2, Ayushi Dalmia wrote:
 
  
 
  As I said, I need to merge large files and I cannot afford more I/O 
 
  operations. So in order to minimise the I/O operation I am writing in 
 
  chunks. Also, I need to use the merged files as indexes later which 
 
  should be loaded in the memory for fast access. Hence the concern.
 
  Can you please elaborate on the point of taking lines into consideration?
 
 
 
 have you tried os.sendfile()? 
 
 
 
 http://docs.python.org/dev/library/os.html#os.sendfile

os.sendfile will not serve my purpose. I not only need to merge files, but do 
it in a sorted way. Thus some postprocessing is needed. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Tim Chase
On 2014-02-04 14:21, Dave Angel wrote:
 To get the total size of a list of strings,  try (untested):
 
 a = sys.getsizeof (mylist )
 for item in mylist:
 a += sys.getsizeof (item)

I always find this sort of accumulation weird (well, at least in
Python; it's the *only* way in many other languages) and would write
it as

  a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Tim Golden

On 04/02/2014 19:21, Dave Angel wrote:

  Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:



Where am I going wrong? What are the alternatives I can try?


You've rejected all the alternatives so far without showing your
  code, or even properly specifying your problem.

To get the total size of a list of strings,  try (untested):

a = sys.getsizeof (mylist )
for item in mylist:
 a += sys.getsizeof (item)


The documentation for sys.getsizeof:

  http://docs.python.org/dev/library/sys#sys.getsizeof

warns about the limitations of this function when applied to a 
container, and even points to a recipe by Raymond Hettinger which 
attempts to do a more complete job.


TJG
--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Tuesday, February 4, 2014 7:36:48 PM UTC+5:30, Dennis Lee Bieber wrote:
 On Tue, 4 Feb 2014 05:19:48 -0800 (PST), Ayushi Dalmia
 
 ayushidalmia2...@gmail.com declaimed the following:
 
 
 
 
 
 I need to chunk out the outputs otherwise it will give Memory Error. I need 
 to do some postprocessing on the data read from the file too. If I donot 
 stop before memory error, I won't be able to perform any more operations on 
 it.
 
 
 
   10 200MB files is only 2GB... Most any 64-bit processor these days can
 
 handle that. Even some 32-bit systems could handle it (WinXP booted with
 
 the server option gives 3GB to user processes -- if the 4GB was installed
 
 in the machine).
 
 
 
   However, you speak of an n-way merge. The traditional merge operation
 
 only reads one record from each file at a time, examines them for first,
 
 writes that first, reads next record from the file first came from, and
 
 then reassesses the set.
 
 
 
   You mention needed to chunk the data -- that implies performing a merge
 
 sort in which you read a few records from each file into memory, sort them,
 
 and right them out to newFile1; then read the same number of records from
 
 each file, sort, and write them to newFile2, up to however many files you
 
 intend to work with -- at that point you go back and append the next chunk
 
 to newFile1. When done, each file contains chunks of n*r records. You now
 
 make newFilex the inputs, read/merge the records from those chunks
 
 outputting to another file1, when you reach the end of the first chunk in
 
 the files you then read/merge the second chunk into another file2. You
 
 repeat this process until you end up with only one chunk in one file.
 
 -- 
 
   Wulfraed Dennis Lee Bieber AF6VN
 
 wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/

The way you mentioned for merging the file is an option but that will involve a 
lot of I/O operation. Also, I do not want the size of the file to increase 
beyond a certain point. When I reach the file size upto a certain limit, I want 
to start writing in a new file. This is because I want to store them in memory 
again later.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Wednesday, February 5, 2014 12:51:31 AM UTC+5:30, Dave Angel wrote:
 Ayushi Dalmia ayushidalmia2...@gmail.com Wrote in message:
 
 
 
  
 
  Where am I going wrong? What are the alternatives I can try?
 
 
 
 You've rejected all the alternatives so far without showing your
 
  code, or even properly specifying your problem.
 
 
 
 To get the total size of a list of strings,  try (untested):
 
 
 
 a = sys.getsizeof (mylist )
 
 for item in mylist:
 
 a += sys.getsizeof (item)
 
 
 
 This can be high if some of the strings are interned and get
 
  counted twice. But you're not likely to get closer without some
 
  knowledge of the data objects and where they come
 
  from.
 
 
 
 -- 
 
 DaveA

Hello Dave, 

I just thought that saving others time is better and hence I explained only the 
subset of my problem. Here is what I am trying to do:

I am trying to index the current wikipedia dump without using databases and 
create a search engine for Wikipedia documents. Note, I CANNOT USE DATABASES.
My approach:

I am parsing the wikipedia pages using SAX Parser, and then, I am dumping the 
words along with the posting list (a list of doc ids in which the word is 
present) into different files after reading 'X' number of pages. Now these 
files may have the same word and hence I need to merge them and write the final 
index again. Now these final indexes must be of limited size as I need to be of 
limited size. This is where I am stuck. I need to know how to determine the 
size of content in a variable before I write into the file.

Here is the code for my merging:

def mergeFiles(pathOfFolder, countFile):
listOfWords={}
indexFile={}
topOfFile={}
flag=[0]*countFile
data=defaultdict(list)
heap=[]
countFinalFile=0
for i in xrange(countFile):
fileName = pathOfFolder+'\index'+str(i)+'.txt.bz2'
indexFile[i]= bz2.BZ2File(fileName, 'rb')
flag[i]=1
topOfFile[i]=indexFile[i].readline().strip()
listOfWords[i] = topOfFile[i].split(' ')
if listOfWords[i][0] not in heap:
heapq.heappush(heap, listOfWords[i][0])

while any(flag)==1:
temp = heapq.heappop(heap)
for i in xrange(countFile):
if flag[i]==1:
if listOfWords[i][0]==temp:

//This is where I am stuck. I cannot wait until memory 
//error, as I need to do some postprocessing too.
try:
data[temp].extend(listOfWords[i][1:])
except MemoryError:
writeFinalIndex(data, countFinalFile, pathOfFolder)
data=defaultdict(list)
countFinalFile+=1

topOfFile[i]=indexFile[i].readline().strip()   
if topOfFile[i]=='':
flag[i]=0
indexFile[i].close()
os.remove(pathOfFolder+'\index'+str(i)+'.txt.bz2')
else:
listOfWords[i] = topOfFile[i].split(' ')
if listOfWords[i][0] not in heap:
heapq.heappush(heap, listOfWords[i][0])
writeFinalIndex(data, countFinalFile, pathOfFolder)

countFile is the number of files and writeFileIndex method writes into the file.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote:
 On 2014-02-04 14:21, Dave Angel wrote:
 
  To get the total size of a list of strings,  try (untested):
 
  
 
  a = sys.getsizeof (mylist )
 
  for item in mylist:
 
  a += sys.getsizeof (item)
 
 
 
 I always find this sort of accumulation weird (well, at least in
 
 Python; it's the *only* way in many other languages) and would write
 
 it as
 
 
 
   a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)
 
 
 
 -tkc

This also doesn't gives the true size. I did the following:

import sys
data=[]
f=open('stopWords.txt','r')

for line in f:
line=line.split()
data.extend(line)

print sys.getsizeof(data)

where stopWords.txt is a file of size 4KB
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Rustom Mody
On Wednesday, February 5, 2014 11:05:05 AM UTC+5:30, Ayushi Dalmia wrote:
 This also doesn't gives the true size. I did the following:

 import sys
 data=[]
 f=open('stopWords.txt','r')

 for line in f:
 line=line.split()
 data.extend(line)

 print sys.getsizeof(data)

 where stopWords.txt is a file of size 4KB

Try getsizeof(.join(data))

General advice:
- You have been recommended (by Chris??) that you should use a database
- You say you cant use a database (for whatever reason)

Now the fact is you NEED database (functionality)
How to escape this catch-22 situation?
In computer science its called somewhat sardonically Greenspun's 10th rule

And the best way out is to 

1 isolate those aspects of database functionality you need 
2 temporarily forget about your original problem and implement the dbms
(subset of) DBMS functionality you need
3 Use 2 above to implement 1
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Finding size of Variable

2014-02-04 Thread Ayushi Dalmia
On Wednesday, February 5, 2014 11:15:09 AM UTC+5:30, Rustom Mody wrote:
 On Wednesday, February 5, 2014 11:05:05 AM UTC+5:30, Ayushi Dalmia wrote:
 
  This also doesn't gives the true size. I did the following:
 
 
 
  import sys
 
  data=[]
 
  f=open('stopWords.txt','r')
 
 
 
  for line in f:
 
  line=line.split()
 
  data.extend(line)
 
 
 
  print sys.getsizeof(data)
 
 
 
  where stopWords.txt is a file of size 4KB
 
 
 
 Try getsizeof(.join(data))
 
 
 
 General advice:
 
 - You have been recommended (by Chris??) that you should use a database
 
 - You say you cant use a database (for whatever reason)
 
 
 
 Now the fact is you NEED database (functionality)
 
 How to escape this catch-22 situation?
 
 In computer science its called somewhat sardonically Greenspun's 10th rule
 
 
 
 And the best way out is to 
 
 
 
 1 isolate those aspects of database functionality you need 
 
 2 temporarily forget about your original problem and implement the dbms
 
 (subset of) DBMS functionality you need
 
 3 Use 2 above to implement 1

Hello Rustum,

Thanks for the enlightenment. I did not know about the Greenspun's Tenth rule. 
It is interesting to know that. However, it is an academic project and not a 
research one. Hence I donot have the liberty to choose what to work with. Life 
is easier with databases though, but I am not allowed to use them. Thanks for 
the tip. I will try to replicate those functionality.
-- 
https://mail.python.org/mailman/listinfo/python-list