Re: socket send O(N**2) complexity

2009-09-22 Thread Antoine Pitrou
exarkun at twistedmatrix.com writes:
 
 To the OP, you can get view-like behavior with the buffer builtin. 

And, on Python 3 (or even the 2.7 in development), you can use the memoryview
builtin for similar effect.

Regards

Antoine.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-22 Thread Nobody
On Mon, 21 Sep 2009 16:33:08 -0400, Jack Diederich wrote:

 AIUI, as a python string is imutable, a slice of a string is a
 new string which points (C char *) to the start of the slice data
 and with a length that is the length of the slice, about 8 bytes
 on 32 bit machine.
 
 Not in CPython.  While some special strings are re-used (empty string,
 single letters) if you take a slice of an existing string a new buffer
 is allocated and the slice memcpy'd into it.

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-22 Thread Jack Diederich
On Tue, Sep 22, 2009 at 7:25 PM, Nobody nob...@nowhere.com wrote:
 On Mon, 21 Sep 2009 16:33:08 -0400, Jack Diederich wrote:

 AIUI, as a python string is imutable, a slice of a string is a
 new string which points (C char *) to the start of the slice data
 and with a length that is the length of the slice, about 8 bytes
 on 32 bit machine.

 Not in CPython.  While some special strings are re-used (empty string,
 single letters) if you take a slice of an existing string a new buffer
 is allocated and the slice memcpy'd into it.

 Er, why?

 I can understand doing this for mutable sequences, but it doesn't seem
 to make much sense for strings.

Because it adds large amounts of complexity.  If you have a string
that points to 8 bytes in a 1Gig string and the giant string goes out
of scope, what do you do?  You can keep it alive, you can copy out 8
bytes; you can do one of those things now or you can wait until later
when some kind of garbage collection happens.   Once you have that
system why not pile ropes on top of it?  and once you have ropes why
not just make strings mutable ...

If you want much longer mailing list threads about the topic search
the python-dev archives.

-Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-22 Thread Steven D'Aprano
On Wed, 23 Sep 2009 00:25:52 +0100, Nobody wrote:

 On Mon, 21 Sep 2009 16:33:08 -0400, Jack Diederich wrote:
 
 AIUI, as a python string is imutable, a slice of a string is a new
 string which points (C char *) to the start of the slice data and with
 a length that is the length of the slice, about 8 bytes on 32 bit
 machine.
 
 Not in CPython.  While some special strings are re-used (empty string,
 single letters) if you take a slice of an existing string a new buffer
 is allocated and the slice memcpy'd into it.
 
 Er, why?
 
 I can understand doing this for mutable sequences, but it doesn't seem
 to make much sense for strings.

Consider:

huge_string = abcdef*1000*1000*1000
tiny_string = huge_string[42:45]
del huge_string

Under the current behaviour, huge_string will be garbage collected. With 
the proposed string-view, it won't be. It would be surprising and 
disturbing if taking a tiny slice of a huge string prohibited the huge 
string from being garbage collected.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


socket send O(N**2) complexity

2009-09-21 Thread Zac Burns
The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword,  but it would be
there for symmetry.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread Mike
On Sep 21, 2:03 pm, Zac Burns zac...@gmail.com wrote:
 The mysocket.mysend method given 
 athttp://docs.python.org/howto/sockets.htmlhas an (unwitting?) O(N**2)
 complexity for long msg due to the string slicing.

 I've been looking for a way to optimize this, but aside from a pure
 python 'string slice view' that looks at the original string I can't
 think of anything. Perhaps start and end keywords could be added to
 send? I can't think of a reason for the end keyword,  but it would be
 there for symmetry.

for ch in msg: add to current buffer or start another buffer
for buf in buffers: send(...buf)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread Rob Williscroft
Zac Burns wrote in news:mailman.211.1253559803.2807.python-l...@python.org 
in comp.lang.python:

 The mysocket.mysend method given at
 http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
 complexity for long msg due to the string slicing.
 
 I've been looking for a way to optimize this, but aside from a pure
 python 'string slice view' that looks at the original string I can't
 think of anything. Perhaps start and end keywords could be added to
 send? I can't think of a reason for the end keyword,  but it would be
 there for symmetry.
 

I ran this script on various versions of python I have access to:

#encoding: utf-8
raw_input( start )

s = 'x' * 100
r = [None] * 1000

raw_input( allocated 1 meg +  )

for i in xrange(1000):
  r[i] = s[:]

raw_input( end )

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between allocated 1 meg +  and end

pypy-c (1.0.0) showed a 30k jump, and IronPython 2.0 showed a few megs
jump.

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a copy of s, its only the a few bytes of 
metadata (the pointer and the length) that is really copied, the 
strings character data stays where it is.

So the code you cite is in fact O(N) as the copy is constant size.

Rob.
-- 
http://www.victim-prime.dsl.pipex.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread exarkun

On 08:00 pm, r...@freenet.co.uk wrote:
Zac Burns wrote in news:mailman.211.1253559803.2807.python- 
l...@python.org

in comp.lang.python:

The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword,  but it would be
there for symmetry.


I ran this script on various versions of python I have access to:

#encoding: utf-8
raw_input( start )

s = 'x' * 100
r = [None] * 1000

raw_input( allocated 1 meg +  )

for i in xrange(1000):
 r[i] = s[:]

raw_input( end )

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between allocated 1 meg +  and end


You bumped into a special case that CPython optimizes.  s[:] is s.  If 
you repeat your test with s[1:], you'll see memory climb as one might 
normally expect.

pypy-c (1.0.0) showed a 30k jump, and IronPython 2.0 showed a few megs
jump.

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a copy of s, its only the a few bytes of
metadata (the pointer and the length) that is really copied, the
strings character data stays where it is.

So the code you cite is in fact O(N) as the copy is constant size.


This all (basically) valid for the special case of s[:].  For any other 
string slicing, though, the behavior is indeed O(N**2).


To the OP, you can get view-like behavior with the buffer builtin. 
Here's an example of its usage from Twisted, where it is employed for 
exactly the reason raised here:


http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py#L93

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread Jack Diederich
On Mon, Sep 21, 2009 at 4:00 PM, Rob Williscroft r...@freenet.co.uk wrote:
 AIUI, as a python string is imutable, a slice of a string is a
 new string which points (C char *) to the start of the slice data
 and with a length that is the length of the slice, about 8 bytes
 on 32 bit machine.

Not in CPython.  While some special strings are re-used (empty string,
single letters) if you take a slice of an existing string a new buffer
is allocated and the slice memcpy'd into it.

 So even though a slice assignment new_s = s[:] appears to a python
 programmer to make a copy of s, its only the a few bytes of
 metadata (the pointer and the length) that is really copied, the
 strings character data stays where it is.

There is a special case for copy in CPython that avoids copying the
string bytes, BUT in that case it just INCREFs the existing object and
returns it.  There are no new allocations at all.

Be very careful recommending optimizations based on how you think the
underlying implementation works without double checking with the code
first.

-Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread Rob Williscroft
 wrote in news:mailman.216.1253565002.2807.python-l...@python.org in 
comp.lang.python:

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between allocated 1 meg +  and end
 
 You bumped into a special case that CPython optimizes.  s[:] is s.  If 
 you repeat your test with s[1:], you'll see memory climb as one might 
 normally expect.
 

Thanks for the correction (to Jack also)

Rob.
-- 
http://www.victim-prime.dsl.pipex.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socket send O(N**2) complexity

2009-09-21 Thread Zac Burns
On Mon, Sep 21, 2009 at 2:10 PM, Rob Williscroft r...@freenet.co.uk wrote:
  wrote in news:mailman.216.1253565002.2807.python-l...@python.org in
 comp.lang.python:

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between allocated 1 meg +  and end

 You bumped into a special case that CPython optimizes.  s[:] is s.  If
 you repeat your test with s[1:], you'll see memory climb as one might
 normally expect.


 Thanks for the correction (to Jack also)

 Rob.
 --
 http://www.victim-prime.dsl.pipex.com/
 --
 http://mail.python.org/mailman/listinfo/python-list


Thanks! The buffer object was exactly the 'slice view' that I was
looking for to fix my version of the algorithm.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games
-- 
http://mail.python.org/mailman/listinfo/python-list