Re: [Python-Dev] Memory Error while reading large file

2008-07-31 Thread Steven D'Aprano
On Thu, 31 Jul 2008 03:01:42 pm Sumant Gupta wrote:
> Hi
>
> I have a problem reading very large text file.
> When I call the len function to get the total lines in python file.i
> get memory error . I am reading the list of files in a loop ,2 files
> are read properly but when the third file is read , It gives an
> memory error .

I'm not completely sure, but I think that means you're out of memory.

If you have an actual question, I think you would be better off posting 
to the comp.lang.python newsgroup. This mailing list is for development 
of the Python compiler, not for writing Python programs.

I'll save you some time: when you post to comp.lang.python, you should 
post the actual error message you get, and the code that fails.


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
> Guido says:
> 
> > Actually, we'd need to look at the various other APIs in Py3k before we can
> > decide whether these should be considered taking or returning bytes or text.
> > It looks like all other APIs in the Py3k version of urllib treat URLs as
> > text.
> 
> 
> Yes, as I said in the bug tracker, I've groveled over the entire stdlib to
> see how my patch affects the behaviour of dependent code. Aside from a few
> minor bits which assumed octets (and did their own encoding/decoding) (which
> I fixed), all the code assumes strings and is very happy to go on assuming
> this, as long as the URIs are encoded with UTF-8, which they almost
> certainly are.

I'm not sure that's sufficient review, though I agree it's necessary.
The major consumers of quote/unquote are not in the Python standard
library.

> (quote will accept either type, while
> unquote will output a str, there will be a new function unquote_to_bytes
> which outputs a bytes - is everyone happy with that?)

No, so don't ask.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
> Of course, it's un-Pythonic to enforce pedantry, and we pedants can
> use a string->string encoder correctly.

Sure.  All I was asking was that we not break the existing usage of
the standard library "unquote" by producing a string by *assuming* a
UTF-8 encoded string is what's in those percent-encoded bytes (instead
of, say, ISO 2022-JP).  Let the "new" function produce a string:
"unquote_as_string".

>  > You really want me to remove the encoding= named argument? And hard-code
>  > UTF-8 into these functions?
> 
> A quoting function that accepts bytes *must* have an encoding
> argument.

Huh?  What would it use it for?  The string, if string it is, is
already encoded as octets.  All it needs to do is percent-encode the
reserved octets.  So far as I can see, the "unquote_as_string" is the
function that needs the encoding.  Ah, it's too late, I'll pick this
up tomorrow.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Bill Janssen
Also see .

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Stephen J. Turnbull
Bill Janssen writes:

 > > A quoting function that accepts bytes *must* have an encoding
 > > argument.
 > 
 > Huh?  What would it use it for?

Ah, you're right.  I was thinking in terms of an URI builder, where the
quoter would do any required conversion (eg, if the bytes represented
a string in Japanese) to another (possibly scheme-mandated) encoding
(typically UTF-8).  But that doesn't really make sense; the URI
builder should know what to do, and that's a better place to do such
conversions.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Alright, I've uploaded the new patch which adds the two requested
bytes-oriented functions, as well as accompanying docs and tests.
http://bugs.python.org/issue3300
http://bugs.python.org/file11009/parse.py.patch6

I'd rather have two pairs of functions, so that those who want to give
> the readers of their code a clue can do so. I'm not opposed to having
> redundant functions that accept either string or bytes though, unless
> others prefer not to.
>

Yes, I was in a similar mindset. So the way I've implemented it, quote
accepts either a bytes or a str. Then there's a new function
quote_from_bytes, which is defined precisely like this:

quote_from_bytes = quote
>

So either name can be used on either input type, with the idea being that
you should use quote on a str, and quote_from_bytes on a bytes. Is this a
good idea or should it be rewritten so each function permits only one input
type?

Sorry, I have yet to look at the tracker (only so many minutes in a day...).


Ah, I didn't mean offense. Just that one could read the sordid details of my
investigation on the tracker if one so desired ;)

I don't mind an encoding argument, as long as it isn't used to change
> the return type (as Bill was proposing).


Yeah, my unquote always outputs a str, and unquote_to_bytes always outputs a
bytes.

Matt
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Bill wrote:

I'm not sure that's sufficient review, though I agree it's necessary.
>
The major consumers of quote/unquote are not in the Python standard
>
library.


I figured that Python 3.0 is designed to fix things, with the breaking
third-party code being an acceptable side-effect of that. So the most
important thing when 3.0 is released is that the stdlib is internally
consistent. All other code is "allowed" to be broken. So I've investigated
all the code necessary.

Having said this, my patch breaks almost no code. Your suggestion breaks a
hell of a lot.

Sure.  All I was asking was that we not break the existing usage of
>
the standard library "unquote" by producing a string by *assuming* a
>
UTF-8 encoded string is what's in those percent-encoded bytes (instead
>
of, say, ISO 2022-JP).  Let the "new" function produce a string:
>
"unquote_as_string".


You're assuming that a Python 2.x "str" is the same thing as a Python 3.0
"bytes". It isn't. (If it was, this transition would be trivial). A Python 2
"str" is a non-Unicode string. It can be printed, concatenated with Unicode
strings, etc etc. It has the semantics of a string. The Python 3.0 "bytes"
is not a string at all.

What you're saying is "the old behaviour was to output a bytes, so the new
behaviour should be consistent". But that isn't true - the old behaviour was
to output a string (a non-Unicode one). People, and code, expect it to
output something with string semantics. So making unquote output a bytes is
just as big a change as making it output a (unicode) str. Python 3.0 doesn't
have a type which is like Python 2's "str" type (which is good - that type
was very messy). So the argument that "Python 2 unquote outputs a bytes, so
we should too" is not legitimate.



If you want to keep pushing this, please install my new patch (patch 6).
Then rename "unquote" to "unquote_to_string" and rename "unquote_to_bytes"
to "unquote", and witness the havoc that ensues. Firstly, you break most
Internet-related modules in the standard library.

10 tests failed:
>
test_SimpleHTTPServer test_cgi test_email test_http_cookiejar
>
test_httpservers test_robotparser test_urllib test_urllib2
>
test_urllib2_localnet test_wsgiref
>

Fixing these isn't a matter of changing test cases (which all but one of my
fixes were). It would require changes to all the modules, to get them to
deal with bytes instead of strings (which would generally mean spraying
.decode("utf-8") all over the place). My code, on the other hand, "tends to
be" compatible with 2.x code.

Here I'm seeing:
BytesWarning: Comparison between bytes and string.
TypeError: expected an object with the buffer interface
http.client.BadStatusLine

For another example, try this:

>>> import http.server
>>> s = http.server.HTTPServer(('',8000),
http.server.SimpleHTTPRequestHandler)
>>> s.serve_forever()

The current (unpatched) build works, but links to files with non-ASCII
filenames (eg. '漢字') break, because of the URL. This is one example of my
patch directly fixing a bug in real code. With my patch applied, the links
work fine *because URL quoting and unquoting are consistent, and work on all
Unicode characters*.

If you change unquote to output a bytes, it breaks completely. You get a
"TypeError: expected an object with the buffer interface" as soon as the
user visits the page.

Matt
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Matrix product

2008-07-31 Thread Nick Coghlan

Sebastien Loisel wrote:

Dear Raymond,

Thank you for your email.


I think much of this thread is a repeat of conversations
that were held for PEP 225:
http://www.python.org/dev/peps/pep-0225/

That PEP is marked as deferred.  Maybe it's time to
bring it back to life.


This is a much better PEP than the one I had found, and would solve
all of the numpy problems. The PEP is very well thought-out.


A very interesting read! I wouldn't support some of the more exotic 
elements tacked on to the end (particularly the replacement of the now 
thoroughly entrenched bitwise operators), but the basic idea of 
providing ~op variants of several operators seems fairly sound. I'd be 
somewhat inclined to add ~not, ~and and ~or to the list  even though 
that would pretty much force the semantics to be elementwise for the ~ 
variants (since the standard not, and and or are always objectwise and 
without PEP 335 there's no way for an object to change that).


Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Jeff Hall
>
>
> quote_from_bytes = quote
>>
>
> So either name can be used on either input type, with the idea being that
> you should use quote on a str, and quote_from_bytes on a bytes. Is this a
> good idea or should it be rewritten so each function permits only one input
> type?
>
>
so you can use quote_from_bytes on strings? I assumed Guido meant it was
okay to have quote accept string/byte input and have a function that was
redundant but limited in what it accepted (i.e. quote_from_bytes accepts
only bytes)

I suppose your implementation doesn't break anything... it just strikes me
as "odd"
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
> so you can use quote_from_bytes on strings?

Yes, currently.

> I assumed Guido meant it was okay to have quote accept string/byte input and 
> have a function that was redundant but limited in what it accepted (i.e. 
> quote_from_bytes accepts only bytes)
>
> I suppose your implementation doesn't break anything... it just strikes me as 
> "odd"

Yeah. I get exactly what you mean. Worse is it takes an
encoding/replace argument.

I'm in two minds about whether it should allow this or not. On one
hand, it kind of goes with the Python philosophy of not artificially
restricting the allowed types. And it avoids redundancy in the code.

But I'd be quite happy to let quote_from_bytes restrict its input to
just bytes, to avoid confusion. It would basically be a
slightly-modified version of quote:

def quote_from_bytes(s, safe = '/'):
if isinstance(safe, str):
safe = safe.encode('ascii', 'ignore')
cachekey = (safe, always_safe)
if not isinstance(s, bytes) or isinstance(s, bytearray):
raise TypeError("quote_from_bytes() expected a bytes")
try:
quoter = _safe_quoters[cachekey]
except KeyError:
quoter = Quoter(safe)
_safe_quoters[cachekey] = quoter
res = map(quoter, s)
return ''.join(res)

(Passes test suite).

I think I'm happier with this option. But the "if not isinstance(s,
bytes) or isinstance(s, bytearray)" is not very nice.
(The only difference to quote besides the missing arguments is the two
lines beginning "if not isinstance". Maybe we can generalise the rest
of the function).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Matrix product

2008-07-31 Thread Cesare Di Mauro
Nick Coghlan write:

> Sebastien Loisel wrote:
>> Dear Raymond,
>>
>> Thank you for your email.
>>
>>> I think much of this thread is a repeat of conversations
>>> that were held for PEP 225:
>>> http://www.python.org/dev/peps/pep-0225/
>>>
>>> That PEP is marked as deferred.  Maybe it's time to
>>> bring it back to life.
>>
>> This is a much better PEP than the one I had found, and would solve
>> all of the numpy problems. The PEP is very well thought-out.
>
> A very interesting read! I wouldn't support some of the more exotic
> elements tacked on to the end (particularly the replacement of the now
> thoroughly entrenched bitwise operators), but the basic idea of
> providing ~op variants of several operators seems fairly sound. I'd be
> somewhat inclined to add ~not, ~and and ~or to the list  even though
> that would pretty much force the semantics to be elementwise for the ~
> variants (since the standard not, and and or are always objectwise and
> without PEP 335 there's no way for an object to change that).
>
> Cheers,
> Nick.

I agree: adding ~op will be very interesting.

For example, we can easily provide case insensitive comparisons for string:

if foo ~== 'Spam':
  print "It's spam!'

equivalent to:

if foo.upper() == 'SPAM:
  print "It's spam!'

we can save both CPU time and memory to build
a brand new string that will be discarded after the comparison...

It will be also useful to redefine /, // and ** operators to do some common 
operations:

'spam, egg' / ', '  could be equivalent to iter('spam, egg'.split(', ')) # 
Generates an iterator

'spam, egg' // ', '  could be equivalent to 'spam, egg'.split(', ') # Generates 
a list

and ', ' ** ('spam', 'egg') could be equivalent to ', '.join(('spam', 'egg'))

but unfortunately we know that at the moment buil-in types
cannot be "extended" through "monkey patching"...

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] critical issues for 2.6 and 3.0

2008-07-31 Thread Martin v. Löwis
> I've never been through a Python release before, but I find these
> statistics rather worrying if we want to make the October release
> date.

I don't worry. Every Python release had bugs, and there will be
2.6.1 and 3.0.1 releases.

The only sure way to resolve bugs is to revert features. If a certain
module is cause of too many serious bugs, it should be dropped from
the release (perhaps not from the source repository - just removed
from all build processes).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Memory Error while reading large file

2008-07-31 Thread Martin v. Löwis
> If you have an actual question

I'd like to stress this point as well. Any good posting one
wants an answer to must include a question, and that question
must be explicitly phrased, and terminated with a question
mark.

(maybe the use of the question mark is more typical in German   
than in English; my stomach turns around when I read a question
that ends with a full stop)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Memory Error while reading large file

2008-07-31 Thread Scott Dial

Martin v. Löwis wrote:

(maybe the use of the question mark is more typical in German   
than in English; my stomach turns around when I read a question
that ends with a full stop)


There is no loss in translation here. Proper English requires the use of 
a question mark just the same as German, but you can't assume proper 
English will be used on a forum of communication like this one. The OP 
stated his problem, and maybe he doesn't know enough English to actually 
ask his question (I'm guessing by the name "Sumant Gupta"). I don't 
believe you are native speaker yourself, and I would've expected more 
sympathy from you. Lord knows I hope the recipients of any German I 
write will have some.


-Scott

--
Scott Dial
[EMAIL PROTECTED]
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Memory Error while reading large file

2008-07-31 Thread Guido van Rossum
On Thu, Jul 31, 2008 at 2:38 PM, Scott Dial
<[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
>>
>> (maybe the use of the question mark is more typical in German
>> than in English; my stomach turns around when I read a question
>> that ends with a full stop)
>
> There is no loss in translation here. Proper English requires the use of a
> question mark just the same as German, but you can't assume proper English
> will be used on a forum of communication like this one. The OP stated his
> problem, and maybe he doesn't know enough English to actually ask his
> question (I'm guessing by the name "Sumant Gupta"). I don't believe you are
> native speaker yourself, and I would've expected more sympathy from you.
> Lord knows I hope the recipients of any German I write will have some.

On the level of mercy: (a) this is python-dev, which is explicitly
*not* for user questions; (b) the OP didn't show any actual code nor
error messages, which makes it impossible to help him unless you're
clairvoyant. Unfortunately we get quite a few of such ill-defined
problems in this list, despite it not being the wrong list, and my own
patience wears thin at times too. (I also get quite a bit of personal
mail of the same nature.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Memory Error while reading large file

2008-07-31 Thread Greg Ewing

Martin v. Löwis wrote:


(maybe the use of the question mark is more typical in German   
than in English; my stomach turns around when I read a question
that ends with a full stop)


No, it's required in English, too.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Looking for the email addresses of some committers

2008-07-31 Thread Brett Cannon
If someone can email me the addresses for the following committers, I
would appreciate it:

* Greg Stein
* Jackilyn Hoxworth
* Jeff Senn
* John Benediktsson
* Mateusz Rukowicz
* Richard Emslie
* Roy Smith

And if any of the above people no longer want commit privileges or
were only given them temporarily (e.g., GSoC) that would be helpful as
well.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com