RedNotebook 1.1.2

2010-12-27 Thread Jendrik Seipp

RedNotebook 1.1.2 has been released.

You can get the tarball, the Windows installer and links to distribution 
packages at

http://rednotebook.sourceforge.net/downloads.html


What is RedNotebook?

RedNotebook is a **graphical journal** and diary helping you keep track 
of notes and thoughts. It includes a calendar navigation, customizable

templates, export functionality and word clouds. You can also format,
tag and search your entries. RedNotebook is available in the 
repositories of most common Linux distributions and a Windows installer 
is available. It is written in Python and uses GTK+ for its interface.



What's new?
---
* Add fullscreen mode (F11)
* Highlight all found occurences of the searched word (LP:614353)
* Highlight mixed markups (**__Bold underline__**)
* Highlight structured headers (=Part=, ==Subpart==, ===Section===, 
Subsection, =Subsubsection=)

* Document structured headers
* Highlight ``, , ''
* Write documentation about ``, , ''
* Let the preview and edit button have the same size
* Fix: Correctly highlight lists (LP:622456)
* Fix: Do not set maximized to True when sending RedNotebook to the tray 
(LP:657421)

* Fix: Add Ctrl-P shortcut for edit button (LP:685609)
* Fix: Add \ to the list of ignored chars for word clouds
* Fix: Escape characters before adding results to the search list
* Fix: Local links with whitespace in latex
* Windows: Fix opening linked files
* Windows: Do not center window to prevent alignment issues
* Windows: Fix image preview (LP:663944)
* Internal: Replace tabs by whitespace in source code
* Many translations updated

Cheers,
Jendrik





--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


Pogo 0.3.1

2010-12-27 Thread Jendrik Seipp
I am proud to announce the release of Pogo 0.3.1, probably the simplest 
and fastest audio player for Linux.


You can get the tarball and an Ubuntu deb package at
http://launchpad.net/pogo


What is Pogo?

Pogo plays your music. Nothing else. It tries to be fast and 
easy-to-use. Pogo's elementary-inspired design uses the screen-space 
very efficiently. It is especially well-suited for people who organize 
their music by albums on the harddrive. The main interface components 
are a directory tree and a playlist that groups albums in an innovative way.
Pogo is a fork of Decibel Audio Player. Supported file formats include 
Ogg Vorbis, MP3, FLAC, Musepack, Wavpack, and MPEG-4 AAC.

Pogo is written in Python and uses GTK+ and gstreamer.


What's new in
0.3.1 You are a radar detector (2010-12-26)
==
* When a track is added from nautilus etc. start playback if not already 
playing

* Show info messages when no music directories have been added
* Stop old search when user clears search field or enters new search phrase
* Add search shortcut (Ctrl-F)
* Do not allow adding root or home directory to music directories
* Translations updated

Cheers,
Jendrik





--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


Fw: Re: User input masks - Access Style

2010-12-27 Thread linmq
 On 2010-12-27, flebber  flebber.c...@gmail.com  wrote:

   Is there anyay to use input masks in python? Similar to the function
   found in access where a users input is limited to a type, length and
   format.

   So in my case I want to ensure that numbers are saved in a basic
   format.
   1) Currency so input limited to 000.00 eg 1.00, 2.50, 13.80 etc

 Some GUIs provide this functionality or provide callbacks for validation
 functions that can determine the validity of the input. ? don't know of
 any modules that provide formatted input in a terminal. ?ost terminal
 input functions just read from stdin (in this case a buffered line)
 and output that as a string. ?t is easy enough to validate whether
 terminal input is in the proper.

 Your example time code might look like:

 ... import re
 ... import sys
 ...
 ... # get the input
 ... print(Please enter time in the format 'MM:SS:HH': , end=)
 ... timeInput = input()
 ...
 ... # validate the input is in the correct format (usually this would be in
 ... # loop that continues until the user enters acceptable data)
 ... if re.match(r'''^[0-9]{2}:[0-9]{2}:[0-9]{2}$''', timeInput) == None:
 ... ??print(I'm sorry, your input is improperly formated.)
 ... ??sys.exit(1)
 ...
 ... # break the input into its componets
 ... componets = timeInput.split(:)
 ... minutes = int(componets[0])
 ... seconds = int(componets[1])
 ... microseconds = int(componets[2])
 ...
 ... # output the time
 ... print(Your time is:  + %02d % minutes + : + %02d % seconds + : +
 ... ??%02d % microseconds)

 Currency works the same way using validating it against:
 r'''[0-9]+\.[0-9]{2}'''

   For sports times that is time duration not a system or date times
   should I assume that I would need to calculate a user input to a
   decimal number and then recalculate it to present it to user?

 I am not sure what you are trying to do or asking. ?ython provides time,
 date, datetime, and timedelta objects that can be used for date/time
 calculations, locale based formatting, etc. ?hat you use, if any, will
 depend on what you are actually tring to accomplish. ?our example doesn't
 really show you doing much with the time so it is difficult giving you any
 concrete recommendations.
 
 yes you are right I should have clarified. The time is a duration over
 distance, so its a speed measure.  Ultimately I will need to store the
 times so I may need to use something likw sqlAlchemy but I am nowehere
 near the advanced but I know that most Db's mysql, postgre etc don't
 support time as a duration as such and i will probably need to store
 it as a decimal and convert it back for the user.
 -- 
 http://mail.python.org/mailman/listinfo/python-list

You can let a user to separately input the days, hours, minutes, etc.
And use the type timedelta to store the time duration:

datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, 
hours[, weeks]]])

Beyond 2.7, you can use timedelta.total_seconds() to convert the time 
duration to a number for database using. And later restore the number
back to timedelta by timedelta(seconds=?).

Refer to:
http://docs.python.org/library/datetime.html?highlight=timedelta#timedelta-objects

--

---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: round in 2.6 and 2.7

2010-12-27 Thread Mark Dickinson
On Dec 23, 6:57 pm, Hrvoje Niksic hnik...@xemacs.org wrote:
 I stumbled upon this.  Python 2.6:

  round(9.95, 1)

 10.0

 So it seems that Python is going out of its way to intuitively round
 9.95, while the repr retains the unnecessary digits.

No, Python's not doing anything clever here.  Python 2.6 uses a simple
rounding algorithm that frequently gives the wrong answer for halfway
or near-halfway cases.  It's just luck that in this particular case it
gives the apparently-correct (but actually incorrect) answer.
Martin's already explained that the 2.7 behaviour is correct, and
agrees with string formatting.  However, note that there's still a
disconnect between these two operations in Python 2.7:

 round(1.25, 1)
1.3
 format(1.25, '.1f')
'1.2'

That's because 'round' in Python 2.x (including 2.7) still rounds
exact halfway cases away from zero, while string formatting rounds
them to the value with even last digit.  In Python 3.x, even this
discrepancy is fixed---everything does round-halfway-to-even.

 Is the change to round() expected?

Expected, and intentional. :-)

[Martin]
 Float-to-string and string-to-float conversions are correctly rounded.
 The round() function is also now correctly rounded.

 Not sure that this is correct English; I think it means that the
 round() function is now correct.

Well, the correct result of the example the OP gave would be 9.9
exactly.  But since 9.9 isn't exactly representable as a Python float,
we necessarily get an approximation.  The language above is intended
to convey that it's the 'correctly rounded' approximation---that is,
the closest Python float to the true value of 9.9 (with halfway cases
rounded to even, as usual).

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string u'hyv\xe4' to file as 'hyvä'

2010-12-27 Thread Alex Willmer
On Dec 27, 6:47 am, Mark Tolonen metolone+gm...@gmail.com wrote:
 gintare g.statk...@gmail.com wrote in message
  In file i find 'hyv\xe4' instead of hyv .

 When you open a file with codecs.open(), it expects Unicode strings to be
 written to the file.  Don't encode them again.  Also, .writelines() expects
 a list of strings.  Use .write():

     import codecs
     item=u'hyv\xe4'
     F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
     F.write(item)
     F.close()

Gintare, Mark's code is correct. When you are reading the file back
make sure you understand what you are seeing:

 F2 = codecs.open('finnish.txt', 'r', 'utf8')
 item2 = F2.read()
 item2
u'hyv\xe4'

That might like as though item2 is 7 characters long, and it contains
a backslash followed by x, e, 4. However item2 is identical to item,
they both contain 4 characters - the final one being a-umlaut. Python
has shown the string using a backslash escape, because printing a non-
ascii character might fail. You can see this directly, if your Python
session is running in a terminal (or GUI) that can handle non-ascii
characters:

 print item2
hyvä
-- 
http://mail.python.org/mailman/listinfo/python-list


Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Hi!

I'm trying to solve a computational problem and of course speed and size is 
important there. Apart from picking the right algorithm, I came across an 
idea that could help speed up things and keep memory requirements down. What 
I have is regions described by min and max coordinates. At first, I just 
modeled these as a simple class containing two values for each axis.

In a second step, I derived this class from tuple instead of object. Some 
code then moved from __init__ to __new__ and some code that modified these 
objects had to be changed to replace them instead. The upside to this is 
that they can be used as keys in sets and dicts, which isn't the case for 
mutable types[1].

What I'm now considering is to only allow a single instance of these objects 
for each set of values, similar to interned strings. What I would gain is 
that I could safely compare objects for identity instead of equality. What 
I'm not yet sure is how much overhead that would give me and/or how to keep 
it low. The idea is to store each instance in a set and after creating a new 
object I would first look up an equal object in the global set and return 
that instead, otherwise add the new one.

The problem I foresee is that if I define equality as identity, this lookup 
when creating will never eliminate duplicates. If I only fall back to 
equality comparison for non-identical objects, I would probably sacrifice 
most of the gain. If I build a dict mapping between the values and the 
actual objects, I would have doubled the required memory and uselessly store 
the same values twice there.

Am I looking in the wrong direction? Is there some better approach? Please 
don't tell me to use C, as I'm specifically interested in learning Python, 
I'm pretty sure I could have solved the problem quickly in C++ otherwise. 
Other suggestions?

Cheers!

Uli


[1] Somebody correct me if I'm wrong, but I believe I could have defined a 
hashing function for the type and thus allowed its use in a set or dict, 
right? However, goofing up because you accidentally modified an object and 
changed its hash value is something I don't want to risk anyway.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fw: Re: User input masks - Access Style

2010-12-27 Thread flebber
On Dec 27, 7:57 pm, linmq li...@neusoft.com wrote:
  On 2010-12-27, flebber  flebber.c...@gmail.com  wrote:

    Is there anyay to use input masks in python? Similar to the function
    found in access where a users input is limited to a type, length and
    format.

    So in my case I want to ensure that numbers are saved in a basic
    format.
    1) Currency so input limited to 000.00 eg 1.00, 2.50, 13.80 etc

  Some GUIs provide this functionality or provide callbacks for validation
  functions that can determine the validity of the input. ? don't know of
  any modules that provide formatted input in a terminal. ?ost terminal
  input functions just read from stdin (in this case a buffered line)
  and output that as a string. ?t is easy enough to validate whether
  terminal input is in the proper.

  Your example time code might look like:

  ... import re
  ... import sys
  ...
  ... # get the input
  ... print(Please enter time in the format 'MM:SS:HH': , end=)
  ... timeInput = input()
  ...
  ... # validate the input is in the correct format (usually this would be in
  ... # loop that continues until the user enters acceptable data)
  ... if re.match(r'''^[0-9]{2}:[0-9]{2}:[0-9]{2}$''', timeInput) == None:
  ... ??print(I'm sorry, your input is improperly formated.)
  ... ??sys.exit(1)
  ...
  ... # break the input into its componets
  ... componets = timeInput.split(:)
  ... minutes = int(componets[0])
  ... seconds = int(componets[1])
  ... microseconds = int(componets[2])
  ...
  ... # output the time
  ... print(Your time is:  + %02d % minutes + : + %02d % seconds + 
  : +
  ... ??%02d % microseconds)

  Currency works the same way using validating it against:
  r'''[0-9]+\.[0-9]{2}'''

    For sports times that is time duration not a system or date times
    should I assume that I would need to calculate a user input to a
    decimal number and then recalculate it to present it to user?

  I am not sure what you are trying to do or asking. ?ython provides time,
  date, datetime, and timedelta objects that can be used for date/time
  calculations, locale based formatting, etc. ?hat you use, if any, will
  depend on what you are actually tring to accomplish. ?our example doesn't
  really show you doing much with the time so it is difficult giving you any
  concrete recommendations.

  yes you are right I should have clarified. The time is a duration over
  distance, so its a speed measure.  Ultimately I will need to store the
  times so I may need to use something likw sqlAlchemy but I am nowehere
  near the advanced but I know that most Db's mysql, postgre etc don't
  support time as a duration as such and i will probably need to store
  it as a decimal and convert it back for the user.
  --
 http://mail.python.org/mailman/listinfo/python-list

 You can let a user to separately input the days, hours, minutes, etc.
 And use the type timedelta to store the time duration:

 datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, 
 hours[, weeks]]])

 Beyond 2.7, you can use timedelta.total_seconds() to convert the time
 duration to a number for database using. And later restore the number
 back to timedelta by timedelta(seconds=?).

 Refer 
 to:http://docs.python.org/library/datetime.html?highlight=timedelta#time...

 --

 ---
 Confidentiality Notice: The information contained in this e-mail and any 
 accompanying attachment(s)
 is intended only for the use of the intended recipient and may be 
 confidential and/or privileged of
 Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
 this communication is
 not the intended recipient, unauthorized use, forwarding, printing,  storing, 
 disclosure or copying
 is strictly prohibited, and may be unlawful.If you have received this 
 communication in error,please
 immediately notify the sender by return e-mail, and delete the original 
 message and all copies from
 your system. Thank you.
 ---

Very helpful thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Daniel Fetchinson
 I'm trying to solve a computational problem and of course speed and size is
 important there. Apart from picking the right algorithm, I came across an
 idea that could help speed up things and keep memory requirements down. What
 I have is regions described by min and max coordinates. At first, I just
 modeled these as a simple class containing two values for each axis.

 In a second step, I derived this class from tuple instead of object. Some
 code then moved from __init__ to __new__ and some code that modified these
 objects had to be changed to replace them instead. The upside to this is
 that they can be used as keys in sets and dicts, which isn't the case for
 mutable types[1].

 What I'm now considering is to only allow a single instance of these objects
 for each set of values, similar to interned strings. What I would gain is
 that I could safely compare objects for identity instead of equality. What
 I'm not yet sure is how much overhead that would give me and/or how to keep
 it low. The idea is to store each instance in a set and after creating a new
 object I would first look up an equal object in the global set and return
 that instead, otherwise add the new one.

 The problem I foresee is that if I define equality as identity, this lookup
 when creating will never eliminate duplicates. If I only fall back to
 equality comparison for non-identical objects, I would probably sacrifice
 most of the gain. If I build a dict mapping between the values and the
 actual objects, I would have doubled the required memory and uselessly store
 the same values twice there.

 Am I looking in the wrong direction? Is there some better approach? Please
 don't tell me to use C, as I'm specifically interested in learning Python,
 I'm pretty sure I could have solved the problem quickly in C++ otherwise.
 Other suggestions?

 Cheers!

 Uli


 [1] Somebody correct me if I'm wrong, but I believe I could have defined a
 hashing function for the type and thus allowed its use in a set or dict,
 right? However, goofing up because you accidentally modified an object and
 changed its hash value is something I don't want to risk anyway.

I believe what you are looking for is (some variant of) the singleton pattern:

http://en.wikipedia.org/wiki/Singleton_pattern

How it's done in python see http://www.google.com/search?q=python+singleton

Cheers,
Daniel

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Daniel Fetchinson wrote:
 I believe what you are looking for is (some variant of) the singleton
 pattern:
 
 http://en.wikipedia.org/wiki/Singleton_pattern

Actually, no. What I want is the flyweight pattern instead:

http://en.wikipedia.org/wiki/Flyweight_pattern

...but thank you for the approach of looking for a suitable pattern!

Cheers!

Uli

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: type(d) != type(d.copy()) when type(d).issubclass(dict)

2010-12-27 Thread Duncan Booth
kj no.em...@please.post wrote:

In (almost?) all cases any objects constructed by a subclass of a
builtin class will be of the original builtin class.
 
 
 What I *really* would like to know is: how do *you* know this (and
 the same question goes for the other responders who see this behavior
 of dict as par for the course).  Can you show me where it is in
 the documentation?  I'd really appreciate it.  TIA!
 

I know it from experience (and reading source). So far as I can tell it 
isn't explicitly stated anywhere in the documentation.

Mostly the documentation just says a method returns 'a copy of' prossibly 
with some modification. For example:

  str.capitalize()
  Return a copy of the string with its first character capitalized and the 
  rest lowercased.

That is ambiguous as it leaves open the question whether it returns a 
string that is a copy or an object of the type being operated upon. It 
happens to be the former but it doesn't actually say.


-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Daniel Fetchinson
 I believe what you are looking for is (some variant of) the singleton
 pattern:

 http://en.wikipedia.org/wiki/Singleton_pattern

 Actually, no. What I want is the flyweight pattern instead:

 http://en.wikipedia.org/wiki/Flyweight_pattern

Oh I see. I did not know about this pattern, but in my defense it
looks like a variant of the singleton pattern :)

Thanks! One always learns something new on python-list.

Cheers,
Daniel


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to pop the interpreter's stack?

2010-12-27 Thread Ethan Furman

Steven D'Aprano wrote:

On Sun, 26 Dec 2010 09:15:32 -0800, Ethan Furman wrote:


Steven D'Aprano wrote:

Right. But I have thought of a clever trick to get the result KJ was
asking for, with the minimum of boilerplate code. Instead of this:


def _pre_spam(args):
if condition(args):
raise SomeException(message)
if another_condition(args):
raise AnotherException(message)
if third_condition(args):
raise ThirdException(message)

def spam(args):
_pre_spam(args)
do_useful_work()


you can return the exceptions instead of raising them (exceptions are
just objects, like everything else!), and then add one small piece of
boilerplate to the spam() function:


def _pre_spam(args):
if condition(args):
return SomeException(message)
if another_condition(args):
return AnotherException(message)
if third_condition(args):
return ThirdException(message)

def spam(args):
exc = _pre_spam(args)
if exc: raise exc
do_useful_work()

-1

You failed to mention that cleverness is not a prime requisite of the
python programmer -- in fact, it's usually frowned upon.  The big
problem with the above code is you are back to passing errors in-band,
pretty much completely defeating the point of have an out-of-band
channel.


How is that any worse than making _pre_spam() a validation function that 
returns a bool?


def spam(args):
flag = _pre_spam(args)
if flag: raise SomeException()
do_useful_work()


Also -1.


Is that also frowned upon for being too clever?


Frowned upon for being out-of-band, and not as much fun as being clever. 
 ;)  I'm pretty sure you've expressed similar sentiments in the past 
(although my memory could be failing me).


More to the point, the OP had code that said:

  args, kwargs = __pre_spam(*args, **kwargs)

and __pre_spam was either passing back verified (and possibly modified)
parameters, or raising an exception.

~Ethan~

--
http://mail.python.org/mailman/listinfo/python-list


Re: User input masks - Access Style

2010-12-27 Thread Adam Tauno Williams
On Sun, 2010-12-26 at 20:37 -0800, flebber wrote:
 Is there anyay to use input masks in python? Similar to the function
 found in access where a users input is limited to a type, length and
 format.

http://faq.pygtk.org/index.py?file=faq14.022.htpreq=show

Typically this is handled by a callback on a keypress event.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string u'hyv\xe4' to file as 'hyvä'

2010-12-27 Thread MRAB

On 27/12/2010 05:56, gintare wrote:

Hello,
STILL do not work. WHAT to be done.

import codecs
item=u'hyv\xe4'
F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.writelines(item.encode('utf8'))

 F.close()

As I said in my previous post, you shouldn't be using .writelines, and
you shouldn't encode it when writing it to the file because codecs.open
will do that for you, that's its purpose:

import codecs
item = u'hyv\xe4'
F = codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.write(item)
F.close()



In file i find 'hyv\xe4' instead of hyvä.

Sorry for mistyping in previous letter about 'latin-1'. I was making
all possible combinations, when normal example syntax did not work,
before writting to this forum

regards,
gintare



On 27 Gruo, 01:14, MRABpyt...@mrabarnett.plus.com  wrote:

On 26/12/2010 22:43, gintare wrote:


Could you please help me with special characters saving to file.



I need to write the string u'hyv\xe4' to file.
I would like to open file and to have line 'hyv '



import codecs
word= u'hyv\xe4'
F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')


This opens the file using the Latin-1 encoding (although only if you
put the filename in quotes).




F.writelines(item.encode('Latin-1'))


This encodes the Unicode item (did you mean 'word'?) to a bytestring
using the Latin-1 encoding. You opened the file using Latin-1 encoding,
so this is pointless. You should pass a Unicode string; it will encode
it for you.

You're also passing a bytestring to the .writelines method, which
expects a list of strings.

What you should be doing is this:

  F.write(word)


F.writelines(item.encode('utf8'))


This encodes the Unicode item to a bytestring using the UTF-8 encoding.
This is also pointless. You shouldn't be encoding to UTF-8 and then
trying to write it to a file which was opened using Latin-1 encoding!




F.writelines(item)



F.close()



All three writelines gives the same result in finnish.txt:   hyv\xe4
i would like to find 'hyv '.- Slėpti cituojamą tekstą -


- Rodyti cituojamą tekstą -





--
http://mail.python.org/mailman/listinfo/python-list


Language Detection Library/Code

2010-12-27 Thread Shashwat Anand
Can anyone suggest a *language detection library* in python which works on a
phrase of say 2-5 words.


-- 
~l0nwlf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Keeping track of the N largest values

2010-12-27 Thread Stefan Sonnenberg-Carstens

Am 26.12.2010 19:51, schrieb Stefan Sonnenberg-Carstens:

l = []
K = 10

while 1:
a = input()
if len(l) == K:
l.remove(min(l))
l=[x for x in l if x  a] + [a] + [x for x in l if x  a]
print l 

A minor fault made it into my prog:

l = [0]
K = 10

while 1:
a = input()
l=[x for x in l if x  a] + [a] + [x for x in l if x  a]
if len(l) == K:
l.remove(min(l))
print l
--
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Terry Reedy

On 12/27/2010 6:05 AM, Ulrich Eckhardt wrote:

Hi!

I'm trying to solve a computational problem and of course speed and size is
important there. Apart from picking the right algorithm, I came across an
idea that could help speed up things and keep memory requirements down. What
I have is regions described by min and max coordinates.


What sort of numbers are the coordinates? If integers in a finite range, 
your problem is a lot simpler than if float of indefinite precision.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Alan Meyer

On 12/21/2010 3:16 AM, Stefan Behnel wrote:

Adam Tauno Williams, 20.12.2010 20:49:

...

You need to process the document as a stream of elements; aka SAX.


IMHO, this is the worst advice you can give.


Why do you say that?  I would have thought that using SAX in this 
application is an excellent idea.


I agree that for applications for which performance is not a problem, 
and for which we need to examine more than one or a few element types, a 
tree implementation is more functional, less programmer intensive, and 
provides an easier to understand approach to the data.  But with huge 
amounts of data where performance is a problem SAX will be far more 
practical.  In the special case where only a few elements are of 
interest in a complex tree, SAX can sometimes also be more natural and 
easy to use.


SAX might also be more natural for this application.  The O.P. could 
tell us for sure, but I wonder if perhaps his 1 GB XML file is NOT a 
true single record.  You can store an entire text encyclopedia in less 
than one GB.  What he may have is a large number logically distinct 
individual records of some kind, each stored as a node in an 
all-encompassing element wrapper.  Building a tree for each record could 
make sense but, if I'm right about the nature of the data, building a 
tree for the wrapper gives very little return for the high cost.


If that's so, then I'd recommend one of two approaches:

1. Use SAX, or

2. Parse out individual logical records using string manipulation on an 
input stream, then build a tree for one individual record in memory 
using one of the DOM or ElementTree implementations.  After each record 
is processed, discard its tree and start on the next record.


Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Alan Meyer

On 12/26/2010 3:15 PM, Tim Harig wrote:
...

The problem is that XML has become such a defacto standard that it
used automatically, without thought, even when there are much better
alternatives available.


I agree with you but, as you say, it has become a defacto standard.  As 
a result, we often need to use it unless there is some strong reason to 
use something else.


The same thing can be said about relational databases.  There are 
applications for which a hierarchical database makes more sense, is more 
efficient, and is easier to understand.  But anyone who recommends a 
database that is not relational had better be prepared to defend his 
choice with some powerful reasoning because his management, his 
customers, and the other programmers on his team are probably going to 
need a LOT of convincing.


And of course there are many applications where XML really is the best. 
 It excels at representing complex textual documents while still 
allowing programmatic access to individual items of information.


Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: __delitem__ feature

2010-12-27 Thread kj
In 4d181afb$0$30001$c3e8da3$54964...@news.astraweb.com Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info writes:

We know it because it explains the observable facts.

So does Monday-night quarterbacking...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Terry Reedy wrote:
 What sort of numbers are the coordinates? If integers in a finite range,
 your problem is a lot simpler than if float of indefinite precision.

Yes, indeed, I could optimize the amount of data required to store the data 
itself, but that would require application-specific handling of the data, 
which is actually not what I want to learn about. If it was that, I'd use a 
language where I have lower-level access to the system. ;)

Thanks nonetheless!

Uli

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Stefan Behnel

Alan Meyer, 27.12.2010 21:40:

On 12/21/2010 3:16 AM, Stefan Behnel wrote:

Adam Tauno Williams, 20.12.2010 20:49:

...

You need to process the document as a stream of elements; aka SAX.


IMHO, this is the worst advice you can give.


Why do you say that? I would have thought that using SAX in this
application is an excellent idea.


From my experience, SAX is only practical for very simple cases where 
little state is involved when extracting information from the parse events. 
A typical example is gathering statistics based on single tags - not a very 
common use case. Anything that involves knowing where in the XML tree you 
are to figure out what to do with the event is already too complicated. The 
main drawback of SAX is that the callbacks run into separate method calls, 
so you have to do all the state keeping manually through fields of the SAX 
handler instance.


My serious advices is: don't waste your time learning SAX. It's simply too 
frustrating to debug SAX extraction code into existence. Given how simple 
and fast it is to extract data with ElementTree's iterparse() in a memory 
efficient way, there is really no reason to write complicated SAX code instead.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Digitally Signing a XML Document (using SHA1+RSA or SHA1+DSA)

2010-12-27 Thread Anurag Chourasia
Hi All,

I have a requirement to digitally sign a XML Document using SHA1+RSA
or SHA1+DSA

Could someone give me a lead on a library that I can use to fulfill this
requirement?

The XML Document has values such as

RSASK-BEGIN RSA PRIVATE KEY-
MIIBOgIBAAJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqguw76g/jmeO6f4i31rDLVQ
n7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQMCQQCOd2lLpgRm6esMblO18WOG
3h8oCNcaydfUa1QmaX0apHlDFnI7UDXpYaHp2VL9gvtSJT5L3ZASMzxRPXJSvzcT
AiEA/16jQh18BAD4q3yk1gKw19I8OuJOYAxFYX9noCEFWUMCIQDWOiYfPtxK3A1s
AFARsDnnHTL4FbRPpiZ79vP+VgqojwIhAKo/F4Fo/VgApceobeQByzqMKCdBiZVd
g5ZU78AWA5DXAiEAjtFuv389hz1eSAA1YSAmmhN3UA54NRlu/U9NVDlccF8CIBkc
Z52oGxy/skwVwI5TBcB1YqXJTT47/6/hTAVMTwaA -END RSA PRIVATE
KEY-/RSASK

RSAPUBK-BEGIN PUBLIC KEY-
MFowDQYJKoZIhvcNAQEBBQADSQAwRgJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqgu
w76g/jmeO6f4i31rDLVQn7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQM= -END PUBLIC
KEY-/RSAPUBK

And the XML also has another node that has a Public Key with Modules and
Exponents etc that I apparently need to utilize.

RSAPK
  
M1bMd8XkGml7gkqV9kOoVSk0uvA1CqC7DvqD+OZ47p/iLfWsMtVCfuxiKW7rkLy836qcQac8Hzbi38DfJ8y7UbQ==/M
  EAw==/E
/RSAPK

I am a little thin on this concept and expecting if you could guide me to a
library/documentation that I could utilize.

Thanks a lot for your help.

Regards,
Anurag
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Adam Tauno Williams
On Mon, 2010-12-27 at 22:55 +0100, Stefan Behnel wrote:
 Alan Meyer, 27.12.2010 21:40:
  On 12/21/2010 3:16 AM, Stefan Behnel wrote:
  Adam Tauno Williams, 20.12.2010 20:49:
  ...
  You need to process the document as a stream of elements; aka SAX.
  IMHO, this is the worst advice you can give.
  Why do you say that? I would have thought that using SAX in this
  application is an excellent idea.
  From my experience, SAX is only practical for very simple cases where 
 little state is involved when extracting information from the parse events. 
 A typical example is gathering statistics based on single tags - not a very 
 common use case. Anything that involves knowing where in the XML tree you 
 are to figure out what to do with the event is already too complicated.

I've found that using a stack-model makes traversing complex documents
with SAX quite manageable.  For example, I parse BPML files with SAX.
If the document is nested and context sensitive then I really don't see
how iterparse differs all that much.

 My serious advices is: don't waste your time learning SAX. It's simply too 
 frustrating to debug SAX extraction code into existence. Given how simple 
 and fast it is to extract data with ElementTree's iterparse() in a memory 
 efficient way, there is really no reason to write complicated SAX code 
 instead.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Tim Harig
On 2010-12-27, Alan Meyer amey...@yahoo.com wrote:
 On 12/26/2010 3:15 PM, Tim Harig wrote:
 ...
 The problem is that XML has become such a defacto standard that it
 used automatically, without thought, even when there are much better
 alternatives available.

 I agree with you but, as you say, it has become a defacto standard.  As 
 a result, we often need to use it unless there is some strong reason to 
 use something else.

XML should be used where it makes sense to do so.  As always, use the
proper tool for the proper job.  XML became such a defacto standard, in
part, because it was abused for many uses in the first place so using it
because it is a defacto standard is just piling more and more mistakes
on top of each other.

 The same thing can be said about relational databases.  There are 
 applications for which a hierarchical database makes more sense, is more 
 efficient, and is easier to understand.  But anyone who recommends a 
 database that is not relational had better be prepared to defend his 
 choice with some powerful reasoning because his management, his 
 customers, and the other programmers on his team are probably going to 
 need a LOT of convincing.

I have no particular problem with using other database models in
theory.  In practice, at least until recently, there were few decent
implementations for alternative model databases.  That is starting to
change with the advent of the so-called NoSQL databases.  There are a few
models that I really do like; but, there are also a lot of failed models.
A large part of the problem was the push towards object databases which
is one of the failed models IMNSHO.  Its failure tended to give some of
the other datase models a bad name.

 And of course there are many applications where XML really is the best. 
   It excels at representing complex textual documents while still 
 allowing programmatic access to individual items of information.

Much agreed.  There are many things that XML does very well.  It works
great for XMP-RPC style interfaces.  I prefer it over binary formats
for documents.  It does suitibly for exporting discreet amounts of
information.

There are however a number of things that it does poorly.  I don't
condone its use for configuration files.  I don't condone its use as a
data store and when you have data approaching gigabytes, that is exaclty
how you are using it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Tim Delaney
On 27 December 2010 22:05, Ulrich Eckhardt dooms...@knuut.de wrote:


 What I'm now considering is to only allow a single instance of these
 objects
 for each set of values, similar to interned strings. What I would gain is
 that I could safely compare objects for identity instead of equality. What
 I'm not yet sure is how much overhead that would give me and/or how to keep
 it low. The idea is to store each instance in a set and after creating a
 new
 object I would first look up an equal object in the global set and return
 that instead, otherwise add the new one.

 The problem I foresee is that if I define equality as identity, this lookup
 when creating will never eliminate duplicates. If I only fall back to
 equality comparison for non-identical objects, I would probably sacrifice
 most of the gain. If I build a dict mapping between the values and the
 actual objects, I would have doubled the required memory and uselessly
 store
 the same values twice there.


The first thing to deal with the equality check. The way this is generally
done is to first do an identity check, then if that fails fall back to an
equality check. This gives you a fast path for the normal case, but still
gives full equality checks on a slow path.

Your assumption of double storage for a dict is somewhat flawed if I
understand you correctly. The mapping:

(value1, value2, ...) = my_object(value1, value2, ...)

*could* result in value1, value2, ... being created and stored twice (hence
the possibility of double storage) and the mapping tuple being stored + your
object. However, if the key and value are the same object, there is only a
single additional reference being stored (within the dict structure of
course).

The way you should probably deal with this is to always create one of your
objects for doing the lookup. Then your algorithm is:

new_object = my_object(value1, value2, ...)

try:
canonical = canonical_dict[new_object]
except KeyError:
canonical = canonical_dict[new_object] = new_object

You'd have to structure your __new__ appropriately to do it there, but it is
possible assuming that everything you need for equality testing is done in
__new__.

If you further want to reduce storage (if it's an issue) you could also
canonicalise the values themselves using a similar technique. You could even
use the same canonicalisation dictionary so long as you could ensure that
none of the different types compare equal (e.g. floats and integers). Note
that as an implementation detail the integers -5...256 are already interned,
but you can't rely on that (the range has changed over time).

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Roy Smith
Alan Meyer amey...@yahoo.com wrote:

 On 12/26/2010 3:15 PM, Tim Harig wrote:
 I agree with you but, as you say, it has become a defacto standard.  As 
 a result, we often need to use it unless there is some strong reason to 
 use something else.

This is certainly true.  In the rarified world of usenet, we can all 
bash XML (and I'm certainly front and center of the XML bashing crowd).  
In the real world, however, it's a necessary evil.  Knowing how to work 
with it (at least to some extent) should be in every software engineer's 
bag of tricks.

 The same thing can be said about relational databases.  There are 
 applications for which a hierarchical database makes more sense, is more 
 efficient, and is easier to understand.  But anyone who recommends a 
 database that is not relational had better be prepared to defend his 
 choice with some powerful reasoning because his management, his 
 customers, and the other programmers on his team are probably going to 
 need a LOT of convincing.

This is also true.  In the old days, they used to say, Nobody ever got 
fired for buying IBM.  Relational databases have pretty much gotten to 
that point.  Suits are comfortable with Oracle and MS SqlServer, and 
even MySQL.  If you want to go NoSQL, the onus will be on you to 
demonstrate that it's the right choice.

Sometimes, even when it is the right choice, it's the wrong choice.  You 
typically have a limited amount of influence capital to spend, and many 
battles to fight.  Sometimes it's right to go along with SQL, even if 
you know it's wrong from a technology point of view, simply because 
taking the easy way out on that battle may let you devote the energy you 
need to win more important battles.

And, anyway, when your SQL database becomes the bottleneck, you can 
always go back and say, I told you so.  Trust me, if you're ever 
involved in an I told you so moment, you really want to be on the 
transmitting end.

 And of course there are many applications where XML really is the best. 
 It excels at representing complex textual documents while still 
 allowing programmatic access to individual items of information.

Yup.  For stuff like that, there really is no better alternative.  To go 
back to my earlier example of

Parental-AdvisoryFALSE/Parental-Advisory

using 432 bits to store 1 bit of information, stuff like that doesn't 
happen in marked-up text documents.  Most of the file is CDATA (do they 
still use that term in XML, or was that an SGML-ism only?).  The markup 
is a relatively small fraction of the data.  I'm happy to pay a factor 
of 2 or 3 to get structured text that can be machine processed in useful 
ways.  I'm not willing to pay a factor of 432 to get tabular data when 
there's plenty of other much more reasonable ways to encode it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Steven D'Aprano
On Mon, 27 Dec 2010 12:05:10 +0100, Ulrich Eckhardt wrote:

 What I'm now considering is to only allow a single instance of these
 objects for each set of values, similar to interned strings. What I
 would gain is that I could safely compare objects for identity instead
 of equality. What I'm not yet sure is how much overhead that would give
 me and/or how to keep it low. The idea is to store each instance in a
 set and after creating a new object I would first look up an equal
 object in the global set and return that instead, otherwise add the new
 one.

Try this technique:

 class InternedTuple(tuple):
... _cache = {}
... def __new__(cls, *args):
... t = super().__new__(cls, *args)
... return cls._cache.setdefault(t, t)
... 
 
 
 t1 = InternedTuple((1.0, 2.0))
 t2 = InternedTuple((0.0, 0.0))
 t3 = InternedTuple((1.0, 2.0))
 
 t1 is t2
False
 t1 is t3
True
 t1 == t2
False
 t1 == t3
True



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Alan Meyer

On 12/27/2010 4:55 PM, Stefan Behnel wrote:
...

 From my experience, SAX is only practical for very simple cases where
little state is involved when extracting information from the parse
events. A typical example is gathering statistics based on single tags -
not a very common use case. Anything that involves knowing where in the
XML tree you are to figure out what to do with the event is already too
complicated. The main drawback of SAX is that the callbacks run into
separate method calls, so you have to do all the state keeping manually
through fields of the SAX handler instance.

My serious advices is: don't waste your time learning SAX. It's simply
too frustrating to debug SAX extraction code into existence. Given how
simple and fast it is to extract data with ElementTree's iterparse() in
a memory efficient way, there is really no reason to write complicated
SAX code instead.

Stefan



I confess that I hadn't been thinking about iterparse().  I presume that 
clear() is required with iterparse() if we're going to process files of 
arbitrary length.


I should think that this approach provides an intermediate solution. 
It's more work than building the full tree in memory because the 
programmer has to do some additional housekeeping to call clear() at the 
right time and place.  But it's less housekeeping than SAX.


I guess I've done enough SAX, in enough different languages, that I 
don't find it that onerous to use.  When I need an element stack to keep 
track of things I can usually re-use code I've written for other 
applications.  But for a programmer that doesn't do a lot of this stuff, 
I agree, the learning curve with lxml will be shorter and the 
programming and debugging can be faster.


Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Katie T
On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand
anand.shash...@gmail.com wrote:
 Can anyone suggest a language detection library in python which works on a
 phrase of say 2-5 words.

Generally such libraries work by bi/trigram frequency analysis, which
means you're going to have a fairly high error rate with such small
phrases. If you're only dealing with a handful of languages it may
make more sense to combine an existing library with a simple
dictionary lookup model to improve accuracy.

Katie
-- 
CoderStack
http://www.coderstack.co.uk/perl-jobs-in-london
The Software Developer Job Board
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Alan Meyer

On 12/27/2010 6:21 PM, Roy Smith wrote:


...  In the old days, they used to say, Nobody ever got
fired for buying IBM.  Relational databases have pretty much gotten to
that point


That's _exactly_ the comparison I had in mind too.

I once worked for a company that made a pitch to a big potential client 
(the BBC) and I made the mistake of telling the client that I didn't 
think a relational database was the best for his particular application. 
 We didn't win that contract and I never made that mistake again!


Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: __delitem__ feature

2010-12-27 Thread Ian Kelly

On 12/26/2010 11:49 AM, kj wrote:

Inmailman.302.1293387041.6505.python-l...@python.org  Ian 
Kellyian.g.ke...@gmail.com  writes:


On 12/26/2010 10:53 AM, kj wrote:

P.S. If you uncomment the commented-out line, and comment out the
last line of the __init__ method (which installs self._delitem as
self.__delitem__) then *all* the deletion attempts invoke the
__delitem__ method, and are therefore blocked.  FWIW.



Because subclasses of builtins only check the class __dict__ for special
method overrides, not the instance __dict__.



How do you know this?


From memory, although it seems I remembered it slightly wrong; it's the 
way new-style classes work in general, not anything to do with builtins 
in particular.



Is this documented?


Yes, as others have pointed out.


Or is this a case of Monday-night quarterbacking?


Do you mean Monday-morning quarterbacking?  Either way, I don't know 
what you mean by that in this context.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Shashwat Anand
On Tue, Dec 28, 2010 at 6:03 AM, Katie T ka...@coderstack.co.uk wrote:

 On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand
 anand.shash...@gmail.com wrote:
  Can anyone suggest a language detection library in python which works on
 a
  phrase of say 2-5 words.

 Generally such libraries work by bi/trigram frequency analysis, which
 means you're going to have a fairly high error rate with such small
 phrases. If you're only dealing with a handful of languages it may
 make more sense to combine an existing library with a simple
 dictionary lookup model to improve accuracy.

 Katie


Infact I'm dealing with very few languages - German, French, Italian,
Portugese and Russian.
I read papers mentioning bi/tri gram frequency but was unable to find any
library.
'guess-language' doesn't perform at all.  The cld (Compact Language
Detection) module of
Google chrome performs well but it is not a standalone library ( I hope
someone ports it ).

Regarding dictionary lookup+n-gram approach I didn't quite understand what
you wanted to say.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Partition Recursive

2010-12-27 Thread DevPlayer

# parse_url11.py

# devpla...@gmail.com
# 2010-12 (Dec)-27
# A brute force ugly hack from a novice programmer.

# You're welcome to use the code, clean it up, make positive
suggestions
# for improvement.


Parse a url string into a list using a generator.


#special_itemMeaning = ;?:@=#.
#//,
#/,
special_item = [;, ?, :, @, =, , #, ., /, //]

# drop urls with obviously bad formatting - NOTIMPLEMENTED
drop_item = [|, localhost, .., ///]
ignore_urls_containing = [php, cgi]

def url_parser_generator(url):
len_text = len(url)
index = 0
start1 = 0# required here if url contains ONLY specials
start2 = 0# required here if url contains ONLY non specials
while index  len_text:

# LOOP1 == Get and item in the special_item list; can be any
length
if url[index] in special_item:
start1 = index
inloop1 = True
while inloop1:
if inloop1:
if url[start1:index+1] in special_item:
#print [,start1, :, index+1, ] = ,
url[start1:index+1]
inloop1 = True
else:# not in ANYMORE, but was in special_item
#print [,start1, :, index, ] = ,
url[start1:index]
yield url[start1:index]
start1 = index
inloop1 = False

if inloop1:
if index  len_text-1:
index = index + 1
else:
#yield url[start1:index]  # NEW
inloop1 = False

elif url[index] in drop_item:
# not properly implemeted at all
raise NotImplemented(
Processing items in the drop_item list is not \
implemented., url[index])

elif url[index] in ignore_urls_containing:
# not properly implemeted at all
raise NotImplemented(
Processing items in the ignore_urls_containing list
\
is not implemented., url[index])

# LOOP2 == Get any item not in the special_item list; can be
any length
elif not url[index] in special_item:
start2 = index
inloop2 = True
while inloop2:
if inloop2:
#if not url[start2:index+1] in special_item:  #-
doesnt work
if not url[index] in special_item:
#print [,start2, :, index+1, ] = ,
url[start2:index+1]
inloop2 = True
else:# not in ANYMORE, but item was not in
special_item before
#print [,start2, :, index, ] = ,
url[start2:index]
yield url[start2:index]
start2 = index
inloop2 = False

if inloop2:
if index  len_text-1:
index = index + 1
else:
#yield url[start2:index]  # NEW
inloop2 = False

else:
print url[index], Not Implemented # should not get here
index = index + 1

if index = len_text-1:
break

# Process any remaining part of URL and yield it to caller.
# Don't know if last item in url is a special or non special.
# Used start1 and start2 instead of start and
# used inloop1 and inloop2 instead of inloop
# to help debug, as using just start and inloop can get be
# harder to track in a generator.
if start1 = start2:
start = start1
else:
start = start2
yield url[start: index+1]

def parse(url):
mylist = []
words = url_parser_generator(url)
for word in words:
mylist.append(word)
#print word
return mylist

def test():
urls = {
0: (True,http://docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition),

1: (True,/http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition),
2: (True,//http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition),
3: (True,///http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition),

4: (True,/http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition/),
5: (True,//http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition//),
6: (True,///http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition///),

7: (True,/#/http:///#docs.python..org/dev//library/
stdtypes./html??highlight=p=partition#str.partition///),

8:
(True,httpdocspythonorgdevlibrarystdtypeshtmlhighlightpartitionstrpartition),
9:
(True,httpdocs.pythonorgdevlibrarystdtypeshtmlhighlightpartitionstrpartition),
10:

Re: Digitally Signing a XML Document (using SHA1+RSA or SHA1+DSA)

2010-12-27 Thread Adam Tauno Williams
On Tue, 2010-12-28 at 03:25 +0530, Anurag Chourasia wrote:
 Hi All,

 I have a requirement to digitally sign a XML Document using SHA1+RSA
 or SHA1+DSA
 Could someone give me a lead on a library that I can use to fulfill
 this requirement?

http://stuvel.eu/rsa  Never used it though.

 The XML Document has values such as 
 RSASK-BEGIN RSA PRIVATE KEY-
 MIIBOgIBAAJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqguw76g/jmeO6f4i31rDLVQ
 n7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQMCQQCOd2lLpgRm6esMblO18WOG
 3h8oCNcaydfUa1QmaX0apHlDFnI7UDXpYaHp2VL9gvtSJT5L3ZASMzxRPXJSvzcT
 AiEA/16jQh18BAD4q3yk1gKw19I8OuJOYAxFYX9noCEFWUMCIQDWOiYfPtxK3A1s
 AFARsDnnHTL4FbRPpiZ79vP+VgqojwIhAKo/F4Fo/VgApceobeQByzqMKCdBiZVd
 g5ZU78AWA5DXAiEAjtFuv389hz1eSAA1YSAmmhN3UA54NRlu/U9NVDlccF8CIBkc
 Z52oGxy/skwVwI5TBcB1YqXJTT47/6/hTAVMTwaA -END RSA PRIVATE
 KEY-/RSASK
 RSAPUBK-BEGIN PUBLIC KEY-
 MFowDQYJKoZIhvcNAQEBBQADSQAwRgJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqgu
 w76g/jmeO6f4i31rDLVQn7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQM= -END
 PUBLIC KEY-/RSAPUBK 

Is this any kind of standard or just something someone made up?  Is
there a namespace for the document?

It seems quite odd that the document contains a *private* key.

If all you need to do is parse to document to retrieve the values that
seems straight-forward enough.

 And the XML also has another node that has a Public Key with Modules
 and Exponents etc that I apparently need to utilize.
 RSAPK
   M1bMd8XkGml7gkqV9kOoVSk0uvA1CqC7DvqD
 +OZ47p/iLfWsMtVCfuxiKW7rkLy836qcQac8Hzbi38DfJ8y7UbQ==/M 
   EAw==/E 
 /RSAPK

 I am a little thin on this concept and expecting if you could guide me
 to a library/documentation that I could utilize.



-- 
http://mail.python.org/mailman/listinfo/python-list


How to programmatically exit from wsgi's serve_forever() loop

2010-12-27 Thread python
Is it possible to programmatically exit from the wsgiref's
serve_forever() loop?

I tried the following, all without success:

httpd.server_close()
httpd.shutdown()
sys.exit(1)
os._exit(1)  (shouldn't this always abort an application?)
raise KeyboardInterupt  (Ctrl+Break from console works)

Thanks,
Malcolm
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Alan Meyer
By the way Stefan, please don't take any of my comments as complaints. 
I use lxml more and more in my work.  It's fast, functional and pretty 
elegant.


I've written a lot of code on a lot of projects in my 35 year career but 
I don't think I've written anything anywhere near as useful to anywhere 
near as many people as lxml.


Thank you very much for writing lxml and contributing it to the community.

Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Santhosh Kumar
Hi I already Developed a language detection with Python Here is the Link.



With Regards,
Santhosh V.Kumar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Language Detection Library/Code

2010-12-27 Thread Santhosh Kumar
 Hi I already Developed a language detection with Python Here is the Link.
 http://code.google.com/p/langdet/


 
 With Regards,
 Santhosh V.Kumar


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Stefan Behnel

Roy Smith, 28.12.2010 00:21:

To go back to my earlier example of

 Parental-AdvisoryFALSE/Parental-Advisory

using 432 bits to store 1 bit of information, stuff like that doesn't
happen in marked-up text documents.  Most of the file is CDATA (do they
still use that term in XML, or was that an SGML-ism only?).  The markup
is a relatively small fraction of the data.  I'm happy to pay a factor
of 2 or 3 to get structured text that can be machine processed in useful
ways.  I'm not willing to pay a factor of 432 to get tabular data when
there's plenty of other much more reasonable ways to encode it.


If the above only appears once in a large document, I don't care how much 
space it takes. If it appears all over the place, it will compress down to 
a couple of bits, so I don't care about the space, either.


It's readability that counts here. Try to reverse engineer a binary format 
that stores the above information in 1 bit.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Stefan Behnel

Alan Meyer, 28.12.2010 03:18:

By the way Stefan, please don't take any of my comments as complaints.


I don't. After all, this discussion is more about the general data format 
than the specific tools.




I use lxml more and more in my work. It's fast, functional and pretty elegant.

I've written a lot of code on a lot of projects in my 35 year career but I
don't think I've written anything anywhere near as useful to anywhere near
as many people as lxml.

Thank you very much for writing lxml and contributing it to the community.


Thanks, I'm happy to read that. You're welcome.

Note that lxml also owes a lot to Fredrik Lundh for designing ElementTree 
and to Martijn Faassen for starting to reimplement it on top of libxml2 
(and choosing the name :).


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-27 Thread Stefan Behnel

Alan Meyer, 28.12.2010 01:29:

On 12/27/2010 4:55 PM, Stefan Behnel wrote:

From my experience, SAX is only practical for very simple cases where
little state is involved when extracting information from the parse
events. A typical example is gathering statistics based on single tags -
not a very common use case. Anything that involves knowing where in the
XML tree you are to figure out what to do with the event is already too
complicated. The main drawback of SAX is that the callbacks run into
separate method calls, so you have to do all the state keeping manually
through fields of the SAX handler instance.

My serious advices is: don't waste your time learning SAX. It's simply
too frustrating to debug SAX extraction code into existence. Given how
simple and fast it is to extract data with ElementTree's iterparse() in
a memory efficient way, there is really no reason to write complicated
SAX code instead.


I confess that I hadn't been thinking about iterparse(). I presume that
clear() is required with iterparse() if we're going to process files of
arbitrary length.

I should think that this approach provides an intermediate solution. It's
more work than building the full tree in memory because the programmer has
to do some additional housekeeping to call clear() at the right time and
place. But it's less housekeeping than SAX.


The iterparse() implementation in lxml.etree allows you to intercept on a 
specific tag name, which is especially useful for large XML documents that 
are basically an endless sequence of (however deeply structured) top-level 
elements - arguably the most common format for gigabyte sized XML files. So 
what I usually do here is to intercept on the top level tag name, clear() 
that tag after use and leave it dangling around, like this:


for _, element in ET.iterparse(source, tag='toptagname'):
# ... work on the element and its subtree
element.clear()

That allows you to write simple in-memory tree handling code (iteration, 
XPath, XSLT, whatever), while pushing the performance up (compared to ET's 
iterparse that returns all elements) and keeping the total amount of memory 
usage reasonably low. Even a series of several hundred thousand empty top 
level tags don't add up to anything that would truly hurt a decent machine.


In many cases where I know that the XML file easily fits into memory 
anyway, I don't even do any housekeeping at all. And the true advantage is: 
if you ever find that it's needed because the file sizes grow beyond your 
initial expectations, you don't have to touch your tested and readily 
debugged data extraction code, just add a suitable bit of cleanup code, or 
even switch from the initial all-in-memory parse() solution to an 
event-driven iterparse()+cleanup solution.




I guess I've done enough SAX, in enough different languages, that I don't
find it that onerous to use. When I need an element stack to keep track of
things I can usually re-use code I've written for other applications. But
for a programmer that doesn't do a lot of this stuff, I agree, the learning
curve with lxml will be shorter and the programming and debugging can be
faster.


I'm aware that SAX has the advantage of being available for more languages. 
But if you are in the lucky position to use Python for XML processing, why 
not just use the tools that it makes available?


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


ANN : PySWITCH Release – 0.1alpha

2010-12-27 Thread Godson Gera
Hi All,

I am glad to announce the first alpha release of PySWITCH.

http://pyswitch.sf.net


The idea of PySWITCH is to offer a complete library to Python and Twisted
programmers for interacting with FreeSWITCH using EventSocket interface. The
target is to cover all FreeSWITCH API commands and Dialplan tools. PySWITCH
handles all the low level details in executing FreeSWITCH commands, so the
programmer can easily concentrate on quickly building FreeSWITCH
applications. As an example, the API functions offered by PySWITCH often
executes many FreeSWITCH commands under the hood and finally returns the
desired result. Suppose you execute a background job, PySWITCH API will
automatically wait and catch the backgroundjob event parse the result and
will fire the deferred.

The current release covers good amount of API commands and a few Dialplan
tools. The protocol communication issues are ironed out. It has a nice event
call back interface. I’ll present its usage in couple of tutorials soon.


-- 
Thanks  Regards,
Godson Gera
http://godson.in
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to programmatically exit from wsgi's serve_forever() loop

2010-12-27 Thread Ian Kelly

On 12/27/2010 6:05 PM, pyt...@bdurham.com wrote:

Is it possible to programmatically exit from the wsgiref's
serve_forever() loop?
I tried the following, all without success:
httpd.server_close()
httpd.shutdown()
sys.exit(1)
os._exit(1) (shouldn't this always abort an application?)
raise KeyboardInterupt (Ctrl+Break from console works)



 help(wsgiref.simple_server.WSGIServer.serve_forever)
Help on method serve_forever in module SocketServer:

serve_forever(self, poll_interval=0.5) unbound 
wsgiref.simple_server.WSGIServer method

Handle one request at a time until shutdown.

Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.

 help(wsgiref.simple_server.WSGIServer.shutdown)
Help on method shutdown in module SocketServer:

shutdown(self) unbound wsgiref.simple_server.WSGIServer method
Stops the serve_forever loop.

Blocks until the loop has finished. This must be called while
serve_forever() is running in another thread, or it will
deadlock.


Did you try:

 import threading
 threading.Thread(target=httpd.shutdown).start()

Cheers,
Ian

--
http://mail.python.org/mailman/listinfo/python-list


[issue10771] descriptor protocol documentation has two different definitions of owner class

2010-12-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

I agree that the owner terminology imprecise.
Will work on a doc fix when I get chance.

--
assignee: d...@python - rhettinger
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10771
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10774] test_logging leaves temp files

2010-12-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

Fix checked into py3k (r87512).

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10774
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10626] Bad interaction between test_logging and test_concurrent_futures

2010-12-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

The reason for the bad interaction is that some of the tests in test_logging 
disable all existing loggers (due to the configuration tests - disabling of 
existing loggers is explicitly tested for), but as a side effect this also 
disabled the concurrent.futures logger.

I've made a change to test_logging which preserves the disabled state of all 
existing loggers across tests, and now all is well when testing

regrtest.py test_concurrent_futures test_logging test_concurrent_futures

after applying Brian's patch of 24 Dec 2010.

The change has been checked into py3k (r87513). However, this raises the wider 
issue of other loggers in stdlib and the effect on them of logging 
configuration calls. I'll raise this on python-dev for discussion.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10626
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly

2010-12-27 Thread Goffi

New submission from Goffi go...@goffi.org:

G'day,

While translating my software to french, I realised that minidom's writexml 
method doesn't handle encoding parameter correctly: it changes the header of 
the resulting xml, but not the encoding itself (which it should according to 
the documentation: http://docs.python.org/library/xml.dom.minidom.html).

The given example doesn't work with writexml; but if I save by myself using the 
toxml's encoding parameter (like in the commented line), it works as expected.

Anyway, it would be better if minidom could handle unicode string directly.

--
components: XML
files: test_minidom.py
messages: 124709
nosy: Goffi
priority: normal
severity: normal
status: open
title: minidom Node.writexml method doesn't manage encoding parameter correctly
versions: Python 2.6
Added file: http://bugs.python.org/file20174/test_minidom.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10781
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly

2010-12-27 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

The documentation is incorrect; writexml does not support an encoding 
parameter. Only Document nodes support the encoding parameter in writexml, and 
it is intentional that its only effect is to fill out the XML declaration.

I don't understand the last sentence in your report: what is it that you want 
to see supported, and how is that related to this issue?

--
assignee:  - d...@python
components: +Documentation
nosy: +d...@python, loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10781
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly

2010-12-27 Thread Goffi

Goffi go...@goffi.org added the comment:

Thanks for your quick reply

The last sentence has nothing to do with the report, it was just a general 
remark that it would be nice if minidom could support unicode string directly.

Should I send a mail to d...@python.org to report the doc issue, or this one is 
sufficient ?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10781
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8889] test_support.transient_internet fails on Freebsd because socket has no attribute EAI_NODATA

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I never forward ported this, but it was fixed in a different way in python3 
during a complete rewrite of transient_internet for other reasons.

--
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8889
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8898] The email package should defer to the codecs module for all aliases

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Too late for 3.2, will implement for 3.3.

--
title: The email package should defer to the codecs module for  all aliases - 
The email package should defer to the codecs module for all aliases
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8898
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1685453] email package should work better with unicode

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Now that we are primarily focused on Python3 development, collecting unicode 
issues is not really all that useful (at least not to me, and I'm currently 
doing the email maintenance), so I'm closing this.  All the relevant issues are 
assigned to me anyway, so I'll be dealing with them by and by.

--
dependencies:  -Add decode_header_as_string method to email.utils, Add utf8 
alias for email charsets, Unicode email address helper, email package and 
Unicode strings handling, email.Header (via add_header) encodes non-ASCII 
content incorrectly, email.Header encode() unicode P2.6, email.header unicode 
fix, email.parser: impossible to read messages encoded in a different encoding, 
email/base64mime.py cannot work, email/charset.py convert() patch, smtplib is 
broken in Python3, unicode in email.MIMEText and email/Charset.py
resolution:  - out of date
stage: unit test needed - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1685453
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly

2010-12-27 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 The last sentence has nothing to do with the report, it was just a general 
 remark that 
 it would be nice if minidom could support unicode string directly.

minidom most certainly supports Unicode directly. All element names,
attribute names, and text nodes carry Unicode objects.

 Should I send a mail to d...@python.org to report the doc issue, 
 or this one is sufficient ?

This one is sufficient.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10781
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1243730] Big speedup in email message parsing

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Since this is a performance hack and is considerably invasive of the feedparser 
code (and needs updating), I'm deferring it to 3.3.

--
stage: unit test needed - patch review
versions: +Python 3.3 -Python 2.7, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1243730
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3244] multipart/form-data encoding

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3244
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10764] sysconfig and alternative implementations

2010-12-27 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10764
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1162477] Parsing failures in parsedate_tz

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Somehow I missed this in my pre-beta feature request review :(

--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1162477
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9864] email.utils.{parsedate, parsedate_tz} should have better return types

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9864
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1043706] External storage protocol for large email messages

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1043706
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10764] sysconfig and alternative implementations

2010-12-27 Thread Tarek Ziadé

Tarek Ziadé ziade.ta...@gmail.com added the comment:

Yes that's what we said we would do, and was the second step after the 
extraction of sysconfig from distutils.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10764
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6533] Make test_xmlrpc_net functional in the absence of time.xmlrpc.com

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

The skip was added and the service is back and has been for a while, so I'm 
closing this, but see also issue 6027.

--
resolution:  - out of date
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6533
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10753] request_uri method of wsgiref module does not support RFC1808 params.

2010-12-27 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

I agree that semi-colon separated segments (params) can be in PATH portion of 
the url. I was trying to find out, how a path;params would be useful in wsgiref 
request_uri's PATH_INFO variable , wherein I assumed PATH_INFO should be a 
file-system path or a method name.

After doing a bit of study, I find that ';' can be part of PATH_INFO in wsgiref 
compliant servers. I find a couple of bugs related to issues where ';' in 
PATH_INFO is not handled properly in other systems - http://bit.ly/g4UHhX

So, I think, we can have ';' as safe character so that it is prevented from 
quoting.

Also, RFC 3986 in Section 3.3 says that ';' '=' and ',' can be considered safe 
in the PATH component. Should we include those too?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10753
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10782] Not possible to cross-compile due to poor detection of %lld support in printf

2010-12-27 Thread Ben Gamari

New submission from Ben Gamari bgam...@gmail.com:

Configure.in assumes that %lld is not supported by printf if cross-compiling. 
This causes build errors in pyport.h,

In file included from Include/Python.h:58:0,
 from Parser/parser.c:8:
Include/pyport.h:243:13: error: #error This platform's pyconfig.h needs to 
define PY_FORMAT_LONG_LONG
...

What is one supposed to do about this short of changing the configure script to 
assume support by default.

--
components: Build
messages: 124722
nosy: bgamari
priority: normal
severity: normal
status: open
title: Not possible to cross-compile due to poor detection of %lld support in 
printf
type: compile error
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10782
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue740495] API enhancement: poplib.MailReader()

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue740495
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue634412] RFC 2112 in email package

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue634412
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue795081] email.Message param parsing problem II

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.3 -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue795081
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1025395] email.Utils.parseaddr fails to parse valid addresses

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1025395
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8769] Straightforward usage of email package fails to round-trip

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8769
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9967] encoded_word regular expression in email.header.decode_header()

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9967
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9298] binary email attachment issue with base64 encoding

2010-12-27 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9298
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10753] request_uri method of wsgiref module does not support RFC1808 params.

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

If the RFC says they are safe it seems like we should include them in the safe 
list.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10753
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1379416] email.Header encode() unicode P2.6

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Committed to 2.7 in r87515.  On second thought there's no reason to forward 
port the test because Python3 doesn't have the equivalent type-promotion issues.

--
nosy:  -BreamoreBoy
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions:  -Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1379416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8719] buildbot: segfault on FreeBSD (signal 11)

2010-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

https://github.com/haypo/faulthandler/wiki can be tried on this buildbot to get 
more information about this issue. But the module have to be installed on this 
host.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8719
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy: +stoneleaf

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9893] Usefulness of the Misc/Vim/ files?

2010-12-27 Thread Brett Cannon

Brett Cannon br...@python.org added the comment:

But if you have a local copy of the Vim files from the community what is 
preventing you from editing them for new keywords and sending a patch to the 
maintainer so that the rest of the community is brought up to speed that much 
faster?

I suspect that not many people beyond core devs use the Misc/Vim file while 
more people in the community use the vim.org files.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9893
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

New submission from David Beazley d...@dabeaz.com:

Is the struct.pack() function supposed to automatically encode Unicode strings 
into binary?  For example:

 struct.pack(10s,Jalape\u00f1o)
b'Jalape\xc3\xb1o\x00'


This is Python 3.2b1.

--
components: Library (Lib)
messages: 124727
nosy: dabeaz
priority: normal
severity: normal
status: open
title: struct.pack() and Unicode strings
type: behavior
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10779] Change filename encoding to FS encoding in PyErr_WarnExplicit()

2010-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Fixed by r87517.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10779
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7056] regrtest runtest_inner calls findtestdir unnecessarily

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Committed (redid, actually) 2nd patch in r87516.  I may or may not backport it.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions:  -Python 2.6, Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7056
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

David Beazley d...@dabeaz.com added the comment:

Note: This is what happens in Python 2.6.4:

 import struct
 struct.pack(10s,uJalape\u00f1o)
Traceback (most recent call last):
  File stdin, line 1, in module
struct.error: argument for 's' must be a string


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10778] decoding_fgets() (tokenizer.c) decodes the filename from the wrong encoding

2010-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Fixed by r87518.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10778
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

David Beazley d...@dabeaz.com added the comment:

Hmmm. Well, the docs seem to say that it's allowed and that it will be encoded 
as UTF-8.  

Given the treatment of Unicode/bytes elsewhere in Python 3, all I can say is 
that this behavior is rather surprising.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

But clearly intentional, and now enshrined in released code.

--
nosy: +mark.dickinson, r.david.murray
resolution:  - invalid
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4212] email.LazyImporter does not use absolute imports

2010-12-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

LazyImporter isn't used in Python3.  Without someone motivated to propose a 
patch this isn't going to be changed, so I'm closing the issue.

--
resolution:  - wont fix
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4212
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Ethan Furman

Ethan Furman et...@stoneleaf.us added the comment:

I like MRAB's suggestion best:

MRAB wrote:
 Suggestion: an explicit 'raise' in the exception handler excludes the
 context, but if you want to include it then 'raise with'. For example:

 # Exclude the context
 try:
 command_dict[command]()
 except KeyError:
 raise CommandError(Unknown command)

 # Include the context
 try:
 command_dict[command]()
 except KeyError:
 raise with CommandError(Unknown command)

I think we can even strike off the verbiage in the exception handler... that 
way, raise always does the same thing -- raise KeyError will raise a KeyError, 
always, not sometimes a KeyError and sometimes a KeyError nested in a 
WhatEverError.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10782] Not possible to cross-compile due to poor detection of %lld support in printf

2010-12-27 Thread Roumen Petrov

Roumen Petrov bugtr...@roumenpetrov.info added the comment:

Use config.cache to set ac_cv_have_long_long_format

--
nosy: +rpetrov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10782
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8719] buildbot: segfault on FreeBSD (signal 11)

2010-12-27 Thread David Bolen

David Bolen db3l@gmail.com added the comment:

Wouldn't that module have to be put into the actual source tree, since the 
tests run beneath the interpreter/libraries that are built for the test?

That may be what you meant, but installed on this host made me think I could 
do something external on the buildbot which I don't think would work given that 
the module has to be called from within the tests themselves?

-- David

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8719
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

I agree with the OP that we need a way to either suppress chaining or have it 
turned-off by default.  A person writing an exception handler should have 
control over what the user sees.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Can we at least offer an optional choice of encoding?

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Matthew Barnett

Changes by Matthew Barnett pyt...@mrabarnett.plus.com:


--
nosy: +mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8719] buildbot: segfault on FreeBSD (signal 11)

2010-12-27 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8719
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

David Beazley d...@dabeaz.com added the comment:

Why is it even encoding at all?  Almost every other part of Python 3 forces you 
to be explicit about bytes/string conversion.  For example:

struct.pack(10s, x.encode('utf-8'))

Given that automatic conversion is documented, it's not clear what can be done 
at this point.  However, there are very few other parts of Python 3 that 
perform implicit string-byte conversions like this (at least that I know of 
off-hand).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Many of these kind of decisions were made quickly, haphazardly, and with 
almost no discussion and were made by contributors who were new to Python core 
development (no familiar with the API norms).

Given the rats nest of bytes/text problems in Py3.0 and Py3.1, I think it is 
fair game to fix it now.  The APIs have not been shaken-out and battle-tested 
through wide-spread adoption, so it was fair to expect that the first 
experienced user to come along would find these rough patches.  

ISTM, this should get fixed.  The most innocuous way to do it is to add a 
warning for the implicit conversion.  That way, any existing 3.x code (probably 
precious little) would continue to run.  Another option is to just finish the 
job by adding an encoding parameter that defaults to utf-8.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

A possible answer to why is this encoding at all was probably to make it 
easier to transition code from python 2.x where strings were usually ascii and 
it would make no difference in output if encoded in utf-8.  The 2-to-3 fixer 
was good at handling name changes but not bytes/text issues.  That is just a 
guess at what the developer may have been thinking.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

David Beazley d...@dabeaz.com added the comment:

I encountered this issue is in the context of distributed 
computing/interprocess communication involving binary-encoded records (and 
encoding/decoding such records using struct). At its core, this is all about 
I/O--something where encodings and decoding matter a lot.  Frankly, it was 
quite surprising that a unicode string would silently pass through struct and 
turn into bytes.  IMHO, the fact that this is even possible encourages a sloppy 
usage of struct that favors programming convenience over correctness--something 
that's only going to end badly for the poor soul who passes non-ASCII 
characters into struct without knowing it. 

A default encoding might be okay as long as it was set to something like ASCII 
or Latin-1 (not UTF-8).  At least then you'd get an encoding error for 
characters that don't fit into a byte.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10780] Fix filename encoding in PyErr_SetFromWindowsErrWithFilename() (and PyErr_SetExcFromWindowsErrWithFilename())

2010-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Fixed by r87519.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10780
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley

David Beazley d...@dabeaz.com added the comment:

Actually, here's another one of my favorite examples:

 import struct
 struct.pack(s,\xf1)
b'\xc3'
 

Not only does this not encode the correct value, it doesn't even encode the 
entire UTF-8 encoding (just the first byte of it).   Like I said, pity the poor 
bastard who puts something that in their code and they spend the whole day 
trying figure out where in the hell '\xf1' magically got turned into '\xc3'.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2010-12-27 Thread Jacques Grove

Jacques Grove jacq...@tripitinc.com added the comment:

Testing issue2636-20101224.zip:

Nested modifiers seems to hang the regex compilation when used in a 
non-capturing group e.g.:

re.compile((?:(?i)foo))

or

re.compile((?:(?u)foo))


No problem on stock Python 2.6.5 regex engine.

The unnested version of the same regex compiles fine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2636
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6210] Exception Chaining missing method for suppressing context

2010-12-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 A person writing an exception handler should have control over what
 the user sees.

There is already support for this in the traceback module (see the
chain parameter to various funcs).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

This feature was introduced in a big commit from Guido van Rossum (made 
before Python 3.0): r55500. The changelog is strange because it starts with 
Make test_zipfile pass. The zipfile module now does all I/O in binary mode 
using bytes. but ends with The _struct needed a patch to support bytes, str8 
and str for the 's' and 'p' formats.. Why was _struct patched at the same time?

Implicit conversion bytes and str is a very bad idea, it is the root of all 
confusion related to Unicode. The experience with Python 2 demonstrated that it 
should be changed, and it was changed in Python 3.0. But Python 3.0 is a big 
project, it has many modules. Some modules were completly broken in Python 3.0, 
it works better with 3.1, and we hope that it will be even better with 3.2.

Attached patch removes the implicit conversion for 'c', 's' and 'p' formats. I 
did a similar change in ctypes, 5 months ago: issue #8966.

If a program written for Python 3.1 fails because of the patch, it can use 
explicit conversion to stay compatible with 3.1 and 3.2 (patched). I think that 
it's better to use explicit conversion.

Implicit conversion on 'c' format is really weird and it was not documented 
correctly: the note (1) is attached to b format, not to the c format. 
Example:

struct.pack('c', 'é')
   struct.error: char format requires bytes or string of length 1
len('é')
   1

There is also a length issue with the s format: struct.pack() truncates unicode 
string to a length in bytes, not in character, it is confusiong.

   struct.pack('2s', 'ha')
   b'ha'
struct.pack('2s', 'hé')
   b'h\xc3'
struct.pack('3s', 'hé')
   b'h\xc3\xa9'

Finally, I don't like implicit conversion from unicode to bytes on pack, 
because it's not symmetrical.

struct.pack('3s', 'hé')
   b'h\xc3\xa9'
struct.unpack('3s', b'h\xc3\xa9')
   (b'h\xc3\xa9',)

(str - pack() - unpack() - bytes)

--
keywords: +patch
nosy: +haypo
Added file: http://bugs.python.org/file20175/struct.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >