[issue39468] Improved the site module's permission handling while writing .python_history

2021-05-03 Thread Aurora


Change by Aurora :


--
pull_requests: +24543
status: pending -> open
pull_request: https://github.com/python/cpython/pull/18210

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39480] referendum reference is needlessly annoying

2020-02-03 Thread Aurora


Change by Aurora :


--
type:  -> enhancement

___
Python tracker 
<https://bugs.python.org/issue39480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] Improved the site module's permission handling while writing .python_history

2020-01-31 Thread Aurora


Change by Aurora :


--
title: .python_history write permission improvements -> Improved the site 
module's permission handling while writing .python_history

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] Improved the site module's permission handling while writing .python_history

2020-01-31 Thread Aurora


Change by Aurora :


--
status: open -> pending

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests:  -17675

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests: +17677
pull_request: https://github.com/python/cpython/pull/18299

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests:  -17589

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests:  -17674

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests: +17675
pull_request: https://github.com/python/cpython/pull/39468

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-31 Thread Aurora


Change by Aurora :


--
pull_requests: +17674
pull_request: https://github.com/python/cpython/pull/18299

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39455] Update the documentation for the linecache module

2020-01-31 Thread Aurora


Change by Aurora :


--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39455>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39314] (readline) Autofill the closing parenthesis during auto-completion for functions which accept no arguments at all

2020-01-31 Thread Aurora


Change by Aurora :


--
title: Autofill the closing paraenthesis during auto-completion for functions 
which accept no arguments at all -> (readline) Autofill the closing parenthesis 
during auto-completion for functions which accept no arguments at all

___
Python tracker 
<https://bugs.python.org/issue39314>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39455] Update the documentation for the linecache module

2020-01-31 Thread Aurora


Change by Aurora :


--
title: Update the documentation for linecache module -> Update the 
documentation for the linecache module

___
Python tracker 
<https://bugs.python.org/issue39455>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39480] referendum reference is needlessly annoying

2020-01-28 Thread Aurora


Aurora  added the comment:

This example is practically against Python's diversity statement.

--
nosy: +opensource-assist

___
Python tracker 
<https://bugs.python.org/issue39480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-27 Thread Aurora


Change by Aurora :


--
pull_requests: +17589
pull_request: https://github.com/python/cpython/pull/18210

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-27 Thread Aurora


Change by Aurora :


--
pull_requests:  -17586

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-27 Thread Aurora


Change by Aurora :


--
keywords: +patch
pull_requests: +17586
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18210

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-27 Thread Aurora


Aurora  added the comment:

https://github.com/opensource-assist/cpython/blob/opensource-assist-patch-sitepy-1/Lib/site.py

--

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39468] .python_history write permission improvements

2020-01-27 Thread Aurora


New submission from Aurora :

On a typical Linux system, if you run 'chattr +i /home/user/.python_history', 
and then run python, then exit, the following error message will be printed out:
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site.py", line 446, in write_history
readline.write_history_file(history)
OSError: [Errno -1] Unknown error -1


With a simple improvement, the site module can check and suggest the user to 
run 'chattr -i' on the .python_history file.

Additionaly, I don't know if it's a good idea to automatically run 'chattr -i' 
in such a situation or not.

--
components: Library (Lib)
messages: 360790
nosy: opensource-assist
priority: normal
severity: normal
status: open
title: .python_history write permission improvements
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39468>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39455] Update the documentation for linecache module

2020-01-25 Thread Aurora


New submission from Aurora :

Added the definitions for two undocumented functions.

--
assignee: docs@python
components: Documentation
messages: 360709
nosy: docs@python, opensource-assist
priority: normal
pull_requests: 17572
severity: normal
status: open
title: Update the documentation for linecache module
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39455>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39449] New Assignment operator

2020-01-25 Thread Aurora


Aurora  added the comment:

That's a nice simple idea.

--
nosy: +opensource-assist

___
Python tracker 
<https://bugs.python.org/issue39449>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39314] Autofill the closing paraenthesis during auto-completion for functions which accept no arguments at all

2020-01-13 Thread Aurora


Change by Aurora :


--
versions:  -Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39314>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39319] ntpath module must not be available on POSIX platforms

2020-01-13 Thread Aurora


Aurora  added the comment:

@eryksun So modify the documentation to note that they're operable on both 
platforms.
I've seen that ntpath worked on my Linux system, but the documentation was 
misleading.

--

___
Python tracker 
<https://bugs.python.org/issue39319>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39319] ntpath module must not be available on POSIX platforms

2020-01-13 Thread Aurora


New submission from Aurora :

According to https://docs.python.org/dev/library/undoc.html the 'ntpath' module 
is an "Implementation of os.path on Win32 and Win64 platforms".
Just like all other Windows-specific modules(like winreg),'ntpath' must not be 
available for use on a POSIX system like Linux.
I guess that 'posixpath' is also available on Windows, that if it is, it must 
not be available too.

--
components: Interpreter Core
messages: 359897
nosy: opensource-assist
priority: normal
severity: normal
status: open
title: ntpath module must not be available on POSIX platforms
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39319>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39319] ntpath module must not be available on POSIX platforms

2020-01-13 Thread Aurora


Change by Aurora :


--
components: +Library (Lib) -Interpreter Core
type: behavior -> 

___
Python tracker 
<https://bugs.python.org/issue39319>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39314] Autofill the closing paraenthesis during auto-completion for functions which accept no arguments at all

2020-01-12 Thread Aurora


Change by Aurora :


--
title: Autofill the closing paraenthesis during auto-completion for functions 
which accept no arguments -> Autofill the closing paraenthesis during 
auto-completion for functions which accept no arguments at all

___
Python tracker 
<https://bugs.python.org/issue39314>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39304] Don't accept a negative number for the count argument in str.replace(old, new[, count])

2020-01-12 Thread Aurora


Aurora  added the comment:

@xtreak
Understood, just as an aftermath:
I still disagree a little with such an implementation because it's riding way 
into terse-coding that it's going against the principles of mathematics, which 
is the basis of computer science and programming.
Python can use another special keyword or something(e.g. the Ellipsis notation) 
for this and all similar cases.
You'll get into trouble if you wanna explain such a thing to a mathematician or 
if you wanna write some pseudo-code based on it, which in both cases they're 
not gonna look at the underlying implementation.
A bad practice in C, followed by CPython spreaded to others.

--

___
Python tracker 
<https://bugs.python.org/issue39304>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39314] Autofill the closing paraenthesis during auto-completion for functions which accept no arguments

2020-01-12 Thread Aurora


New submission from Aurora :

If Python is compiled with the GNU readline headers, it will provide 
autocompletion for Python functions and etc.

In the Python interpreter environment, if a function is typed partially, Python 
will fill in the rest if a tab character is typed.

If a function accepts no arguments, Python still doesn't fill in the last 
closing paraenthesis during autocompletion, in the hope that the user will 
provide arguments, but in such a case it's pointless.

--
components: Interpreter Core
messages: 359855
nosy: opensource-assist
priority: normal
severity: normal
status: open
title: Autofill the closing paraenthesis during auto-completion for functions 
which accept no arguments
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39314>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39304] Don't accept a negative number for the count argument in str.replace(old, new[, count])

2020-01-11 Thread Aurora


New submission from Aurora :

It's meaningless for the count argument to have a negative value, since there's 
no such thing as negative count for something.

--
components: Library (Lib)
messages: 359795
nosy: opensource-assist
priority: normal
severity: normal
status: open
title: Don't accept a negative number for the count argument in 
str.replace(old, new[,count])
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39304>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38441] failing to build the Documentation

2019-10-11 Thread Aurora


New submission from Aurora :

I'm failing to build the cpython/Doc dir.

The full build log is as follows:

mkdir -p build
Building NEWS from Misc/NEWS.d with blurb
PATH=./venv/bin:$PATH sphinx-build -b epub -d build/doctrees -D 
latex_elements.papersize=  -W . build/epub 
Running Sphinx v2.2.0
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [epub]: targets for 476 source files that are out of date
updating environment: [new config] 476 added, 0 changed, 0 removed
reading sources... [100%] whatsnew/index

 

Warning, treated as error:
/home/aurora/A.Code/Python/Reference/python/cpython/Doc/library/email.message.rst:4:duplicate
 object description of email.message, other instance in 
library/email.compat32-message, use :noindex: for one of them
make: *** [Makefile:46: build] Error 2


Running on Debian Experimental kernel v5.3

--
assignee: docs@python
components: Documentation
messages: 354425
nosy: aurora, docs@python
priority: normal
severity: normal
status: open
title: failing to build the Documentation
type: compile error
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38441>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Unicode question : turn José into uJosé

2006-04-05 Thread aurora
First of all, if you run this on the console, find out your console's  
encoding. In my case it is English Windows XP. It uses 'cp437'.

C:\chcp
Active code page: 437

Then

 s = José
 u = uJos\u00e9 # same thing in unicode escape
 s.decode('cp437') == u   # use encoding that match your console
True


wy




 This is probably stupid and/or misguided but supposing I'm passed a  
 byte-string value that I want to be unicode, this is what I do. I'm sure  
 I'm missing something very important.

 Short version :

 s = José #Start with non-unicode string
 unicoded = eval(u'%s' % José)

 Long version :

 s = José #Start with non-unicode string
 s  #Lets look at it
 'Jos\xe9'
 escaped = s.encode('string_escape')
 escaped
 'Jos\\xe9'
 unicoded = eval(u'%s' % escaped)
 unicoded
 u'Jos\xe9'

 test = uJosé   #What they should have passed me
 test == unicoded #Am I really getting the same thing?
 True #Yay!





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Design mini-lanugage for data input

2006-03-21 Thread aurora
Yes. But they have different motivations.

The mini-language concept is to design an input format that is convenient  
for human editor and that is close to the semi-structured data source. I  
think the benefit from ease of editing and flexibility would justify  
writing a little parsing code.

JSON is mainly designed for data exchange between programs. You can hand  
edit JSON data (as well as XML or Python statement) but it is not the most  
convenient.

Just consider you don't have to enter two quotes for every string object  
is almost liberating. These quotes are only artifacts for structured data  
format. The idea to design a format convenient for human and let code to  
parse and built the data structure.

wy



 Hmm,
 Do you know about JSON and YAML?
   http://en.wikipedia.org/wiki/JSON
   http://en.wikipedia.org/wiki/YAML

 They have the advantage of being maintained by a group of people and
 being available for a number of languages. (as well as NOT being XML
 :-)

 - Cheers, Paddy.
 --
 http://paddy3118.blogspot.com/


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Design mini-lanugage for data input

2006-03-21 Thread aurora
P.S. Also it is a 'mini-language' because it is an ad-hoc design that is  
good enough and can be easily implemented for a given problem. This is  
oppose to a general purpose solution like XML that is one translation from  
the original data format and carries too much baggages.

 Just consider you don't have to enter two quotes for every string object  
 is almost liberating. These quotes are only artifacts for structured  
 data format. The idea to design a format convenient for human and let  
 code to parse and built the data structure.

 wy
-- 
http://mail.python.org/mailman/listinfo/python-list


Design mini-lanugage for data input

2006-03-20 Thread aurora
This is an entry I just added to ASPN. It is a somewhat novel technique I  
have employed quite successfully in my code. I repost it here for more  
explosure and discussions.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158

wy



Title: Design mini-lanugage for data input


Description:

Many programs need a set of initial data. For ease of use and flexibility,  
design a mini-language for your input data. Use Python's superb text  
handling capability to parse and build the data structure from the input  
text.

Source: Text Source
# this is an example to demonstrate the programming technique

DATA = 
# data souce: http://www.mongabay.com/igapo/world_statistics_by_pop.htm
# Country / Captial / Area [sq. km] / 2002 Population Estimate
China / Beijing / 9,596,960 / 1,284,303,705
India / New Delhi / 3,287,590 / 1,045,845,226
United States / Washington DC / 9,629,091 / 280,562,489
Indonesia / Jakarta / 1,919,440 / 231,328,092
Russia / Moscow / 17,075,200 / 144,978,573


def initData():
  parse and return a country list of (name, captial, area,  
population) 

 countries = []
 for line in DATA.splitlines():

 # filter out blank lines/comment lines
 line = line.strip()
 if not line or line.startswith('#'):
 continue

 # 4 fields separated by '/'
 parts = map(string.strip, line.split('/'))
 country, captial, area, population = parts

 # remove commas in numbers
 area = int(area.replace(',',''))
 population = int(population.replace(',',''))

 countries.append((country, captial, area, population))

 return countries


def findLargestCountry(countries):
 # your algorithm here


def main():
 countries = initData()
 print findLargestCountry(countries)


Discussion:

Problem
---

Many programs need a set of initial data. The simplest way is to construct  
Python data structure directly as shown below. This is often not ideal.  
Algorithm and data structure tend to change. Python program statements is  
likely differ literally from its data source, which might be text pulled  
 from web pages or other place. This means a great deal of effort is often  
needed to format and maintain the input as Python statements.

This is a sample program that initialize some geographical data.

# map of country - (captial, area, population)
COUNTRIES = {}
COUNTRIES['China'] = ('Beijing', 9596960, 1284303705)
COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226)
COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489)
COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092)
COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573)


Mini-language
-

A more flexible approach is to define a mini-lanugage to describe the  
data. This can be as simple as formatting data into a multiple-line string.

1. Define the data format in text. It should mirror the data source and  
designed for ease for human editing.

2. Define the data structure.

3. Write glue code to parse the input data and initialize the data  
structure.

In the example above we use one line for each record. Each record has four  
fields, Country, captial, area and population, separated by slashes. One  
of the immediate benefit is that we no longer need to type so many quotes  
for every string literal. This concise data format is much easiler to read  
and edit than Python statements.

The parser simply break down the input text using splitlines() and then  
loop through them line by line. It is useful to account for some extra  
white space so that it is more flexible for human editor. In this case the  
numbers (area, population) from the data source contains commas. Rather  
than manually edit them out, they are copied as is into the text as is.  
Then they are parsed into integer using

area = int(area.replace(',',''))

Slash is chosen as the separator (rather than the more common comma)  
because it does not otherwise appear in the data. A record is parsed into  
field using

line.split('/')

Don't forget to remove extra white space using string.strip()

Finally it built a data structure of list of country record as tuple of  
(country, captial, area, population). It is just as easy to turn them into  
objects or any other data structure as desired.

The mini-language technique can be refined to represent more complex, more  
structured input. It makes transformation and maintenance of input data  
much easier.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: datetime iso8601 string input

2006-03-20 Thread aurora
I agree. I just keep rewriting the parse method again and again.

wy

def parse_iso8601_date(s):
  Parse date in iso8601 format e.g. 2003-09-15T10:34:54 and
 returns a datetime object.
 
 y=m=d=hh=mm=ss=0
 if len(s) not in [10,19,20]:
 raise ValueError('Invalid timestamp length - %s' % s)
 if s[4] != '-' or s[7] != '-':
 raise ValueError('Invalid separators - %s' % s)
 if len(s)  10 and (s[13] != ':' or s[16] != ':'):
 raise ValueError('Invalid separators - %s' % s)
 try:
 y = int(s[0:4])
 m = int(s[5:7])
 d = int(s[8:10])
 if len(s) = 19:
 hh = int(s[11:13])
 mm = int(s[14:16])
 ss = int(s[17:19])
 except Exception, e:
 raise ValueError('Invalid timestamp - %s: %s' % (s, str(e)))
 return datetime(y,m,d,hh,mm,ss)


 I was a little surprised to recently discover
 that datetime has no method to input a string
 value.  PEP 321 appears does not convey much
 information, but a timbot post from a couple
 years ago clarifies things:

 http://tinyurl.com/epjqc

 You can stop looking:  datetime doesn't
 support any kind of conversion from string.
 The number of bottomless pits in any datetime
 module is unbounded, and Guido declared this
 particular pit out-of-bounds at the start so
 that there was a fighting chance to get
 *anything* done for 2.3.

 I can understand why datetime can't handle
 arbitrary string inputs, but why not just
 simple iso8601 format -- i.e. the default
 output format for datetime?

 Given a datetime-generated string:

now = str(datetime.datetime.now())
print now
   '2006-02-23 11:03:36.762172'

 Why can't we have a function to accept it
 as string input and return a datetime object?

   datetime.parse_iso8601(now)

 Jeff Bauer
 Rubicon, Inc.


-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: pyregex 0.5

2006-03-10 Thread aurora
pyregex is a command line tools for constructing and testing Python's
regular _expression_. Features includes text highlighting, detail break
down of match groups, substitution and a syntax quick reference. It is
released in the public domain.

Screenshot and download from http://tungwaiyip.info/software/pyregex.html.

Wai Yip Tung


Usage: pyregex.py [options] -|filename regex [replacement [count]]

Test Python regular expressions. Specify test data's filename or use -
to enter test text from console. Optionally specify a replacement text.

Options:
-f filter mode
-n nnn limit to examine the first nnn lines. default no limit.
-m show only matched line. default False


Regular _expression_ Syntax

Special Characters

. matches any character except a newline
^ matches the start of the string
$ matches the end of the string or just before the newline at the end of
 the string
* matches 0 or more repetitions of the preceding RE
+ matches 1 or more repetitions of the preceding RE
? matches 0 or 1 repetitions of the preceding RE
{m} exactly m copies of the previous RE should be matched
{m,n} matches from m to n repetitions of the preceding RE
\ either escapes special characters or signals a special sequence
[] indicate a set of characters. Characters can be listed individually,
 or a range of characters can be indicated by giving two characters and
 separating them by a -. Special characters are not active inside sets
 Including a ^ as the first character match the complement of the set
| A|B matches either A or B
(...) indicates the start and end of a group
(?...) this is an extension notation. See documentation for detail
(?iLmsux) I ignorecase; L locale; M multiline; S dotall; U unicode; X verbose

*, +, ? and {m,n} are greedy. Append the ? qualifier to match non-greedily.


Special Sequences

\number matches the contents of the group of the same number. Groups are
 numbered starting from 1
\A matches only at the start of the string
\b matches the empty string at the beginning or end of a word
\B matches the empty string not at the beginning or end of a word
\d matches any decimal digit
\D matches any non-digit character
\gnameuse the substring matched by the group named 'name' for sub()
\s matches any whitespace character
\S matches any non-whitespace character
\w matches any alphanumeric character and the underscore
\W matches any non-alphanumeric character
\Z matches only at the end of the string


See the Python documentation on Regular _expression_ Syntax for more detail

http://docs.python.org/lib/re-syntax.html
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


Re: HTMLTestRunner - generates HTML test report for unittest

2006-01-27 Thread aurora
On Fri, 27 Jan 2006 06:35:46 -0800, Paul McGuire  
[EMAIL PROTECTED] wrote:
 Nice!  I just adapted my pyparsing unit tests to use this tool - took me
 about 3 minutes, and now it's much easier to run and review my unit test
 results.  I especially like the pass/fail color coding, and the  
 drill-down
 to the test output.

 -- Paul

Thank you! I'm glad that it is helpful to you :)
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: HTMLTestRunner - generates HTML test report for unittest

2006-01-26 Thread aurora
Greeting,

HTMLTestRunner is an extension to the Python standard library's unittest  
module. It generates easy to use HTML test reports. See a sample report at  
http://tungwaiyip.info/software/sample_test_report.html.

Check more information and download from
http://tungwaiyip.info/software/#htmltestrunner

Wai Yip Tung
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: decode unicode string using 'unicode_escape' codecs

2006-01-13 Thread aurora
Cool, it works! I have also done some due diligence that the utf-8  
encoding would not introduce any Python escape accidentially. I have  
written a recipe in the Python cookbook:

Efficient character escapes decoding
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/466293

wy

 Does this do what you want?

   u'€\\n€'
 u'\x80\\n\x80'
   len(u'€\\n€')
 4
   u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8')
 u'\x80\n\x80'

 len(u'€\\n€'.encode('utf-8').decode('string_escape').decode('utf-8'))
 3

 Basically, I convert the unicode string to bytes, escape the bytes using  
 the 'string_escape' codec, and then convert the bytes back into a  
 unicode string.

 HTH,

 STeVe

-- 
http://mail.python.org/mailman/listinfo/python-list


decode unicode string using 'unicode_escape' codecs

2006-01-12 Thread aurora
I have some unicode string with some characters encode using python  
notation like '\n' for LF. I need to convert that to the actual LF  
character. There is a 'unicode_escape' codec that seems to suit my purpose.

 encoded = u'A\\nA'
 decoded = encoded.decode('unicode_escape')
 print len(decoded)
3

Note that both encoded and decoded are unicode string. I'm trying to use  
the builtin codec because I assume it has better performance that for me  
to write pure Python decoding. But I'm not converting between byte string  
and unicode string.

However it runs into problem in some cases.

encoded = u'€\\n€'
decoded = encoded.decode('unicode_escape')

Traceback (most recent call last):
   File g:\bin\py_repos\mindretrieve\trunk\minds\x.py, line 9, in ?
 decoded = encoded.decode('unicode_escape')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in  
position 0: ordinal not in range(128)

Reading the docuemnt more carefully, I found out what has happened.  
decode('unicode_escape') takes byte string as operand and convert it into  
unicode string. Since encoded is already unicode, it is first implicitly  
converted to byte string using 'ascii' encoding. In this case it fails  
because of the '€' character.

So I resigned to the fact that 'unicode_escape' doesn't do what I want.  
But I think more deeply. I come up with this Python source code. It runs  
OK and outputs 3.

-
# -*- coding: utf-8 -*-
print len(u'€\n€')  # 3
-

Think about what happened in the second line. First the parser decodes the  
bytes into an unicode string with UTF-8 encoding. Then it applies syntax  
run to decode the unicode characters '\n' to LF. The second is what I  
want. There must be something available to the Python interpreter that is  
not available to the user. So it there something I have overlook?

Anyway I just want to leverage the builtin codecs for performance. I  
figure this would be faster than

   encoded.replace('\\n', '\n')
   ...and so on...

If there are other suggestion it would be greatly appriciated :)

wy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of recursive generator

2005-08-11 Thread aurora
 You seem to be assuming that a yield statement and a function call are  
 equivalent.  I'm not sure that's a valid assumption.

I don't know. I was hoping the compiler can optimize away the chain of  
yields.

 Anyway, here's some data to consider:

  test.py 
 def gen(n):
  if n:
  for i in gen(n/2):
  yield i
  yield n
  for i in gen(n/2):
   yield i

 def gen_wrapper(n):
  return list(gen(n))

 def nongen(n, func):
   if n:
   nongen(n/2, func)
   func(n)
   nongen(n/2, func)

 def nongen_wrapper(n):
   result = []
   nongen(n, result.append)
   return result
 -

This test somehow water down the n^2 issue. The problem is in the depth of  
recursion, in this case it is only log(n). It is probably more interesting  
to test:

def gen(n):
  if n:
  yield n
  for i in gen(n-1):
  yield i
-- 
http://mail.python.org/mailman/listinfo/python-list


performance of recursive generator

2005-08-10 Thread aurora
I love generator and I use it a lot. Lately I've been writing some  
recursive generator to traverse tree structures. After taking closer look  
I have some concern on its performance.

Let's take the inorder traversal from  
http://www.python.org/peps/pep-0255.html as an example.

def inorder(t):
 if t:
 for x in inorder(t.left):
 yield x
 yield t.label
 for x in inorder(t.right):
 yield x

Consider a 4 level deep tree that has only a right child:

1
  \
   2
\
 3
  \
   4


Using the recursive generator, the flow would go like this:

maingen1gen2gen3gen4


inorder(1..4)

 yield 1
 inorder(2..4)
 yield 2
 yield 2
 inorder(3..4)
 yield 3
 yield3
 yield 3
 inorder(4)
 yield 4
 yield 4
 yield 4
 yield 4


Note that there are 4 calls to inorder() and 10 yield. Indeed the  
complexity of traversing this kind of tree would be O(n^2)!


Compare that with a similar recursive function using callback instead of  
generator.

def inorder(t, foo):
 if t:
 inorder(t.left, foo):
 foo(t.label)
 inorder(t.right, foo):


The flow would go like this:

mainstack1  stack2  stack3  stack4


inorder(1..4)
 foo(1)
 inorder(2..4)
 foo(2)
 inorder(3..4)
 foo(3)
 inorder(4)
 foo(4)


There will be 4 calls to inorder() and 4 call to foo(), give a reasonable  
O(n) performance.

Is it an inherent issue in the use of recursive generator? Is there any  
compiler optimization possible?
-- 
http://mail.python.org/mailman/listinfo/python-list


Problem redirecting stdin on Windows

2005-05-25 Thread aurora
On Windows (XP) with win32 extension installed, a Python script can be  
launched from the command line directly since the .py extension is  
associated with python. However it fails if the stdin is piped or  
redirected.

Assume there is an echo.py that read from stdin and echo the input.



Launching from command line directly, this echos input from keyboard:

   echo.py



But it causes an error if the stdin is redirected

   echo.py textfile


   ...
   for line in fp:
   IOError: [Errno 9] Bad file descriptor



However it works as expected if launched via Python.exe

   c:\Python24\python.exe echo.py textfile



Why is the second option fails? It makes many script lot less functional.
-- 
http://mail.python.org/mailman/listinfo/python-list


win32clipboard.GetClipboardData() return string with null characters

2005-05-25 Thread aurora
I was using win32clipboard.GetClipboardData() to retrieve the Windows  
clipboard using code similar to the message below:

http://groups-beta.google.com/group/comp.lang.python/msg/3722ba3afb209314?hl=en

Some how I notice the data returned includes \0 and some characters that  
shouldn't be there after the null character. It is easy enough to truncate  
them. But why does it get there in the first place? Is the data length  
somehow calculated wrong?

I'm using Windows XP SP2 with Python 2.4 and pywin32-203.

aurora
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unit testing - one test class/method, or test class/class

2005-02-25 Thread aurora
I do something more or less like your option b. I don't think there is any  
orthodox structure to follow. You should use a style that fit your taste.

What I really want to bring up is your might want to look at refactoring  
your module in the first place. 348 test cases for one module sounds like  
a large number. That reflects you have a fairly complex module to be  
tested to start with. Often the biggest benefit of doing automated unit  
testing is it forces the developers to modularize and decouple their code  
in order to make it testable. This action alone improve that code quality  
a lot. If breaking up the module make sense in your case, the test  
structure will follows.

Hi,
I just found py.test[1] and converted a large unit test module to py.test
format (which is actually almost-no-format-at-all, but I won't get there
now). Having 348 test cases in the module and huge test classes, I  
started
to think about splitting classes. Basically you have at least three  
obvious
choises, if you are going for consistency in your test modules:

Choise a:
Create a single test class for the whole module to be tested, whether it
contains multiple classes or not.
...I dont think this method deserves closer inspection. It's probably  
rather
poor method to begin with. With py.test where no subclassing is required
(like in Python unittest, where you have to subclass unittest.TestCase)
you'd probably be better off with just writing a test method for each  
class
and each class method in the module.

Choise b:
Create a test class for each class in the module, plus one class for any
non-class methods defined in the module.
+ Feels clean, because each test class is mapped to one class in the  
module
+ It is rather easy to find all tests for given class
+ Relatively easy to create class skeleton automatically from test module
  and the other way round

- Test classes get huge easily
- Missing test methods are not very easy to find[2]
- A test method may depend on other tests in the same class
Choise c:
Create a test class for each non-class method and class method in the  
tested
module.

+ Test classes are small, easy to find all tests for given method
+ Helps in test isolation - having separate test class for single method
  makes tested class less dependent of any other methods/classes
+ Relatively easy to create test module from existing class (but then you
  are not doing TDD!) but not vice versa
- Large number of classes results in more overhead; more typing, probably
  requires subclassing because of common test class setup methods etc.
What do you think, any important points I'm missing?
Footnotes:
[1]  In reality, this is a secret plot to advertise py.test, see
 http://codespeak.net/py/current/doc/test.html
[2] However, this problem disappears if you start with writing your tests
first: with TDD, you don't have untested methods, because you start  
by
writing the tests first, and end up with a module that passes the  
tests

--
# Edvard Majakari		Software Engineer
# PGP PUBLIC KEY available	Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy,  
however, was
not full of lettuces. This did not seem fair.  --Mr Bunnsy has an  
adventure
--
http://mail.python.org/mailman/listinfo/python-list


Re: running a shell command from a python program

2005-02-23 Thread aurora
In Python 2.4, use the new subprocess module for this. It subsume the  
popen* methods.

Hi,
   I'm a newbie, so please be gentle :-)
How would I run a shell command in Python?
Here is what I want to do:
I want to run a shell command that outputs some stuff, save it into a
list and do stuff with the contents of that list.
I started with a BASH script actually, until I realized I really needed
better data structures :-)
Is popen the answer? Also, where online would I get access to good
sample code that I could peruse?
I'm running 2.2.3 on Linux, and going strictly by online doc so far.
Thanks!
S C
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python and Ajax technology collaboration

2005-02-23 Thread aurora
It was discussed in the last Bay Area Python Interest Group meeting.
Thursday, February 10, 2005
Agenda: Developing Responsive GUI Applications Using HTML and HTTP
Speakers: Donovan Preston
http://www.baypiggies.net/
The author has a component LivePage for this. You may find it from  
http://nevow.com/. Similar idea from the Javascript stuff but very Python  
centric.


Interesting GUI developments, it seems. Anyone developed a Ajax
application using Python? Very curious
thx
(Ajax stands for:
XHTML and CSS;
dynamic display and interaction using the Document Object Model;
data interchange and manipulation using XML and XSLT;
asynchronous data retrieval using XMLHttpRequest;
and JavaScript binding everything together
ie Google has used these technologies to build Gmail, Google Maps etc.
more info:
http://www.adaptivepath.com/publications/essays/archives/000385.php)
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode encoding usablilty problem

2005-02-20 Thread aurora
On Sat, 19 Feb 2005 18:44:27 +0100, Fredrik Lundh [EMAIL PROTECTED]  
wrote:

aurora [EMAIL PROTECTED] wrote:
I don't want to mix them. But how could I find them? How do I know  
this  statement can be
potential problem

  if a==b:
where a and b can be instantiated individually far away from this line  
of  code that put them
together?
if you don't know what a and b comes from, how can you be sure that
your program works at all?  how can you be sure they're both strings?
(a op b can fail in many ways, depending on what a, b, and op  
are)

a and b are both string. The issue is 8-bit string or unicode string.

Things works fine, unit tests pass, all until the first non-ASCII  
characters
come in and then the program breaks.
if you have unit tests, why don't they include Unicode tests?
/F
How do I structure the test cases to guarantee coverage? It is not  
practical to test every combinations of unicode/8-bit strings. Adding  
non-ascii characters to test data probably make problem pop up earlier.  
But it is arduous and it is hard to spot if you left out any.

--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode encoding usablilty problem

2005-02-20 Thread aurora
On Sun, 20 Feb 2005 15:01:09 +0100, Martin v. Löwis [EMAIL PROTECTED]  
wrote:

Nick Coghlan wrote:
Having , u, and r be immutable, while b was mutable would seem  
rather inconsistent.
Yes. However, this inconsistency might be desirable. It would, of  
course, mean that the literal cannot be a singleton. Instead, it has
to be a display (?), similar to list or dict displays: each execution
of the byte string literal creates a new object.

An alternative would be to have bytestr be the immutable type  
corresponding to the current str (with b literals producing  
bytestr's), while reserving the bytes name for a mutable byte  
sequence.
Indeed. This maze of options has caused the process to get stuck.
People also argue that with such an approach, we could as well
tell users to use array.array for the mutable type. But then,
people complain that it doesn't have all the library support that
strings have.
The main point being, the replacement for 'str' needs to be immutable  
or the upgrade process is going to be a serious PITA.
Somebody really needs to take this in his hands, completing the PEP,
writing a patch, checking applications to find out what breaks.
Regards,
Martin
What is the processing of getting a PEP work out? Does the work and  
discussion carry out in the python-dev mailing list? I would be glad to  
help out especially on this particular issue.
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and socket

2005-02-19 Thread aurora
On 18 Feb 2005 19:10:36 -0800, [EMAIL PROTECTED] wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?
I was answering your specific question:
How can I send the unicode string to the remote end of the socket as it  
is without any conversion of encode

The answer is you could not. Not that you cannot sent unicode but you have  
to encode it. The same applies to perl, c or Java. The only difference is  
the detail of how strings get encoded.

There are a few posts suggest various means. Or you can check out  
codecs.getwriter() which closer resembles Java's way.
--
http://mail.python.org/mailman/listinfo/python-list


unicode encoding usablilty problem

2005-02-18 Thread aurora
I have long find the Python default encoding of strict ASCII frustrating.  
For one thing I prefer to get garbage character than an exception. But the  
biggest issue is Unicode exception often pop up in unexpected places and  
only when a non-ASCII or unicode character first found its way into the  
system.

Below is an example. The program may runs fine at the beginning. But as  
soon as an unicode character u'b' is introduced, the program boom out  
unexpectedly.

sys.getdefaultencoding()
'ascii'
a='\xe5'
# can print, you think you're ok
... print a
å
b=u'b'
a==b
Traceback (most recent call last):
  File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0:  
ordinal not in range(128)


One may suggest the correct way to do it is to use decode, such as
  a.decode('latin-1') == b
This brings up another issue. Most references and books focus exclusive on  
entering unicode literal and using the encode/decode methods. The fallacy  
is that string is such a basic data type use throughout the program, you  
really don't want to make a individual decision everytime when you use  
string (and take a penalty for any negligence). The Java has a much more  
usable model with unicode used internally and encoding/decoding decision  
only need twice when dealing with input and output.

I am sure these errors are a nuisance to those who are half conscious to  
unicode. Even for those who choose to use unicode, it is almost impossible  
to ensure their program work correctly.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie CGI problem

2005-02-18 Thread aurora
Not sure about the repeated hi. But you are supposed to use \r\n\r\n, not  
just \n\n according to the HTTP specification.

#!/usr/bin/python
import cgi
print Content-type: text/html\n\n
print hi
Gives me the following in my browser:
'''
hi
Content-type: text/html
hi
'''
Why are there two 'hi's?
Thanks,
Rory
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie CGI problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 18:36:10 +0100, Peter Otten [EMAIL PROTECTED] wrote:
Rory Campbell-Lange wrote:
#!/usr/bin/python
import cgi
print Content-type: text/html\n\n
print hi
Gives me the following in my browser:
'''
hi
Content-type: text/html
hi
'''
Why are there two 'hi's?
You have chosen a bad name for your script: cgi.py.
It is now self-importing. Rename it to something that doesn't clash with  
the
standard library, and all should be OK.

Peter
You are genius.
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode and socket

2005-02-18 Thread aurora
You could not. Unicode is an abstract data type. It must be encoded into  
octets in order to send via socket. And the other end must decode the  
octets to retrieve the unicode string. Needless to say the encoding scheme  
must be consistent and understood by both ends.

On 18 Feb 2005 11:03:46 -0800, [EMAIL PROTECTED] wrote:
hello all,
 I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?
Thanks
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode encoding usablilty problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 20:18:28 +0100, Walter Dörwald [EMAIL PROTECTED]  
wrote:

aurora wrote:
  [...]
In Java they are distinct data type and the compiler would catch all   
incorrect usage. In Python, the interpreter seems to 'help' us to  
promote  binary string to unicode. Things works fine, unit tests pass,  
all until  the first non-ASCII characters come in and then the program  
breaks.
 Is there a scheme for Python developer to use so that they are safe  
from  incorrect mixing?
Put the following:
import sys
sys.setdefaultencoding(undefined)
in a file named sitecustomize.py somewhere in your Python path and
Python will complain whenever there's an implicit conversion between
str and unicode.
HTH,
Walter Dörwald
That helps! Running unit test caught quite a few potential problems (as  
well as a lot of safe of ASCII string promotion).
--
http://mail.python.org/mailman/listinfo/python-list


Re: unicode encoding usablilty problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis [EMAIL PROTECTED]  
wrote:

I'd like to point out the
historical reason: Python predates Unicode, so the byte string type
has many convenience operations that you would only expect of
a character string.
We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return byte strings, when they should process character strings.
Instead of breaking the libraries right away, we have defined
a transitional mechanism, which allows to add Unicode support
to libraries as the need arises. This transition is still in
progress.
I understand. So I wasn't yelling why can't Python be more like Java. On  
the other hand I also want to point out making individual decision for  
each string wasn't practical and is very error prone. The fact that  
unicode and 8 bit string look alike and work alike in common situation but  
only run into problem with non-ASCII is very confusing for most people.


Eventually, the primary string type should be the Unicode
string. If you are curious how far we are still off that goal,
just try running your program with the -U option.
Lots of errors. Amount them are gzip (binary?!) and strftime??
I actually quite appriciate Python's power in processing binary data as  
8-bit strings. But perhaps we should transition to use unicode as text  
string as treat binary string as exception. Right now we have

  '' - 8bit string; u'' unicode string
How about
  b'' - 8bit string; '' unicode string
and no automatic conversion. Perhaps this can be activated by something  
like the encoding declarations, so that transition can happen module by  
module.


Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: DHTML control from Python?

2005-02-14 Thread aurora
IE should be able to do that. Install the win32 modules. Then you should  
simply embed Python using script language='python'. Not sure about Mac.  
Even on Windows your audiences are limited to those who have  
IE+python+win32 modules.


Are there any ways to use Python (rather than JavaScript) for  
controlling DHTML? I don't mind writing
JavaScript stubs which can be called by Python, so long as I need to do  
so only once for a particular
feature. I'm running Mac OS X 10.3, so comments as to the best browser  
for testing this would also be
appreciated.

If you could also email as well as posting a reply, I'd be grateful.
email to : kenneth.m.mcdonald _at_ sbcglobal.net
Thanks,
Ken McDonald
--
http://mail.python.org/mailman/listinfo/python-list


Re: executing VBScript from Python and vice versa

2005-02-04 Thread aurora
Go to the bookstore and get a copy of Python Programming on Win32
by Mark Hammond, Andy Robinson today.
  http://www.oreilly.com/catalog/pythonwin32/
It has everything you need.
Is there a way to make programs written in these two languages  
communicate
with each other? I am pretty sure that VBScript can access a Python  
script
because Python is COM compliant. On the other hand, Python might be able  
to
call a VBScript through WSH. Can somebody provide a simple example? I  
have
exactly 4 days of experience in Python (and fortunately, much more in  
VB6)

Thanks.
--
http://mail.python.org/mailman/listinfo/python-list


Re: OT: why are LAMP sites slow?

2005-02-03 Thread aurora
aurora [EMAIL PROTECTED] writes:
Slow compares to what? For a large commerical site with bigger budget,
better infrastructure, better implementation, it is not surprising
that  they come out ahead compares to hobbyist sites.
Hmm, as mentioned, I'm not sure what the commercial sites do that's
different.  I take the view that the free software world is capable of
anything that the commercial world is capable of, so I'm not awed just
because a site is commercial.  And sites like Slashdot have pretty big
budgets by hobbyist standards.
Putting implementation aside, is LAMP inherently performing worst than
commerical alternatives like IIS, ColdFusion, Sun ONE or DB2? Sounds
like  that's your perposition.
I wouldn't say that.  I don't think Apache is a bottleneck compared
with other web servers.  Similarly I don't see an inherent reason for
Python (or whatever) to be seriously slower than Java servlets.  I
have heard that MySQL doesn't handle concurrent updates nearly as well
as DB2 or Oracle, or for that matter PostgreSQL, so I wonder if busier
LAMP sites might benefit from switching to PostgreSQL (LAMP = LAPP?).
I'm lost. So what do you compares against when you said LAMP is slow? What  
is the reference point? Is it just a general observation that slashdot is  
slower than we like it to be?

If you are talking about slashdot, there are many ideas to make it faster.  
For example they can send all 600 comments to the client and let the user  
do querying using DHTML on the client side. This leave the server serving  
mostly static files and will certainly boost the performance tremendously.

If you mean MySQL or SQL database in general is slow, there are truth in  
it. The best thing about SQL database is concurrent access, transactional  
semantics and versatile querying. Turns out a lot of application can  
really live without that. If you can rearchitect the application using  
flat files instead of database it can often be a big bloom.

A lot of these is just implementation. Find the right tool and the right  
design for the job. I still don't see a case that LAMP based solution is  
inherently slow.
--
http://mail.python.org/mailman/listinfo/python-list


Re: hotspot profiler experience and accuracy?

2005-02-02 Thread aurora
Thanks for pointing me to your analysis. Now I know it wasn't me doing  
something wrong.

hotspot did lead me to knock down a major performance bottleneck one time.  
I found that zipfile.ZipFile() basically read the entire zip file in  
instantiation time, even though you may only need one file from it  
subsequencely.

In anycase the number of function call seems to make sense and it should  
give some insight to the runtime behaviour. The CPU time is just so  
misleading.


aurora wrote:
But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but  
the  actual elapsed time is only 54s. When measuring elapsed time  
instead of  CPU time the performance gain is only 13% with the profiler  
running and  down to 10% when not using the profiler.
 Is there something I misunderstood in reading the numbers?
Well, I'm confused too. Look at my post from a few months ago:
   http://tinyurl.com/6awzj
(note that my code contained a few errors and that you need
to use the fixed code that I posted a few replies later).
Perhaps somebody can explain a bit more about this this time? :-)
At the moment, frankly, hotspot seems rather useless.
--Irmen
--
http://mail.python.org/mailman/listinfo/python-list


Re: Printing Filenames with non-Ascii-Characters

2005-02-02 Thread aurora
  print d.encode('cp437')
So I would have to specify the encoding on every call to print? I am  
sure to
forget and I don't like the program dying, in my case garbled output  
would be
much more acceptable.
Marian I'm with you. You never known you have put enough encode in all the  
right places and there is no static type checking to help you. So that  
short answer is to set a different default in sitecustomize.py. I'm trying  
to writeup something about unicode in Python, once I understand what's  
going on inside...
--
http://mail.python.org/mailman/listinfo/python-list


Re: Next step after pychecker

2005-02-01 Thread aurora
A frequent error I encounter
  try:
...do something...
  except IOError:
log('encounter an error %s line %d' % filename)
Here in the string interpolation I should supply (filename,lineno).  
Usually I have a lot of unittesting to catch syntax error in the main  
code. But it is very difficult to run into exception handler, some of  
those are added defensely. Unfortunately those untested exception  
sometimes fails precisely when we need it for diagnosis information.

pychecker sometime give false alarm. The argument of a string  
interpolation  may be a valid tuple. It would be great it we can somehow  
unit test the exception handler (without building an extensive library of  
mock objects).
--
http://mail.python.org/mailman/listinfo/python-list


Re: Printing Filenames with non-Ascii-Characters

2005-02-01 Thread aurora
On Tue, 01 Feb 2005 20:28:11 +0100, Marian Aldenhövel  
[EMAIL PROTECTED] wrote:

Hi,
I am very new to Python and have run into the following problem. If I do
something like
   dir = os.listdir(somepath)
   for d in dir:
  print d

The program fails for filenames that contain non-ascii characters.
   'ascii' codec can't encode characters in position 33-34:
I have noticed that this seems to be a very common problem. I have read  
a lot
of postings regarding it but not really found a solution. Is there a  
simple
one?
English windows command prompt uses cp437 charset. To print it, use
  print d.encode('cp437')
The issue is a terminal only understand certain character set. If you have  
unicode string, like d in your case, you have to encode it before it can  
be printed. (We really need native unicode terminal!!!) If you don't  
encode, Python will do it for you. The default encoding is ASCII. Any  
string that contains non-ASCII character will give you trouble. In my  
opinion Python is too conversative to use the 'strict' encoding which  
gives users unaware of unicode a lot of woes.

So how did you get a unicoded d to start with? If 'somepath' is unicode,  
os.listdir returns a list of unicode. So why is somepath unicode? Either  
you have entered a unicode literal or it comes from some other sources.  
One possible source is XML parser, which returns string in unicode.

Windows NT support unicode filename. I'm not sure about Linux. The result  
maybe slightly differ.




What I specifically do not understand is why Python wants to interpret  
the
string as ASCII at all. Where is this setting hidden?

I am running Python 2.3.4 on Windows XP and I want to run the program on
Debian sarge later.
Ciao, MM
--
http://mail.python.org/mailman/listinfo/python-list


hotspot profiler experience and accuracy?

2005-02-01 Thread aurora
I have a parser I need to optimize. It has some disk IO and a lot of  
looping over characters.

I used the hotspot profiler to gain insight on optimization options. The  
methods show up on on the top of this list seems fairly trivial and does  
not look like CPU hogger. Nevertheless I optimized it and have 25%  
performance gain according to hotspot's number.

But the numbers look skeptical. Hotspot claim 71.166 CPU seconds but the  
actual elapsed time is only 54s. When measuring elapsed time instead of  
CPU time the performance gain is only 13% with the profiler running and  
down to 10% when not using the profiler.

Is there something I misunderstood in reading the numbers?
--
http://mail.python.org/mailman/listinfo/python-list


Go visit Xah Lee's home page

2005-01-31 Thread aurora
Let's stop discussing about the perl-python non-sense. It is so boring.
For a break, just visit Mr Xah Lee's personal page  
(http://xahlee.org/PageTwo_dir/Personal_dir/xah.html). You'll find lot of  
funny information and quotes from this queer personality. Thankfully no  
perl-python stuff there.

Don't miss Mr. Xah Lee's recent pictures at
  http://xahlee.org/PageTwo_dir/Personal_dir/mi_pixra.html
My favor is the last picture. Long haired Xah Lee sitting contemplatively  
in the living room. The caption says my beautiful hair, fails to resolve  
the problems of humanity. And, it is falling apart by age.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Transparent (redirecting) proxy with BaseHTTPServer

2005-01-28 Thread aurora
It should be very safe to count on the host header. Maybe some really  
really old browser would not support that. But they probably won't work in  
today's WWW anyway. Majority of today's web site is likely to be virtually  
hosted. One Apache maybe hosting for 50 web addresses. If a client strip  
the host name and not sending the host header either the web server  
wouldn't what address it is really looking for. If you caught some request  
that doesn't have host header it is a good idea to redirect them to a  
browser upgrade page.

Thanks, aurora ;),
aurora wrote:
If you actually want the IP, resolve the host header would give you  
that.
I' m only interested in the hostname.
 The second form of HTTP request without the host part is for  
compatability  of pre-HTTP/1.1 standard. All modern web browser should  
send the Host  header.
How safe is the assumtion that the Host header will be there? Is it part  
of the HTTP/1.1 spec? And does it mean all pre 1.1 clients will fail?  
Hmm, maybe I should look on the wire whats really happening...

thanks again
  Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: Transparent (redirecting) proxy with BaseHTTPServer

2005-01-27 Thread aurora
If you actually want the IP, resolve the host header would give you that.
In the redirect case you should get a host header like
Host: www.python.org
From that you can reconstruct the original URL as  
http://www.python.org/ftp/python/contrib/. With that you can open it using  
urllib and proxy the data to the client.

The second form of HTTP request without the host part is for compatability  
of pre-HTTP/1.1 standard. All modern web browser should send the Host  
header.


Hi list,
My ultimate goal is to have a small HTTP proxy which is able to show a  
message specific to clients name/ip/status then handle the original  
request normally either by redirecting the client, or acting as a proxy.

I started with a modified[1] version of TinyHTTPProxy postet by Suzuki  
Hisao somewhere in 2003 to this list and tried to extend it to my needs.  
It works quite well if I configure my client to use it, but using  
iptables REDIRECT feature to point the clients transparently to the  
proxy caused some issues.

Precisely, the self.path member variable of baseHTTPRequestHandler is  
missing the command and the host (i.e www.python.org) part of the  
request line for REDIRECTed connections:

without iptables REDIRECT:
self.path - GET http://www.python.org/ftp/python/contrib/ HTTP/1.1
with REDIRECT:
self.path - GET /ftp/python/contrib/ HTTP/1.1
I asked about this on the squid mailing list and was told this is normal  
and I have to reconstuct the request line from the real destination IP,  
the URL-path and the Host header (if any). If the Host header is sent  
it's an (unsafe) nobrainer, but I cannot for the life of me figure out  
where to get the real destination IP. Any ideas?

thanks
  Paul
[1] HTTP Debugging Proxy
  Modified by Xavier Defrang (http://defrang.com/)
--
http://mail.python.org/mailman/listinfo/python-list


Re: limited python virtual machine (WAS: Another scripting language implemented into Python itself?)

2005-01-26 Thread aurora
It is really necessary to build a VM from the ground up that includes OS  
ability? What about JavaScript?


On Wed, Jan 26, 2005 at 05:18:59PM +0100, Alexander Schremmer wrote:
On Tue, 25 Jan 2005 22:08:01 +0100, I wrote:
 sys.safecall(func, maxcycles=1000)
 could enter the safe mode and call the func.
This might be even enhanced like this:
 import sys
 sys.safecall(func, maxcycles=1000,
 allowed_domains=['file-IO', 'net-IO', 'devices',  
'gui'],
 allowed_modules=['_sre'])

Any comments about this from someone who already hacked CPython?
Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written  
that
way from the ground up (to behave both like a program interpreter and an  
OS).
Python the language was not, and the CPython interpreter definitely was  
not.

Search groups.google.com for previous discussions of this on c.l.py
-Jack
--
http://mail.python.org/mailman/listinfo/python-list


Re: list unpack trick?

2005-01-23 Thread aurora
On Sat, 22 Jan 2005 10:03:27 -0800, aurora [EMAIL PROTECTED] wrote:
I am think more in the line of string.ljust(). So if we have a  
list.ljust(length, filler), we can do something like

   name, value = s.split('=',1).ljust(2,'')
I can always break it down into multiple lines. The good thing about  
list unpacking is its a really compact and obvious syntax.
Just to clarify the ljust() is a feature wish, probably should be named  
something like pad().

Also there is another thread a few hours before this asking about  
essentially the same thing.

default value in a list
http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f3affefdb4272270
--
http://mail.python.org/mailman/listinfo/python-list


Re: list unpack trick?

2005-01-22 Thread aurora
Thanks. I'm just trying to see if there is some concise syntax available  
without getting into obscurity. As for my purpose Siegmund's suggestion  
works quite well.

The few forms you have suggested works. But as they refer to list multiple  
times, it need a separate assignment statement like

  list = s.split('=',1)
I am think more in the line of string.ljust(). So if we have a  
list.ljust(length, filler), we can do something like

  name, value = s.split('=',1).ljust(2,'')
I can always break it down into multiple lines. The good thing about list  
unpacking is its a really compact and obvious syntax.


On Sat, 22 Jan 2005 08:34:27 +0100, Fredrik Lundh [EMAIL PROTECTED]  
wrote:
...
So more generally, is there an easy way to pad a list into length of n   
with filler items appended
at the end?
some variants (with varying semantics):
list = (list + n*[item])[:n]
or
list += (n - len(list)) * [item]
or (readable):
if len(list)  n:
list.extend((n - len(list)) * [item])
etc.
/F
--
http://mail.python.org/mailman/listinfo/python-list


Re: A Fundamental Turn Toward Concurrency in Software

2005-01-08 Thread aurora
Of course there are many performance bottleneck, CPU, memory, I/O, network  
all the way up to the software design and implementation. As a software  
guy myself I would say by far better software design would lead to the  
greatest performance gain. But that doesn't mean hardware engineer can sit  
back and declare this as software's problem. Even if we are not writing  
CPU intensive application we will certain welcome free performace gain  
coming from a faster CPU or a more optimized compiler.

I think this is significant because it might signify a paradigm shift.  
This might well be a hype, but let's just assume this is future direction  
of CPU design. Then we might as well start experimenting now. I would just  
throw some random ideas: parallel execution at statement level, look up  
symbol and attributes predicitively, parallelize hash function, dictionary  
lookup, sorting, list comprehension, etc, background just-in-time  
compilation, etc, etc.

One of the author's idea is many of today's main stream technology (like  
OO) did not come about suddenly but has cumulated years of research before  
becoming widely used. A lot of these ideas may not work or does not seems  
to matter much today. But in 10 years we might be really glad that we have  
tried.


aurora [EMAIL PROTECTED] writes:
Just gone though an article via Slashdot titled The Free Lunch Is
Over: A  Fundamental Turn Toward Concurrency in Software
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that
the  continous CPU performance gain we've seen is finally over. And
that future  gain would primary be in the area of software concurrency
taking advantage  hyperthreading and multicore architectures.
Well, another gain could be had in making the software less wasteful
of cpu cycles.
I'm a pretty experienced programmer by most people's standards but I
see a lot of systems where I can't for the life of me figure out how
they manage to be so slow.  It might be caused by environmental
pollutants emanating from Redmond.
--
http://mail.python.org/mailman/listinfo/python-list


A Fundamental Turn Toward Concurrency in Software

2005-01-07 Thread aurora
Hello!
Just gone though an article via Slashdot titled The Free Lunch Is Over: A  
Fundamental Turn Toward Concurrency in Software  
[http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues that the  
continous CPU performance gain we've seen is finally over. And that future  
gain would primary be in the area of software concurrency taking advantage  
hyperthreading and multicore architectures.

Perhaps something the Python interpreter team can ponder.
--
http://mail.python.org/mailman/listinfo/python-list