The Cython compiler is 20 years old today !

2022-04-04 Thread Stefan Behnel

Dear Python community,

it's now 20 years since Greg Ewing posted his first announcement of Pyrex, 
the tool that is now known and used under the name Cython.


https://mail.python.org/pipermail/python-list/2002-April/126661.html

It was a long way, and I've written up some of it in a blog post:

http://blog.behnel.de/posts/cython-is-20/

Today, if you're working on any kind of larger application in Python, 
you're likely to have some piece of code downloaded into your venv that was 
built with Cython. Or many of them.


I'm proud of what we have achieved. And I'm happy to see and talk to the 
many, many users out there whom we could help to help their users get their 
work done.


Happy anniversary, Cython!

Stefan



PS: The list of Cython implemented packages on PyPI is certainly 
incomplete, so please add the classifier to yours if it's missing. With 
almost 3000 dependent packages on Github (and almost 100,000 related 
repos), I'm sure we can crack the number of 1000 Cython built packages on 
PyPI as a birthday present. (No Spam, please, just honest classifiers.)


https://pypi.org/search/?q=&o=-created&c=Programming+Language+%3A%3A+Cython

https://github.com/cython/cython/network/dependents?dependent_type=PACKAGE
--
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-23 Thread Stefan Behnel
Jesse Ibarra schrieb am 22.07.19 um 18:12:
> On Saturday, July 20, 2019 at 1:11:51 PM UTC-6, Stefan Behnel wrote:
>> Jesse Ibarra schrieb am 20.07.19 um 04:12:
>>> Sorry, I am not understanding. Smalltlak VW 8.3 does not support Python.
>>> I can only call Pyhton code through C/Python API.
>>
>> Ok, but that doesn't mean you need to write code that uses the C-API of
>> Python. All you need to do is:
>>
>> 1) Start up a CPython runtime from Smalltalk (see the embedding example I
>> posted) and make it import an extension module that you write (e.g. using
>> the "inittab" mechanism [1]).
>>
>> 2) Use Cython to implement this extension module to provide an interface
>> between your Smalltalk code and your Python code. Use the Smalltalk C-API
>> from your Cython code to call into Smalltalk and exchange data with it.
>>
>> Now you can execute Python code inside of Python and make it call back and
>> forth into your Smalltalk code, through the interface module. And there is
>> no need to use the Python C-API for anything beyond step 1), which is about
>> 5 lines of Python C-API code if you write it yourself. Everything else can
>> be implemented in Cython and Python.
>>
>> Stefan
>>
>>
>> [1]
>> https://docs.python.org/3/extending/embedding.html?highlight=PyImport_appendinittab#extending-embedded-python
> 
> This cleared so much @Stefan, thank you. I just need some clarification if 
> you don't mind.
>  
> In (1), when you say  "import an extension module that you write",  do you 
> mean the Python library that was created "import emb"? Is that gonna be 
> written in Cython or standalone .C file?

Yes. In Cython.


> in (2), what do to mean when you said "Use the Smalltalk C-API from your 
> Cython code to call into Smalltalk and exchange data with it."? 

Not sure what part exactly you are asking about, but you somehow have to
talk to the Smalltalk runtime from your Cython/Python code if you want to
interact with it. I assume that this will be done through the C API that
Smalltalk provides.

Just in case, did you check if there is already a bridge for your purpose?
A quick web search let me find this, not sure if it helps.

https://github.com/ObjectProfile/PythonBridge

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-20 Thread Stefan Behnel
Jesse Ibarra schrieb am 20.07.19 um 04:12:
> Sorry, I am not understanding. Smalltlak VW 8.3 does not support Python.
> I can only call Pyhton code through C/Python API.

Ok, but that doesn't mean you need to write code that uses the C-API of
Python. All you need to do is:

1) Start up a CPython runtime from Smalltalk (see the embedding example I
posted) and make it import an extension module that you write (e.g. using
the "inittab" mechanism [1]).

2) Use Cython to implement this extension module to provide an interface
between your Smalltalk code and your Python code. Use the Smalltalk C-API
from your Cython code to call into Smalltalk and exchange data with it.

Now you can execute Python code inside of Python and make it call back and
forth into your Smalltalk code, through the interface module. And there is
no need to use the Python C-API for anything beyond step 1), which is about
5 lines of Python C-API code if you write it yourself. Everything else can
be implemented in Cython and Python.

Stefan


[1]
https://docs.python.org/3/extending/embedding.html?highlight=PyImport_appendinittab#extending-embedded-python

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-19 Thread Stefan Behnel
Jesse Ibarra schrieb am 17.07.19 um 20:39:
> My options seem rather limited, I need to make a Pipeline from
> (Smalltalk -> C -> Python) then go back (Smalltalk <- C <- Python).
> Since Smalltalk does not support Python directly I have to settle with
> the C/Python API
> (https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview).
> Any suggestions?

First of all: don't use the C-API! :-)

Use Cython instead. It's a Python-to-C compiler that covers up all the ugly
little details of talking to Python from C (importing a module is just
"import module", and it even adapts to different Python versions
automatically). You can keep writing Python code, and at the same time
trivially use external C code.

https://cython.org/

http://docs.cython.org/en/latest/src/tutorial/

For embedding Python in an external program (in case you really need to do
that and can't live with starting Python instead of Smalltalk), here's an
example:

https://github.com/cython/cython/tree/master/Demos/embed

It uses the "--embed" argument to Cython that generates a C (main) function
to start up the CPython runtime.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Using PyArg_ParseTuple to with optional fields.

2019-02-28 Thread Stefan Behnel
Anthony Flury via Python-list schrieb am 28.02.19 um 10:18:
> I am trying to write an extension module with a function (actually an
> __init__ method, but I am not sure that matters) where the function can be
> called as either :
> 
>     my_func()
> 
> or
> 
>     my_func( a, b, c, d) - where a,b,c,d are all doubles.
> 
> I would prefer not to allow the function to be called with some arguments
> specified - it is either all four - or none.

You could do the argument checking yourself, after extracting them.

But (without knowing what you're actually up to here) – are you sure the
above is a good user interface? Did you consider splitting the function
into two, one that takes arguments and one that takes none?

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_GetAttrString returns not the object I expected

2019-02-10 Thread Stefan Behnel
Barry Scott schrieb am 10.02.19 um 13:08:
> After calling PyObject_GetAttrString() I expected to get a PyObject string 
> back but I found that I had been given a  instead.
> 
> (gdb) p *args_o 
> $4 = 
> 
> What is going on and how do I get from the  to the object 
> I 
> want?

Phil is right about the function itself, but my guess is that you called
GetAttr() on a class instead of an instance. Read up on Python descriptors
to understand what difference that makes.

https://docs.python.org/3/howto/descriptor.html

Basically, you got something like a property object back, but not the value
that the property maintaines. If you look up the attribute on the instance,
the property (or descriptor) will hand it to you. The same applies to
method lookups and other special attributes that may also be implemented as
descriptors.

Also take a look at Cython, which is designed to keep users from having to
learn all these things and instead lets you do them in Python.

https://cython.org/

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Implement C's Switch in Python 3

2019-02-03 Thread Stefan Behnel
Chris Angelico schrieb am 03.02.19 um 02:23:
> Of course, you can also precompute this:
> 
> day_ordinal = mapper(
> [1, 21, 31], "st",
> [2, 22], "nd",
> [3, 23], "rd",
> )
> def f(x): return day_ordinal.get(x, "th")

… in which case I would also 'precompute' the ".get" and give the resulting
callable a properly readable name all-together:

find_special_day_ordinal = mapper(
[1, 21, 31], "st",
[2, 22], "nd",
[3, 23], "rd",
).get

print(find_special_day_ordinal(x, "th"))

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What is your experience porting Python 2.7.x scripts to Python 3.x?

2019-01-23 Thread Stefan Behnel
Cameron Simpson schrieb am 23.01.19 um 00:21:
>  from __future__ import absolute_imports, print_function
> 
> gets you a long way.

... and: division.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fastest first

2018-12-24 Thread Stefan Behnel
Avi Gross schrieb am 17.12.18 um 01:00:
> SHORT VERSION: a way to automatically run multiple algorithms in parallel
> and kill the rest when one returns an answer.

One (somewhat seasonal) comment on this: it doesn't always have to be about
killing (processes or threads). You might also consider a cooperative
implementation, where each of the algorithms is allowed to advance by one
"step" in each "round", and is simply discarded when a solution is found
elsewhere, or when it becomes unlikely that this specific algorithm will
contribute a future solution. This could be implemented via a sequence of
generators or coroutines in Python. Such an approach is often used in
simulations (e.g. SimPy and other "discrete event" simulators), where exact
control over the concurrency pattern is desirable.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Side by side comparison - CPython, nuitka, PyPy

2018-12-24 Thread Stefan Behnel
Anthony Flury via Python-list schrieb am 21.12.18 um 09:06:
> I thought I would look at a side by side comparison of CPython, nuitka and
> PyPy

Interesting choice. Why nuitka?


> *The functionality under test*
> 
> I have a library (called primelib) which implements a Sieve of Erathoneses
> in pure Python - it was orginally written as part of my project Euler attempts
> 
> Not only does it build a sieve to test primality, it also builds an
> iterable list of primes, and has functionality to calculate the prime
> factors (and exponents) and also calculate all divisors of a given integer
> (up to the size of the sieve).
> 
> To test the primelib there is a simple harness which :
> 
>  * Builds a sieve for integers from 2 to 104729 (104729 is the 10,000th
>    prime number)
>  * Using a pre-built list from primes.utm.edu -
>  o For every integer from 2 to 104729 the prime sieve and pre-built
>    list agree on the primality or non-primality
>  o confirm that the list of ALL primes identified by the sieve is
>    the same as the pre-built list.
>  o For every integer from 2 to 104729, get primelib to generate the
>    prime factors and exponents - and comfirm that they multiply up
>    to the expected integer
>  o For every integer from 2 to 104729 get primelib to generate the
>    divisors on the integer, and confirm that each divisor does
>    divide cleanly into the integer
> 
> The Sieve is rebuilt between each test, there is no caching of data between
> test cases, so the test harness forces a lot of recalculations.
> 
> I have yet to convert primelib to be Python 3 compatible.
> 
> Exactly the same test harness was run in all 3 cases :
> 
>  * Under CPython 2.7.15, the execution of the test harness took around
>    75 seconds to execute over 5 runs - fastest 73, slowest 78.
>  * Under Nuitka 0.6, the execution of the test harness after compiler
>    took around 85 seconds over 5 runes, fastest 84, slowest 86.
>  * Under PyPy, the execution of the test harness took 4.9 seconds on
>    average over 5 runs, fastest 4.79, slowest 5.2
> 
> I was very impressed at the execution time improvement under PyPy, and a
> little surprised about the lack of improvement under Nuitka.
> 
> I know Nuitka is a work in progress, but given that Nuitka compiles Python
> to C code I would have expected some level of gain, especially in a maths
> heavy implementation.

It compiles to C, yes, but that by itself doesn't mean that it makes it run
faster. Remember that CPython is also written in C, so why should a simple
static translation from Python code to C make it run faster than in CPython?

Cython [1], on the other hand, is an optimising Python-to-C compiler, which
aims to generate fast code and allow users to manually tune it. That's when
you start getting real speedups that are relevant for real-world code.


> This comparison is provided for information only, and is not intended as
> any form of formal benchmark. I don't claim that primelib is as efficient
> as it could be - although every effort was made to try to make it as fast
> as I could.

I understand that it came to life as an exercise, and you probably won't
make production use of it. Actually, I doubt that there is a shortage of
prime detection libraries. ;) Still, thanks for the writeup. It's helpful
to see comparisons of "code how people write it" under different runtimes
from time to time.

Stefan


[1] http://cython.org/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: cython3: Cannot start!

2018-12-24 Thread Stefan Behnel
Paulo da Silva schrieb am 22.12.18 um 19:26:
> Sorry if this is OT.
> 
> I decided to give cython a try and cannot run a very simple program!
> 
> 1. I am using kubuntu 18.04 and installe cython3 (not cython).
> 
> 2. My program tp.pyx:
> 
> # cython: language_level=3
> print("Test",2)
> 
> 3. setup.py
> from distutils.core import setup
> from Cython.Build import cythonize
> 
> setup(
> ext_modules = cythonize("tp.pyx")
> )
> 
> 4. Running it:
> python3 setup.py build_ext --inplace
> python3 -c 'import tp'
> 
> 5. output:
> ('Test', 2)
> 
> This is wrong for python3! It should output
> Test 2
> 
> It seems that it is parsing tp.pyx as a python2 script!
> 
> I tried to change the print to python2
> print "Test",2
> 
> and it recognizes the syntax and outputs
> Test 2
> 
> So, how can I tell cython to use python3?

Ubuntu 18.04 ships Cython 0.26, which has a funny bug that you hit above.
It switches the language-level too late, so that the first token (or word)
in the file is parsed with Py2 syntax. In your case, that's the print
statement, which is really parsed as (Py2) statement here, not as (Py3)
function. In normal cases, the language level does not matter for the first
statement in the source (because, why would you have a print() there?), so
it took us a while to find this bug.

pip-installing Cython will get you the latest release, where this bug is
resolved.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Calling an unbound method in C using the Public API

2018-08-29 Thread Stefan Behnel
Matthieu Dartiailh schrieb am 29.08.2018 um 16:33:
> I am one of the maintainer of the atom library 
> (https://github.com/nucleic/atom ). This 
> library provides low-memory footprint Python objects, descriptors and 
> containers enforcing type validation, implements the observer pattern. 
> 
> For list, we basically subclass the standard Python list in C++ and add the 
> required checks/notifications to some methods (in particular insert, pop and 
> sort). To call the method of standard list object on our custom object, the 
> original author chose to directly access the c-function inside the MethodDef 
> and call it (by first casting to the proper function pointer type).
> 
> Starting with Python 3.7, insert, pop and sort use the FASTCALL calling 
> convention, which means that the signature of the function stored inside the 
> MethodDef have changed. I have a working solution, but it involves to use a 
> lot of the CPython private C-API. In particular I needed to access:
> _PyCFunctionFast
> _PyCFunctionFastWithKeywords
> _PyStack_UnpackDict
> 
> I tried to look at the public C API for a way to call an unbound method with 
> a minimal cost (in term of speed and memory). It seems to me, but please 
> correct me if I am wrong, that one cannot call a MethodDef using only the 
> public API. To use the public C API, one has to use PyCFunction_Call (or a 
> variant) that expect a PyCFunctionObject which binds a the MethodDef to an 
> instance. In my case, to avoid creating a temporary PyCFunctionObject each 
> time I call list.insert on my custom subclass instance, I have to store that 
> PyCFunctionObject for each instance. But this means storing  7 
> PyCFunctionObject per instance (one for each method of list I need to wrap). 
> So I can either use the public API and increase the memory footprint or slow 
> down the code by creating PyCFunctionObject for each call
>  , or use large amount of the private API.

These functions are very unlikely to change within the Py3.7.x releases,
but they are hopefully going to change for 3.8. See PEP 580. Your feedback
on it is welcome.

https://www.python.org/dev/peps/pep-0580/

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: lxml namespace as an attribute

2018-08-17 Thread Stefan Behnel
Skip Montanaro schrieb am 15.08.2018 um 23:25:
> Much of XML makes no sense to me. Namespaces are one thing. If I'm
> parsing a document where namespaces are defined at the top level, then
> adding namespaces=root.nsmap works when calling the xpath method. I
> more-or-less get that.
> 
> What I don't understand is how I'm supposed to search for a tag when
> the namespace appears to be defined as an attribute of the tag itself.
> I have some SOAP XML I'm trying to parse. It looks roughly like this:
> 
> 
>   
>  ...
>   
>   
> http://some/new/path";>
> ...
> 
>   
> 
> If the document is "doc", I can find the body like so:
> 
> body = doc.xpath(".//Body" namespaces=doc.nsmap)
> 
> I don't understand how to find Tag, however. When I iterate over the
> body's children, printing them out, I see that Tag's name is actually:
> 
> {http://some/new/path}Tag
> 
> yet that namespace is unknown to me until I find Tag. It seems I'm
> stuck in a chicken-and-egg situation. Without knowing that
> http://some/new/path namespace, is there a way to cleanly find all
> instances of Tag?

In addition to what dieter said, let me mention that you do not need to
obey to XPath's dictate to use namespace prefixes. lxml provides two ways
of expressing searches with qualified tag names (i.e. "{namespace}tag" aka.
Clark Notation).

1) You can use the .find*() methods, which implement a subset of what XPath
can express (the same that the xml.etree.ElementTree library supports,
improvements welcome), but are simpler to use and faster than the XPath
engine. If you need only the first occurrence of a tag, you can say

doc.find(".//{http://some/namespace}Body";)

and there is also an .iterfind() method for incremental searches and
.findall() to return all matches as a list.

2) You can use the XPath subclass "ETXPath", which internally translates
qualified tag names to a prefix mapping for you and passes them on into the
normal XPath engine. So this gives you the expressiveness of XPath without
having to care about prefixes.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Tracking a memory leak in C extension - interpreting the output of PYTHONMALLOCSTATS

2018-07-28 Thread Stefan Behnel
Bartosz Golaszewski schrieb am 24.07.2018 um 13:05:
> Ok I've found the problem and it's my fault. From tp_dealloc's documentation:
> 
> ---
> The destructor function should free all references which the instance
> owns, free all memory buffers owned by the instance (using the freeing
> function corresponding to the allocation function used to allocate the
> buffer), and finally (as its last action) call the type’s tp_free
> function.
> ---
> 
> I'm not calling the tp_free function...

If you want to avoid the little traps of the C-API in the future, give
Cython a try. It can generate all the glue code safely for you, and
probably also generates faster wrapper code than you would write yourself.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Stefan Behnel
Marko Rauhamaa schrieb am 07.07.2018 um 15:41:
> Steven D'Aprano :
>> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:
>>> D.setdefault('c', None)
>>
>> Oh that's clever!
> 
> Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no
> such promise.

It's implemented in C and it's at least designed to avoid multiple lookups
and hash value calculations, which suggests that it's also thread-safe by
design (or by a side-effect of the design). Whether that's guaranteed, I
cannot say, but a change that makes it non-thread-safe would probably be
very controversial.


> At least __collectios_abc.py
> contains this method definition for MutableMapping:
> 
> def setdefault(self, key, default=None):
> 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
> try:
> return self[key]
> except KeyError:
> self[key] = default
> return default
> 
> There are more such non-thread-safe definitions.

That's a different beast, because Python code can always be interrupted by
thread switches (between each byte code execution). C code cannot, unless
it starts executing byte code (e.g. for calculating a key's hash value) or
explicitly allows a thread switch at a given point.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for a recent quote about dynamic typing, possibly on this list

2018-07-07 Thread Stefan Behnel
Ben Finney schrieb am 07.07.2018 um 03:38:
> Steven D'Aprano  writes:
> 
>> Somebody gave a quote about dynamic typing, along the lines of
>>
>> "Just because a language allows a lot of dynamic features, doesn't mean 
>> people's code uses a lot of dynamism."
> 
> You did refer us to http://lambda-the-ultimate.org/node/1519> on
> this forum. That contains a quote attributed to John Aycock with the
> meaning you paraphrase above.

And since the link in that article seems broken, here's the cited paper:

https://legacy.python.org/workshops/2000-01/proceedings/papers/aycock/aycock.html

Stefan


PS: the link could probably be fixed with a redirect from python.org...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: List replication operator

2018-05-25 Thread Stefan Behnel
Peter Otten schrieb am 25.05.2018 um 09:28:
> Steven D'Aprano wrote:
> 
>> But what do people think about proposing a new list replication with copy
>> operator?
>>
>> [[]]**5
>>
>> would return a new list consisting of five shallow copies of the inner
>> list.
> 
> Yet another arcanum to learn for beginners with little return.
> If you cannot refrain from tinkering with the language at least concentrate 
> on the features with broad application.
> Thank you.

I might have phrased this a little less ... short, but if it's really just
about avoiding a call to "copy.deepcopy()" in certain special cases at the
cost of adding new syntax, then I have to agree that we'd better avoid
adding the syntax instead.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: List replication operator

2018-05-24 Thread Stefan Behnel
Steven D'Aprano schrieb am 25.05.2018 um 04:25:
> On Thu, 24 May 2018 15:12:09 -0400, Ned Batchelder wrote:
> 
>> On 5/24/18 2:17 PM, Steven D'Aprano wrote:
> [...]
>>> But what do people think about proposing a new list replication with
>>> copy operator?
>>>
>>>  [[]]**5
>>>
>>> would return a new list consisting of five shallow copies of the inner
>>> list.
>>>
>> "shallow" will be the next problem.  Do we also need this?:
>>
>>      [[[]]]***5 # j/k
> 
> You might be right: on further thought, I think I want deep copies, not 
> shallow.

But how would that protocol work then? What would happen with a data
structure like this:

[( 1, [1, 2, 3] )] ** 3

? Would it also deep copy the tuple, or ignore it? What about other,
non-builtin sequence types? The '**' operator cannot just recursively call
'**' on the items in the list, because they may not support it. Or they may
support it, but not in the expected way.

And limiting this to lists of lists seems rather arbitrary. What about
subtypes of lists?

Calling "copy.deepcopy()" internally instead of a recursive '**' doesn't
seem safe either, because it also wouldn't know where to stop.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Getting Unicode decode error using lxml.iterparse

2018-05-23 Thread Stefan Behnel
digi...@gmail.com schrieb am 23.05.2018 um 00:56:
> I'm trying to read my iTunes library in Python using iterparse. My current 
> stub is:
> 
>  Snip 
> 
> import sys
> import datetime
> import xml.etree.ElementTree as ET
> import argparse
> import re
> 
> class Library:
> 
> unmarshallers = {
> # collections
> "array": lambda x: [v.text for v in x],
> "dict": lambda x:
> dict((x[i].text, x[i+1].text) for i in range(0, len(x), 2)),
> "key": lambda x: x.text or "",
> 
> # simple types
> "string": lambda x: x.text or "",
> "data": lambda x: base64.decodestring(x.text or ""),
> "date": lambda x: datetime.datetime(*map(int, re.findall("\d+", 
> x.text))),
> "true": lambda x: True,
> "false": lambda x: False,
> "real": lambda x: float(x.text),
> "integer": lambda x: int(x.text)
> }
> 
> def load(self, file):
> print('Starting...')
> parser = ET.iterparse(file)
> for action, elem in parser:
> unmarshal = self.unmarshallers.get(elem.tag)
> if unmarshal:
> data = unmarshal(elem)
> elem.clear()
> elem.text = data
> print(elem.text)
> elif elem.tag != "plist":
> raise IOError("unknown plist type: %r" % elem.tag)
> return parser.root[0].text
> 
> def __init__(self, infile):
> self.root = self.load(infile)
> 
> if __name__ == "__main__":
> parser = argparse.ArgumentParser(description = "Parse an iTunes library 
> file to a set of CSV files suitable for import to a database.")
> parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), 
> default=sys.stdin)
> args=parser.parse_args()
> print('Infile = ', args.infile)
> library = Library(args.infile)
> 
> 
> My input file (reduced to home in on the error) is:
> 
> 
>  snip -
> 
> 
> 
> 
>   
>   15078
>   
>   NamePart 2. The Death Of Enkidu. 
> Skon Přitele Mého Mne Zdeptal Težče
>   
>   
> 
> 
> 
>  snip 
> 
> 
> 
> 
> 
>   
>   15078
>   
>   NamePart 2. The Death Of Enkidu. 
> Skon Přitele Mého Mne Zdeptal Težče
>   
>   
> 
> 
> 
> 
> 
> I'm getting an error on one part of the XML:
> 
> 
>  File "C:\Users\digit\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> 
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 202: 
> character maps to 
> 
> 
> I suspect the issue is that it's using cp1252.py, which I don't think is 
> UTF-8 as specified in the XML prolog. Is this an iterparse problem, or am I 
> using it wrongly?

You are not showing enough of the stack trace to properly diagnose this
problem, but seeing your code, my educated guess is that the problem is the
encoding of the file *path name* and not the *content* of the XML file.
I.e. it probably cannot even open the file.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Getting Unicode decode error using lxml.iterparse

2018-05-23 Thread Stefan Behnel
dieter schrieb am 23.05.2018 um 08:25:
> If the encoding is not specified, "lxml" will try to determine it
> and finally defaults to "utf-8" (which seems to be the correct encoding
> for your case).

Being an XML parser, it does not do that. XML parsers are designed to
reject non-wellformed content, and that includes anything that cannot be
decoded.

In short, if no encoding is specified, then it's UTF-8, but if there is an
XML declaration that specifies that encoding, then it uses that encoding.

Here, the encoding is specifed as UTF-8, so that's what the parser uses.

Note, however, that the library that the OP uses is not lxml but xml.etree,
i.e. the ElementTree XML support in the standard library.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing a C extension - borrowed references

2018-03-21 Thread Stefan Behnel
Tom Evans via Python-list schrieb am 20.03.2018 um 18:03:
> On Tue, Mar 20, 2018 at 4:38 PM, Chris Angelico wrote:
>> BTW, have you looked into Cython? It's smart enough to take care of a
>> lot of this sort of thing for you.
> 
> I did a bit; this work is to replace our old python 2 SAML client,
> which used python-lasso and python-libxml2, both packages that are
> built as part of building the C library and thus an utter PITA to
> package for different versions of python (something our Infra team
> were extremely reluctant to do). When the latest (PITA to build)
> version of python-lasso started interfering with python-xmlsec
> (validating an invalid signature was causing segfaults), I got fed up
> of fighting it.

If rewriting is an option (just mentioning it ;) ), something like this
would probably be based on lxml these days. It also comes with a
C/Cython-level API, in case you need to a) do some kind of low-level tree
processing or b) integrate with some external library.


> I actually also maintain a C version of the same code, using the same
> libraries, so porting those few segments of code to Python/C seemed
> more expedient than rewriting in Cython. I'm not writing an API to
> these libraries, just a few functions.

Happy if that works for you, but let me still add a quick note that, from
my experience, rewriting C-API code in Cython tends to be a surprisingly
quick thing to do (it's basically just reverse engineering the Python code
from the C-API code), but it's a one-time investment that can reduce the
long-term maintenance cost a lot, often improves the performance, and
usually also leads to nicer Python APIs, as what you get in the end is more
obvious from the code, so you can focus on the handling more easily.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Peter Otten schrieb am 09.03.2018 um 14:11:
> Stefan Behnel wrote:
> 
>> Andrew Z schrieb am 07.03.2018 um 05:03:
>>> Hello,
>>>  with 3.6 and latest greatest lxml:
>>>
>>> from lxml import etree
>>>
>>> tree = etree.parse('Sample.xml')
>>> etree.register_namespace('','http://www.example.com')
>>
>> The default namespace prefix is spelled None (because there is no prefix
>> for it) and not the empty string.
> 
> Does that mean the OP shouldn't use register_namespace() at all or that he's 
> supposed to replace "" with None?

It meant neither of the two, but now that you ask, I would recommend the
first. ;)

An application global setup for the default namespace is never a good idea,
thus my question regarding the actual intention of the OP. Depending on the
context, the right thing to do might be be to either not care at all, or to
not use the default namespace but a normally prefixed one instead, or to
define a (default) namespace mapping for a newly created tree, as shown in
the namespace tutorial.

http://lxml.de/tutorial.html#namespaces

Usually, not caring about namespace prefixes is the best approach. Parsers,
serialisers and compressors can deal with them perfectly and safely, humans
should just ignore the clutter, pitfalls and complexity that they introduce.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Steven D'Aprano schrieb am 09.03.2018 um 12:41:
> On Fri, 09 Mar 2018 10:22:23 +0100, Stefan Behnel wrote:
> 
>> Andrew Z schrieb am 07.03.2018 um 05:03:
>>> Hello,
>>>  with 3.6 and latest greatest lxml:
>>>
>>> from lxml import etree
>>>
>>> tree = etree.parse('Sample.xml')
>>> etree.register_namespace('','http://www.example.com')
>>
>> The default namespace prefix is spelled None (because there is no prefix
>> for it) and not the empty string.
> 
> Is that documented somewhere?

http://lxml.de/tutorial.html#namespaces


> Is there a good reason not to support "" as the empty prefix?

Well, the "empty prefix" is not an "empty" prefix, it's *no* prefix. The
result is not ":tag" instead of "prefix:tag", the result is "tag".

But even ignoring that difference, why should the API support two ways of
spelling the same thing, and thus encourage users to write diverging code?

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Andrew Z schrieb am 07.03.2018 um 05:03:
> Hello,
>  with 3.6 and latest greatest lxml:
> 
> from lxml import etree
> 
> tree = etree.parse('Sample.xml')
> etree.register_namespace('','http://www.example.com')

The default namespace prefix is spelled None (because there is no prefix
for it) and not the empty string.


> causes:
> Traceback (most recent call last):
>   File "/home/az/Work/flask/tutorial_1/src/xml_oper.py", line 16, in
> 
> etree.register_namespace('','http://www.example.com')
>   File "src/lxml/etree.pyx", line 203, in lxml.etree.register_namespace
> (src/lxml/etree.c:11705)
>   File "src/lxml/apihelpers.pxi", line 1631, in lxml.etree._tagValidOrRaise
> (src/lxml/etree.c:35382)
> ValueError: Invalid tag name ''
> 
> partial Sample.xml:
> 
> 
> http://www.example.com";>
>  
>md_status_nonpro="true" type="INDIVIDUAL" prefix="jadoe">
> 
> 
> 
> it seems to not be happy with the empty tag .
> But i'm not sure why and how to go about it.

Could you explain why you want to do that?

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Stefan Behnel
Steven D'Aprano schrieb am 22.02.2018 um 11:59:
> https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en

Thanks for sharing, Steven.

While it was already suggested between the lines in some of the replies,
I'd like to emphasise that the combination of timeit and result caching
(memoizing) in this post is misleading and not meaningful. It actually
shows a common mistake that easily happens when benchmarking code.

Since timeit repeats the benchmark runs and takes the minimum, it will
*always* return the time it takes to look up the final result in the cache,
and never report the actual performance of the code that is meant to be
benchmarked here. From what I read, this was probably not intended by the
author of the post.

Myself, I'm guilty as charged to run into this more than once and have seen
it in many occasions. I've even seen proper Cython benchmark code that a C
compiler can fully analyse as static and replaces by a constant, and then
get wonder speedups from it that would never translate to any real-world
gains. These things happen, but they are mistakes. All they tell us is that
we must always be careful when evaluating benchmarks and their results, and
to take good care that the benchmarks match the intended need, which is to
help us understand and solve a real-world performance problem [1].

Stefan


[1] That's also the problem in the post above and in the original
benchmarks it refers to: there is no real-world problem to be solved here.
Someone wrote slow code and then showed that one language can evaluate that
slow code faster than another. Nothing to see here, keep walking...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help on convert PyObject to string (c) Python 3.6

2018-02-19 Thread Stefan Behnel
Jason Qian via Python-list schrieb am 04.02.2018 um 17:52:
>This is the case of calling python from c and the python function  will
> return a string.

Hi Jason,

I noticed that you ran into a couple of problems using the C-API, so I
suggest looking into Cython instead. It basically translates Python to C
and supports optional static usage of C and C++ data types, so you get all
the native code interaction *and* all the Python interaction and features,
without having to go through the hassle of learning how the Python
internals work (and why they don't work for you). The code it generates is
probably also faster and safer than what you are currently writing (no
offence, just experience from reading and writing a lot of such code).

http://cython.org/

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: c code generator from python

2018-02-19 Thread Stefan Behnel
bhattacharya.kush...@gmail.com schrieb am 17.01.2018 um 12:03:
> Is there any python framework or any tool as  which can generate C code from 
> python code as it is .

http://cython.org/

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Return str to a callback raise a segfault if used in string formating

2017-10-14 Thread Stefan Behnel
Vincent Vande Vyvre schrieb am 13.10.2017 um 13:18:
> Le 13/10/17 à 12:39, Paul Moore a écrit :
>> As a specific suggestion, I assume the name of the created file is a
>> string object constructed in the C extension code, somehow. The fact
>> that you're getting the segfault with some uses of that string
>> (specifically, passing it to %-formatting) suggests that there's a bug
>> in the C code that constructs that string. That's where I'd start by
>> looking.

Absolutely.


> That was my first idea, because I can verify the instance of PyUnraw is not
> destroyed when I use the file name, but I was in trouble by the usage of
> the file name in string formatting.
> 
> For example I can use the file name into the slot i.e. shutil.copy(fname,
> "path/renamed.tiff")
> The file is correctly copied.
> 
> In fact, I can do anything with the file name except use it in string
> formatting, then your approach is probably a good way.
> 
> Into the CPython part I have a c-string pythonized by:
>     temp = self->outfname;
>     self->outfname = PyUnicode_FromString(ofname);
>     Py_XDECREF(temp);
> 
> and exposed to Python with:
> static PyMemberDef PyUnraw_members[] = {
>     {"out_file", T_OBJECT_EX, offsetof(PyUnraw, outfname), 0,
>  "Path of the decoded file"},

One more suggestion, in case this is actually your own C code: it's much
easier to write extension modules in Cython than in plain C and C-API code.
Much easier. There is a learning curve, sure, but it unburdens you from so
many pitfalls and boilerplate code that it's always worth switching. And
you also get faster code as a side-effect. How's that for a tradeoff. :)

Stefan
(Cython core developer)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parentheses (as after "print")

2017-09-26 Thread Stefan Behnel
Stefan Ram schrieb am 26.09.2017 um 17:56:
>   Why do we newbies write »print 2«? Here's another hint.
>   This is an original transcript of what happened to me today:
> 
> |>>> import( operator )
> |  File "", line 1
> |import( operator )
> |  ^
> |SyntaxError: invalid syntax
> |
> |>>> import operator
> |
> |>>> help operator
> |  File "", line 1
> |help operator
> |^
> |SyntaxError: invalid syntax
> |
> |>>> help( operator )
> |Help on module operator:
> 
>   What happened? I woke up today in parens mood. So I typed:
> 
> import( operator )
> 
>   Python told me that I should type:
> 
> import operator
> 
>   . Fine, Python conditioned me to omit the parens. 
>   So now I was in noparens mood. So I typed:
> 
> help operator
> 
>   . Oops!

But would you also write this?

for(i in [1,2,3]): ...

def(func(a,b,c)):
return(a+b+c)


>   "Don't make me think!"

In language design, some things are worth being keywords, while others are
not. Having less keywords is generally a good thing, but for some
functionality, the parser/compiler needs to be able to safely detect and
process it, so some things really need to be keywords.

An import is an assignment, for example. It stores a reference to the
module (or its attributes) in variables. A function call cannot do an
assignment. The same applies to function definitions and loops. They all do
assignments, or even change the program execution flow.

print() and help() are definitely not worth being keywords. They do not
impact the program flow, they don't do any assignments, nothing. That's why
they are simple functions.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does CPython build it's NEWS or changelog?

2017-09-21 Thread Stefan Behnel
Hartmut Goebel schrieb am 21.09.2017 um 10:59:
> I just discovered that CPython now uses Misc/NEWS.d/next to collect
> changes an there are a lot of Misc/NEWS/*.rst files for the respective
> released version. I'm investigating whether to adopt this for PyInstaller.
> 
> What is the tooling for this? Is there some documentation, maybe a
> mailingslist-diskussion or a but-report?

https://docs.python.org/devguide/committing.html#what-s-new-and-news-entries

https://github.com/larryhastings/blurb

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Even Older Man Yells At Whippersnappers

2017-09-19 Thread Stefan Behnel
Stefan Ram schrieb am 19.09.2017 um 17:00:
> D'Arcy Cain  writes:
>> of course, I use calculators and computers but I still understand the 
>> theory behind what I am doing.
> 
>   I started out programming in BASIC. Today, I use Python,
>   the BASIC of the 21st century. Python has no GOTO, but when
>   it is executed, its for loop eventually is implemented using
>   a GOTO-like jump instruction. Thanks to my learning of BASIC,
>   /I/ can have this insight. Younger people, who never learned
>   GOTO, may still be able to use Python, but they will not 
>   understand what is going on behind the curtains. Therefore, for
>   a profound understanding of Python, everyone should learn BASIC
>   first, just like I did!

http://entrian.com/goto/

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stdlib, what's in, what's out

2017-09-19 Thread Stefan Behnel
John Ladasky schrieb am 19.09.2017 um 08:54:
> I have come to understand from your other posts that adding something to
> the stdlib imposes significant constraints on the release schedules of
> those modules.  I can appreciate the hassle that might cause.  Still,
> now I wonder what I might be missing.

There are many packages on PyPI that reimplement functionality of the
stdlib in some "better" way, by their own definition of "better". Some are
faster, some are more feature-rich, some have a better API, some focus on
making specific special cases faster/easier/whatever.

The stdlib is there to provide a base level of functionality. That base
level tends to be much higher up than in most other programming languages,
but from the point of view of Python, it's still just a base level, however
comfortable it might be.

If you need specific features, more speed, can't live with a certain API or
feel that you are wasting too much developer time by doing something the
way you always did it, search PyPI for something "better" by your own
definition at a given time.

If you can live with what the stdlib provides, stick to it. Keeping foreign
dependencies low is also "better" in some cases.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive string equality

2017-09-05 Thread Stefan Behnel
Steve D'Aprano schrieb am 02.09.2017 um 02:31:
> - the German eszett, ß, which has two official[1] uppercase forms: 'SS'
> and an uppercase eszett

I wonder if there is an equivalent to Godwin's Law with respect to
character case related discussions and the German ß.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3 removes name binding from outer scope

2017-07-25 Thread Stefan Behnel
Ben Finney schrieb am 25.07.2017 um 08:34:
> Ethan Furman writes:
> 
>> Something like:
>>
>> try:
>> 
>> except ZeroDivisionError as dead_exc:
>> exc = dead_exc
>> 
>> 
>> print(text_template.format(exc=exc)
> 
> That strikes me as busy-work; the name in the ‘except’ clause already
> *has* the object, and is a servicable name already.
> 
> Having to make another name for the same object, merely to avoid some
> surprising behaviour, is IMO un-Pythonic.

It's an extremely rare use case and keeping the exception alive after
handling has clear drawbacks in terms of resource usage (exception
information, tracebacks, frames, local variables, chained exceptions, ...)

This tradeoff was the reason why this was changed in Py3k at the time,
together with the introduction of exception chaining (and some other
cleanups in that corner).

Basically, it's better to save resources by default and let users
explicitly keep them alive if they still need them, than to implicitly hold
on to them in a deep corner of CPython (sys.exc_info()) and let users
figure out how to release them explicitly if they find out that they hurt
and then additionally manage to debug where they are stored. Py2.x did the
latter, and guess how many users knew about it?

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy

2017-06-10 Thread Stefan Behnel
Serhiy Storchaka schrieb am 11.06.2017 um 07:11:
> 10.06.17 15:54, Steve D'Aprano пише:
>> Larry Hastings is working on removing the GIL from CPython:
>>
>> https://lwn.net/Articles/723949/
>>
>> For those who don't know the background:
>>
>> - The GIL (Global Interpreter Lock) is used to ensure that only one piece of
>> code can update references to an object at a time.
>>
>> - The downside of the GIL is that CPython cannot take advantage of
>> multiple CPU
>> cores effectively. Hence multi-threaded code is not as fast as it could be.
>>
>> - Past attempts to remove the GIL caused unacceptable slow-downs for
>> single-threaded programs and code run on single-core CPUs.
>>
>> - And also failed to show the expected performance gains for multi-threaded
>> programs on multi-core CPUs. (There was some gain, but not much.)
>>
>>
>> Thanks Larry for your experiments on this!
> 
> And also GIL is used for guaranteeing atomicity of many operations and
> consistencity of internal structures without using additional locks. Many
> parts of the core and the stdlib would just not work correctly in
> multithread environment without GIL.

And the same applies to external extension modules. The GIL is really handy
when it comes to reasoning about safety and correctness of algorithms under
the threat of thread concurrency. Especially in native code, where the
result of an unanticipated race condition is usually a crash rather than an
exception.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Events in 2017, Need your help.

2017-02-10 Thread Stefan Behnel
Hi!

Germany has three major Python events planned this year:

- PyCon-Web in München (May 27-28th)
- Python-Camp in Köln (April 8-9th)
- PyCon-DE in Karlsruhe (October, dates TBA).

http://pyconweb.org/

https://python-verband.org/informieren/events/pythoncamp-2017

Stefan


Stephane Wirtel via PSF-Community schrieb am 09.01.2017 um 10:54:
> Dear Community,
> 
> For the PythonFOSDEM [1] on 4th and 5th February in Belgium, I would like
> to present some slides with the Python events around the World.  Based on
> https://python.org/events, I have noted that there are missing events, for
> example:
> 
> * PyCon Otto: Italy
> * PyCon UK: United Kingdom
> * PyCon CA: Canada
> * PyCon Ireland: Ireland
> * PyCon France: France
> 
> Some of these events are not yet announced and I understand they are in the
> second semester, and thus, they don't know the location and the dates,
> excepted for PyCon Otto (April).
> 
> In fact, I have noted that we know some big events in the Python community
> (for example: PyCon US and EuroPython) but do you know the others events,
> maybe the local event, PyCon IE, PyCon UK or PyCon IT.
> 
> I like to know where there is a PyCon or a Django Conf or a PyData Event.
> 
> In fact, I think we can help the Python Community if we submit all the
> events in https://python.org/events.
> 
> This page has been created by the PSF and is maintained by some volunteers.
> 
> I know this list of events:
> * PyCon Cameroon : 20-23 Jav, Cameroon
> * PythonFOSDEM : 4-5 Feb, Belgium
> * PyCon Colombia : 10-12 Feb, Colombia
> * PyCon Pune : 16-20 Feb, India
> * Swiss Python Summit : 17-18 Feb, Switzerland
> * IrPyCon : 17-18 Feb, Iran
> * PyCon SK : 10-13 Mar, Slovakia
> * Django Europe : 3-8 Apr, Italy
> * PyCon Otto : 6-9 Apr, Italy
> * Python Sudeste : 5-7 Mai, Brazil
> * GeoPython : 8-11 May, Switzerland
> * PyCon US : 17-26 May, USA
> * EuroPython : July, Italy
> * PyCon AU : 3-9 Aug, Australia
> * PyCon UK : September, United Kingdom
> * PyCon CA : November, Canada
> * PyCon Ireland : October, Ireland
> * PyCon FR : October/November, France
> 
> And you ?
> Please, could you check on https://www.python.org/events/ , if you are an
> organizer, please add your event.
> 
> If you think there is a missing event, please, send me the info via
> [email](mailto:steph...@wirtel.be) or via my [twitter
> account](https://twitter.com/matrixise) and I will add it on my slides.
> 
> I would like to present your event.
> 
> Thank you so much for your help.
> 
> Stephane Wirtel
> 
> [1] https://www.python-fosdem.org
> 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: windows utf8 & lxml

2016-12-26 Thread Stefan Behnel
Hi!

Sayth Renshaw schrieb am 20.12.2016 um 12:53:
> I have been trying to get a script to work on windows that works on mint. The 
> key blocker has been utf8 errors, most of which I have solved.
> 
> Now however the last error I am trying to overcome, the solution appears to 
> be to use the .decode('windows-1252') to correct an ascii error.
> 
> I am using lxml to read my content and decode is not supported are there any 
> known ways to read with lxml and fix unicode faults?
> 
> The key part of my script is 
> 
> for content in roots:
> utf8_parser = etree.XMLParser(encoding='utf-8')
> fix_ascii = utf8_parser.decode('windows-1252')

This looks rather broken. Are you sure this is what your code looks like,
or did just you type this into your email while trying to strip down your
actual code into a simpler example?


> mytree = etree.fromstring(
> content.read().encode('utf-8'), parser=fix_ascii)

Note that lxml can parse from Unicode, so once you have decoded your data,
you can just pass it into the parser as is, e.g.

mytree = etree.fromstring(content.decode('windows-1252'))

This is not something I'd encourage since it requires a bit of back and
forth encoding internally and is rather memory inefficient, but if your
decoding is non-trivial, this might still be a viable approach.


> Without the added .decode my code looks like
> 
> for content in roots:
> utf8_parser = etree.XMLParser(encoding='utf-8')
> mytree = etree.fromstring(
> content.read().encode('utf-8'), parser=utf8_parser)
> 
> However doing it in such a fashion returns this error:
> 
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: 
> invalid start byte

Same thing as above: I don't see how this error message matches the code
you show here. The exception you get might be a Python 2.x problem in the
first place.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cython taking more time than regular Python

2016-09-19 Thread Stefan Behnel
Peter Otten schrieb am 19.09.2016 um 14:55:
> In [7]: %%cython
> def omega(int n):
> cdef long i
> cdef long result = 0
> for i in range(n): result += i
> return result
>...: 
> 
> In [8]: %timeit omega(10)
> 1 loops, best of 3: 91.6 µs per loop

Note that this is the worst benchmark ever. Any non-dump C compiler will
happily apply Young Gauß and calculate the result in constant time.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C Python extension to export an Function

2016-09-01 Thread Stefan Behnel
Ganesh Pal schrieb am 01.09.2016 um 17:24:
> Thanks stefan and  Gollwitzer  , good to know there are many ways to do this
> i.e via cython or SWIG   but  the C/Python API
>  is probably the most widely used method

It certainly was, years ago, but I honestly doubt that it still is. I don't
think there is still that much manually written C-API code out there that
is actively maintained. Often enough, it's easier to rewrite the code in
Cython at some point, than to keep maintaining it in C over years.

Manually written C-API code is simply too difficult to maintain, and also
too difficult to get right in the first place. There are just too many ways
to introduce reference leaks, crashes and long standing unnoticed bugs
(believe me, I know what I'm talking about). Even experienced CPython core
devs still make mistakes here from time to time.


> - not for it’s simplicity but for the fact that you can manipulate python
> objects in your C code.

Neither SWIG nor Cython prevent you from doing that, although you'd usually
leave these things to Cython since it already generates faster code for
many operations than you would (or could) write by hand.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C Python extension to export an Function

2016-09-01 Thread Stefan Behnel
Ganesh Pal schrieb am 01.09.2016 um 14:30:
> On Thu, Sep 1, 2016 at 12:32 PM, dieter wrote:
>> Ganesh Pal writes:
>>> Iam pretty new to C Python extension , I was able to export few simple
>>> modules to python and it look like the cool thing to do ...
>>
>> Maybe, it is a good idea to have a look at "cython".
>>
>> "cython" is a compiler. It translates Python code enhanced with
>> special annotations into C. The annotations mostly tell the compiler
>> that something ("object", "method", "function", ...) should be at "C"
>> rather than "Python" level, thus avoiding much of Python's overhead
>> and allows to do things possible in "C" but not in "Python".
>>
>> Developing safe "C" extensions for Python is difficult. You
>> need some quite deep understanding of the Python-C interface
>> and must be very careful to observe all requirements (especially
>> those related to proper reference management).
>>
>> Developing "C" extensions with "cython" is much easier as
>> "cython" hides many of the complexities and takes care of most
>> requirements.
> 
> Really appreciate the reply and your suggestion on  trying to use "cython"
> ,  but my whole idea of using  "C" extension is to regular C codes .  We
> have bunch of C code that's already available and   C -Python seems to suit
> me better

>From your response it's not obvious whether you are aware that Cython also
makes it substantially easier to *interface* CPython with external C code,
in the same way that it makes it easy (but not necessary) to *avoid*
writing C in the first place. So I thought I'd just mention that this is
not a reason to rule it out as an excellent option.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Helloworld with Python C extension

2016-08-29 Thread Stefan Behnel
Ganesh Pal schrieb am 29.08.2016 um 19:30:
> I need you input on the below  hello world program.  I a m trying to add a
> python binding which will return the character for the given index .  I am
> on Python 2.7   and linux
> 
> Example :
> >>> string ='helloworld'
> >>> dda_hello(5)
> >>> 'w'
> 
>  /*
> + * Hello world example for python bindings
> + */
> +
> +static char* string = "helloworld";
> +char dda_hello(int i)
> + {
> +  return string[i];
> + }
> +
> +static PyObject *
> +py_dda_hello(PyObject *self, PyObject *args )
> +{
> +   int index;
> +   char char1;
> +   if (!PyArg_ParseTuple(args, "i", &index))
> +   return NULL;
> +   char1 = dda_hello(index);
> +   return Py_BuildValue("s",char1);
> +}
> +
> +/*
> 
> @@ -1674,6 +1705,10 @@ PyMethodDef xyz_methods[] = {
> +{"dda_hello", py_dda_hello, METH_VARARGS,
> +"Returns the character entered for a given index"},

Here's a Cython implementation (http://cython.org) of your example:

cdef str string = "helloworld"

def dda_hello(int i):
return string[i]

It uses a lot less code than the C-implemented version, but is compatible
with Python 2 and Python 3 and avoids pitfalls like the crash you are
seeing, as well as raising a proper IndexError for invalid index arguments
(and it supports negative indexing). I also wouldn't be surprised if it's
visibly faster than your C implementation.

Unless your intention is to explicitly learn how to use the CPython C-API,
you should give Cython a try instead.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP suggestion: Uniform way to indicate Python language version

2016-08-21 Thread Stefan Behnel
Stefan Behnel schrieb am 22.08.2016 um 08:03:
> Steven D'Aprano schrieb am 22.08.2016 um 07:35:
>> if sys.version < '3':
>> import mymodule2 as mymodule
>> else:
>> import mymodule3 as mymodule
> 
> This condition is going to fail when Python 30.0 comes out.

Oh, sorry - make that Python 10.0, that's way closer! See? I got it wrong
because I failed to understand your hugely obfuscated code! ;)

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP suggestion: Uniform way to indicate Python language version

2016-08-21 Thread Stefan Behnel
Steven D'Aprano schrieb am 22.08.2016 um 07:35:
> if sys.version < '3':
> import mymodule2 as mymodule
> else:
> import mymodule3 as mymodule

This condition is going to fail when Python 30.0 comes out.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: usage of functools.partial in in parallelism

2016-07-31 Thread Stefan Behnel
Sivan Greenberg schrieb am 30.07.2016 um 23:15:
>  I'm wondering about the use of partial in writing parallel code. Is is it
> quicker than re-evaluating arguments for a multiple session.get()'s method
> with different , for example (of requests) ?
> 
>  Or maybe it is used to make sure the arguments differ per each invocation
> ? (the parallel invocation is supposedly using tasks / co-routine support
> in Python 3.
> 
>  I can't publish the code I spotted that in.
> 
>  What are ups and downs of using them when are they in context?

I'm having difficulties in understanding what exactly you are asking here
and what you are comparing it with, but partial() is just a way of saying
"I want a single thing that calls *this* function with at least (or
exactly) *these* arguments whenever I call it". It's basically binding a
function and some arguments together into a nice package that the eventual
caller doesn't have to know any special details about.

There is more than one way to do that, but partial is a quick and straight
forward one that is commonly and widely used. Also for entry points when
running code in parallel or concurrently, but there's really nothing
special about that use case.

Does that answer your question?

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Quick poll: gmean or geometric_mean

2016-07-09 Thread Stefan Behnel
Ethan Furman schrieb am 09.07.2016 um 08:27:
> On 07/08/2016 10:49 PM, Random832 wrote:
>> On Sat, Jul 9, 2016, at 01:26, Steven D'Aprano wrote:
> 
>>> hmean and gmean
>>>
>>> harmonic_mean and geometric_mean
>>
>> The latter, definitely.
> 
> My preference is also for the latter.  However, if the rest of the module
> is filled with abbreviated names you may as well be consistent with them.

+1 for consistency, but I'm just fine with the short names. It's in the
statistics module after all, so the context is very narrow and clear and
people who don't know which to use or what the one does that they find in a
given piece of code will have to read the docs and maybe fresh up their
rusty math memory anyway. Longer names don't help much with that.

If further clarity is needed in a given code context that uses a direct
name import, renaming the function at the same time is easy enough. I often
do that with "os.path.join", for example, which turns into "join_path" on
import. Same problem, easy solution.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Which one is the best XML-parser?

2016-07-02 Thread Stefan Behnel
Random832 schrieb am 24.06.2016 um 15:09:
> On Fri, Jun 24, 2016, at 02:39, dieter wrote:
>> You want an incremental parser if the XML documents are so huge that
>> you must process them incrementally rather than have a data structure
>> representing the whole document (in memory). Incremental parsers
>> for XML are usually called "SAX" parsers.
> 
> You know what would be really nice? A "semi-incremental" parser that can
> e.g. yield (whether through an event or through the iterator protocol) a
> fully formed element (preferably one that can be queried with xpath) at
> a time for each record of a document representing a list of objects.
> Does anything like that exist?

http://lxml.de/parsing.html#incremental-event-parsing

https://docs.python.org/3/library/xml.etree.elementtree.html#pull-api-for-non-blocking-parsing

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How are you supposed to define subclasses in C?

2016-04-21 Thread Stefan Behnel
Random832 schrieb am 21.04.2016 um 18:35:
> I was trying to write a proof of concept on including descriptors (e.g.
> a "sys.recursionlimit" instead of set/get methods) in the sys module,
> and couldn't figure out how to "properly" define a type using
> PyType_FromSpecWithBases. Everything I tried just segfaulted. I ended up
> just calling PyObject_CallFunctionObjArgs((PyObject *)&PyType_Type, ...)
> but I assume there's a better way to do it. I couldn't find any examples
> or tutorial.

I suppose you might find Cython useful:

http://cython.org/

http://docs.cython.org/

In a nutshell, it lets you write beautiful Python code instead of ugly C
code full of leaks and crashes, and then takes care of making it faster
than the usual C-API calls you'd write by hand. It has some extended syntax
for extension types and other C-ish things to make their usage explicit.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I'd like to add -march=native to my pip builds

2016-04-08 Thread Stefan Behnel
Neal Becker schrieb am 08.04.2016 um 15:27:
> I'd like to add -march=native to my pip builds.  How can I do this?

First of all, make sure you don't install binary packages and wheels.
Changing the C compiler flags will require source builds.

Then, it should be enough to set the CFLAGS environment variable, e.g.

  CFLAGS="-O3 -march=native"  pip install  --no-use-wheel  numpy

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing SOME class methods in C

2015-11-29 Thread Stefan Behnel
Oscar Benjamin schrieb am 18.11.2015 um 13:52:
> On 18 November 2015 at 07:50, Daniel Haude wrote:
>>
>> I'm trying to implement some (but not all) methods of a Python class in C.
>> What I've found on the Net is:
>>  - how to implement entire modules in C so that I can import that module and
>>use the C functions (successfully done it, too).
>>  - how to implement entire classes in C
> 
> I would suggest to use Cython here. You can write your class in Python
> (that will be compiled to C) and then call out to any C code from any
> of its methods.

Or, in fact, do the reverse: Implement the base class in Cython and inherit
from it in a Python class that extends it. That would give you a fast,
native extension type at the base and leaves you with all the freedom to
extend it in Python code or even natively in other Cython code.

I strongly recommend not to resort to writing real C code here (using the
C-API of CPython). It will be slower and will contain more bugs.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Create a .lua fle from Python

2015-10-02 Thread Stefan Behnel
jmp schrieb am 02.10.2015 um 11:03:
> Safety is like speed optimization, you care about it only when it can be a
> problem. And the vast majority (there's a recent trolling thread about the
> equivalent percentage of vast majority if you want to have fun) of python
> code may run on trusted networks. Meaning it's probable you are wrong when
> assuming security of a python snippet is a concern.

Writing code "for internal use only" is ok, but there is never a guarantee
that some of that code won't be reused elsewhere, in an entirely different
context. Or that someone comes up with the idea of adding a REST API
frontend, now that there is a command line interface [1]. If that happens,
I assure you that at least in some cases (be it the "vast majority" or not)
there will be no thorough security audit up-front. Because, you know - it's
code that works and is production proven already. Possibly for years and
years, and through generations of employees, all experienced and trusted.
What can possibly be wrong with such code?

So, it's acceptable to write such code under certain conditions, but at
least someone should leave a visible comment somewhere (as Peter rightfully
did in this case) that the input is not safely validated, so that future
generations of programmers can see immediately that a) security hasn't been
a concern when writing it and b) the author was in fact not a complete
moron, not knowing a bit about the basics of input validation.

It really helps in trust building to find such comments from time to time.

Stefan



[1] mainframes on the Internet, anyone?


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: The Nikola project is deprecating Python 2.7 (+2.x/3.x user survey results)

2015-10-01 Thread Stefan Behnel
Chris Warrick schrieb am 01.10.2015 um 18:26:
> The Nikola developers decided to deprecate Python 2.7 support.

I wonder why it took the Nikola project so long to take that decision.
Python 3.3 came out almost exactly three(!) years ago and seems to have all
major features that they would require. Nikola's PyPI page claims support
of Python 3.3 for just about as long, since version 5.4 or so, which means
that all of their dependencies were already available back then.

It's a different thing for *libraries* that Python 2.x users still depend
on, but for an *application* that has all its (necessary) dependencies
available in Python 3.x, I can't see a general reason to keep supporting
both language versions.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: XML Binding

2015-09-09 Thread Stefan Behnel
dieter schrieb am 09.09.2015 um 10:20:
> Palpandi writes:
>> Is it better to use pyxb than lxml?
>>
>> What are the advantages of lxml and pyxb?
> 
> "pyxb" has a different aim than "lxml".
> 
> "lxml" is a general purpose library to process XML documents.
> It gives you an interface to the document's resources (elements,
> attributes, comments, processing instructions) on a low level
> independ from the document type.

lxml's toolbox is actually larger than that. There's also lxml.objectify
which provides a Python object interface to the XML tree, similar to what
data binding would give you. And you can stick your own Element object
implementations into it if you feel a need to simplify the API itself
and/or adapt it to a given document format.

http://lxml.de/objectify.html

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Logging to a file from a C-extension

2015-08-26 Thread Stefan Behnel
Hi,

welcome to Python (and to this list). The usual way to reply here is
inline, after stripping anything that's not relevant to your reply.


AllanPfalzgraf schrieb am 25.08.2015 um 15:03:
> From: Stefan Behnel:
>> Al Pfalzgraf schrieb am 18.08.2015 um 15:07:
>>> If a logging file is opened at the level of a Python application,
>>> how would the log file name be communicated to a C-extension so that
>>>  logging from the extension would be sent to the same log file?
>> 
>> Writing to the file directly (as was suggested) may not be a good idea
>> as it would bypass the log filtering and formatting. Instead, I'd
>> suggest sending output to a normal Python Logger object instead.
>> 
>> This is obviously trivial in Cython (where you can just implement it
>> in Python code), but you can do the same in C with just the usual
>> C-API overhead.
> 
> You have understood my question.  I'm new to Python.  Could I use a
> Cython solution to get suggestions on just how to go about this in the C
> extension?  Otherwise could you suggest which C-API functions I should
> be looking at?

Well, my suggestion would be to write the extension in Cython instead of C,
simply because it allows you to care about what you want to achieve instead
of having to concentrate on C-API details like this which try to get in
your way and are difficult to master.

However, if you really want to use the CPython C-API directly, you have two
choices: write the logging setup in Python and execute it from a C string
using PyRun_SimpleString(), or reimplement what Python would do in C using
PyImport_AddModule, PyObject_GetAttr and PyObject_Call*(). The latter is
also what Cython does, except that it generates faster code.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to handle cpu cache in python ( or fastest way to call a function once)

2015-08-23 Thread Stefan Behnel
Yuzhi Xu schrieb am 23.08.2015 um 08:10:
> I find out that python's VM seems to be very unfriendly with CPU-Cache.
> see:
> http://stackoverflow.com/questions/32163585/how-to-handle-cpu-cache-in-python-or-fastest-way-to-call-a-function-once
> http://stackoverflow.com/questions/32153178/python-functionor-a-code-block-runs-much-slower-with-a-time-interval-in-a-loop
> 
> for example:
> ***
> import time
> a = range(500)
> 
> sum(a)
> 
> for i in range(100): #just to create a time interval, seems this disturb 
> cpu cache?
> pass
> 
> 
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
> 
> *
> time:> 100us
> 
> 
> another case:
> *
> import time
> a = range(500)
> 
> for i in range(10):
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
> 
> *
> time:~ 20us
> 
> 
> we can see when running frequently, the code becomes much faster.

That does not seem like a straight forward deduction. Especially the
interpretation that the CPU caching behaviour is to blame here seems rather
far fetched.

My guess is that it rather has to do with CPython's internal object caching
or something at that level. However, given the absolute timings above, I
wouldn't bother too much finding it out. It's unlikely to hurt real-world
code. (And in fact, the more interesting case where things are happing
several times in a row rather than being a negligible constant one-time
effort seems to be substantially faster in your timings. Congratulations!)


> is there a solution?

Is there a problem?

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Logging to a file from a C-extension

2015-08-19 Thread Stefan Behnel
Al Pfalzgraf schrieb am 18.08.2015 um 15:07:
> If a logging file is opened at the level of a Python application, how
> would the log file name be communicated to a C-extension so that logging
> from the extension would be sent to the same log file?

Writing to the file directly (as was suggested) may not be a good idea as
it would bypass the log filtering and formatting. Instead, I'd suggest
sending output to a normal Python Logger object instead.

This is obviously trivial in Cython (where you can just implement it in
Python code), but you can do the same in C with just the usual C-API overhead.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Module load times

2015-08-13 Thread Stefan Behnel
Joseph L. Casale schrieb am 13.08.2015 um 18:56:
> I have an auto generated module that provides functions exported from a
> c dll. Its rather large and we are considering some dynamic code generation
> and caching, however before I embark on that I want to test import times.
> 
> As the module is all auto generated through XSL, things like __all__ are not
> used,  a consumer only imports one class which has methods for their use.
> 
> It is the internal supporting classes which are large such as the ctype 
> function
> prototypes and structures.

How is the DLL binding implemented? Using "ctypes"? Or something else?

Obviously, instantiating a large ctypes wrapper will take some time. A
binary module would certainly be quicker here, both in terms of import time
and execution time. Since you're generating the code anyway, generating
Cython code instead shouldn't be difficult but would certainly yield faster
code.


> My concern is simply reloading this in Python 3.3+ in a timeit loop is not
> accurate. What is the best way to do this?

What makes you think the import might be a problem? That's a one-time
thing. Or is your application a command-line tool or so that needs to start
and terminate quickly?

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New module (written in C) for using the high-precision QD library

2015-07-31 Thread Stefan Behnel
Chris Angelico schrieb am 31.07.2015 um 09:37:
> On Fri, Jul 31, 2015 at 5:26 PM, Stefan Behnel wrote:
>> Your C code seems to be only about 1500 lines, not too late to translate
>> it. That should save you a couple of hundred lines and at the same time
>> make it work with Python 3 (which it currently doesn't, from what I see).
> 
> To what extent does Cython make this easier? The biggest barrier I
> would expect to see is the bytes/text distinction

Yes, that tends to be a barrier. Cython is mostly just Python, so you can write

if isinstance(s, unicode):
s = ( s).encode('utf8')

and be happy with it ("" is a cast in Cython). Such simple code looks
uglier when spelled out using the C-API and wouldn't be any CPU cycle faster.

But there's also the PyInt/PyLong unification, which can easily get in the
way for a number processing library. In Cython, you can write

if isinstance(x, (int, long)):
try:
c_long =  x
except OverflowError:
...  # do slow conversion of large integer here
else:
...  # do fast conversion from c_long here

or something like that and it'll work in Py2.6 through Py3.5 because Cython
does the necessary adaptations internally for you. This code snippet
already has a substantially faster fast-path than what the OP's code does
and it will still be much easier to tune later, in case you notice that the
slow path is too slow after all.

And then there are various helpful little features in the language like,
say, C arrays assigning by value, or freelists for extension types using a
decorator. The OP's code would clearly benefit from those, if only for
readability.

Python is much easier to write and maintain than C. Cython inherits that
property and expands it across C data types. And it generates C code for
you that automatically adapts to the different Python versions in various
ways, both in terms of compatibility and performance.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New module (written in C) for using the high-precision QD library

2015-07-31 Thread Stefan Behnel
baruc...@gmail.com schrieb am 30.07.2015 um 22:09:
> It is written in pure C with the CPython C-API in order to get the highest 
> possible speed.

This is a common fallacy. Cython should still be able to squeeze another
bit of performance out of your wrapper for you. It tends to know the C-API
better than you would think, and it does things for you that you would
never do in C. It also helps in keeping your code safer and easier to maintain.

Your C code seems to be only about 1500 lines, not too late to translate
it. That should save you a couple of hundred lines and at the same time
make it work with Python 3 (which it currently doesn't, from what I see).

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Foo.__new__ is what species of method?

2015-07-13 Thread Stefan Behnel
Steven D'Aprano schrieb am 14.07.2015 um 06:54:
> On Tuesday 14 July 2015 14:45, Ben Finney wrote:
>> The Python reference says of a class ‘__new__’ method::
>>
>> object.__new__(cls[, ...])
>>
>> Called to create a new instance of class cls. __new__() is a static
>> method (special-cased so you need not declare it as such) that takes
>> the class of which an instance was requested as its first argument.
> 
> This is correct. __new__ is a static method and you need to explicitly 
> provide the cls argument:

And it needs to be that way in order to allow superclass calls in a
subclass's __new__ method:

  class Super(object):
  def __new__(cls):
  return object.__new__(cls)

  class Sub(Super):
  def __new__(cls):
  return Super.__new__(cls)

If it was a classmethod, it would receive the class you call it on as first
argument (i.e. "Super" and "object" above), not the class you want to
instantiate (i.e. "Sub" or "Super").

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-28 Thread Stefan Behnel
Denis McMahon schrieb am 26.06.2015 um 09:44:
> xml data is an unordered list, and are trying to assign an order to it.
> 
> If the xml data was ordered, either each tag would be different, or each 
> tag would have an attribute specifying a sequence number.

XML is not unordered. The document order is well defined and entirely
obvious from the data. Whether this order is relevant and has a meaning or
not is, however, not part of XML itself but is left to the semantics of the
specific document format at hand. Meaning, XML document formats can choose
to ignore that order and define it as irrelevant. That doesn't mean it's
not there for a given document, but it may mean that a re-transmission of
the same document would be allowed to use a different order without
changing the information.

This property applies to pretty much all structured data formats and not
just XML, by the way, also to CSV and other tabular formats.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: HOPE: A Python just-in-time compiler for astrophysical computations

2015-06-28 Thread Stefan Behnel
Michael Torrie schrieb am 26.06.2015 um 19:32:
> I've never heard of pythran; I'll have to check it out and see how it
> compares to the ever-growing crop of Python dialect compilers.

My feeling is that Python seems such a simple language at the surface that
people who want to write a special purpose "Python subset" compiler prefer
starting from scratch, rather than contributing to the existing tools. It
takes a while until they understand the actual size of that undertaking and
that's the point where most of these projects just die.

I don't mean all of them. If you have enough time and/or money, you can
certainly get a project going that's relevant enough for a critical
(special purpose) user base to provide an actual benefit. But then, why
invest that time into something completely new that requires major
long-term maintenance efforts, when implementing the desired feature in an
existing compiler would be a one-time investment with a much smaller
overall impact on further maintenance costs?

Not Invented Here Syndrome, I guess...

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Java to Python autoconverters

2015-06-12 Thread Stefan Behnel
Sebastian M Cheung via Python-list schrieb am 12.06.2015 um 13:36:
> Are these available? Any good ones to recommend?

I recommend not doing that. You'd end up with ugly and unidiomatic Python
code that's impossible to maintain, whereas you now (hopefully) have
somewhat idiomatic Java code that should be reasonably maintainable.

If you want to integrate Python code with Java code, take a look at Jython
instead. If that's not what you want, then feel free to unveil your intentions.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedded Python and C Callback functions

2015-06-07 Thread Stefan Behnel
doc.mefi...@gmail.com schrieb am 07.06.2015 um 10:56:
> And I can't use Cython, because I have C++ module, and I have to use it.

That's not a valid reason. Cython supports C++ code just fine.

http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Sorting in reverse is not the same as sorting then reversing

2015-06-05 Thread Stefan Behnel
Steven D'Aprano schrieb am 05.06.2015 um 16:07:
> Sorting in reverse does not give the same result as sorting then reversing.
> 
> It's easiest to see with a key function:
> 
> py> a = ['fox', 'dog', 'DOG', 'cat', 'ape']
> py> b = a[:]
> py> a.sort(key=str.lower, reverse=True)
> py> b.sort(key=str.lower)
> py> b.reverse()
> py> a
> ['fox', 'dog', 'DOG', 'cat', 'ape']
> py> b
> ['fox', 'DOG', 'dog', 'cat', 'ape']
> 
> Sorting in reverse keeps the initial order of any equal elements unchanged.
> Sorting, then reversing, reverses them.
> 
> (Thanks to Tim Peters for the tip.)

... and for implementing this in the first place. :)

For those of you who didn't know and now got interested, the relevant term
here is "stable sorting". It means that elements that compare equal keep
their relative order. That's a general property of Python's sort algorithm.
All that "reverse=True" does is to change "lower than" into "greater than"
and vice versa for elements that compare unequal. It does not change the
behaviour for elements that compare equal, which means that they keep the
same relative order in both cases (reversed/non-reversed).

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Building an extension module with SWIG

2015-05-30 Thread Stefan Behnel
garyr schrieb am 30.05.2015 um 22:48:
> *snip*
> 
>> Compile it ("cythonize -b foo.pyx") and you'll get an extension module
>> that
>> executes faster than what SWIG would give you and keeps everything in one
>> file to improve readability.
>>
>> [1] http://cython.org/
> 
> Thanks for your reply. My interest is not in computing the gcd but to learn
> how build an extension module. I have some much more complicated C code I
> wish to use.

You can do that with Cython, too.

http://docs.cython.org/src/tutorial/external.html

http://docs.cython.org/src/tutorial/clibraries.html

I might be a bit biased as a core developer, but if the parts of you C
library's API for which you have an immediate use are not so tremendously
huge that it's entirely infeasible for you to write a nicely usable Python
API for them, I'd always recommend using Cython over a wrapper generator
like SWIG. Once you get to the points where it becomes interesting, you'll
always end up having more fun writing a Cython based integration layer than
fighting your up-hill battle against the way the wrapper generator wants
you to design it.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Building an extension module with SWIG

2015-05-30 Thread Stefan Behnel
garyr schrieb am 30.05.2015 um 18:22:
> I'm trying to create an extension module using SWIG. I've
> succeeded in generating a pyd file but when I import the module I get the
> error message: "SystemError: dynamic module not initialized properly." I
> added an initfoo() function but that didn't solve the problem. Below are the
> various files, a slightly modified version of a SWIG exmaple.
> I'm using Python 2.7
> 
> What am I missing?
> 
> //foo.c:
> #include "foo.h"
> double Foo;
> void initfoo()
> {
> Foo = 3.0;
> }

This is wrong and you also won't need that.


> int gcd(int x, int y) {
>   int g;
>   g = y;
>   while (x > 0) {
> g = x;
> x = y % x;
> y = g;
>   }
>   return g;
> }
> [...]

Just in case you're not bound to SWIG yet, here's a Cython [1] version of
your code:

# put this in a file called "foo.pyx"

def gcd(int x, int y):
while x > 0:
y, x = x, y % x
return y

Compile it ("cythonize -b foo.pyx") and you'll get an extension module that
executes faster than what SWIG would give you and keeps everything in one
file to improve readability.

Stefan


[1] http://cython.org/


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating a reliable sandboxed Python environment

2015-05-30 Thread Stefan Behnel
Laura Creighton schrieb am 30.05.2015 um 13:24:
> As a point of fact, We've _already got_ Topaz, a Ruby interpreter,
> Hippy, a PHP interpreter, a Prolog interpreter, a Smalltalk
> interpeter, and a javascript interpreter.  Recently we got Pyket a
> Racket compiler.  There also exist plenty of experimental languages
> written by academic langauge designers, and other crazy people who
> like such things.  But don't ask the PyPy project about hard is it to
> sandbox one versus the other.  From our point of view, they all cost
> the same -- free, as in _already done for you_, same as you get a JIT
> for free, and pluggable garbage collectors for free, etc. etc.

So here the cost of security is actually rewriting the entire language
runtime and potentially also major parts of its ecosystem? Not exactly a
cheap price either.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating a reliable sandboxed Python environment

2015-05-29 Thread Stefan Behnel
Chris Angelico schrieb am 29.05.2015 um 09:41:
> On Fri, May 29, 2015 at 4:18 PM, Stefan Behnel wrote:
>>> Lua's a much weaker language than Python is, though. Can it handle
>>> arbitrary-precision integers? Unicode? Dare I even ask,
>>> arbitrary-precision rationals (fractions.Fraction)?
>>
>> All of those and way more, as long as you use it embedded in Python.
> 
> Okay, so how would you go about using Lua-embedded-in-Python to
> manipulate Unicode text?

Lua only supports byte strings, so Lupa will encode and decode them for
you. If that's not enough, you'll have to work with Python Unicode string
objects through the language interface. (And I just noticed that the
handling can be improved here by overloading Lua operators with Python
operators - not currently implemented.)


> Looks to me as if Lua doesn't have integers at all

The standard number type in Lua is a C double float, i.e. the steady
integer range is somewhere within +/-2^53. That tends to be enough for a
*lot* of use cases. You could change that type in the Lua C code (e.g. to a
64 bit int), but that's usually a bad idea. The same comment as above
applies: if you need Python object features, use Python objects.

Embedding Lua in Python gives you access to all of Python's objects and
ecosystem. It may not always be as cool to use as from Python, but in that
case, why not code it in Python in the first place? You wouldn't use
Lua/Lupa to write whole applications, just the user defined parts of them.
The rest can happily remain in Python. And should, for your own sanity.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating a reliable sandboxed Python environment

2015-05-28 Thread Stefan Behnel
Chris Angelico schrieb am 28.05.2015 um 20:51:
> On Fri, May 29, 2015 at 4:41 AM, Stefan Behnel wrote:
>> davidf...@gmail.com schrieb am 26.05.2015 um 04:24:
>>> Has anyone on this list attempted to sandbox Python programs in a
>>> serious fashion? I'd be interested to hear your approach.
>>
>> Not quite sandboxing Python, but I've seen people use my Lupa [1] library
>> for this. They're writing all their code in Python, and then let users
>> embed their own Lua code into it to script their API. The Lua runtime is
>> apparently quite good at sandboxing, and it's really small, just some 600KB
>> or so. Lupa then lets you easily control the access to your Python code at
>> a whitelist level by intercepting all Python attribute lookups.
>>
>> It doesn't add much to your application to embed Lua (or even LuaJIT) in
>> Python, and it gives users a nicely object oriented language to call and
>> orchestrate your Python objects.
> 
> Lua's a much weaker language than Python is, though. Can it handle
> arbitrary-precision integers? Unicode? Dare I even ask,
> arbitrary-precision rationals (fractions.Fraction)?

All of those and way more, as long as you use it embedded in Python.


> Security comes at a price, I guess.

Sure, but features aren't the price here.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating a reliable sandboxed Python environment

2015-05-28 Thread Stefan Behnel
davidf...@gmail.com schrieb am 26.05.2015 um 04:24:
> Has anyone on this list attempted to sandbox Python programs in a
> serious fashion? I'd be interested to hear your approach.

Not quite sandboxing Python, but I've seen people use my Lupa [1] library
for this. They're writing all their code in Python, and then let users
embed their own Lua code into it to script their API. The Lua runtime is
apparently quite good at sandboxing, and it's really small, just some 600KB
or so. Lupa then lets you easily control the access to your Python code at
a whitelist level by intercepting all Python attribute lookups.

It doesn't add much to your application to embed Lua (or even LuaJIT) in
Python, and it gives users a nicely object oriented language to call and
orchestrate your Python objects.

Stefan


[1] https://pypi.python.org/pypi/lupa

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Strategies for using cffi with C++ code?

2015-05-22 Thread Stefan Behnel
Skip Montanaro schrieb am 22.05.2015 um 19:15:
> 2015-05-22 12:05 GMT-05:00 Lele Gaifax:
>> Maybe http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html?
> 
> Thanks for the reference, Lele. I was thinking in terms of cffi, but
> this might work as well. (Shouldn't cffi interfaces be the thinnest?)

Thin in what sense? cffi is a generic library (at least on CPython), so the
interface can never be as thin as what a dedicated compiler can generate
for a specific piece of interface code. PyPy's cffi can mitigate this by
applying runtime optimisations, but you can't do that in CPython without
running some kind of native code generator, be it a JIT compiler or a
static compiler. cffi can apply the latter (run a C compiler) to a certain
extent, but then you end up with a dependency on a C compiler at *runtime*.
The term "thin" really doesn't apply to that dependency. And even if you
accept to go down that route, you'd still get better results from runtime
compilation with Cython, as it will additionally optimise your interface
code (and thus make it "thinner").

Being a Cython core developer, I'm certainly a somewhat biased expert, but
using Cython to generate a statically compiled and optimised C++ wrapper is
really your best choice. IMHO, it provides the best results/tradeoffs in
terms of developer effort, runtime performance and overall end user experience.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Calling a function is faster than not calling it?

2015-05-10 Thread Stefan Behnel
Steven D'Aprano schrieb am 10.05.2015 um 11:58:
> Why is calling a function faster than bypassing the function object and
> evaluating the code object itself? And not by a little, but by a lot?
> 
> Here I have a file, eval_test.py:
> 
> # === cut ===
> from timeit import Timer
> 
> def func():
> a = 2
> b = 3
> c = 4
> return (a+b)*(a-b)/(a*c + b*c)
> 
> 
> code = func.__code__
> assert func() == eval(code)
> 
> t1 = Timer("eval; func()", setup="from __main__ import func")
> t2 = Timer("eval(code)", setup="from __main__ import code")
> 
> # Best of 10 trials.
> print (min(t1.repeat(repeat=10)))
> print (min(t2.repeat(repeat=10)))
> 
> # === cut ===
> 
> 
> Note that both tests include a name lookup for eval, so that as much as
> possible I am comparing the two pieces of code on an equal footing.
> 
> Here are the results I get:
> 
> 
> [steve@ando ~]$ python2.7 eval_test.py
> 0.804041147232
> 1.74012994766
> [steve@ando ~]$ python3.3 eval_test.py
> 0.7233301624655724
> 1.7154695875942707
> 
> Directly eval'ing the code object is easily more than twice as expensive
> than calling the function, but calling the function has to eval the code
> object.

Well, yes, but it does so directly in C code. What you are essentially
doing here is replacing a part of the fast C code path for executing a
Python function by some mostly equivalent but more general Python code. So,
you're basically replacing a function call by another function call to
eval(), plus some extra generic setup overhead.

Python functions know exactly what they have to do internally in order to
execute. eval() cannot make the same streamlined assumptions.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: XML Parsing

2015-04-05 Thread Stefan Behnel
Sepideh Ghanavati schrieb am 06.04.2015 um 04:26:
> I know basic of python and I have an xml file created from csv which has
> three attributes "category", "definition" and "definition description".
> I want to parse through xml file and identify actors, constraints,
> principal from the text. However, I am not sure what is the best way to
> go. Any suggestion?

If it's really generated from a CSV file, you could also parse that instead:

https://docs.python.org/3/library/csv.html

Admittedly, CSV files are simple, but they also have major problems,
especially when it comes to detecting their character encoding and their
specific format (tab/comma/semicolon/space/whatever separated, with or
without escaping, quoted values, ...). Meaning, you can easily end up
reading nonsense from the file instead of the content that was originally
put into it.

So, if you want to parse from XML instead, use ElementTree:

https://docs.python.org/3/library/xml.etree.elementtree.html

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is it possible to deliver different source distributions for different Python versions?

2015-04-05 Thread Stefan Behnel
Dave Hein schrieb am 05.04.2015 um 22:38:
> I would like to distribute a python package with different code for
> Python 2.* than for Python 3.*. (Mostly this is because of different
> unicode string handling).
> 
> There is nothing in to setuptools or PyPi that directly supports
> this scenario.
> 
> But perhaps there could be some script run at install time that moves
> the correct source code to the right location? In other works, if I
> included both source code versions in the distribution (in a src2 and
> a src3 subdirectory) then a function invoked at install time could
> detect the python version and copy the appropriate source code to the
> right location.
> 
> Is that at all possible? Is there some install time hook that lets me
> supply custom installation code?

Sure. You can simply change the directory in which distutils looks for your
Python code:

https://docs.python.org/2/distutils/setupscript.html#listing-whole-packages

However, in general, you shouldn't be doing this. It's usually easier
(definitely in the long-term) to keep your sources cross-Py2.x/3.x
compatible, maybe with the help of tools like "six" or "python-future",
than to try to keep separate source trees in sync.

http://python-future.org/

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Speeding up permutations generation

2015-03-06 Thread Stefan Behnel
Ian Kelly schrieb am 06.03.2015 um 18:13:
> On Fri, Mar 6, 2015 at 1:24 AM, Abhiram R wrote:
>>> A list of 100 elements has approximately 9.33 x 10**157 permutations.
>>> If you could somehow generate one permutation every yoctosecond,
>>> exhausting them would still take more than a hundred orders of
>>> magnitude longer than the age of the universe.
>>
>> True that :D I may have exaggerated on the number. Let's consider something
>> more practically manageable => 50 elements with a 50! permutation.
>> Is there a solution now?
> 
> That's still infeasible, as others have pointed out. At one
> permutation every picosecond, you'll still need 9.6 x 10**44 years.
> 
> If the size isn't that important to you and you just want a faster
> implementation of permutations, you could try reimplementing it
> yourself as a C extension. The stdlib implementation is already
> written in C though, so unless you have a better algorithm I doubt
> you'll find much room for optimization.

Well, one obvious "optimisation" in a case like this is to change the order
in which permutations are returned. If processing all of them is
infeasible, then being able to control which ones will be processed can be
a crucial property of a "better" algorithm.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Cython - was: Future of Pypy?

2015-02-23 Thread Stefan Behnel
Dave Farrance schrieb am 23.02.2015 um 15:13:
> Dave Cook wrote:
>> On 2015-02-22, Dave Farrance wrote:
>>
>>> It's still quicker to do a re-write in the more cumbersome C
>>
>> You should try Cython.
> 
> I did try Cython when I was trying to figure out what to do about the slow
> speed.  My initial attempt showed no speedup at all.  The documentation
> told me that I needed to change the data types to special C-like types, so
> it seemed to me that it would become half way between Python and C and
> would be as cumbersome to develop as C.  So at that point, I just rewrote
> it in C.

The main selling point of Cython is that, while it gives you the speed of C
if you write C-ish code (because it translates it to the obvious C code),
you don't have to write that C-ish code unless you decide to do so. Right
the next line, you can use a set comprehension or yield a value back from a
generator. So, it's not "half way between Python and C", it actually covers
both, almost entirely. (Oh, and also C++, if you feel like it.)

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: lxml objectify - attribute elements to list.

2015-02-08 Thread Stefan Behnel
Sayth Renshaw schrieb am 08.02.2015 um 12:22:
> How can I actually access the values of an element with lxml objectify?
> 
> for example if I had this element in my xml file.
> 
>  VenueCode="151" TrackName="Main" TrackCode="149">
> 
> I can see all the attributes using this.
> 
> In [86]: for child in root.getchildren():
> print(child.attrib)
>: 
> {}
> {'RequestCode': '', 'RequestId': '0'}
> {}
> {}
> ...
> {}
> {}
> {}
> {'Category': 'Metro', 'AbrClubDesc': 'VRC', 'State': 'VIC', 'ClubCode': 
> '10018', 'Title': 'Victoria Racing Club'}
> {'TrackName': 'Main', 'VenueName': 'Flemington', 'TrackCode': '149', 
> 'VenueAbbr': 'FLEM', 'VenueDesc': 'Flemington', 'VenueCode': '151'}
> {}
> {}
> ...
> 
> Trying to access by attribs isn't working or me.
> 
> In [90]: names = [p.text for p in root.Track.attrib['VenueName']]
> ---
> AttributeErrorTraceback (most recent call last)
>  in ()
> > 1 names = [p.text for p in root.Track.attrib['VenueName']]
> 
> AttributeError: 'str' object has no attribute 'text'

As you can see from the output above, "attrib" is a mapping from strings
(attribute names) to strings (attribute values). So just use

name = root.Track.attrib['VenueName']

or, even simpler:

name = root.Track.get('VenueName')

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: xml SAX Parsing in python

2014-12-17 Thread Stefan Behnel
Hi,

Abubakar Roko schrieb am 17.12.2014 um 07:30:
> Please I am new in using python to write program. I am trying to parse an XML 
> document using sax parse  and store the parsed result in a tree like 
> definedbelow. XNode class define an xml element which has an ID , a tag, a 
> text value, children element and a parent element
>class XNode(object):
> def __init__(self, ID ="", elmName="", 
> elmValue="", parent=None):
>  self.ID = ID 
> self.elmName=elmName self.elmValue=elmValue   
>   self.childs=[]
> self.parent=parent
> 
>def  getPath(self):
>   if self.parent is None:   return 
> self.elmName else:
>return self.parent.getPath()+"/"+ self.elmName
> I  wrote a program that parse an XML document ,  convert the  document  into 
> the tree like structure defined above and then return the parsed result tothe 
> program that call it.  The program shown below.
> 
> import xml.saximport XMLnode as n
> 
> class XML_Handler ( xml.sax.ContentHandler):
> def __init__(self, root):self.root = rootself.tmp =  
> n.XNode()
> def startElement(self, tag, attributes):#if self.root != None:
> if self.root is not None:
> if len(self.tmp.childs) < 10:ID = self.tmp.ID 
> +"." + "0" + str( len(self.tmp.childs))else:ID = 
> self.tmp.ID +"." + str( len(self.tmp.childs)) 
> self.tmp.childs.append( n.XNode(ID,tag,"",self.tmp)) 
> self.tmp= self.tmp.childs[len(self.tmp.childs)-1]else:
> print "0", tag, self.tmp.getPath()self.root= n.XNode("0", 
> tag,"",None)self.tmp=self.root
> def characters(self, content):self.tmp.elmValue += content.strip()
> def endElement(self, tag):self.tmp= self.tmp.parent
> 
> def parse(self, f):xml.sax.parse(self,f)return self.root
> 
> if ( __name__ == "__main__"):
>  parser = xml.sax.make_parser() 
> parser.setFeature(xml.sax.handler.feature_namespaces, 0) root = None
> Handler = XML_Handler(root)parser.setContentHandler( Handler )
> treRoot= parser.parse("Movies.xml")print treRoot
> Can somebody help me answer the following questionMy Question is how do I 
> return the parsed result through the root instance variable of  of 
> XML_Handler classI try to do it but i always get None as answerI am using 
> Window 7 professional and python 2.7

The formatting of your code example was heavily screwed up, please send a
plain text email next time.

My general advice is to use ElementTree instead of SAX. It's way easier to
use (even for simple tasks). Use iterparse() to get event driven
incremental parsing.

https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse

http://effbot.org/zone/element-iterparse.htm

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: jitpy - Library to embed PyPy into CPython

2014-12-07 Thread Stefan Behnel
Albert-Jan Roskam schrieb am 06.12.2014 um 21:28:
> On Fri, Dec 5, 2014 8:54 PM CET Mark Lawrence wrote:
>> For those who haven't heard thought this might be of interest
>> https://github.com/fijal/jitpy
> 
> Interesting, but it is not clear to me when you would use jitpy instead
> of pypy.

I think this is trying to position PyPy more in the same corner as other
JIT compilers for CPython, as opposed to keeping it a completely separate
thing which suffers from being "not CPython". It's a huge dependency, but
so are others.

Being able to choose tools at this level is great, so if PyPy becomes yet
another way to speed up the critical 5% of a CPython application, that's a
good thing.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedded python 'scripting engine' inside Python app

2014-11-23 Thread Stefan Behnel
Chris Angelico schrieb am 23.11.2014 um 11:35:
> On Sun, Nov 23, 2014 at 9:28 PM, Patrick Stinson wrote:
>> Is there a better and more secure way to do the python-within-python in
>> order allow users to automate your app?
> 
> More secure? Basically no. You could push the inner script into a
> separate process, but I would recommend simply acknowledging the
> insecurity. Embrace the lack of security and call it a debugging
> feature - make it possible to introspect, control, manipulate internal
> structures. Feature, not flaw. :)

As the author of Lupa, I know that some people have successfully and safely
embedded Lua in Python as a simple, small and object-oriented scripting
language in a sandbox.

https://pypi.python.org/pypi/lupa

The overall syntax isn't quite as great as that of Python, but as long as
you mostly stick to "here's a bunch of functions you can call" or "here's
an object, go and call some methods on it" kind of APIs, there isn't all
that much of a difference either.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Stefan Behnel
Ian Kelly schrieb am 20.11.2014 um 20:44:
> On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote:
>> There's also the E-factory for creating (sub-)trees and a nicely objectish 
>> way:
>>
>> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory
> 
> That looks ugly with all those caps and also hard to extend. Notably
> it seems to be missing any functions to build HTML5 elements, unless
> those have been added in lxml 3.4.

It's actually trivial to extend, and it's designed for it. The factory
simply uses "__getattr__()", so you can ask it for any tag name. The
predefined names in the builder.py module are mainly there to easily detect
typos on user side.

https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py

If you don't like capital names for constants, just copy the module and
change the tag names to lower case, or use the blank E-factory if you feel
like it.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Most gratuitous comments

2014-11-20 Thread Stefan Behnel
Chris Angelico schrieb am 20.11.2014 um 06:06:
> On Thu, Nov 20, 2014 at 3:58 PM, Steven D'Aprano wrote:
>> And the award for the most gratuitous comments before an import goes to
>> one of my (former) workmates, who wrote this piece of code:
>>
>> # Used for base64-decoding.
>> import base64
>> # Used for ungzipping.
>> import gzip
> 
> Well hey. Good to know he's using the tools for their intended purposes!

Not necessarily. The comments only suggest that the imports were added (or
at least commented on) with the intended purpose in mind. Whether that
purpose is still what the modules are used for or whether they are even
still in use at all, is unclear from the above.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Stefan Behnel
Tim schrieb am 20.11.2014 um 18:31:
> On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote:
>>> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
 So what I'm looking for is a method to create an html5 document using
 "dom manipulation", ie:

 doc = new htmldocument(doctype="HTML")
 html = new html5element("html")
 doc.appendChild(html)
 head = new html5element("body")
 html.appendChild(head)
 body = new html5element("body")
 html.appendChild(body)
 title = new html5element("title")
 txt = new textnode("This Is The Title")
 title.appendChild(txt)
 head.appendChild(title)
 para = new html5element("p")
 txt = new textnode("This is some text.")
 para.appendChild(txt)
 body.appendChild(para)

 print(doc.serialise())

 generates:

 This Is The Title>>> head>This is some text.

 I'm finding various mechanisms to generate the structure from an
 existing piece of html (eg html5lib, beautifulsoup etc) but I can't
 seem to find any mechanism to generate, manipulate and produce html5
 documents using this dom manipulation approach. Where should I be
 looking?
>>
>> Everything there seems to assume I'll be creating a document serially, eg 
>> that I won't get to some point in the document and decide that I want to 
>> add an element earlier.
>>
>> bs4 and html5lib will parse a document into a tree structure, but they're 
>> not so hot on manipulating the tree structure, eg adding and moving nodes.
>>
>> Actually it looks like bs4 is going to be my best bet, although limited 
>> it does have most of what I'm looking for. I just need to start by giving 
>> it "" to parse.
> 
> I believe lxml should work for this. Here's a snippet that I have used to 
> create an HTML document:
> 
> from lxml import etree
> page = etree.Element('html')
> doc = etree.ElementTree(page)
> 
> head = etree.SubElement(page, 'head')
> body = etree.SubElement(page, 'body')
> table = etree.SubElement(body, 'table')
> 
> etc etc
>
> with open('mynewfile.html', 'wb') as f:
> doc.write(f, pretty_print=True, method='html')
> 
> (you can leave out the method= option to get xhtml).

There's also the E-factory for creating (sub-)trees and a nicely objectish way:

http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory

and the just released lxml 3.4.1 has an "htmlfile" context manager that
allows you to generate HTML incrementally:

http://lxml.de/api.html#incremental-xml-generation

Obviously, you can combine both, so you can create a subtree in memory and
write it into an incrementally built HTML stream. Pretty versatile.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: (-1)**1000

2014-10-29 Thread Stefan Behnel
Ned Batchelder schrieb am 26.10.2014 um 21:45:
> On 10/26/14 4:07 PM, Tony the Tiger wrote:
>> On Wed, 22 Oct 2014 10:27:34 +0200, ast wrote:
>>
>>> If i am writing (-1)**1000 on a python program, will the interpreter do
>>> (-1)*(-1)*...*(-1) or something clever ?
>>
>> Even vs. odd. It ought to know. I would assume from a set of defined
>> rules how math works.
> 
> There is such a thing as an optimization that isn't worthwhile to perform,
> simply because it's expected to provide so little benefit.  The language
> implementors have to trade off the cost of adding the optimization to the
> implementation, against the possible benefit people would get from it.
> 
> Benefit in this case would have to include a guess as to how often real
> programs would hit the optimization case.

... and also compare it to the number of cases where the optimisation
(which may, for example, need to check for an optimisable value or set of
values) slows down the generic (unoptimised) code path that is actually taken.

Even if the code impact on the implementation is small enough to be
acceptable, an optimisation for unlikely cases may provide a net-loss for
the "normal" code. So there are several reasons why an "obvious"
optimisation may be a bad idea.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: XML Patch

2014-10-27 Thread Stefan Behnel
Hi,

please keep this on-list.

Nicholas Cole schrieb am 26.10.2014 um 22:43:
> On Sun, Oct 26, 2014 at 6:30 PM, Stefan Behnel wrote:
>> Nicholas Cole schrieb am 26.10.2014 um 18:00:
>>> I'm looking for a python library that can parse XML Documents and 
>>> create xml-aware "diff" files, and then use those to patch
>>> documents. In other words, I'd like something similar to the Google 
>>> diff-match-patch tools, but something which is XML aware.
>>> 
>>> I can see several projects on Pypi that can generate some form of
>>> xml diff, but I can't seem to see anything that can also do the
>>> patching side of things.
>> 
>> Is there a use case for this?
> 
> Yes - I want to store a series of XML diffs/patches and be able to 
> generate documents by applying them.

Could you be a little more specific? There are lots of ways to generate
XML, but I never heard of anyone who wanted to do this based on diffs
between other documents. What kind of document differences are you talking
about here?

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: XML Patch

2014-10-26 Thread Stefan Behnel
Nicholas Cole schrieb am 26.10.2014 um 18:00:
> I'm looking for a python library that can parse XML Documents and
> create xml-aware "diff" files, and then use those to patch documents.
> In other words, I'd like something similar to the Google
> diff-match-patch tools, but something which is XML aware.
> 
> I can see several projects on Pypi that can generate some form of xml
> diff, but I can't seem to see anything that can also do the patching
> side of things.

Is there a use case for this?

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Clearing globals in CPython

2014-10-02 Thread Stefan Behnel
Chris Angelico schrieb am 02.10.2014 um 16:12:
> On Fri, Oct 3, 2014 at 12:07 AM, Grant Edwards wrote:
>> On 2014-10-01, Steven D'Aprano wrote:
>>
>>> Obviously the easiest way to recover is to exit the current session and
>>> restart it, but as a challenge, can we recover from this state?
>>
>> Python apparently _does_ need a "restart command".
> 
> Apparently not... you saw how easily Peter recovered :)

Right. All we need is a builtin function for that recovery code.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: any way to tell at runtime whether a callable is implemented in Python or C ?

2014-09-26 Thread Stefan Behnel
Chris Angelico schrieb am 26.09.2014 um 10:42:
> On Fri, Sep 26, 2014 at 5:47 PM, Wolfgang Maier wrote:
>> is there any reliable and inexpensive way to inspect a callable from running
>> Python code to learn whether it is implemented in Python or C before calling
>> into it ?
> 
> I'm not sure you can say for absolute certain, but the presence of a
> __code__ attribute is strongly suggestive that there's Python code
> behind the function. That might be good enough for your purposes.

Cython implemented native functions have a "__code__" attribute, too. Their
current "__code__.co_code" attribute is empty (no bytecode), but I wouldn't
rely on that for all times either.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: any way to tell at runtime whether a callable is implemented in Python or C ?

2014-09-26 Thread Stefan Behnel
Wolfgang Maier schrieb am 26.09.2014 um 09:47:
> is there any reliable and inexpensive way to inspect a callable from
> running Python code to learn whether it is implemented in Python or C
> before calling into it ?

Not really. Both can have very different types and very different
interfaces. There are types, classes, functions, methods, objects with a
dedicated __call__() method, ... Any of them can be implemented in Python
or C (or other native languages, or a mix of more than one language).

What's your use case? There might be other ways to achieve what you want.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GCD in Fractions

2014-09-24 Thread Stefan Behnel
blindanagram schrieb am 24.09.2014 um 15:25:
> On 24/09/2014 12:44, Steven D'Aprano wrote:
> 
>> blindanagram wrote:
> [snip]
>> - Mathworld says that GCD of two negative numbers is a negative number;
>>
>> - but Mathematica says that GCD of two negative numbers is a positive;
>>
>> - Wikipedia agrees with Mathematica and disagrees with Mathworld;
> 
> After looking at these (and several other) on-line mathematical sites, I
> realised that I would have to go back to long standing mathemmatical
> references to find how the gcd is defined by those that explicitly cover
> the greatest common divisor for negative integers (I did this before
> raising the issue here).
> 
> All four that I have so far looked at have definitions that lead to
> gcd(a, b) for integers being equal to gcd(|a|, |b|). I hope to visit a
> University library shortly to review more.  Does anyone know of such a
> reference that uses a definition that conflicts with gcd(a, b) for
> integers being equal to gcd(|a|, |b|)?

Steven has already given sources that suggest that the result of gcd()
should be positive. Just like he gave sources that suggest the opposite.
So, the question is not how or where to find even more sources, or to
decide which of those sources is "more right" than the others, the question
is whether such a shaky ground is a reasonable foundation for breaking
other people's code.

We have an open tracker ticket now on changing *something* about the
current situation. Let's just add some new functionality somewhere if
people really want it (as in "need it for their code", not just "want it
for purity reasons" or "sleep better when they know it's out there"), but
please, everyone, stop complaining about "fractions.gcd" not catering for
your needs. It does what it's there for, even if the name is more public or
more generic than you might want. There are other ways to fix the actual
problem and move on.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GCD in Fractions

2014-09-23 Thread Stefan Behnel
Ian Kelly schrieb am 23.09.2014 um 19:39:
> On Tue, Sep 23, 2014 at 11:26 AM, Stefan Behnel wrote:
>> Wolfgang Maier schrieb am 23.09.2014 um 18:38:
>>> While at first I thought this to be a rather irrelevant debate over module
>>> private vs public naming conventions, I now think the OP is probably right
>>> and renaming fractions.gcd to fractions._gcd may be a good idea.
>>
>> Making a public API private is rarely a good idea. It should be enough in
>> this case to document the behaviour.
>>
>> And, believe it or not, it actually is documented:
>>
>> https://docs.python.org/3.5/library/fractions.html#fractions.gcd
> 
> I don't think documentation is sufficient in this case. This is the
> kind of thing though that is easy to forget about if you haven't read
> the documentation recently. And with a function like gcd, one
> generally wouldn't expect to *need* to read the documentation.

Interesting. I would definitely consult the documentation first thing if I
were considering to pass negative values into a gcd function - into any
implementation, even if I had been the very author myself, just two months
back. I might even take a look at the source to make sure the docs are
correct and up to date, and to look for comments that give further
insights. But maybe that's just me.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GCD in Fractions

2014-09-23 Thread Stefan Behnel
blindanagram schrieb am 23.09.2014 um 19:43:
> On 23/09/2014 18:26, Stefan Behnel wrote:
>> Wolfgang Maier schrieb am 23.09.2014 um 18:38:
>>> While at first I thought this to be a rather irrelevant debate over module
>>> private vs public naming conventions, I now think the OP is probably right
>>> and renaming fractions.gcd to fractions._gcd may be a good idea.
>> For negative numbers, the "expected" behaviour seems to be unclear, so the
>> current behaviour is just as good as any, so backwards compatibility
>> concerns clearly win this fight.
> 
> The expected behaviour is not unclear for anyone who takes the
> mathematical properties of the GCD seriously.  It's a shame that Python
> doesn't.

May I ask how you get from one little function in the well-defined scope of
a data type module (which is not named "math" or "integers" or "natural" or
anything like it) to the extrapolation that Python doesn't take
mathematical properties serious?

If the scope of that function's applicability does not match what you want
in your specific use case, then by all means, don't use it for your
specific use case.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GCD in Fractions

2014-09-23 Thread Stefan Behnel
Wolfgang Maier schrieb am 23.09.2014 um 18:38:
> While at first I thought this to be a rather irrelevant debate over module
> private vs public naming conventions, I now think the OP is probably right
> and renaming fractions.gcd to fractions._gcd may be a good idea.

Making a public API private is rarely a good idea. It should be enough in
this case to document the behaviour.

And, believe it or not, it actually is documented:

https://docs.python.org/3.5/library/fractions.html#fractions.gcd


> Googling for recipes to calculate the gcd using python brings up
> fractions.gcd as a general answer (like at stackoverflow:
> http://stackoverflow.com/questions/11175131/code-for-greatest-common-divisor-in-python)
> and it is not obvious for non-mathematicians to realize that it is NOT a
> generally acceptable solution.

It is. Certainly for positive numbers, which clearly present the majority
of use cases. It's definitely the "normal" use case, wouldn't you say?

For negative numbers, the "expected" behaviour seems to be unclear, so the
current behaviour is just as good as any, so backwards compatibility
concerns clearly win this fight.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PyCharm refactoring tool?

2014-09-15 Thread Stefan Behnel
George Silva schrieb am 15.09.2014 um 21:49:
> It's pretty useful. I use it for some time now and I very much like it.
> [...]
> The most powerful for me are the rename refactor and extract. Works like
> charm (no pun intended).

Dito.


> On Mon, Sep 15, 2014 at 4:44 PM, Skip Montanaro  wrote:
>> I started up an instance of PyCharm last Friday. It's mostly just been
>> sitting there like a bump on a log. I set things up to use Emacs as my
>> editor. It seems most of its functionality won't be all that useful. Most
>> of my work is on libraries/platforms - stuff which is not runnable in
>> isolation, so the Run menu doesn't look all that useful.

I also do most exec stuff on the command line - it needs to work there
anyway, so the additional config in PyCharm is really something on top that
I often don't do. However, running stuff within PyCharm can still be really
handy because it integrates very nicely with py.test and other test
runners. You get nice visual feedback for your tests, can rerun failing
tests with one click, can visually debug problems, get coverage analysis
for free, etc. It's all very nicely integrated.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pylint for cython?

2014-09-12 Thread Stefan Behnel
Skip Montanaro schrieb am 12.09.2014 um 17:52:
> I have slowly been converting some Python source to Cython. I'm pretty
> conservative in what changes I make, mostly sprinkling a few "cdef",
> "float" and "int" declarations around the pyx file. Still, conservative or
> not, it's enough to choke pylint. Rather than have to maintain a pure
> Python version of my code, it would be nice if pylint had a flag or if
> there was a "cylint" tool available.

If you really just do things like "cdef int x", I recommend using pure
Python syntax for it (in a .py file). That way, you can just run pylint
over it as before.

http://docs.cython.org/src/tutorial/pure.html#static-typing

Specifically, the "@cython.locals()" decorator might be all you need, or
maybe some of the other things like "@cython.cfunc".

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python vs C++

2014-08-22 Thread Stefan Behnel
If you want to add Cython to that (overly simplified) graph, you might get
something like this:

Christian Gollwitzer schrieb am 22.08.2014 um 21:25:
> as |--|
> c   ||
> c++   |---|
Cython   ||
> python||

Meaning, there is a lot you can do in Cython that can keep you from having
to write C/C++ code at all. And even if you really have to, it still helps
in keeping that down to a couple of well chosen snippets rather than full
programs.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: error building lxml.etree

2014-08-22 Thread Stefan Behnel
Robin Becker schrieb am 22.08.2014 um 17:50:
> I'm trying to build a bunch of extensions in a 2.7 virtual environment on a
> centos 7 VM. I don't know centos very well and I understand centos 7 is
> quite new
> 
>> building 'lxml.etree' extension
>>
>> creating build/temp.linux-x86_64-2.7
>>
>> creating build/temp.linux-x86_64-2.7/src
>>
>> creating build/temp.linux-x86_64-2.7/src/lxml
>>
>> gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
>> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
>> -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall
>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
>> --param=ssp-buffer-size=4   -grecord-gcc-switches -m64 -mtune=generic
>> -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/libxml2
>> -I/home/rptlab/website/xxx/xxx_0/build/lxml/src/lxml/includes
>> -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o
>> build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
>>
>> {standard input}: Assembler messages:
>>
>> {standard input}:1858223: Error: unknown pseudo-op: `.'
>>
>> gcc: internal compiler error: Killed (program cc1)
>>
>> Please submit a full bug report,
>>
>> with preprocessed source if appropriate.
>>
>> See  for instructions.
>>
>> error: command 'gcc' failed with exit status 4
> 
> 
> uname -a
>> Linux localhost.localdomain 3.10.0-123.6.3.el7.x86_64 #1 SMP Wed Aug 6
>> 21:12:36 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> gcc --version
>> gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> I do have the various devel rpms installed so far as I can tell.
> 
> Has anyone else seen this error? It's entirely possible that it might be I
> don't have enough memory or something

Yes, that's most likely it. Having 500MB+ of free(!) RAM is a good idea for
the build.


> lxml builds almost always take a long time.

For testing, you can speed things up quite substantially by using "-O0" as
your CFLAGS. Not a good idea for a production system, though.

You might also get away with building a (static?) wheel on another
compatible Linux system that has more RAM.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python vs C++

2014-08-21 Thread Stefan Behnel
dieter schrieb am 22.08.2014 um 08:12:
> David Palao writes:
>>  Why to use C++ instead of python?
> 
> Likely, you would not use Python to implement most parts of an
> operating system (where, for efficiency reasons, some parts
> are even implemented in an assembler language).
> 
> I can imagine that the GNU compiler developers, too, had good
> reasons to implement them in C rather than a scripting language.
> It makes a huge difference whether you wait one or several hours
> before a large system is built.
> 
> "firefox", too, seems to be implemented in C/C++. There, too, I
> see good reasons:
>   *  it is nice when your pages are rendered quickly
> 
>   *  "firefox" depends on lots of external libraries, all of them
>  with C/C++ interfaces; while is is possible to create
>  Python bindings for them, this is quite some work
> 
>   *  as it is, "firefox" is a huge "memory eater"; one might
>  fear that things would be worse if implemented in a
>  higher level language (with everything on the heap).
>  Though, the fear might not be justified.
> 
> 
> All these examples are really large projects. I like Python a lot
> for smaller projects.

While I agree that there are very valid reasons to write C/C++ code (and
operating systems clearly fall into that category), most of the above might
turn out to be fallacies. With a more high-level language, it is easier to
get a system running and then focus on optimisation than in a low-level
language that requires a lot of concentrated work just to get things done
at all. Especially in the long run, where the maintenance burden of
low-level code starts getting so much in the way that it becomes harder and
harder to keep improving the system and adding new features.

If, instead, you start with a high-level language, your first
implementation might not be as fast as your first C++ implementation could
have been, but it'll be almost certainly available much earlier, so that
you can then give it real world testing and performance evaluation. That
gives you a head start for optimisation and improvements, which then leads
to a faster system again. Thus, it's not unlikely that you already get an
even faster and better system (in terms of actual user experience) in the
same timeframe that you would otherwise have spent on getting even a first
working version of your system in a low-level language.

And the optimisation that you apply to your system may still include
rewriting parts of it in C++, but then really only those parts where real
world evaluation proved that it's worth the effort and maintenance overhead.

I've given a talk about this topic at PyCon-DE 2012. It's in German, but it
contains a lot of figures that should be understandable even if you don't
understand that language.

http://consulting.behnel.de/PyConDE/2012/ohnecpp.html

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Stefan Behnel
Steven D'Aprano schrieb am 17.08.2014 um 16:21:
> I wonder whether Ruby programmers are as obsessive about
> Ruby's GIL?

I actually wonder more whether Python programmers are really all that
obsessive about CPython's GIL. Sure, there are always the Loud Guys who
speak up when they feel like no-one's mentioned it for too long, but I'd
expect the vast majority to be just ok with the status quo and not think
about it most of the time. Or, well, think about it when one of the Loud
Guys takes the megaphone, but then put their thoughts back in the attic and
keep doing their daily work.

Personally, I like the GIL. It helps me keep my code simpler and more
predictable. I don't have to care about threading issues all the time and
can otherwise freely choose the right model of parallelism that suits my
current use case when the need arises (and threads are rarely the right
model). I'm sure that's not just me.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What can Nuitka do?

2014-06-27 Thread Stefan Behnel
CM, 28.06.2014 05:57:
>> Now type 
>> 
>> nuitka --recurse-all something_or_other.py
>>  
>> and hit Enter. What happens?
> 
> I did that and the message is:
> 
>'nuitka' is not recognized as an internal 
>or external command, operable program or batch file.
> 
> which makes sense because some kind of file called 
> nuitka is not in my path. What I wasn't sure of is how 
> to add it, because I looked in the nuitka folder in 
> Python27/Lib/site-packages and there was no file 
> called nuitka.py or nuitka.exe within that folder, 
> and there were a lot of subfolders but I just didn't 
> know what I should do.  

There should be a folder Python27/Scripts that contains the executable
programs that Python packages install.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >