On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski <fij...@gmail.com> wrote:


> Seriously which ones? I think msgpack usage is absolutely legit. You
> seem to have different opinions about the design of that software, but
> you did not respond to my concerns even, not to mention the fact that
> it sounds like it's not "obfuscated by jitviewer".
>
> Cheers,
> fijal
>

First I would have tried using cffi to the msgpack c library.  If I wasn't
happy with it I would do a Python port.  So for no lets forget about cffi
and just deal with the current design of this library.

I had tried to minimize the discussion about this library on this forum as
I had already wrote extensive comments on the original blog [1].  Now I
didn't do an extensive review of the code as I only concentrated on a small
portion of it namely in the area of unpacking the msgpack messages.  I'll
just highlight a couple of concerns I had.

The first thing the shocked me  was the use of the struct.pack and
struct.unpack functions.  Normally when you need to pack and unpack often
with the same format you would create a struct object with the desired
format and use this object with its pack and unpack methods.  That way the
format string is not always being parsed but instead once when the struct
object is created.

As Bas pointed out pypy is able to optimize the parsing of the format which
is great but why would you prefer to write code that would run with
horrible performance under CPython when there is an alternative available.
 Now toward the end of the comments on the blog, Bas stated he tried the
struct object under pypy and found it ran slower.  So there is likely
an opportunity for pypy to add another optimization as if pypy can optimize
the struct functions it should be able to handle the struct objects which I
would think would be an easier case to handle purely looking at it from a
high level perspective.

Another issue I had was the msgpack spec is designed in a way to minimize
the need of copying data.  That is you should be able to just use the data
directly from the message buffers.  The normal way to do this with the
struct module is to use the unpack_from and pack_into methods instead of
the pack and unpack methods.  These methods take a buffer and an offset as
opposed to the pack and unpack which would require you to slice out a copy
of the original buffer to pass it in the unpack method.  As Bas pointed out
again pypy is able to optimize this copy created from slicing away which is
great but again why code it in a way that will be slow on CPython when
there is an alternative.

The other issue I mentioned on the blog was the large number of if, elif
statements used to handle each type of msgpack message.  I instead
suggested creating essentialy a list that holds references to struct
objects so that the message type would be used as in index into this list.
 So that way you remove all the if, elif statements and end up with
something like

struct_objects[message_type].unpack_from()

Now I understand that pypy is able to optimize all these if and elif
statements by creating bridges for the various paths through this code but
again why code it this way when it will be slow on CPython.  I would also
assume that using the if elif statements would still have more overhead in
pypy compared to using a list of references although maybe there is not
much of a difference.

Any way this is just the issues I saw with this library which by the way is
no where near as bad as other code I have seen written as a result of users
using the jitviewer.  Unfortunately, I could not discuss these other
projects as they are closed source.

Any way to get to the other part of you reply I assume not responding to
your concerns is about the following

"python is nicer. It does not segfault. Besides, how do you get a
string out of a C library? if you do raw malloc it's prone to be bad.
Etc. etc."

Sorry that was an over sight.  I feel the same way about Python but what's
the real issue of taking the practical approach of using a c library that
is written well and is robust.  I would love to see everything written in
Python but who has the time to port everything over.

In the msgpack c library it would have the responsibility of maintaining
the buffers.  It's API supports creating and freeing these buffers. The
msgpack library would be doing most of the work and the only data that has
to go back and forth between the Python code and the library are just basic
types like int, float, double, strings, etc.  To get a string out of the c
library just slice cffi.buffer to create a copy of it in Python before
calling the function to clear the msgpack buffer.

With using cffi this slicing to create copies of strings into Python and
the overhead of calling into the c functions does add extra work over what
would be done with the code written purely in Python and assuming pypy does
have all the optimizations in place to get you to match the performance of
the msgpack c library.  The question is how much overhead does cffi really
add in this use case, and is it worth doing the Python port to remove that
overhead.

I don't know the answer to this question.  It would require profiling both
cases.

[1] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/

John
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to