[Python-Dev] Perceus useful for Python

2020-12-19 Thread Gary Robinson via Python-Dev
A recent technical note from Microsoft describes a new reference counting 
algorithm, Perceus. It seemed worth posting here in case there are any thoughts 
about whether it might be useful for Python. I couldn't find any existing 
references to it in this list.

"""
We introduce Perceus, an algorithm for precise reference counting with reuse 
and specialization. Starting from a func- tional core language with explicit 
control-flow, Perceus emits precise reference counting instructions such that 
programs are garbage free, where only live references are retained. This 
enables further optimizations, like reuse analysis that allows for guaranteed 
in-place updates at runtime. This in turn enables a novel programming paradigm 
that we call functional but in-place (FBIP). Much like tail-call optimiza- tion 
enables writing loops with regular function calls, reuse analysis enables 
writing in-place mutating algorithms in a purely functional way. We give a 
novel formalization of ref- erence counting in a linear resource calculus, and 
prove that Perceus is sound and garbage free. We show evidence that Perceus, as 
implemented in Koka, has good performance and is competitive with other 
state-of-the-art memory collectors.
"""

https://www.microsoft.com/en-us/research/uploads/prod/2020/11/perceus-tr-v1.pdf
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7KJAGPJZ66U4GGGABWMVDEPW6RDJ2XT7/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

2015-09-09 Thread Gary Robinson
> 
> I haven't tried getting the SciPy stack running with PyParallel yet.

That would be essential for my use. I would assume a lot of potential 
PyParallel users are in the same boat.

Thanks for the info about PyPy limits. You have a really interesting project. 

-- 

Gary Robinson
gary...@me.com
http://www.garyrobinson.net

> On Sep 9, 2015, at 7:02 PM, Trent Nelson <tr...@snakebite.org> wrote:
> 
> On Wed, Sep 09, 2015 at 04:52:39PM -0400, Gary Robinson wrote:
>> I’m going to seriously consider installing Windows or using a
>> dedicated hosted windows box next time I have this problem so that I
>> can try your solution. It does seem pretty ideal, although the STM
>> branch of PyPy (using http://codespeak.net/execnet/ to access SciPy)
>> might also work at this point.
> 
> I'm not sure how up-to-date this is:
> 
> http://pypy.readthedocs.org/en/latest/stm.html
> 
> But it sounds like there's a 1.5GB memory limit (or maybe 2.5GB now, I
> just peaked at core.h linked in that page) and a 4-core segment limit.
> 
> PyParallel has no memory limit (although it actually does have support
> for throttling back memory pressure by not accepting new connections
> when the system hits 90% physical memory used) and no core limit, and it
> scales linearly with cores+concurrency.
> 
> PyPy-STM and PyParallel are both pretty bleeding edge and experimental
> though so I'm sure we both crash as much as each other when exercised
> outside of our comfort zones :-)
> 
> I haven't tried getting the SciPy stack running with PyParallel yet.
> 
>Trent.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

2015-09-09 Thread Gary Robinson
I’m going to seriously consider installing Windows or using a dedicated hosted 
windows box next time I have this problem so that I can try your solution. It 
does seem pretty ideal, although the STM branch of PyPy (using 
http://codespeak.net/execnet/ to access SciPy) might also work at this point.

Thanks!

I still hope CPython has a solution at some point… maybe PyParallelel 
functionality will be integrated into Python 4 circa 2023… :)



-- 

Gary Robinson
gary...@me.com
http://www.garyrobinson.net

> On Sep 9, 2015, at 4:33 PM, Trent Nelson <tr...@snakebite.org> wrote:
> 
> On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote:
>> There was a huge data structure that all the analysis needed to
>> access. Using a database would have slowed things down too much.
>> Ideally, I needed to access this same structure from many cores at
>> once. On a Power8 system, for example, with its larger number of
>> cores, performance may well have been good enough for production. In
>> any case, my experimentation and prototyping would have gone more
>> quickly with more cores.
>> 
>> But this data structure was simply too big. Replicating it in
>> different processes used memory far too quickly and was the limiting
>> factor on the number of cores I could use. (I could fork with the big
>> data structure already in memory, but copy-on-write issues due to
>> reference counting caused multiple copies to exist anyway.)
> 
> This problem is *exactly* the type of thing that PyParallel excels at,
> just FYI.  PyParallel can load large, complex data structures now, and
> then access them freely from within multiple threads.  I'd recommended
> taking a look at the "instantaneous Wikipedia search server" example as
> a start:
> 
> https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py
> 
> That loads trie with 27 million entries, creates ~27.1 million
> PyObjects, loads a huge NumPy array, and has a WSS of ~11GB.  I've
> actually got a new version in development that loads 6 tries of the
> most frequent terms for character lengths 1-6.  Once everything is
> loaded, the data structures can be accessed for free in parallel
> threads.
> 
> There are more details regarding how this is achieved on the landing
> page:
> 
> https://github.com/pyparallel/pyparallel
> 
> I've done a couple of consultancy projects now that were very data
> science oriented (with huge data sets), so I really gained an
> appreciation for how common the situation you describe is.  It is
> probably the best demonstration of PyParallel's strengths.
> 
>> Gary Robinson gary...@me.com http://www.garyrobinson.net
> 
>Trent.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

2015-09-08 Thread Gary Robinson
> 
> Trent seems to be on to something that requires only a bit of a tilt
> ;-), and despite the caveat above, I agree with David, check it out:

I emailed with Trent a couple years ago about this very topic. The biggest 
issue for me was that it was Windows-only, but it sounds like that restriction 
may be getting closer to possibly going away… (?)



-- 

Gary Robinson
gary...@me.com
http://www.garyrobinson.net

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Yet another "A better story for multi-core Python" comment

2015-09-08 Thread Gary Robinson
Folks,

If it’s out of line in some way for me to make this comment on this list, let 
me know and I’ll stop! But I do feel strongly about one issue and think it’s 
worth mentioning, so here goes.

I read the "A better story for multi-core Python” with great interest because 
the GIL has actually been a major hindrance to me. I know that for many uses, 
it’s a non-issue. But it was for me.

My situation was that I had a huge (technically mutable, but unchanging) data 
structure which needed a lot of analysis. CPU time was a major factor — things 
took days to run. But even so, my time as a programmer was much more important 
than CPU time. I needed to prototype different algorithms very quickly. Even 
Cython would have slowed me down too much. Also, I had a lot of reason to want 
to make use of the many great statistical functions in SciPy, so Python was an 
excellent choice for me in that way. 

So, even though pure Python might not be the right choice for this program in a 
production environment, it was the right choice for me at the time. And, if I 
could have accessed as many cores as I wanted, it may have been good enough in 
production too. But my work was hampered by one thing:

There was a huge data structure that all the analysis needed to access. Using a 
database would have slowed things down too much. Ideally, I needed to access 
this same structure from many cores at once. On a Power8 system, for example, 
with its larger number of cores, performance may well have been good enough for 
production. In any case, my experimentation and prototyping would have gone 
more quickly with more cores.

But this data structure was simply too big. Replicating it in different 
processes used memory far too quickly and was the limiting factor on the number 
of cores I could use. (I could fork with the big data structure already in 
memory, but copy-on-write issues due to reference counting caused multiple 
copies to exist anyway.)

So, one thing I am hoping comes out of any effort in the “A better story” 
direction would be a way to share large data structures between processes. Two 
possible solutions:

1) More the reference counts away from data structures, so copy-on-write isn’t 
an issue. That sounds like a lot of work — I have no idea whether it’s 
practical. It has been mentioned in the “A better story” discussion, but I 
wanted to bring it up again in the context of my specific use-case. Also, it 
seems worth reiterating that even though copy-on-write forking is a Unix thing, 
the midipix project appears to bring it to Windows as well. (http://midipix.org)

2) Have a mode where a particular data structure is not reference counted or 
garbage collected. The programmer would be entirely responsible for manually 
calling del on the structure if he wants to free that memory. I would imagine 
this would be controversial because Python is currently designed in a very 
different way. However, I see no actual risk if one were to use an 
@manual_memory_management decorator or some technique like that to make it very 
clear that the programmer is taking responsibility. I.e., in general, 
information sharing between subinterpreters would occur through message 
passing. But there would be the option of the programmer taking responsibility 
of memory management for a particular structure. In my case, the amount of work 
required for this would have been approximately zero — once the structure was 
created, it was needed for the lifetime of the process. 

Under this second solution, there would be little need to actually remove the 
reference counts from the data structures — they just wouldn’t be accessed. 
Maybe it’s not a practical solution, if only because of the overhead of Python 
needing to check whether a given structure is manually managed or not. In that 
case, the first solution makes more sense.

In any case I thought this was worth mentioning,  because it has been a real 
problem for me, and I assume it has been a real problem for other people as 
well. If a solution is both possible and practical, that would be great.

Thank you for listening,
Gary


-- 

Gary Robinson
gary...@me.com
http://www.garyrobinson.net

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

2015-09-08 Thread Gary Robinson
> I guess a third possible solution, although it would probably have
> meant developing something for yourself which would have hit the same
> "programmer time is critical" issue that you noted originally, would
> be to create a module that managed the data structure in shared
> memory, and then use that to access the data from the multiple
> processes.

I think you mean, write a non-python data structure in shared memory, such as 
writing it in C? If so, you’re right, I want to avoid the time overhead for 
writing something like that. Although I have used C data in shared-memory in 
the past when the data structure was simple enough. It’s not a foreign concept 
to me — it just would have been a real nuisance in this case.

An in-memory SQLLite database would have been too slow, at least if I used any 
kind of ORM. Without an ORM it still would have slowed things down while making 
for code that’s harder to read  and write. While I have used in-memory SQLite 
code at times, I’m not sure how much slowdown it would have engendered in this 
case. 

> Your suggestion (2), of having a non-refcounted data structure is
> essentially this, doable as an extension module. The core data
> structures all use refcounting, and that's unlikely to change, but
> there's nothing to say that an extension module couldn't implement
> fast data structures with objects allocated from a pool of
> preallocated memory which is only freed as a complete block.

Again, I think you’re talking about non-Python data structures, for instance C 
structures, which could be written to be “fast”? Again, I want to avoid writing 
that kind of code. Sure, for a production project where I had more programmer 
time, that would be a solution, but that wasn’t my situation. And, ideally, 
even if I had more time, I would greatly prefer not to have to spend it on that 
kind of code. I like Python because it saves me time and eliminates potential 
bugs that are associated with language like C but not with Python (primarily 
memory management related). To the extent that I have to write and debug 
external modules in C or C++, it doesn’t.

But, my view is: I shouldn’t be forced to even think about that kind of thing. 
Python should simply provide a solution. The fact that the reference counters 
are mixed in with the data structure, so that copy-on-write causes copies to be 
made of the data structure shouldn’t be something I should have to discover by 
trial and error, or by having deep knowledge of language and OS internals 
before I start a project, and then have to try to find a way to work around.

Obviously, Python, like any language, will always have limitations, and 
therefore it’s arguable that no one should say that any language “should” do 
anything it doesn’t do; if I don’t like it, I can use a more appropriate 
language. 

But these limitations aren’t obvious up-front. They make the language less 
predictable to people who don’t have a deep knowledge and just want to get 
something done and think Python (especially combined with things like SciPy) 
looks like a great choice to do them. And that confusion and uncertainty has to 
be bad for general language acceptance. I don’t see it as  “PR issue” — I see 
it as a practical issue having to do with the cost of knowledge acquisition. 
Indeed, I personally lost a lot of time because I didn’t understand them 
upfront!

Solving the problem I mention here would provide real benefits even with the 
current multiprocessing module. But it would also make the “A better story” 
subinterpreter idea a better solution than it would be without it. The 
subinterpreter multi-core solution is a major project — it seems like it would 
be a shame to create that solution and still have it not solve the problem 
discussed here.

Anyway, too much of this post is probably spent proseletyzing for my point of 
view. Members of python-dev can judge it as they think fit — I don’t have much 
more to say unless anyone has questions.

But if I’m missing something about the solutions mentioned by Paul, and they 
can be implemented in pure Python, I would be much appreciative if that could 
be explained!

Thanks,
Gary




-- 

Gary Robinson
gary...@me.com
http://www.garyrobinson.net

> On Sep 8, 2015, at 11:44 AM, Paul Moore <p.f.mo...@gmail.com> wrote:
> 
> On 8 September 2015 at 15:12, Gary Robinson <gary...@me.com> wrote:
>> So, one thing I am hoping comes out of any effort in the “A better story” 
>> direction would be a way to share large data structures between processes. 
>> Two possible solutions:
>> 
>> 1) More the reference counts away from data structures, so copy-on-write 
>> isn’t an issue. That sounds like a lot of work — I have no idea whether it’s 
>> practical. It has been mentioned in the “A better story” discussion, but I 
>> wanted to bring it up again in the context of my specific use

[Python-Dev] Possible C API problem?

2005-06-27 Thread Gary Robinson
Hello,

I was asking about a problem I was having over on the C++-python list, 
and they suggested I report it here as a possible Python problem.

I was getting bus errors with a C module I was linking to, so factored 
it down too a very small example that reproduced the problem. Here it 
is:


#include Python.h
static  double gfSumChiSquare = 123.0; 

static PyObject *
getSumChiSquare(PyObject *self, PyObject *args){
return Py_BuildValue(d, gfSumChiSquare);
}

static PyMethodDef SimMethods[] = {
{getSumChiSquare, getSumChiSquare, METH_NOARGS, Return 
fSumChiSquare},
{NULL, NULL, 0, NULL}/* Sentinel */
};

PyMODINIT_FUNC
inittestfloat(void)
{
(void) Py_InitModule(testfloat, SimMethods);
}

That caused a bus error 100% of the time when I simply imported the 
module into Python and called getSumChiSquare(), i.e.:

 import testfloat
 testfloat.getSumChiSquare()


However, the problem seems to go away if I use METH_VARARGS, and parse 
the non-existent args with 
PyArg_ParseTuple:

#include Python.h
static  double gfSumChiSquare = 123.0; 

static PyObject *
getSumChiSquare(PyObject *self, PyObject *args){
if (!PyArg_ParseTuple(args, , NULL))
return NULL;
return Py_BuildValue(d, gfSumChiSquare);
}

static PyMethodDef SimMethods[] = {
{getSumChiSquare, getSumChiSquare, METH_VARARGS, Return 
fSumChiSquare},
{NULL, NULL, 0, NULL}/* Sentinel */
};

PyMODINIT_FUNC
inittestfloat(void)
{
(void) Py_InitModule(testfloat, SimMethods);
}


This approach seems to work reliably -- at least variations I've tried 
haven't caused a bus error. But I haven't been able to discern an 
explanation from the docs as to why this would be better. The docs say 
that both METH_VARARGS and METH_NOARGS expect a PyCFunction. So if I am 
calling the function with no arguments, why can't I use METH_NOARGS and 
skip the call to PyArg_ParseTuple?

Could it be that this is a python bug? Or am I doing something wrong?

Note: this is using Python 2.3 on OS X:

Python 2.3 (#1, Sep 13 2003, 00:49:11) 

Thanks in advance for any help or insight you can give,

Gary


-- 

Gary Robinson
CTO
Emergent Music, LLC
[EMAIL PROTECTED]
207-942-3463
Company: http://www.goombah.com
Blog:http://www.garyrobinson.net
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible C API problem?

2005-06-27 Thread Gary Robinson
 It doesn't for me (CVS HEAD, OS X Panther).

Note sure what you mean CVS HEAD, you mean the latest python from 
cvs? 2.4? I'm still using the Apple python, which is straight 2.3.

 Have you, you know, tried to debug the situation yourself?  If you
 have gcc installed, you probably have gdb installed too...

It's been around 7 years since I've used C, I've forgotten virtually 
everything I may have known about gdb, I've never worked with the 
C-python API before... meanwhile there is intense time pressure to get 
the next release of our product (http://www.goombah.com) ready. So it's 
just not practical for me to take that on myself now. I'm hoping to get 
some help from other pythonistas where someone will say -- yes, it's 
getting a bus error for so-and-so reason, and if you do it this other 
way, you'll be fine...

Thanks,
Gary


-- 

Gary Robinson
CTO
Emergent Music, LLC
[EMAIL PROTECTED]
207-942-3463
Company: http://www.goombah.com
Blog:http://www.garyrobinson.net

On Mon, 27 Jun 2005 21:56:44 +0100, Michael Hudson wrote:
 Gary Robinson [EMAIL PROTECTED] writes:
 
  That caused a bus error 100% of the time when I simply imported the 
  module into Python and called getSumChiSquare(), i.e.:
 
  import testfloat
  testfloat.getSumChiSquare()
 
 It doesn't for me (CVS HEAD, OS X Panther).
 
  Could it be that this is a python bug? Or am I doing something wrong?
 
  Note: this is using Python 2.3 on OS X:
 
  Python 2.3 (#1, Sep 13 2003, 00:49:11) 
 
  Thanks in advance for any help or insight you can give,
 
 Have you, you know, tried to debug the situation yourself?  If you
 have gcc installed, you probably have gdb installed too...
 
 Cheers,
 mwh
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com