Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread Maciej Fijalkowski
Hi David

I'm sorry, it was not supposed to come as rude.

It seems that the blocker here is full numpy support which we're
working on right now, we can come back to that discussion once that's
ready

On Thu, Mar 24, 2016 at 6:31 PM, David Edelsohn  wrote:
> Maciej,
>
> How about a little more useful response of "we'll help you find the
> right audience for this discussion and collaborate with you to make
> the case."?
>
> - David
>
> On Thu, Mar 24, 2016 at 11:32 AM, Maciej Fijalkowski  wrote:
>> Ok fine, but we're not the receipents of such a message.
>>
>> Please lobby PSF for having a JIT, we all support that :-)
>>
>> On Thu, Mar 24, 2016 at 5:23 PM, John Camara  wrote:
>>> Hi Fijal,
>>>
>>> I understand where your coming from and not trying to convince you to work
>>> on it.  Just mainly trying to point out a need that may not be obvious to
>>> this community.  I don't spend much time on big data and analytics so I
>>> don't have a lot of time to devote to this task.  That could change in the
>>> future so you never know I may end up getting involved with this.
>>>
>>> At the end of the day I think it is the PSF, which needs to do an honest
>>> assessment of the current state of Python and in programming in general, so
>>> that they can help direct the future of Python.  I think with an honest
>>> assessment it should be clear that it is absolutely necessary that a dynamic
>>> language have a JIT. Otherwise, a language like Node would not be growing so
>>> quickly on the server side.  An honest assessment would conclude that Python
>>> needs to play a major role in big data and analytics as we don't want this
>>> to be another area where Python misses the boat.  As with all languages
>>> other than JavaScript we missed playing an important role on web front end.
>>> More recently we missed out on mobile.  I don't think it is good for us to
>>> miss out on big data.  It would be a shame since we had such a strong
>>> scientific community which initially gave us a huge advantage over other
>>> communities.  Missing out on big data might also be the driver that moves
>>> the scientific community in a different direction which would be a big loss
>>> to Python.
>>>
>>> I personally don't see any particular companies or industries that are
>>> willing to fund the tasks needed to solve these issues.  It's not to say
>>> there are no more funds for Python projects its just likely no one company
>>> will be willing to fund these kinds of projects on their own.  It really
>>> needs the PSF to coordinate these efforts but they seamed to be more focus
>>> on trying to make Python 3 a success instead of improving the overall health
>>> of the community.
>>>
>>> I believe that Python is in pretty good shape in being able to solve these
>>> issues but it just needs some funding and focus to get there.
>>>
>>> Hopefully the workshop will be successful and help create some focus.
>>>
>>> John
>>>
>>> On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski 
>>> wrote:

 Hi John

 Thanks for explaining the current situation of the ecosystem. I'm not
 quite sure what your intention is. PyPy (and CPython) is very easy to
 embed through any C-level API, especially with the latest additions to
 cffi embedding. If someone feels like doing the work to share stuff
 that way (as I presume a lot of data presented in JVM can be
 represented as some pointer and shape how to access it), then he's
 obviously more than free to do so, I'm even willing to help with that.
 Now this seems like a medium-to-big size project that additionally
 will require quite a bit of community will to endorse. Are you willing
 to volunteer to work on such a project and dedicate a lot of time to
 it? If not, then there is no way you can convince us to volunteer our
 own time to do it - it's just too big and quite a bit far out of our
 usual areas of interest. If there is some commercial interest (and I
 think there might be) in pushing python and especially pypy further in
 that area, we might want to have a better story for numpy first, but
 then feel free to send those corporate interest people my way, we can
 maybe organize something. If you want us to do community service to
 push Python solutions in the area I have very little clue about
 however, I would like to politely decline.

 Cheers,
 fijal

 On Thu, Mar 24, 2016 at 2:22 PM, John Camara 
 wrote:
 > Besides JPype and PyJNIus there is also https://www.py4j.org/.  I
 > haven't
 > heard of JPype being used in any recent projects so I assuming it is
 > outdated by now.  PyJNIus gets used but I tend to only see it used on
 > Android projects.  The Py4J project gets used often in
 > numerical/scientific
 > projects mainly due to it use in PySpark.  The problem 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
It turns out there is some work in progress in the Spark project to share
its memory with non JVM programs. See
https://issues.apache.org/jira/browse/SPARK-10399.  Once this is completed
it should be fairly trivial to expose it to Python and then maybe JIT
integration could be discussed at that time.  This is a huge step forward
over sharing Java objects.  From the title of the ticket it appears it
would be a c++ interface but looking at the pull request it looks like it
will be a c interface.

In the end the blocker may just come down to PyPy having complete support
for Numpy. Without Numpy the success of this would be somewhat limited
based on user expectations and without PyPy it maybe to slow for many
applications.

On Thu, Mar 24, 2016 at 1:11 PM, John Camara 
wrote:

> Hi Armin,
>
> At a minimum tighter execution is required as well as sharing memory.  But
> on the other hand you have raised the bar so high with cffi, having a clean
> and unbloated interface, that it would be nice if a library with a similar
> spirit existed for java. Having support in PyPy's JIT to remove all the
> marshalling types would be a big plus on top of the shared memory as well
> as some integration between the 2 GCs would likely be required.
>
> Maybe the best approach would be a combination of existing libraries and a
> new interface that allows for sharing of memory.  Maybe similar to numpy
> arrays with a better API that avoids the pit falls of numpy relying on
> CPython semantics/implementation details.  After all the only thing that
> needs to be eliminated is the copying/serialization of large data
> arrays/structures.
>
> John
>
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
Hi Armin,

At a minimum tighter execution is required as well as sharing memory.  But
on the other hand you have raised the bar so high with cffi, having a clean
and unbloated interface, that it would be nice if a library with a similar
spirit existed for java. Having support in PyPy's JIT to remove all the
marshalling types would be a big plus on top of the shared memory as well
as some integration between the 2 GCs would likely be required.

Maybe the best approach would be a combination of existing libraries and a
new interface that allows for sharing of memory.  Maybe similar to numpy
arrays with a better API that avoids the pit falls of numpy relying on
CPython semantics/implementation details.  After all the only thing that
needs to be eliminated is the copying/serialization of large data
arrays/structures.

John

On Thu, Mar 24, 2016 at 12:20 PM, Armin Rigo  wrote:

> Hi John,
>
> On 24 March 2016 at 13:22, John Camara  wrote:
> > (...)  Thus the need for a jffi library.
>
> When I hear "a jffi library" I'm thinking about a new library with a
> new API.  I think what you would really like instead is to keep the
> existing libraries, but adapt them internally to allow tighter
> execution of the Python and Java VMs.
>
> I may be completely wrong about that, but you're also talking to the
> wrong guys in the first place :-)
>
>
> A bientôt,
>
> Armin.
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread David Edelsohn
Maciej,

How about a little more useful response of "we'll help you find the
right audience for this discussion and collaborate with you to make
the case."?

- David

On Thu, Mar 24, 2016 at 11:32 AM, Maciej Fijalkowski  wrote:
> Ok fine, but we're not the receipents of such a message.
>
> Please lobby PSF for having a JIT, we all support that :-)
>
> On Thu, Mar 24, 2016 at 5:23 PM, John Camara  wrote:
>> Hi Fijal,
>>
>> I understand where your coming from and not trying to convince you to work
>> on it.  Just mainly trying to point out a need that may not be obvious to
>> this community.  I don't spend much time on big data and analytics so I
>> don't have a lot of time to devote to this task.  That could change in the
>> future so you never know I may end up getting involved with this.
>>
>> At the end of the day I think it is the PSF, which needs to do an honest
>> assessment of the current state of Python and in programming in general, so
>> that they can help direct the future of Python.  I think with an honest
>> assessment it should be clear that it is absolutely necessary that a dynamic
>> language have a JIT. Otherwise, a language like Node would not be growing so
>> quickly on the server side.  An honest assessment would conclude that Python
>> needs to play a major role in big data and analytics as we don't want this
>> to be another area where Python misses the boat.  As with all languages
>> other than JavaScript we missed playing an important role on web front end.
>> More recently we missed out on mobile.  I don't think it is good for us to
>> miss out on big data.  It would be a shame since we had such a strong
>> scientific community which initially gave us a huge advantage over other
>> communities.  Missing out on big data might also be the driver that moves
>> the scientific community in a different direction which would be a big loss
>> to Python.
>>
>> I personally don't see any particular companies or industries that are
>> willing to fund the tasks needed to solve these issues.  It's not to say
>> there are no more funds for Python projects its just likely no one company
>> will be willing to fund these kinds of projects on their own.  It really
>> needs the PSF to coordinate these efforts but they seamed to be more focus
>> on trying to make Python 3 a success instead of improving the overall health
>> of the community.
>>
>> I believe that Python is in pretty good shape in being able to solve these
>> issues but it just needs some funding and focus to get there.
>>
>> Hopefully the workshop will be successful and help create some focus.
>>
>> John
>>
>> On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski 
>> wrote:
>>>
>>> Hi John
>>>
>>> Thanks for explaining the current situation of the ecosystem. I'm not
>>> quite sure what your intention is. PyPy (and CPython) is very easy to
>>> embed through any C-level API, especially with the latest additions to
>>> cffi embedding. If someone feels like doing the work to share stuff
>>> that way (as I presume a lot of data presented in JVM can be
>>> represented as some pointer and shape how to access it), then he's
>>> obviously more than free to do so, I'm even willing to help with that.
>>> Now this seems like a medium-to-big size project that additionally
>>> will require quite a bit of community will to endorse. Are you willing
>>> to volunteer to work on such a project and dedicate a lot of time to
>>> it? If not, then there is no way you can convince us to volunteer our
>>> own time to do it - it's just too big and quite a bit far out of our
>>> usual areas of interest. If there is some commercial interest (and I
>>> think there might be) in pushing python and especially pypy further in
>>> that area, we might want to have a better story for numpy first, but
>>> then feel free to send those corporate interest people my way, we can
>>> maybe organize something. If you want us to do community service to
>>> push Python solutions in the area I have very little clue about
>>> however, I would like to politely decline.
>>>
>>> Cheers,
>>> fijal
>>>
>>> On Thu, Mar 24, 2016 at 2:22 PM, John Camara 
>>> wrote:
>>> > Besides JPype and PyJNIus there is also https://www.py4j.org/.  I
>>> > haven't
>>> > heard of JPype being used in any recent projects so I assuming it is
>>> > outdated by now.  PyJNIus gets used but I tend to only see it used on
>>> > Android projects.  The Py4J project gets used often in
>>> > numerical/scientific
>>> > projects mainly due to it use in PySpark.  The problem with all these
>>> > libraries is that they don't have a way to share large amounts of memory
>>> > between the JVM and Python VMs and so large chunks of data have to be
>>> > copied/serialized when going between the 2 VMs.
>>> >
>>> > Spark is the de facto standard in clustering computing at this point in
>>> > time.  At a high level Spark executes code that is 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread Armin Rigo
Hi John,

On 24 March 2016 at 13:22, John Camara  wrote:
> (...)  Thus the need for a jffi library.

When I hear "a jffi library" I'm thinking about a new library with a
new API.  I think what you would really like instead is to keep the
existing libraries, but adapt them internally to allow tighter
execution of the Python and Java VMs.

I may be completely wrong about that, but you're also talking to the
wrong guys in the first place :-)


A bientôt,

Armin.
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread Maciej Fijalkowski
Ok fine, but we're not the receipents of such a message.

Please lobby PSF for having a JIT, we all support that :-)

On Thu, Mar 24, 2016 at 5:23 PM, John Camara  wrote:
> Hi Fijal,
>
> I understand where your coming from and not trying to convince you to work
> on it.  Just mainly trying to point out a need that may not be obvious to
> this community.  I don't spend much time on big data and analytics so I
> don't have a lot of time to devote to this task.  That could change in the
> future so you never know I may end up getting involved with this.
>
> At the end of the day I think it is the PSF, which needs to do an honest
> assessment of the current state of Python and in programming in general, so
> that they can help direct the future of Python.  I think with an honest
> assessment it should be clear that it is absolutely necessary that a dynamic
> language have a JIT. Otherwise, a language like Node would not be growing so
> quickly on the server side.  An honest assessment would conclude that Python
> needs to play a major role in big data and analytics as we don't want this
> to be another area where Python misses the boat.  As with all languages
> other than JavaScript we missed playing an important role on web front end.
> More recently we missed out on mobile.  I don't think it is good for us to
> miss out on big data.  It would be a shame since we had such a strong
> scientific community which initially gave us a huge advantage over other
> communities.  Missing out on big data might also be the driver that moves
> the scientific community in a different direction which would be a big loss
> to Python.
>
> I personally don't see any particular companies or industries that are
> willing to fund the tasks needed to solve these issues.  It's not to say
> there are no more funds for Python projects its just likely no one company
> will be willing to fund these kinds of projects on their own.  It really
> needs the PSF to coordinate these efforts but they seamed to be more focus
> on trying to make Python 3 a success instead of improving the overall health
> of the community.
>
> I believe that Python is in pretty good shape in being able to solve these
> issues but it just needs some funding and focus to get there.
>
> Hopefully the workshop will be successful and help create some focus.
>
> John
>
> On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski 
> wrote:
>>
>> Hi John
>>
>> Thanks for explaining the current situation of the ecosystem. I'm not
>> quite sure what your intention is. PyPy (and CPython) is very easy to
>> embed through any C-level API, especially with the latest additions to
>> cffi embedding. If someone feels like doing the work to share stuff
>> that way (as I presume a lot of data presented in JVM can be
>> represented as some pointer and shape how to access it), then he's
>> obviously more than free to do so, I'm even willing to help with that.
>> Now this seems like a medium-to-big size project that additionally
>> will require quite a bit of community will to endorse. Are you willing
>> to volunteer to work on such a project and dedicate a lot of time to
>> it? If not, then there is no way you can convince us to volunteer our
>> own time to do it - it's just too big and quite a bit far out of our
>> usual areas of interest. If there is some commercial interest (and I
>> think there might be) in pushing python and especially pypy further in
>> that area, we might want to have a better story for numpy first, but
>> then feel free to send those corporate interest people my way, we can
>> maybe organize something. If you want us to do community service to
>> push Python solutions in the area I have very little clue about
>> however, I would like to politely decline.
>>
>> Cheers,
>> fijal
>>
>> On Thu, Mar 24, 2016 at 2:22 PM, John Camara 
>> wrote:
>> > Besides JPype and PyJNIus there is also https://www.py4j.org/.  I
>> > haven't
>> > heard of JPype being used in any recent projects so I assuming it is
>> > outdated by now.  PyJNIus gets used but I tend to only see it used on
>> > Android projects.  The Py4J project gets used often in
>> > numerical/scientific
>> > projects mainly due to it use in PySpark.  The problem with all these
>> > libraries is that they don't have a way to share large amounts of memory
>> > between the JVM and Python VMs and so large chunks of data have to be
>> > copied/serialized when going between the 2 VMs.
>> >
>> > Spark is the de facto standard in clustering computing at this point in
>> > time.  At a high level Spark executes code that is distributed
>> > throughout a
>> > cluster so that the code being executed is as close as possible to where
>> > the
>> > data lives so as to minimize transferring of large amounts of data.  The
>> > code that needs to be executed are packaged up into units called
>> > Resilient
>> > Distributed Dataset (RDD).  RDDs are lazy evaluated and 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
Hi Fijal,

I understand where your coming from and not trying to convince you to work
on it.  Just mainly trying to point out a need that may not be obvious to
this community.  I don't spend much time on big data and analytics so I
don't have a lot of time to devote to this task.  That could change in the
future so you never know I may end up getting involved with this.

At the end of the day I think it is the PSF, which needs to do an honest
assessment of the current state of Python and in programming in general, so
that they can help direct the future of Python.  I think with an honest
assessment it should be clear that it is absolutely necessary that a
dynamic language have a JIT. Otherwise, a language like Node would not be
growing so quickly on the server side.  An honest assessment would conclude
that Python needs to play a major role in big data and analytics as we
don't want this to be another area where Python misses the boat.  As with
all languages other than JavaScript we missed playing an important role on
web front end.  More recently we missed out on mobile.  I don't think it is
good for us to miss out on big data.  It would be a shame since we had such
a strong scientific community which initially gave us a huge advantage over
other communities.  Missing out on big data might also be the driver that
moves the scientific community in a different direction which would be a
big loss to Python.

I personally don't see any particular companies or industries that are
willing to fund the tasks needed to solve these issues.  It's not to say
there are no more funds for Python projects its just likely no one company
will be willing to fund these kinds of projects on their own.  It really
needs the PSF to coordinate these efforts but they seamed to be more focus
on trying to make Python 3 a success instead of improving the overall
health of the community.

I believe that Python is in pretty good shape in being able to solve these
issues but it just needs some funding and focus to get there.

Hopefully the workshop will be successful and help create some focus.

John

On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski 
wrote:

> Hi John
>
> Thanks for explaining the current situation of the ecosystem. I'm not
> quite sure what your intention is. PyPy (and CPython) is very easy to
> embed through any C-level API, especially with the latest additions to
> cffi embedding. If someone feels like doing the work to share stuff
> that way (as I presume a lot of data presented in JVM can be
> represented as some pointer and shape how to access it), then he's
> obviously more than free to do so, I'm even willing to help with that.
> Now this seems like a medium-to-big size project that additionally
> will require quite a bit of community will to endorse. Are you willing
> to volunteer to work on such a project and dedicate a lot of time to
> it? If not, then there is no way you can convince us to volunteer our
> own time to do it - it's just too big and quite a bit far out of our
> usual areas of interest. If there is some commercial interest (and I
> think there might be) in pushing python and especially pypy further in
> that area, we might want to have a better story for numpy first, but
> then feel free to send those corporate interest people my way, we can
> maybe organize something. If you want us to do community service to
> push Python solutions in the area I have very little clue about
> however, I would like to politely decline.
>
> Cheers,
> fijal
>
> On Thu, Mar 24, 2016 at 2:22 PM, John Camara 
> wrote:
> > Besides JPype and PyJNIus there is also https://www.py4j.org/.  I
> haven't
> > heard of JPype being used in any recent projects so I assuming it is
> > outdated by now.  PyJNIus gets used but I tend to only see it used on
> > Android projects.  The Py4J project gets used often in
> numerical/scientific
> > projects mainly due to it use in PySpark.  The problem with all these
> > libraries is that they don't have a way to share large amounts of memory
> > between the JVM and Python VMs and so large chunks of data have to be
> > copied/serialized when going between the 2 VMs.
> >
> > Spark is the de facto standard in clustering computing at this point in
> > time.  At a high level Spark executes code that is distributed
> throughout a
> > cluster so that the code being executed is as close as possible to where
> the
> > data lives so as to minimize transferring of large amounts of data.  The
> > code that needs to be executed are packaged up into units called
> Resilient
> > Distributed Dataset (RDD).  RDDs are lazy evaluated and are essential
> graphs
> > of the operations that need to be performed on the data.  They are
> capable
> > of reading data from many types of sources, outputting to multiple types
> of
> > sources, containing the code that needs to be executed, and are also
> > responsible to caching or keeping results in memory for future RDDs 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread Maciej Fijalkowski
Hi John

Thanks for explaining the current situation of the ecosystem. I'm not
quite sure what your intention is. PyPy (and CPython) is very easy to
embed through any C-level API, especially with the latest additions to
cffi embedding. If someone feels like doing the work to share stuff
that way (as I presume a lot of data presented in JVM can be
represented as some pointer and shape how to access it), then he's
obviously more than free to do so, I'm even willing to help with that.
Now this seems like a medium-to-big size project that additionally
will require quite a bit of community will to endorse. Are you willing
to volunteer to work on such a project and dedicate a lot of time to
it? If not, then there is no way you can convince us to volunteer our
own time to do it - it's just too big and quite a bit far out of our
usual areas of interest. If there is some commercial interest (and I
think there might be) in pushing python and especially pypy further in
that area, we might want to have a better story for numpy first, but
then feel free to send those corporate interest people my way, we can
maybe organize something. If you want us to do community service to
push Python solutions in the area I have very little clue about
however, I would like to politely decline.

Cheers,
fijal

On Thu, Mar 24, 2016 at 2:22 PM, John Camara  wrote:
> Besides JPype and PyJNIus there is also https://www.py4j.org/.  I haven't
> heard of JPype being used in any recent projects so I assuming it is
> outdated by now.  PyJNIus gets used but I tend to only see it used on
> Android projects.  The Py4J project gets used often in numerical/scientific
> projects mainly due to it use in PySpark.  The problem with all these
> libraries is that they don't have a way to share large amounts of memory
> between the JVM and Python VMs and so large chunks of data have to be
> copied/serialized when going between the 2 VMs.
>
> Spark is the de facto standard in clustering computing at this point in
> time.  At a high level Spark executes code that is distributed throughout a
> cluster so that the code being executed is as close as possible to where the
> data lives so as to minimize transferring of large amounts of data.  The
> code that needs to be executed are packaged up into units called Resilient
> Distributed Dataset (RDD).  RDDs are lazy evaluated and are essential graphs
> of the operations that need to be performed on the data.  They are capable
> of reading data from many types of sources, outputting to multiple types of
> sources, containing the code that needs to be executed, and are also
> responsible to caching or keeping results in memory for future RDDs that
> maybe executed.
>
> If you write all your code in Java or Scala, its execution will be performed
> in JVMs distributed in the cluster.  On the other hand, Spark does not limit
> its use to only Java based languages so Python can be used.  In the case of
> Python the PySpark library is used.  When Python is used, the PySpark
> library can be used to define the RDDs that will be executed under the JVM.
> In this scenario, only if required, the final results of the calculations
> will end up being passed to Python.  I say only if necessary as its possible
> the end results may just be left in memory or to create an output such as an
> hdfs file in hadoop and does not need to be transferred to Python. Under
> this scenario the code is written in Python but effectively all the "real"
> work is performed under the JVM.
>
> Often someone writing Python is also going to want to perform some of the
> operations under Python.  This can be done as the RDDs that are created can
> contain both operations that get performed under the JVM as well as Python
> (and of course other languages are supported).  When Python is involved
> Spark will start up Python VMs on the required nodes so that the Python
> portions of the work can be performed.  The Python VMs can either be
> CPython, PyPy or even a mix of both CPython and PyPy.  The downside to using
> non Java languages is the overhead of passing data between the JVM and the
> Python VM as the memory is not shared between the processes but instead
> copied/serialized between them.
>
> Because this data is copied between the 2 VMs, anyone who writes Python code
> for this environment always has to be conscious of the data being copied
> between the processes so as to not let the amount of the extra overhead
> become a large burden.  Quite often the goal will be to first perform the
> bulk of the operations under the JVM and then hopefully only a smaller
> subset of the data will have to be processed under Python.  If this can be
> done then the overhead can be minimized and then there is essential no down
> sides to using Python in the pipeline of operations.
>
> If your unfortunate and need to perform some of the processing early in the
> pipline under Python and worse yet if there is a need to go back and 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
Besides JPype and PyJNIus there is also https://www.py4j.org/.  I haven't
heard of JPype being used in any recent projects so I assuming it is
outdated by now.  PyJNIus gets used but I tend to only see it used on
Android projects.  The Py4J project gets used often in numerical/scientific
projects mainly due to it use in PySpark.  The problem with all these
libraries is that they don't have a way to share large amounts of memory
between the JVM and Python VMs and so large chunks of data have to be
copied/serialized when going between the 2 VMs.

Spark is the de facto standard in clustering computing at this point in
time.  At a high level Spark executes code that is distributed throughout a
cluster so that the code being executed is as close as possible to where
the data lives so as to minimize transferring of large amounts of data.
The code that needs to be executed are packaged up into units called
Resilient Distributed Dataset (RDD).  RDDs are lazy evaluated and are
essential graphs of the operations that need to be performed on the data.
They are capable of reading data from many types of sources, outputting to
multiple types of sources, containing the code that needs to be executed,
and are also responsible to caching or keeping results in memory for future
RDDs that maybe executed.

If you write all your code in Java or Scala, its execution will be
performed in JVMs distributed in the cluster.  On the other hand, Spark
does not limit its use to only Java based languages so Python can be used.
In the case of Python the PySpark library is used.  When Python is used,
the PySpark library can be used to define the RDDs that will be executed
under the JVM.  In this scenario, only if required, the final results of
the calculations will end up being passed to Python.  I say only if
necessary as its possible the end results may just be left in memory or to
create an output such as an hdfs file in hadoop and does not need to be
transferred to Python. Under this scenario the code is written in Python
but effectively all the "real" work is performed under the JVM.

Often someone writing Python is also going to want to perform some of the
operations under Python.  This can be done as the RDDs that are created can
contain both operations that get performed under the JVM as well as Python
(and of course other languages are supported).  When Python is involved
Spark will start up Python VMs on the required nodes so that the Python
portions of the work can be performed.  The Python VMs can either be
CPython, PyPy or even a mix of both CPython and PyPy.  The downside to
using non Java languages is the overhead of passing data between the JVM
and the Python VM as the memory is not shared between the processes but
instead copied/serialized between them.

Because this data is copied between the 2 VMs, anyone who writes Python
code for this environment always has to be conscious of the data being
copied between the processes so as to not let the amount of the extra
overhead become a large burden.  Quite often the goal will be to first
perform the bulk of the operations under the JVM and then hopefully only a
smaller subset of the data will have to be processed under Python.  If this
can be done then the overhead can be minimized and then there is essential
no down sides to using Python in the pipeline of operations.

If your unfortunate and need to perform some of the processing early in the
pipline under Python and worse yet if there is a need to go back and forth
many times between Python and Java the overhead of coping huge amounts of
data can significantly slow things down which essentially puts Python at a
disadvantage to Java.

If it was possible to change the model of execution such that it was
possible to embed the Python VM in the JVM or vice versa and that the
memory could be shared between the 2 VMs the downside of using Python in
this environment would be eliminated or at the very least minimized to the
point where it is no longer an issue.  Thus the need for a jffi library.

There is a strong desire by many to use dynamic languages in these
clustered environments and Python is likely in the best position to become
the language of choice due to its ability to work with C based libraries
and of course its syntax.  The issues that hold Python back at this point
is the serialization overhead, not so great state of packaging, and not
having both the speed of the JIT and complete access to numpy/scipy
ecosystem.

Luckily for Python at this point there is no other dynamic language that is
a clear winner today.  But if too much time passes before these issues are
solved I'm sure another language will step up to the plate.  At this point
my expectations is that Node could likely make a move.  It already has the
speed due to the Java Script JITs, it already has a great story for
packaging and deployment, and its growth is exploding on the server side
due to all the money being poured into it.  What it strongly lacks 

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread Hakan Ardo
On Mar 23, 2016 21:49, "Armin Rigo"  wrote:
>
> Hi John,
>
> On 23 March 2016 at 19:16, John Camara  wrote:
> > I would like to suggest one more topic for the workshop. I see a big
need
> > for a library (jffi) similar to cffi but that provides a bridge to Java
> > instead of C code. The ability to seamlessly work with native Java
data/code
> > would offer a huge improvement (...)
>
> Isn't it what JPype does?  Can you describe how it isn't suitable for
> your needs?

There is also PyJNIus:

https://pyjnius.readthedocs.org/en/latest/
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread Armin Rigo
Hi John,

On 23 March 2016 at 19:16, John Camara  wrote:
> I would like to suggest one more topic for the workshop. I see a big need
> for a library (jffi) similar to cffi but that provides a bridge to Java
> instead of C code. The ability to seamlessly work with native Java data/code
> would offer a huge improvement (...)

Isn't it what JPype does?  Can you describe how it isn't suitable for
your needs?


A bientôt,

Armin.
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread John Camara
Hi Fijal,

I agree that jffi would be both a large project and without someone leading
it, it would likely not get any where.  But I tend to disagree that it
would be a separate goal for the conference.  I realize the goal of the
summit is to talk about native-code compilation for Python and most would
argue that means executing C code, assembly, or at the very least executing
code at the speed of "C code".  But the reality now is,
numerical/scientific programming increasingly needs executing in a
clustered environment.  So I think we need to be careful to not only solve
yesterday's problems but make sure we are covering the current day and
future ones.

Today, big data and analytics, which is driving most numerical/scientific
programming, is becoming almost exclusively run in a clustered environment,
with the Apache Spark ecosystem as the de facto standard.  A few years
back, Python's ace up its sleeve for the scientific community was the
numpy/scipy ecosystem but we have recently lost that edge by falling behind
in clustered computing.  At this point in time our best move forward on the
numerical/scientific fronts is to become best buddies with the Spark
ecosystem and make sure we can bring bridge the numpy/scipy ecosystem to
it.  That is we merge the best of both worlds and suddenly Python becomes
to go to language again for numerical/scientific computing.  Of course we
still need to address what should have been yesterday's problem and deal
with the "native-code compilation" issues.

John

On Wed, Mar 23, 2016 at 2:47 PM, Maciej Fijalkowski 
wrote:

> Hi John
>
> I understand why you're bringing this up, but it's a huge project on
> it's own, worth at least a couple months worth of work. Without  a
> dedicated effort from someone I'm worried it would not go anywhere.
> It's kind of separated from the other goals of the summit
>
> On Wed, Mar 23, 2016 at 8:16 PM, John Camara 
> wrote:
> > Hi Nathaniel,
> >
> > I would like to suggest one more topic for the workshop. I see a big need
> > for a library (jffi) similar to cffi but that provides a bridge to Java
> > instead of C code. The ability to seamlessly work with native Java
> data/code
> > would offer a huge improvement when python code needs to work with the
> > Spark/Hadoop ecosystem. The current mechanisms which involve serializing
> > data to/from Java can kill performance for some applications and can
> render
> > Python unsuitable for these cases.
> >
> > John
> >
> > ___
> > pypy-dev mailing list
> > pypy-dev@python.org
> > https://mail.python.org/mailman/listinfo/pypy-dev
> >
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread Maciej Fijalkowski
Hi John

I understand why you're bringing this up, but it's a huge project on
it's own, worth at least a couple months worth of work. Without  a
dedicated effort from someone I'm worried it would not go anywhere.
It's kind of separated from the other goals of the summit

On Wed, Mar 23, 2016 at 8:16 PM, John Camara  wrote:
> Hi Nathaniel,
>
> I would like to suggest one more topic for the workshop. I see a big need
> for a library (jffi) similar to cffi but that provides a bridge to Java
> instead of C code. The ability to seamlessly work with native Java data/code
> would offer a huge improvement when python code needs to work with the
> Spark/Hadoop ecosystem. The current mechanisms which involve serializing
> data to/from Java can kill performance for some applications and can render
> Python unsuitable for these cases.
>
> John
>
> ___
> pypy-dev mailing list
> pypy-dev@python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread John Camara
Hi Nathaniel,

I would like to suggest one more topic for the workshop. I see a big need
for a library (jffi) similar to cffi but that provides a bridge to Java
instead of C code. The ability to seamlessly work with native Java
data/code would offer a huge improvement when python code needs to work
with the Spark/Hadoop ecosystem. The current mechanisms which involve
serializing data to/from Java can kill performance for some applications
and can render Python unsuitable for these cases.

John
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-22 Thread Nathaniel Smith
Hi all,

I wanted to announce a workshop I'm organizing at SciPy this year, and
invite you to attend!

What: A two-day workshop bringing together folks working on JIT/AOT
compilation in Python.

When/where: July 11-12, in Austin, Texas.

(This is co-located with SciPy 2016, at the same time as the tutorial
sessions, just before the conference proper.)

Website: https://python-compilers-workshop.github.io/

Note that I anticipate that we'll be able to get sponsorship funding
to cover travel costs for folks who can't get their employers to foot
the bill.

Cheers,
-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev