Ok fine, but we're not the receipents of such a message. Please lobby PSF for having a JIT, we all support that :-)
On Thu, Mar 24, 2016 at 5:23 PM, John Camara <john.m.cam...@gmail.com> wrote: > Hi Fijal, > > I understand where your coming from and not trying to convince you to work > on it. Just mainly trying to point out a need that may not be obvious to > this community. I don't spend much time on big data and analytics so I > don't have a lot of time to devote to this task. That could change in the > future so you never know I may end up getting involved with this. > > At the end of the day I think it is the PSF, which needs to do an honest > assessment of the current state of Python and in programming in general, so > that they can help direct the future of Python. I think with an honest > assessment it should be clear that it is absolutely necessary that a dynamic > language have a JIT. Otherwise, a language like Node would not be growing so > quickly on the server side. An honest assessment would conclude that Python > needs to play a major role in big data and analytics as we don't want this > to be another area where Python misses the boat. As with all languages > other than JavaScript we missed playing an important role on web front end. > More recently we missed out on mobile. I don't think it is good for us to > miss out on big data. It would be a shame since we had such a strong > scientific community which initially gave us a huge advantage over other > communities. Missing out on big data might also be the driver that moves > the scientific community in a different direction which would be a big loss > to Python. > > I personally don't see any particular companies or industries that are > willing to fund the tasks needed to solve these issues. It's not to say > there are no more funds for Python projects its just likely no one company > will be willing to fund these kinds of projects on their own. It really > needs the PSF to coordinate these efforts but they seamed to be more focus > on trying to make Python 3 a success instead of improving the overall health > of the community. > > I believe that Python is in pretty good shape in being able to solve these > issues but it just needs some funding and focus to get there. > > Hopefully the workshop will be successful and help create some focus. > > John > > On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski <fij...@gmail.com> > wrote: >> >> Hi John >> >> Thanks for explaining the current situation of the ecosystem. I'm not >> quite sure what your intention is. PyPy (and CPython) is very easy to >> embed through any C-level API, especially with the latest additions to >> cffi embedding. If someone feels like doing the work to share stuff >> that way (as I presume a lot of data presented in JVM can be >> represented as some pointer and shape how to access it), then he's >> obviously more than free to do so, I'm even willing to help with that. >> Now this seems like a medium-to-big size project that additionally >> will require quite a bit of community will to endorse. Are you willing >> to volunteer to work on such a project and dedicate a lot of time to >> it? If not, then there is no way you can convince us to volunteer our >> own time to do it - it's just too big and quite a bit far out of our >> usual areas of interest. If there is some commercial interest (and I >> think there might be) in pushing python and especially pypy further in >> that area, we might want to have a better story for numpy first, but >> then feel free to send those corporate interest people my way, we can >> maybe organize something. If you want us to do community service to >> push Python solutions in the area I have very little clue about >> however, I would like to politely decline. >> >> Cheers, >> fijal >> >> On Thu, Mar 24, 2016 at 2:22 PM, John Camara <john.m.cam...@gmail.com> >> wrote: >> > Besides JPype and PyJNIus there is also https://www.py4j.org/. I >> > haven't >> > heard of JPype being used in any recent projects so I assuming it is >> > outdated by now. PyJNIus gets used but I tend to only see it used on >> > Android projects. The Py4J project gets used often in >> > numerical/scientific >> > projects mainly due to it use in PySpark. The problem with all these >> > libraries is that they don't have a way to share large amounts of memory >> > between the JVM and Python VMs and so large chunks of data have to be >> > copied/serialized when going between the 2 VMs. >> > >> > Spark is the de facto standard in clustering computing at this point in >> > time. At a high level Spark executes code that is distributed >> > throughout a >> > cluster so that the code being executed is as close as possible to where >> > the >> > data lives so as to minimize transferring of large amounts of data. The >> > code that needs to be executed are packaged up into units called >> > Resilient >> > Distributed Dataset (RDD). RDDs are lazy evaluated and are essential >> > graphs >> > of the operations that need to be performed on the data. They are >> > capable >> > of reading data from many types of sources, outputting to multiple types >> > of >> > sources, containing the code that needs to be executed, and are also >> > responsible to caching or keeping results in memory for future RDDs that >> > maybe executed. >> > >> > If you write all your code in Java or Scala, its execution will be >> > performed >> > in JVMs distributed in the cluster. On the other hand, Spark does not >> > limit >> > its use to only Java based languages so Python can be used. In the case >> > of >> > Python the PySpark library is used. When Python is used, the PySpark >> > library can be used to define the RDDs that will be executed under the >> > JVM. >> > In this scenario, only if required, the final results of the >> > calculations >> > will end up being passed to Python. I say only if necessary as its >> > possible >> > the end results may just be left in memory or to create an output such >> > as an >> > hdfs file in hadoop and does not need to be transferred to Python. Under >> > this scenario the code is written in Python but effectively all the >> > "real" >> > work is performed under the JVM. >> > >> > Often someone writing Python is also going to want to perform some of >> > the >> > operations under Python. This can be done as the RDDs that are created >> > can >> > contain both operations that get performed under the JVM as well as >> > Python >> > (and of course other languages are supported). When Python is involved >> > Spark will start up Python VMs on the required nodes so that the Python >> > portions of the work can be performed. The Python VMs can either be >> > CPython, PyPy or even a mix of both CPython and PyPy. The downside to >> > using >> > non Java languages is the overhead of passing data between the JVM and >> > the >> > Python VM as the memory is not shared between the processes but instead >> > copied/serialized between them. >> > >> > Because this data is copied between the 2 VMs, anyone who writes Python >> > code >> > for this environment always has to be conscious of the data being copied >> > between the processes so as to not let the amount of the extra overhead >> > become a large burden. Quite often the goal will be to first perform >> > the >> > bulk of the operations under the JVM and then hopefully only a smaller >> > subset of the data will have to be processed under Python. If this can >> > be >> > done then the overhead can be minimized and then there is essential no >> > down >> > sides to using Python in the pipeline of operations. >> > >> > If your unfortunate and need to perform some of the processing early in >> > the >> > pipline under Python and worse yet if there is a need to go back and >> > forth >> > many times between Python and Java the overhead of coping huge amounts >> > of >> > data can significantly slow things down which essentially puts Python at >> > a >> > disadvantage to Java. >> > >> > If it was possible to change the model of execution such that it was >> > possible to embed the Python VM in the JVM or vice versa and that the >> > memory >> > could be shared between the 2 VMs the downside of using Python in this >> > environment would be eliminated or at the very least minimized to the >> > point >> > where it is no longer an issue. Thus the need for a jffi library. >> > >> > There is a strong desire by many to use dynamic languages in these >> > clustered >> > environments and Python is likely in the best position to become the >> > language of choice due to its ability to work with C based libraries and >> > of >> > course its syntax. The issues that hold Python back at this point is >> > the >> > serialization overhead, not so great state of packaging, and not having >> > both >> > the speed of the JIT and complete access to numpy/scipy ecosystem. >> > >> > Luckily for Python at this point there is no other dynamic language that >> > is >> > a clear winner today. But if too much time passes before these issues >> > are >> > solved I'm sure another language will step up to the plate. At this >> > point >> > my expectations is that Node could likely make a move. It already has >> > the >> > speed due to the Java Script JITs, it already has a great story for >> > packaging and deployment, and its growth is exploding on the server side >> > due >> > to all the money being poured into it. What it strongly lacks today is >> > the >> > connection to C/legacy code, numerical/scientific modules and of course >> > it >> > also does not have a solution to the data copying overhead it also has >> > with >> > the JVM. >> > >> > Any way, this is just my 2 cents on what is currently holding Python >> > back >> > from taking off in this space. >> > >> > On Thu, Mar 24, 2016 at 2:32 AM, Hakan Ardo <hakan.a...@gmail.com> >> > wrote: >> >> >> >> >> >> On Mar 23, 2016 21:49, "Armin Rigo" <ar...@tunes.org> wrote: >> >> > >> >> > Hi John, >> >> > >> >> > On 23 March 2016 at 19:16, John Camara <john.m.cam...@gmail.com> >> >> > wrote: >> >> > > I would like to suggest one more topic for the workshop. I see a >> >> > > big >> >> > > need >> >> > > for a library (jffi) similar to cffi but that provides a bridge to >> >> > > Java >> >> > > instead of C code. The ability to seamlessly work with native Java >> >> > > data/code >> >> > > would offer a huge improvement (...) >> >> > >> >> > Isn't it what JPype does? Can you describe how it isn't suitable for >> >> > your needs? >> >> >> >> There is also PyJNIus: >> >> >> >> https://pyjnius.readthedocs.org/en/latest/ >> > >> > >> > >> > _______________________________________________ >> > pypy-dev mailing list >> > pypy-dev@python.org >> > https://mail.python.org/mailman/listinfo/pypy-dev >> > > > _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev