Maciej, How about a little more useful response of "we'll help you find the right audience for this discussion and collaborate with you to make the case."?
- David On Thu, Mar 24, 2016 at 11:32 AM, Maciej Fijalkowski <fij...@gmail.com> wrote: > Ok fine, but we're not the receipents of such a message. > > Please lobby PSF for having a JIT, we all support that :-) > > On Thu, Mar 24, 2016 at 5:23 PM, John Camara <john.m.cam...@gmail.com> wrote: >> Hi Fijal, >> >> I understand where your coming from and not trying to convince you to work >> on it. Just mainly trying to point out a need that may not be obvious to >> this community. I don't spend much time on big data and analytics so I >> don't have a lot of time to devote to this task. That could change in the >> future so you never know I may end up getting involved with this. >> >> At the end of the day I think it is the PSF, which needs to do an honest >> assessment of the current state of Python and in programming in general, so >> that they can help direct the future of Python. I think with an honest >> assessment it should be clear that it is absolutely necessary that a dynamic >> language have a JIT. Otherwise, a language like Node would not be growing so >> quickly on the server side. An honest assessment would conclude that Python >> needs to play a major role in big data and analytics as we don't want this >> to be another area where Python misses the boat. As with all languages >> other than JavaScript we missed playing an important role on web front end. >> More recently we missed out on mobile. I don't think it is good for us to >> miss out on big data. It would be a shame since we had such a strong >> scientific community which initially gave us a huge advantage over other >> communities. Missing out on big data might also be the driver that moves >> the scientific community in a different direction which would be a big loss >> to Python. >> >> I personally don't see any particular companies or industries that are >> willing to fund the tasks needed to solve these issues. It's not to say >> there are no more funds for Python projects its just likely no one company >> will be willing to fund these kinds of projects on their own. It really >> needs the PSF to coordinate these efforts but they seamed to be more focus >> on trying to make Python 3 a success instead of improving the overall health >> of the community. >> >> I believe that Python is in pretty good shape in being able to solve these >> issues but it just needs some funding and focus to get there. >> >> Hopefully the workshop will be successful and help create some focus. >> >> John >> >> On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski <fij...@gmail.com> >> wrote: >>> >>> Hi John >>> >>> Thanks for explaining the current situation of the ecosystem. I'm not >>> quite sure what your intention is. PyPy (and CPython) is very easy to >>> embed through any C-level API, especially with the latest additions to >>> cffi embedding. If someone feels like doing the work to share stuff >>> that way (as I presume a lot of data presented in JVM can be >>> represented as some pointer and shape how to access it), then he's >>> obviously more than free to do so, I'm even willing to help with that. >>> Now this seems like a medium-to-big size project that additionally >>> will require quite a bit of community will to endorse. Are you willing >>> to volunteer to work on such a project and dedicate a lot of time to >>> it? If not, then there is no way you can convince us to volunteer our >>> own time to do it - it's just too big and quite a bit far out of our >>> usual areas of interest. If there is some commercial interest (and I >>> think there might be) in pushing python and especially pypy further in >>> that area, we might want to have a better story for numpy first, but >>> then feel free to send those corporate interest people my way, we can >>> maybe organize something. If you want us to do community service to >>> push Python solutions in the area I have very little clue about >>> however, I would like to politely decline. >>> >>> Cheers, >>> fijal >>> >>> On Thu, Mar 24, 2016 at 2:22 PM, John Camara <john.m.cam...@gmail.com> >>> wrote: >>> > Besides JPype and PyJNIus there is also https://www.py4j.org/. I >>> > haven't >>> > heard of JPype being used in any recent projects so I assuming it is >>> > outdated by now. PyJNIus gets used but I tend to only see it used on >>> > Android projects. The Py4J project gets used often in >>> > numerical/scientific >>> > projects mainly due to it use in PySpark. The problem with all these >>> > libraries is that they don't have a way to share large amounts of memory >>> > between the JVM and Python VMs and so large chunks of data have to be >>> > copied/serialized when going between the 2 VMs. >>> > >>> > Spark is the de facto standard in clustering computing at this point in >>> > time. At a high level Spark executes code that is distributed >>> > throughout a >>> > cluster so that the code being executed is as close as possible to where >>> > the >>> > data lives so as to minimize transferring of large amounts of data. The >>> > code that needs to be executed are packaged up into units called >>> > Resilient >>> > Distributed Dataset (RDD). RDDs are lazy evaluated and are essential >>> > graphs >>> > of the operations that need to be performed on the data. They are >>> > capable >>> > of reading data from many types of sources, outputting to multiple types >>> > of >>> > sources, containing the code that needs to be executed, and are also >>> > responsible to caching or keeping results in memory for future RDDs that >>> > maybe executed. >>> > >>> > If you write all your code in Java or Scala, its execution will be >>> > performed >>> > in JVMs distributed in the cluster. On the other hand, Spark does not >>> > limit >>> > its use to only Java based languages so Python can be used. In the case >>> > of >>> > Python the PySpark library is used. When Python is used, the PySpark >>> > library can be used to define the RDDs that will be executed under the >>> > JVM. >>> > In this scenario, only if required, the final results of the >>> > calculations >>> > will end up being passed to Python. I say only if necessary as its >>> > possible >>> > the end results may just be left in memory or to create an output such >>> > as an >>> > hdfs file in hadoop and does not need to be transferred to Python. Under >>> > this scenario the code is written in Python but effectively all the >>> > "real" >>> > work is performed under the JVM. >>> > >>> > Often someone writing Python is also going to want to perform some of >>> > the >>> > operations under Python. This can be done as the RDDs that are created >>> > can >>> > contain both operations that get performed under the JVM as well as >>> > Python >>> > (and of course other languages are supported). When Python is involved >>> > Spark will start up Python VMs on the required nodes so that the Python >>> > portions of the work can be performed. The Python VMs can either be >>> > CPython, PyPy or even a mix of both CPython and PyPy. The downside to >>> > using >>> > non Java languages is the overhead of passing data between the JVM and >>> > the >>> > Python VM as the memory is not shared between the processes but instead >>> > copied/serialized between them. >>> > >>> > Because this data is copied between the 2 VMs, anyone who writes Python >>> > code >>> > for this environment always has to be conscious of the data being copied >>> > between the processes so as to not let the amount of the extra overhead >>> > become a large burden. Quite often the goal will be to first perform >>> > the >>> > bulk of the operations under the JVM and then hopefully only a smaller >>> > subset of the data will have to be processed under Python. If this can >>> > be >>> > done then the overhead can be minimized and then there is essential no >>> > down >>> > sides to using Python in the pipeline of operations. >>> > >>> > If your unfortunate and need to perform some of the processing early in >>> > the >>> > pipline under Python and worse yet if there is a need to go back and >>> > forth >>> > many times between Python and Java the overhead of coping huge amounts >>> > of >>> > data can significantly slow things down which essentially puts Python at >>> > a >>> > disadvantage to Java. >>> > >>> > If it was possible to change the model of execution such that it was >>> > possible to embed the Python VM in the JVM or vice versa and that the >>> > memory >>> > could be shared between the 2 VMs the downside of using Python in this >>> > environment would be eliminated or at the very least minimized to the >>> > point >>> > where it is no longer an issue. Thus the need for a jffi library. >>> > >>> > There is a strong desire by many to use dynamic languages in these >>> > clustered >>> > environments and Python is likely in the best position to become the >>> > language of choice due to its ability to work with C based libraries and >>> > of >>> > course its syntax. The issues that hold Python back at this point is >>> > the >>> > serialization overhead, not so great state of packaging, and not having >>> > both >>> > the speed of the JIT and complete access to numpy/scipy ecosystem. >>> > >>> > Luckily for Python at this point there is no other dynamic language that >>> > is >>> > a clear winner today. But if too much time passes before these issues >>> > are >>> > solved I'm sure another language will step up to the plate. At this >>> > point >>> > my expectations is that Node could likely make a move. It already has >>> > the >>> > speed due to the Java Script JITs, it already has a great story for >>> > packaging and deployment, and its growth is exploding on the server side >>> > due >>> > to all the money being poured into it. What it strongly lacks today is >>> > the >>> > connection to C/legacy code, numerical/scientific modules and of course >>> > it >>> > also does not have a solution to the data copying overhead it also has >>> > with >>> > the JVM. >>> > >>> > Any way, this is just my 2 cents on what is currently holding Python >>> > back >>> > from taking off in this space. >>> > >>> > On Thu, Mar 24, 2016 at 2:32 AM, Hakan Ardo <hakan.a...@gmail.com> >>> > wrote: >>> >> >>> >> >>> >> On Mar 23, 2016 21:49, "Armin Rigo" <ar...@tunes.org> wrote: >>> >> > >>> >> > Hi John, >>> >> > >>> >> > On 23 March 2016 at 19:16, John Camara <john.m.cam...@gmail.com> >>> >> > wrote: >>> >> > > I would like to suggest one more topic for the workshop. I see a >>> >> > > big >>> >> > > need >>> >> > > for a library (jffi) similar to cffi but that provides a bridge to >>> >> > > Java >>> >> > > instead of C code. The ability to seamlessly work with native Java >>> >> > > data/code >>> >> > > would offer a huge improvement (...) >>> >> > >>> >> > Isn't it what JPype does? Can you describe how it isn't suitable for >>> >> > your needs? >>> >> >>> >> There is also PyJNIus: >>> >> >>> >> https://pyjnius.readthedocs.org/en/latest/ >>> > >>> > >>> > >>> > _______________________________________________ >>> > pypy-dev mailing list >>> > pypy-dev@python.org >>> > https://mail.python.org/mailman/listinfo/pypy-dev >>> > >> >> > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev