Dear Spark Users, GraphFrames 0.9.2 is out on PyPi <https://pypi.org/project/graphframes-py/> as graphframes-py <https://pypi.org/project/graphframes-py/> and as io.graphframes <https://central.sonatype.com/namespace/io.graphframes> on Maven Sonatype Central <https://central.sonatype.com/search?q=io.graphframes>! Documentation is now available on graphframes.io… and we even have a new logo! The new GraphFrames logo is new for this release :) GraphFrames is BACK!
You can see below that GraphFrames is back! It has seen contributions every week for most of the year — we have half a dozen active contributors now. This release is due to the efforts of many people but I need to express our deep gratitude to Sem Sinchenko <https://www.linkedin.com/in/semyon-a-sinchenko/>, who drove this release. GraphFrames is back! Semyon Sinchenko deserves the appreciation and respect of all GraphFrames users :)The project has gone from dead to lively since GraphX was deprecated from Spark — prompting us to work on a replacement. The project has gone from *effectively dead* to *vibrant* in the six months since GraphX was deprecated <https://lists.apache.org/thread/qrvo6xrt8zvp5ss73z5spt9q89r0htwo> from Spark, which prompted us to get to work on an all-DataFrame replacement. You can see in the chart below that there are more frequent contributions than since the project’s inception! After a six year gap in additions, GraphFrames is back with Spark Connect support! New Features in GraphFrames 0.9.2 It was necessary for GraphFrames to support both Spark 4 and Spark Connect to remain integral to the Spark community. There were many issues resolved <https://medium.com/r?url=https%3A%2F%2Fgithub.com%2Fgraphframes%2Fgraphframes%2Freleases%2Ftag%2Fv0.9.0> in the release, but the core of it was: - Spark Connect support <https://github.com/graphframes/graphframes/pull/506> - Spark 4.x support <https://github.com/graphframes/graphframes/pull/608> - Performance improvements in Connected Components <https://github.com/graphframes/graphframes/pull/552> - Updated API for Pregel <https://github.com/graphframes/graphframes/issues?q=is%3Aissue+state%3Aclosed+Pregel> - DataFrame implementation of LabelPropagation <https://graphframes.io/api/scala/org/graphframes/lib/LabelPropagation.html>, GraphX-free - DataFrame implementation of ShortestPaths <https://graphframes.io/api/scala/org/graphframes/lib/ShortestPaths.html>, GraphX-free - New groupId <https://central.sonatype.com/namespace/io.graphframes> io.graphframes - New PyPi ID <https://pypi.org/project/graphframes-py/>: graphframes-py - A new website: graphframes.io with Updated documentation <https://graphframes.io/> - A new Network Motif Finding Tutorial <https://graphframes.io/motif-tutorial.html> - A lot of additional changes and fixes <https://github.com/graphframes/graphframes/releases/tag/v0.9.0> State of the Union The GraphFrames community has achieved our first goal: make the project viable again! Still in the future? Property Graphs Sem has started implementing <https://github.com/graphframes/graphframes/pull/613> Property Graphs for GraphFrames, which currently has relationship for edges but not type for nodes. In current practice, this means property graph processing requires you to merge all your node schemas together into a kitchen sink schema before using GraphFrames’ algorithms. It is a real drag… property graphs will be a huge improvement! Sem recently outlined a beautiful vision <https://semyonsinchenko.github.io/ssinchenko/post/dreams-about-graph-in-lakehouse/> for property graphs as part of the Open Lakehouse. Check it out! Inclusion in Spark This is actively debated: it would be a lot of trouble to release with Spark, but based on the number of search hits for GraphX <https://www.google.com/search?q=GraphX> versus GraphFrames <https://www.google.com/search?q=GraphFrames>, it would get us 10x as many users. When I put that way, GraphFrames in Spark sounds pretty good! GraphX is Deprecated Spark deprecating GraphX was the call to action that led us to revive GraphFrames, and we heard it well. We’re building DataFrame implementations of all GraphX components. GraphX has already been removed from <https://github.com/graphframes/graphframes/pull/587> ShortestPaths <https://graphframes.io/api/python/graphframes.html#graphframes.GraphFrame.shortestPaths> and from <https://github.com/graphframes/graphframes/pull/558> LabelPropagation <https://graphframes.io/api/python/graphframes.html#graphframes.GraphFrame.labelPropagation>. The rest of the work is being tracked here <https://github.com/graphframes/graphframes/issues/556> and is underway. GraphX will be deprecated from GraphFrames as of 1.0. GraphFrames 2.0 will remove GraphX completely. Soon GraphFrames will be entirely built on DataFrames! The Sedona Alliance! Developers from Apache Sedona <https://sedona.apache.org/latest/> joined the development of GraphFrames 0.9. Sedona 1.80 will depend on <https://github.com/apache/sedona/pull/2098> the new version. They’ve been a huge help! James Willis <https://www.linkedin.com/in/james-willis/>, Adam Binford <https://www.linkedin.com/in/adam-binford-a10b0321/> and the Apache Sedona team gave us new configurations, helped us fix our CI to enable the 0.9 release and drove Spark 4 support. James Willis became an official maintainer of GraphFrames to coordinate efforts between these projects. New Contributors We have a lot of new contributors for this release! - @bjornjorgensen <https://github.com/bjornjorgensen> made their first contribution in #471 <https://github.com/graphframes/graphframes/pull/471> - @Nassizouz <https://github.com/Nassizouz> made their first contribution in #474 <https://github.com/graphframes/graphframes/pull/474> - @SauronShepherd <https://github.com/SauronShepherd> made their first contribution in #495 <https://github.com/graphframes/graphframes/pull/495> - @SemyonSinchenko <https://github.com/SemyonSinchenko> made their first contribution in #487 <https://github.com/graphframes/graphframes/pull/487> - @dmatrix <https://github.com/dmatrix> made their first contribution in #535 <https://github.com/graphframes/graphframes/pull/535> - @architch <https://github.com/architch> made their first contribution in #320 <https://github.com/graphframes/graphframes/pull/320> - @james-willis <https://github.com/james-willis> made their first contribution in #563 <https://github.com/graphframes/graphframes/pull/563> - @dependabot <https://github.com/dependabot>[bot] made their first contribution in #596 <https://github.com/graphframes/graphframes/pull/596> - @Conor0Callaghan <https://github.com/Conor0Callaghan> made their first contribution in #592 <https://github.com/graphframes/graphframes/pull/592> - @Kimahriman <https://github.com/Kimahriman> made their first contribution in #608 <https://github.com/graphframes/graphframes/pull/608> A Call for Help We are building a list of dependent projects <https://github.com/graphframes/graphframes/discussions/616>, so if you use GraphFrames, please let us know! We want your help testing new versions before the release. Got questions or concerns? Let us know what you think! Find us on Discord in #graphframes on GraphGeeks <https://discord.com/channels/1162999022819225631/1326257052368113674>, or join the GraphFrames Google Group <https://groups.google.com/g/graphframes/> . Note: this email updated originally appeared at https://blog.graphlet.ai/graphframes-is-back-with-v0-9-2-5773d55d3291 Thanks, Russell Jurney | rjur...@graphlet.ai | graphlet.ai | Graphlet AI Blog <https://blog.graphlet.ai/> | LinkedIn <https://linkedin.com/in/russelljurney> | BlueSky <https://bsky.app/profile/rjurney.bsky.social>