Patrick Wendell created SPARK-2709:
--------------------------------------

             Summary: Add a tool for certifying Spark API compatiblity
                 Key: SPARK-2709
                 URL: https://issues.apache.org/jira/browse/SPARK-2709
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
            Reporter: Patrick Wendell
            Assignee: Prashant Sharma


As Spark is packaged by more and more distributors, it would be good to have a 
tool that verifies API compatiblity of a provided Spark package. The tool would 
certify that a vendor distrubtion of Spark contains all of the API's present in 
a particular upstream Spark version.

This will help vendors make sure they remain "API compliant" when they make 
changes or back ports to Spark. It will also discourage vendors from knowingly 
breaking API's, because anyone can audit their distribution and see that they 
have removed support for certain API's.

I'm hoping a tool like this will avoid API fragmentation in the Spark community.

One "poor man's" implementation of this is that a vendor can just run the 
binary compatibility checks in the spark build against an upstream version of 
Spark. That's a pretty good start, but it means you can't come as a third party 
and audit a distribution.

Another approach would be to have something where anyone can come in and audit 
a distribution even if they don't have access to the packaging and source code. 
That would look something like this:

1. For each release we publish a manifest of all public API's (we might borrow 
the MIMA string representation of bye code signatures)
2. We package an auditing tool as a jar file.
3. The user runs a tool with spark-submit that reflectively walks through all 
exposed Spark API's and makes sure that everything on the manifest is 
encountered.

>From the implementation side, this is just brainstorming at this point.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to