Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4205#issuecomment-71526760
  
    I left an initial pass of comments.  I haven't really dug into the details 
very much yet, but a couple of high-level comments:
    
    - There's a lot of code duplication in the Python code that creates the 
Java RDDs, so it would be nice to see if there's a way to refactor the code to 
remove this duplication.  My concern here is largely around future 
maintainability, since I'm worried that we'll see the copies of the code 
diverge when people make changes without being aware of the duplicate copies.
    - I'd like to avoid repeating the `Java*Like` pattern, since it doesn't 
look necessary here and it has caused problems in the past: see 
https://issues.scala-lang.org/browse/SI-8905 and 
https://issues.apache.org/jira/browse/SPARK-3266.
    
    Now that we're increasingly seeing Spark libraries being written in one JVM 
language and used from another (e.g. a Spark library written against the Java 
API and called from Scala), it might be nice to try to extend GraphX's Scala 
API to expose Java-friendly methods instead of adding a new Java API.  This is 
a major departure from how we've handled Java APIs up until now, but it might 
be a better long-term decision for new code.  I think @rxin may be able to 
chime in here with more details.  GraphX might be a nice context to explore 
this idea since it's a much smaller API than Spark as a whole.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to