Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3070#discussion_r19757373
  
    --- Diff: mllib/pom.xml ---
    @@ -46,6 +46,11 @@
           <version>${project.version}</version>
         </dependency>
         <dependency>
    +      <groupId>org.apache.spark</groupId>
    +      <artifactId>spark-sql_${scala.binary.version}</artifactId>
    --- End diff --
    
    I think it would be pretty difficult to have a SchemaRDD that didn't at 
least depend on catalyst and then there still would be no way to execute the 
projections and structured data input/output that MLlib wants to.  I think 
really the problem might be in naming.  Catalyst / Spark SQL core are really 
more about manipulating structured data using Spark and we actually considered 
not even having SQL in the name (unfortunately Spark Schema doesn't have the 
same ring to it).
    
    The SQL project has already been carefully factored into pieces to minimize 
the number of dependencies, and so I believe that the only additional 
dependency that we are bringing in here is Parquet (which is kind of the point 
of this example).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to