[GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...

staslos Tue, 19 May 2015 12:33:07 -0700

Github user staslos commented on the pull request:

    https://github.com/apache/spark/pull/4434#issuecomment-103645571
  
    We've been using both Spark Core and Spark SQL for over 6 months by now. 
For sure, we're not an experts here, but we found that Spark Core better suits 
our data pipeline (as a Pig replacement), while Spark SQL is more analytical 
tool. When it's about data moving and transformation, we prefer Spark Core to 
Spark SQL's 'magic' because Spark Core is more stable, gives us more control 
over the process and more confidence.
    
    Also, last time I checked on Spark SQL, I couldn't achieve proper Avro 
schema evolution which is absolutely critical for our data pipeline dealing 
with different version of the same data. An ability to provide reader and 
writer schema is proceless. I couldn't find the way to do this in Spark SQL. 
Our data scientists have to use projection in Spark SQL to be able to read 
across different versions of data. Lucky them, they don't need to use all the 
fields and pass them down the pipeline. 
    
    Also, correct me if I'm wrong, Spark SQL is not production ready yet. Our 
latest upgrade from Spark 1.2.0 to 1.3.0 proved we were right sticking with 
Spark Core, at least for now, while our data scientists were going mad since 
their Spark SQL scripts stopped working with S3.
    
    Anyway, thank you guys, for doing the great job. Feel free to toss this 
pull request, I was just thinking back in February it could be useful for other 
people facing the same problem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...

Reply via email to