[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35040314 Yes, that's a much better suggestion. Thanks!

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread velvia
Github user velvia commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35039082 Uri, What you can do is, in Scala, have an implicit conversion to your own class, effectively extending SparkContext yourself. We do this for our own

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson closed the pull request at: https://github.com/apache/incubator-spark/pull/576

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35035595 Yes, I have since thought about it more and agree that this would actually be a bad idea. No need to add additional dependencies on other specific file formats

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35034019 Hey there - this is interesting but I don't think it's something that we should put inside of SparkContext. It's nice code as an example thought for other peopl

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-34718389 No, this actually constructs Avro `GenericRecord` objects in memory. The problem is that if you want access to the Parquet data through PySpark, there is no ob

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread velvia
Github user velvia commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-34715532 My concern with this is that Parquet is typically used for high performance OLAP queries, and changing it to JSON makes it much slower. Out of curiosity, I have

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-34665329 Can one of the admins verify this patch?

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
GitHub user laserson opened a pull request: https://github.com/apache/incubator-spark/pull/576 Added parquetFileAsJSON to read Parquet data into JSON strings This function makes it incredibly easy to read Parquet data especially with PySpark. Is there any interest in this? It requ