[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
GitHub user laserson opened a pull request: https://github.com/apache/incubator-spark/pull/576 Added parquetFileAsJSON to read Parquet data into JSON strings This function makes it incredibly easy to read Parquet data especially with PySpark. Is there any interest in this? It

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-34718389 No, this actually constructs Avro `GenericRecord` objects in memory. The problem is that if you want access to the Parquet data through PySpark, there is no

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35035595 Yes, I have since thought about it more and agree that this would actually be a bad idea. No need to add additional dependencies on other specific file

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson closed the pull request at: https://github.com/apache/incubator-spark/pull/576

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35040314 Yes, that's a much better suggestion. Thanks!

Installing PySpark on a local machine

2013-12-22 Thread Uri Laserson
m happy to contribute these, but want to hear what the preferred method is first. Uri -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com