Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35040314
Yes, that's a much better suggestion. Thanks!
Github user velvia commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35039082
Uri,
What you can do is, in Scala, have an implicit conversion to your own
class, effectively extending SparkContext yourself. We do this for our own
Github user laserson closed the pull request at:
https://github.com/apache/incubator-spark/pull/576
Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35035595
Yes, I have since thought about it more and agree that this would actually
be a bad idea. No need to add additional dependencies on other specific file
formats
Github user pwendell commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35034019
Hey there - this is interesting but I don't think it's something that we
should put inside of SparkContext. It's nice code as an example thought for
other peopl
Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-34718389
No, this actually constructs Avro `GenericRecord` objects in memory. The
problem is that if you want access to the Parquet data through PySpark, there
is no ob
Github user velvia commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-34715532
My concern with this is that Parquet is typically used for high performance
OLAP queries, and changing it to JSON makes it much slower. Out of curiosity,
I have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-34665329
Can one of the admins verify this patch?
GitHub user laserson opened a pull request:
https://github.com/apache/incubator-spark/pull/576
Added parquetFileAsJSON to read Parquet data into JSON strings
This function makes it incredibly easy to read Parquet data especially with
PySpark. Is there any interest in this? It requ