Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/455#issuecomment-45432289
As an aside, I am generally -1 on adding a lot of specific reading/writing
code to Spark core.
My view is, that is why InputFormat/OutputFormat support is there - to
provide that custom read/write functionality. Now it makes sense for something
like Parquet with SparkSQL as the preferred format for efficiency (in much the
same way as SequenceFiles are often the preferred format in many Hadoop
pipelines), but should Spark core contain standardised methods for
.saveAsXXXFile for every format? IMO, no - the examples show how to do things
with common formats.
I can see why providing contrib modules for reading/writing structured
(RDBMS-like) data via common formats for SparkSQL makes sense, as there will
probably be one "correct" way of doing this.
But looking at the HBase PR you referenced, I don't see the value of having
that live in Spark. And why is it not simply using an ```OutputFormat```
instead of custom config and writing code? (I might be missing something here,
but it seems to add complexity and maintenance burden unnecessarily)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---