Github user tigerquoll commented on the pull request:
https://github.com/apache/spark/pull/5250#issuecomment-87681300
If a user can write scala codes that appropriately deals with the problem,
why can't they write spark code to deal with it in parallel? Isn't this what
spark is about? Isn't this a problem that can be readily parallelised? Spark is
being put forward as data processing framework - bad data needs to be handled
in some way better then just refusing to have anything to do with it.
I believe to parallelise your mentioned solution means adding to the public
API, which takes time and consideration. The option was considered as a
scoped, quick fix solution to at least give users some ability to continue -
the idea would be to retire the option once a new API was in place to
gracefully deal with the problem.
In regards to the option being "presented to the users as a fine thing to
do when I don't believe it is" - how about providing the information to the
user a letting the users chose themselves? A good point about an option being a
public API though - what is the understanding about how stable options are? No
real Experimental or DeveloperAPI tags available here.
Your proposed solution was the same solution I ended up settling on when
first confronted with the issue - but only after a number of frustrated
attempts at getting spark to do what I wanted it to. What you proposed and
what I did In the end was to give up using spark and to bashing out some
standalone code using hadoop libraries to do the job. ie: Stopped using spark
and used another tool that made my job easier. I felt that it didn't have to
be this way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]