Github user tigerquoll commented on the pull request:

    https://github.com/apache/spark/pull/5250#issuecomment-87681300
  
    If a user can write scala codes that appropriately deals with the problem, 
why can't they write spark code to deal with it in parallel? Isn't this what 
spark is about? Isn't this a problem that can be readily parallelised? Spark is 
being put forward as data processing framework - bad data needs to be handled 
in some way better then just refusing to have anything to do with it.
    
    I believe to parallelise your mentioned solution means adding to the public 
API, which takes time and consideration.  The option was considered as a 
scoped, quick fix solution to at least give users some ability to continue - 
the idea would be to retire the option once a new API was in place to 
gracefully deal with the problem.
    
    In regards to the option being "presented to the users as a fine thing to 
do when I don't believe it is" - how about providing the information to the 
user a letting the users chose themselves? A good point about an option being a 
public API though - what is the understanding about how stable options are? No 
real Experimental or DeveloperAPI tags available here.
    
    Your proposed solution was the same solution I ended up settling on when 
first confronted with the issue - but only after a number of frustrated 
attempts at getting spark to do what I wanted it to.  What you proposed and 
what I did In the end was to give up using spark and to bashing out some 
standalone code using hadoop libraries to do the job.  ie: Stopped using spark 
and used another tool that made my job easier.  I felt that it didn't have to 
be this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to