[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

tigerquoll Mon, 30 Mar 2015 02:52:39 -0700

Github user tigerquoll commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5250#discussion_r27377569
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
    @@ -246,6 +249,15 @@ class HadoopRDD[K, V](
             } catch {
               case eof: EOFException =>
                 finished = true
    +          case e: Exception =>
    --- End diff --
    
    Having been on the receiving end of things I know that the gzip module 
throws an IOException, but unfortunately I have no knowledge over what the 
Hadoop input modules and what exceptions they throw, or if they propagate 
exceptions up from other 3rd party libraries.   Catching such a broad exception 
is mitigated by the fact that this particular option defaults to off, and 
should only be enabled when you are trying to parse files that you know are 
corrupt.  Given the situation, when you turn the option on, we should really 
try to finish processing files to the best of our ability, thus I think in this 
case catching 'Exception' might be appropriate.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

Reply via email to