[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

koeninger Tue, 13 Jan 2015 06:46:48 -0800

Github user koeninger commented on the pull request:

    https://github.com/apache/spark/pull/3798#issuecomment-69754083
  
    1.  Yes, I removed ivy and maven cache, verified the example app failed to
    locate the dependency, re-published from the spark dev version, verified
    the example app now found it
    
    2.  Yes, I've tried spark-streaming both provided and included in the
    assembly.  The real issue is probably that spark-streaming-kafka can't be
    marked as provided and must be included in the assembly (more on this in a
    second)
    
    3. Yes, it's spark-submit to an instance of spark running out of the same
    spark dev version.
    
    So like I was saying about #2, the class that is failing to load,
     KafkaRDDPartition, is in the spark-streaming-kafka jar, not spark or
    spark-streaming, so it's not available by default.  It clearly will end up
    on the classpath when it's included in an application jar, because the
    committed working version of the code that checkpoints tuples can
    successfully convert to KafkaRDDPartition in restore().  It's just not
    available in the classloader that's reading the checkpoint.  Further
    evidence of this is that if I move only KafkaRDDPartition into the
    spark-streaming artifact, KafkaRDDPartition can be successfully read from
    the checkpoint.
    
    KafkaRDDPartition doesn't actually have any dependencies on anything other
    than Partition, so moving it into spark-streaming might be a solution...
    your call on whether you think that's uglier than saving to / from tuples,
    or if you want to dig further into the classloader issue.
    
    On Mon, Jan 12, 2015 at 10:53 PM, Tathagata Das <[email protected]>
    wrote:
    
    > Can you confirm the following.
    > 1. In your SBT/maven app used for testing, you are using your development
    > Spark version to compile? That is, the dev version is locally publish and
    > you are compiling your app against spark version 1.3.0-SNAPSHOT?
    > 2. Do you have spark-streaming dependency as "provided" scope or the
    > default "compile" scope? And then are you creating uber jar of the app?
    > 3. Are you submitting the app through spark-submit to the same development
    > Spark version to compile?
    >
    > On Mon, Jan 12, 2015 at 2:13 PM, Cody Koeninger <[email protected]>
    >
    > wrote:
    >
    > > Yeah, this is on a local development version, after assembly / publish
    > > local.
    > >
    > > Here's a gist of the exception and the diff that causes it (using
    > > KafkaRDDPartition instead of a tuple)
    > >
    > > https://gist.github.com/koeninger/561a61482cd1b5b3600c
    > >
    > > â
    > > Reply to this email directly or view it on GitHub
    > > <https://github.com/apache/spark/pull/3798#issuecomment-69656800>.
    > >
    >
    > â
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/3798#issuecomment-69695353>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

Reply via email to