[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

JoshRosen Wed, 10 Dec 2014 17:34:33 -0800

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3653#discussion_r21652896
  
    --- Diff: docs/streaming-custom-receivers.md ---
    @@ -191,9 +196,68 @@ The full source code is in the example 
[JavaCustomReceiver.java](https://github.
     </div>
     </div>
     
    -
    -
    -### Implementing and Using a Custom Actor-based Receiver
    +## Receiver Reliability
    +As discussed in brief in the
    +[Spark Streaming Programming 
Guide](streaming-programming-guide.html#receiver-reliability),
    +there are two kinds of receivers based on their reliability and 
fault-tolerance semantics.
    +
    +1. *Reliable Receiver* - For *reliable sources* that allow sent data to be 
acknowledged, a
    +  *reliable receiver* correctly acknowledges to the source that the data 
has been received
    +  and stored in Spark reliably (that is, replicated successfully). Usually,
    +  implementing this receiver involves careful consideration of the 
semantics of source
    +  acknowledgements.
    +1. *Unreliable Receiver* - These are receivers for unreliable sources that 
do not support
    +  acknowledging. Even for reliable sources, one may implement an 
unreliable receiver that
    +  do not go into the complexity of acknowledging correctly.
    +
    +To implement a *reliable receiver*, you have to use 
`store(multiple-records)` to store data.
    +This flavour of `store` is a blocking call which returns only after all 
the given records have
    +been stored inside Spark. If replication is enabled receiver's configured 
storage level
    +(enabled by default), then this call returns after replication has 
completed.
    +Thus it ensures that the data is reliably stored, and the receiver can now 
acknowledge the
    +source appropriately. This ensures that no data is caused when the 
receiver fails in the middle
    +of replicating data -- the buffered data will not be acknowledged and 
hence will be later resent
    +by the source.
    +
    +An *unreliable receiver* does not have to implement any of this logic. It 
can simply receive
    +records from the source and insert them one-at-a-time using 
`store(single-record)`. While it does
    +not get the reliability guarantees of `store(multiple-records)`, it has 
the following advantages.
    +
    +- The system takes care of chunking that data into appropriate sized 
blocks (look for block
    +interval in the [Spark Streaming Programming 
Guide](streaming-programming-guide.html)).
    +- The system takes care of controlling the receiving rates if the rate 
limits have been specified.
    +- Because of these two, *unreliable receivers are simpler to implement 
than reliable receivers.
    +
    +The following table summarizes the characteristics of both types of 
receivers
    +
    +<table class="table">
    +<tr>
    +  <th>Receiver Type</th>
    +  <th>Characteristics</th>
    +</tr>
    +<tr>
    +  <td><b>Unreliable Receivers</b></td>
    +  <td>
    +    Simple to implement.<br>
    +    System takes care of block generation and rate control.
    +    No fault-tolerance guarantees, can loose data on receiver failure.
    --- End diff --
    
    loose -> lose



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

Reply via email to