[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

harishreedharan Tue, 24 Jun 2014 00:46:31 -0700

GitHub user harishreedharan opened a pull request:

    https://github.com/apache/spark/pull/1192


    SPARK-1730. Make receiver store data reliably to avoid data-loss on 
executor failures.

    Added a new method in Receiver, ReceiverSupervisor, ReceiverSupervisorImpl 
to store the data and callback a supplied function with a given argument.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harishreedharan/spark persist-data

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1192
    
----
commit 6d6776a45f30e3594a15bda2582f99819c28a583
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-09T06:16:56Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Currently Spark uses Flume's internal Avro Protocol to ingest data from 
Flume. If the executor running the
    receiver fails, it currently has to be restarted on the same node to be 
able to receive data.
    
    This commit adds a new Sink which can be deployed to a Flume agent. This 
sink can be polled by a new
    DStream that is also included in this commit. This model ensures that data 
can be pulled into Spark from
    Flume even if the receiver is restarted on a new node. This also allows the 
receiver to receive data on
    multiple threads for better performance.

commit d24d9d47795fe0a81fa2d70a4f81c24d2efd8914
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-18T07:58:45Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Update to the previous patch fixing some error cases and also excluding 
Netty dependencies. Also updated the unit tests.

commit 08176adc2a1a4f17562f486e0f897abfb7eba84d
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-18T08:06:22Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Exclude IO Netty in the Flume sink.

commit 03d6c1c45bb5e1e00ba0a3618b920481ec3ec51a
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-19T16:24:55Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Removing previousArtifact from build spec, so that the build runs fine.

commit 8df37e4911f74253a901502c9232c3db26dc8856
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-20T06:09:02Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Updated Maven build to be equivalent of the sbt build.

commit 87775aa52e21804680ed43dc4f789adf718ddb6c
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-21T00:42:40Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Fix build with maven.

commit 0f10788487f10234aa39277d4c20556f7c846796
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-05-24T08:32:32Z

    SPARK-1729. Make Flume pull data from source, rather than the current push 
model
    
    Added support for polling several Flume agents from a single receiver.

commit c604a3c0fee085679967460f50b563a8d58aedf1
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-06-05T16:17:05Z

    SPARK-1729. Optimize imports.

commit 9741683173c5dad3148c77d1a0f47b92387b8bdc
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-06-06T06:38:12Z

    SPARK-1729. Fixes based on review.

commit e7da5128be13130538e41fb5e976089e93f1e149
Author: Hari Shreedharan <hshreedha...@apache.org>
Date:   2014-06-06T06:43:13Z

    SPARK-1729. Fixing import order

commit d6fa3aa25e21be508c695067a858afd0d3ddbd64
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-10T05:27:19Z

    SPARK-1729. New Flume-Spark integration.
    
    Made the Flume Sink considerably simpler. Added a lot of documentation.

commit 70bcc2ad5b117324652e41f0331eb974ab696966
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-10T05:34:40Z

    SPARK-1729. New Flume-Spark integration.
    
    Renamed the SparkPollingEvent to SparkFlumePollingEvent.

commit 3c23c182fd8655e0f1a64cee64641f1cc803f7c2
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-10T23:20:40Z

    SPARK-1729. New Spark-Flume integration.
    
    Minor formatting changes.

commit 0d69604ae319610b9fde1b3a77fd8130f70b4ec2
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-16T19:44:12Z

    FLUME-1729. Better Flume-Spark integration.
    
    Use readFully instead of read in EventTransformer.

commit bda01fc18daae511603a526ca5fcd2ada97a3de4
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-17T22:15:36Z

    FLUME-1729. Flume-Spark integration.
    
    Refactoring classes into new files and minor changes in protocol.

commit 4b0c7fcdf654023f56d3e85b8d52ee1d049d8c65
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-18T05:47:49Z

    FLUME-1729. New Flume-Spark integration.
    
    Avro does not support inheritance, so the error message needs to be part of 
the message itself.

commit 205034dc78a8bda62e373101275cae1870875a21
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-18T06:32:01Z

    Merging master in

commit e13fab50a38d88f11021282e0da55dcaeab5a20c
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-24T07:41:20Z

    SPARK-1730. Make receiver store data reliably to avoid data-loss on 
executor failures.
    
    Added a new method in Receiver, ReceiverSupervisor, ReceiverSupervisorImpl 
to store the data
    and callback a supplied function with a given argument.

commit 038b644f1b35ffe10d13ef830e0baa02d5ef7bef
Author: Hari Shreedharan <harishreedha...@gmail.com>
Date:   2014-06-24T07:44:23Z

    Merge remote-tracking branch 'origin/master' into persist-data

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

Reply via email to