I don't entirely agree with that assessment.  Not paying for extra cores to
run receivers was about as important as delivery semantics, as far as
motivations for the api.

As I said in the jira tickets on the topic, if you want to use the direct
api and save offsets to ZK, you can.   The right way to make that easier is
to expose the (currently private) methods that already exist in
KafkaCluster.scala for committing offsets through Kafka's api.  I don't
think adding another "do the wrong thing" option is beneficial.

On Fri, Aug 14, 2015 at 11:34 AM, dutrow <[email protected]> wrote:

> In summary, it appears that the use of the DirectAPI was intended
> specifically to enable exactly-once semantics. This can be achieved for
> idempotent transformations and with transactional processing using the
> database to guarantee an "onto" mapping of results based on inputs. For the
> latter, you need to store your offsets in the database of record.
>
> If you as a developer do not necessarily need exactly-once semantics, then
> you can probably get by fine using the receiver API.
>
> The hope is that one day the Direct API could be augmented with
> Spark-abstracted offset storage (with zookeeper, kafka, or something else
> outside of the Spark checkpoint), since this would allow developers to
> easily take advantage of the Direct API's performance benefits and
> simplification of parallelism. I think it would be worth adding, even if it
> were to come with some "buyer beware" caveats.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246p24273.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to