Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/2994#issuecomment-97578847
I have a branch of the directStream api that caches consumers.
It had no noticeable impact on processing time. Even at 100 partitions /
200ms batches on a production-like workload.
On Wed, Apr 29, 2015 at 3:37 PM, Ben Fradet <[email protected]>
wrote:
> @tdas <https://github.com/tdas> @harishreedharan
> <https://github.com/harishreedharan> Any updates on this?
>
> Since we're incorporating Kafka 0.8.2.1
> <https://github.com/apache/spark/pull/4537> and that there is a new
> Producer API
>
<http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html>,
> there might be a need to start over.
>
> It might be interesting to think about pooling producers (as well as
> consumers, for that matter) also.
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/2994#issuecomment-97577120>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]