Re: KafkaInputDStream mapping of partitions to tasks

2014-05-04 Thread Aries
her issue. > > ssc batch creates new rdds every batch duration, always, even it previous > > computation did not finish. > > > > But with kafka, we can consume more rdds later, after we finish previous > > rdds. > > That way it would be much much simpler to not get OOM’ed when starting from > > beginning, > > because we can consume many data from kafka during batch duration and then > > get oom. > > > > But we just can not start slow, can not limit how many to consume during > > batch. > > > > > > > > > > -- > > > View this message in context: > > > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html > > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > > >

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-29 Thread Nicolas Bär
n it > previous computation did not finish. > > > > But with kafka, we can consume more rdds later, after we finish previous > rdds. > > That way it would be much much simpler to not get OOM'ed when starting > from beginning, > > because we can consume many data

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-28 Thread Evgeniy Shishkin
ch duration and then > get oom. > > But we just can not start slow, can not limit how many to consume during > batch. > > > > > > -- > > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
alances. But for spark cluster i think this time is not enough. If there was a way to wait every spark executor to start, rebalance, and only when start to consume, this issue would be less visible. > > -- > View this message in context: > http://apache-spark-user-list.1001560.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Scott Clasen
broke :| -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3391.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
ation, always, even it previous >> computation did not finish. >> >> But with kafka, we can consume more rdds later, after we finish previous >> rdds. >> That way it would be much much simpler to not get OOM’ed when starting from >> beginning, >> becau

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
dds later, after we finish previous rdds. > That way it would be much much simpler to not get OOM’ed when starting from > beginning, > because we can consume many data from kafka during batch duration and then > get oom. > > But we just can not start slow, can not limit how m

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Tathagata Das
can not limit how many to consume during > batch. > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
ause we can consume many data from kafka during batch duration and then get oom. But we just can not start slow, can not limit how many to consume during batch. > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mappin

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Scott Clasen
ing-of-partitions-to-tasks-tp3360p3379.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3374.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Scott Clasen
://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3374.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Patrick Wendell
1 task > doing work. I would have expected that each task took a subset of the > partitions. > > Is there a way to make more than one task share the work here? Are my > expectations off here? > > > > -- > View this message in context: > http://apache-spark-user-li

KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Scott Clasen
doing work. I would have expected that each task took a subset of the partitions. Is there a way to make more than one task share the work here? Are my expectations off here? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of