Re: Samza Job Slow to Restart

2017-10-24 Thread Liu Bo
met the same problem before and resolved with Yi's help, xD

On 20 October 2017 at 06:10, Yi Pan  wrote:

> Awesome that you have figured it out! Just a general notice: any logcompact
> topic used in Samza may see this slow-down if the Kafka log cleaner thread
> dies, which include checkpoint, coordinator stream, and changelog topics.
>
> Best!
>
> -Yi
>
> On Thu, Oct 19, 2017 at 12:14 PM, XiaoChuan Yu 
> wrote:
>
> > Hi,
> >
> > We were finally able to find out why the job takes so long to start.
> > There was higher than normal network IO during job startup and so we
> > checked size of the checkpoint topic on disk and it was ~21GB.
> > We then restarted the Kafka node who was the leader for the checkpoint
> > topic, the topic disk size went down to ~1.8GB and the job started up
> > fairly quickly.
> > Its probably due to a bug in Kafka where log cleaner died and we never
> > noticed: https://issues.apache.org/jira/browse/KAFKA-3894.
> > We have since been working on upgrading Kafka to avoid this bug.
> > Hope this helps if anyone else ever runs into it.
> >
> > Thanks,
> > Xiaochuan Yu
> >
> > On Sat, Sep 23, 2017 at 6:17 PM XiaoChuan Yu 
> wrote:
> >
> > > >> How long does it take?
> > > It took around 10 minute from "Got offset 0 for topic  topic>
> > > ..." to init() being called on the Task.
> > >
> > > >> Have you measured which parts of the start up sequence take the most
> > > time?
> > > >> - is it checkpoint restoration, or restore of local state?
> > > Should be checkpoint restoration. There is no local state for this job.
> > >
> > > >> If reading from the checkpoint topic takes the most time, then I'd
> > > >> recommend reading from the beginning from that topic, and
> benchmarking
> > > how
> > > >> long it takes? It'll also help to verify if the checkpoint topic is
> > > >> actually log-compacted.
> > > I'm not sure how to verify how much the topic is compacted by Kafka.
> > > The cleanup policy is to compact though.
> > >
> > > >> Do containers eventually start? Or does the start-up hang?
> > > >> If so, a thread dump will be useful.
> > > It does eventually start up.
> > >
> > > >> Can you please link and attach the entire log file for us to take a
> > > look?
> > > Unfortunately there is too much stuff for me to redact from the log
> right
> > > now.
> > > However, I can tell you that the job has two input topics both with the
> > > following settings:
> > > systems.kafka.streams.my-special-topic.samza.reset.offset=true
> > > systems.kafka.streams.my-special-topic.samza.offset.default=upcoming
> > > It was thought that this would speedup startup of the job to no avail.
> > >
> > > On Wed, Sep 20, 2017 at 3:21 PM Jagadish Venkatraman <
> > > jagadish1...@gmail.com> wrote:
> > >
> > >> Hi Xiaochuan,
> > >>
> > >> >> What does that loop do exactly?
> > >>
> > >> Most of what the run-loop does is documented in
> > >> https://samza.apache.org/learn/documentation/0.9/
> > container/event-loop.html
> > >>
> > >> >> We are running into a problem where it seems to take a very long
> time
> > >> to
> > >> restart a Samza job.
> > >>
> > >> Some follow-up questions,
> > >>
> > >> How long does it take?
> > >> Have you measured which parts of the start up sequence take the most
> > time?
> > >> - is it checkpoint restoration, or restore of local state?
> > >> If reading from the checkpoint topic takes the most time, then I'd
> > >> recommend reading from the beginning from that topic, and benchmarking
> > how
> > >> long it takes? It'll also help to verify if the checkpoint topic is
> > >> actually log-compacted.
> > >> Do containers eventually start? Or does the start-up hang? If so, a
> > thread
> > >> dump will be useful.
> > >> Can you please link and attach the entire log file for us to take a
> > look?
> > >>
> > >> >> 3. Any ideas on how to fix this?
> > >>
> > >> We can perhaps, try to narrow down where the time is spent in startup
> > from
> > >> the logs? Depending on that, I can suggest a fix :-)
> > >>
> > >> Thanks,
> > >> Jagadish
> > >>
> > >> On Wed, Sep 20, 2017 at 11:21 AM, XiaoChuan Yu 
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We are running into a problem where it seems to take a very long
> time
> > to
> > >> > restart a Samza job.
> > >> > We are using Samza 0.9.1 at the moment.
> > >> >
> > >> > From the logs for a particular container it looks like it has
> > something
> > >> to
> > >> > do with reading checkpoints from Kafka:
> > >> >
> > >> > 2017-09-20 03:21:02.060 INFO  o.a.s.c.kafka.KafkaCheckpointManager
> > >> [main]
> > >> > -
> > >> > Got offset 0 for topic __samza_checkpoint_ver_1_for_test-job_1 and
> > >> > partition 0. Attempting to fetch messages for checkpoint log.
> > >> > 2017-09-20 03:21:02.072 INFO  o.a.s.c.kafka.KafkaCheckpointManager
> > >> [main]
> > >> > -
> > >> > Get latest offset 42890599 for topic
> > >> > __samza_checkpoint_ver_1_for_test-job_1 and 

[GitHub] samza pull request #337: TestExecutionPlanner compilation error

2017-10-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/337


---


[GitHub] samza pull request #337: TestExecutionPlanner compilation error

2017-10-24 Thread sborya
GitHub user sborya opened a pull request:

https://github.com/apache/samza/pull/337

TestExecutionPlanner compilation error



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sborya/samza 
TestExecutionPlannerCompilationError

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #337


commit a31a7aa29b7be4bb46f8e651b6b8fa46a65b48e2
Author: Boris Shkolnik 
Date:   2017-10-16T22:25:49Z

reduce debugging from info to debug in KafkaCheckpointManager.java

commit 410ce78ba1ff8dafa2587481473e62ac9cfa6f4f
Author: Boris S 
Date:   2017-10-17T01:20:04Z

Merge branch 'master' of https://github.com/apache/samza

commit d4620d6690f74cad9472d0e27a1b31aeb4156c54
Author: Boris S 
Date:   2017-10-25T00:11:48Z

Merge branch 'master' of https://github.com/apache/samza

commit bbffb79b8b9799a41e8e82ded60f83550736886b
Author: Boris S 
Date:   2017-10-25T00:54:20Z

Merge branch 'master' of https://github.com/apache/samza

commit dc234698bc8a088294b2b696bce9ee9c5b8750e9
Author: Boris S 
Date:   2017-10-25T01:21:52Z

compilation issue




---


[GitHub] samza pull request #328: SAMZA-1457: Set retention for internal streams for ...

2017-10-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/328


---


[GitHub] samza pull request #336: StreamOperatorTask does not need to be final.

2017-10-24 Thread sborya
GitHub user sborya opened a pull request:

https://github.com/apache/samza/pull/336

StreamOperatorTask does not need to be final.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sborya/samza UnmakeStreamOperatorTaskFinal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #336


commit a31a7aa29b7be4bb46f8e651b6b8fa46a65b48e2
Author: Boris Shkolnik 
Date:   2017-10-16T22:25:49Z

reduce debugging from info to debug in KafkaCheckpointManager.java

commit 410ce78ba1ff8dafa2587481473e62ac9cfa6f4f
Author: Boris S 
Date:   2017-10-17T01:20:04Z

Merge branch 'master' of https://github.com/apache/samza

commit d4620d6690f74cad9472d0e27a1b31aeb4156c54
Author: Boris S 
Date:   2017-10-25T00:11:48Z

Merge branch 'master' of https://github.com/apache/samza

commit c44ee5852f4ee209a627387dcc2134eac32a45bf
Author: Boris S 
Date:   2017-10-25T00:13:06Z

StreamOperatorTask does not need to be final




---


[GitHub] samza pull request #335: StreamOperatorTask does not need to be final.

2017-10-24 Thread sborya
Github user sborya closed the pull request at:

https://github.com/apache/samza/pull/335


---


[GitHub] samza pull request #335: StreamOperatorTask does not need to be final.

2017-10-24 Thread sborya
GitHub user sborya opened a pull request:

https://github.com/apache/samza/pull/335

StreamOperatorTask does not need to be final.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sborya/samza master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #335


commit a31a7aa29b7be4bb46f8e651b6b8fa46a65b48e2
Author: Boris Shkolnik 
Date:   2017-10-16T22:25:49Z

reduce debugging from info to debug in KafkaCheckpointManager.java

commit 410ce78ba1ff8dafa2587481473e62ac9cfa6f4f
Author: Boris S 
Date:   2017-10-17T01:20:04Z

Merge branch 'master' of https://github.com/apache/samza

commit d4620d6690f74cad9472d0e27a1b31aeb4156c54
Author: Boris S 
Date:   2017-10-25T00:11:48Z

Merge branch 'master' of https://github.com/apache/samza




---


Review Request 63267: Add instructions to README.md

2017-10-24 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63267/
---

Review request for samza and Prateek Maheshwari.


Repository: samza-hello-samza


Description
---

Add instructions to README.md


Diffs
-

  README.md 0f80e9e52240abc071dcdfb56800826ef5d49d7d 
  build.gradle ec451d576a157eba2d8889b63b8574f397937197 


Diff: https://reviews.apache.org/r/63267/diff/1/


Testing
---


Thanks,

Xinyu Liu