Can you post full log of one the workers/executors that reads at least one record? You can remove application logs.
On Mon, Nov 6, 2017 at 12:45 PM, NerdyNick <[email protected]> wrote: > It's all partitions and it's random on how far behind they show. The topic > in question has a low events/sec but over sized partition count. So not > every partition has data to be read the full length of time. So given this, > it's almost like the partition is drained before the batch end read time. > Which seems to result in the consumer not posting an update to the offset > within that batch. Because the offsets will get posted just not with the > rate of the batches. So it'll look like it falls behind then somewhat > catches up. > > On Mon, Nov 6, 2017 at 12:53 PM, Raghu Angadi <[email protected]> wrote: > >> On Mon, Nov 6, 2017 at 10:39 AM, NerdyNick <[email protected]> wrote: >> >>> Hey Raghu, >>> >>> Runner is Spark on Yarn cluster. Executor/Thread would be the >>> Thread(Core) within a Spark Executor. I have nothing I can find in the logs >>> saying that they aren't being closed cleanly outside of forced shutdown. >>> Interval I have now is 30ms with a min read time of 2secs. Watching the >>> finished tasks for the read job shows each thread running for the 2secs, >>> some die early with no log as to why. However watching the offset within >>> Kafka Manager shows the lag growing, as if the commit isn't happening. >>> However the event throughput shows it matching that of the kafka topic msgs >>> inbound. I can't seem to find a working option for getting the offsets >>> KafkaIO believes it has out of either to compare numbers. I tried the Spark >>> Metrics sinks provided by the runner, as KafkaIO appears to provide it via >>> those. But they don't appear to work and actually caused issues, no longer >>> appeared, with the Streaming stats Spark maintains. >>> >> >> May be one or two Kafka partitions aren't being consumed (due to some >> error). You could check on Kafka if all the partitions are behind or just >> some. Otherwise what you describe sounds like bugs/issues in the pipeline >> (spark or user). If feasible, you can try running with direct-runner to see >> if auto_commit does not advance. >> >> I would also encourage you to file bugs about the missing metrics in >> spark (and/or report here on another thread). >> >> Raghu. >> >> >>> Consumer group name is consistent as a bash script is being used to >>> launch jobs to guarantee uniformity between launches. >>> >>> From what I've been able to dissect from the KafkaIO class. I believe >>> the watermark management layer via the PartitionState might be the best >>> tiein for doing manual kafka offset management via the >>> Consumer.commitSync() or Consumer.commitAsync() methods. This would allow >>> better uniform and consistent reporting of offset in sync with the values >>> being maintained in the watermark. >>> >>> On Mon, Nov 6, 2017 at 11:11 AM, Raghu Angadi <[email protected]> >>> wrote: >>> >>>> What is the runner? >>>> Can you elaborate a bit more what you mean by 'executor/thread >>>> shutting down'? If the KafkaIO reader shutdown cleanly, it would call close >>>> the consumer cleanly, triggering auto commit. But if you shutdown a >>>> pipeline, it might not cleanly close the consumers. What is the auto_commit >>>> interval you have? Please note that there is no way to coordinate >>>> consistency between Beam pipeline and externally maintained auto commit >>>> offsets since it is outside Beam. 'Drain' feature Dataflow can help (it >>>> lets a clean shutdown of the pipeline), also note that many runners provide >>>> clean ways to update a pipeline that keeps all the state from previous run >>>> (in this case Kafka offsets), which is the only way for Beam to provide its >>>> processing guarantees across runs. >>>> >>>> KafkaIO leaves auto_commit handling entirely to KafkaConsumer. If you >>>> are seeing consumer is not honoring the auto_committed offset, please check >>>> the log from KafkConsumer on the worker. Only user error I could think of >>>> is some typo in consumer group name upon restart. >>>> >>>> Currently KafkaIO does not actively participate in auto_commit >>>> management. It lets user directly set KafkaConsumer configuration. May be >>>> there is a case for some more active support for auto_commit management. >>>> Please provide more details in your case so that we can discuss actual >>>> specifics and potential improvements it provides. >>>> >>>> >>>> On Mon, Nov 6, 2017 at 8:15 AM, NerdyNick <[email protected]> wrote: >>>> >>>>> There seems to be a lot of oddities with the auto offset committer and >>>>> the watermark management as well as kafka offsets in general. >>>>> >>>>> One issue I keep having is the auto committer will just not commit any >>>>> offsets. So the topic will look like it's backing up. From what I've been >>>>> able to trace on it it appears to be in relation to the executor/thread >>>>> shutting down before the auto commit has a chance to run. Even though the >>>>> min read times are set. It still prematurely shuts down. Turning auto >>>>> commit interval down seems to help but doesn't resolve the issue. Just >>>>> seems to allow it to correct itself much quicker. >>>>> >>>>> Another I just had happen is after restarting a pipeline the auto >>>>> committed offsets reset to the earliest record and the pipeline appears to >>>>> be working on those records. Which is odd in contrary to a lot of things. >>>>> When I shut the pipeline down it was only a few thousand records behind. >>>>> The consumer is configured to start at the latest offset not the earliest. >>>>> Give that It would appear the recorded watermarks had an odd corruption or >>>>> something where they believed they where in the past. >>>>> >>>>> -- >>>>> Nick Verbeck - NerdyNick >>>>> ---------------------------------------------------- >>>>> NerdyNick.com >>>>> Coloco.ubuntu-rocks.org >>>>> >>>> >>>> >>> >>> >>> -- >>> Nick Verbeck - NerdyNick >>> ---------------------------------------------------- >>> NerdyNick.com >>> Coloco.ubuntu-rocks.org >>> >> >> > > > -- > Nick Verbeck - NerdyNick > ---------------------------------------------------- > NerdyNick.com > Coloco.ubuntu-rocks.org >
