[ 
https://issues.apache.org/jira/browse/SAMZA-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman closed SAMZA-1572.
------------------------------------------

> Add fixed retries on failure in KafkaCheckpointManager
> ------------------------------------------------------
>
>                 Key: SAMZA-1572
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1572
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>             Fix For: 0.15.0
>
>
> KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop 
> when an irrecoverable failure happens, this indefinitely blocks the commit 
> phase (there by preventing processing). This exception is revealed only 
> during the shutdown of the job making shutdown block indefinitely since the 
> markers for shutdown are ignored by runloop which is blocked on commit phase.
> {code:java}
> 2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager]  [] Failed to write 
> checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Flush failed. One or more 
> batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:10.804 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:11.204 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:12.005 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:13.605 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:16.805 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:23.205 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:33.206 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:43.206 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:18:53.206 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:19:03.207 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:19:13.207 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exceptio 2018/01/22 19:19:23.207 WARN 
> [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
> org.apache.samza.system.SystemProducerException: Producer was unable to 
> recover from previous exception.. Retrying.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to