Re: [controller-dev] [release] Autorelease carbon failed to build sal-distributed-datastore from controller
controller had one patch go in today [1]. Looks like maybe an IT just needed to be udpated. [1] https://git.opendaylight.org/gerrit/61433 On Fri, Aug 11, 2017 at 9:12 PM, Jenkinswrote: > Attention controller-devs, > > Autorelease carbon failed to build sal-distributed-datastore from > controller in build > 429. Attached is a snippet of the error message related to the > failure that we were able to automatically parse as well as console logs. > > > Console Logs: > https://logs.opendaylight.org/releng/jenkins092/autorelease- > release-carbon/429 > > Jenkins Build: > https://jenkins.opendaylight.org/releng/job/autorelease- > release-carbon/429/ > > Please review and provide an ETA on when a fix will be available. > > Thanks, > ODL releng/autorelease team > > > ___ > controller-dev mailing list > controller-dev@lists.opendaylight.org > https://lists.opendaylight.org/mailman/listinfo/controller-dev > > ___ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev
[controller-dev] [release] Autorelease carbon failed to build sal-distributed-datastore from controller
Attention controller-devs, Autorelease carbon failed to build sal-distributed-datastore from controller in build 429. Attached is a snippet of the error message related to the failure that we were able to automatically parse as well as console logs. Console Logs: https://logs.opendaylight.org/releng/jenkins092/autorelease-release-carbon/429 Jenkins Build: https://jenkins.opendaylight.org/releng/job/autorelease-release-carbon/429/ Please review and provide an ETA on when a fix will be available. Thanks, ODL releng/autorelease team error.log.gz Description: application/gzip ___ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev
[controller-dev] [release] Autorelease nitrogen failed to build sal-distributed-datastore from controller
Attention controller-devs, Autorelease nitrogen failed to build sal-distributed-datastore from controller in build 156. Attached is a snippet of the error message related to the failure that we were able to automatically parse as well as console logs. Console Logs: https://logs.opendaylight.org/releng/jenkins092/autorelease-release-nitrogen/156 Jenkins Build: https://jenkins.opendaylight.org/releng/job/autorelease-release-nitrogen/156/ Please review and provide an ETA on when a fix will be available. Thanks, ODL releng/autorelease team error.log.gz Description: application/gzip ___ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev
Re: [controller-dev] Circuit Breaker timed out
Or was there a real disk issue in that machine you were using? On Fri, Aug 11, 2017 at 10:58 AM, Srini Seetharaman < srini.seethara...@gmail.com> wrote: > Muthu, > It's worrisome to hear that you've seen this too. Did it go away with > Nitrogen or with moving to Akka 2.5 persistence? > > I am referring to the following params within the persistence section of > akka.conf > > circuit-breaker { > max-failures = 10 > call-timeout = 10s > reset-timeout = 30s > } > > > > On Thu, Aug 10, 2017 at 10:17 PM, Muthukumaran K < > muthukumara...@ericsson.com> wrote: > >> Hi Tom, Srini, >> >> >> >> We have also noticed this with Boron very sporadically even without any >> explicit action taken on shard like Srini did >> >> >> >> Srini, >> >> >> >> Are you referring “journal-plugin-fallback” from >> http://doc.akka.io/docs/akka/current/scala/general/configura >> tion.html#config-akka-persistence ? >> >> >> >> Regards >> >> Muthu >> >> >> >> *From:* controller-dev-boun...@lists.opendaylight.org [mailto: >> controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Srini >> Seetharaman >> *Sent:* Friday, August 11, 2017 9:40 AM >> *To:* Tom Pantelis >> *Cc:* controller-dev@lists.opendaylight.org >> *Subject:* Re: [controller-dev] Circuit Breaker timed out >> >> >> >> Thanks Tom. I will investigate further on why the local disk operation >> failed. Seems strange though because I haven't seen anything in dmesg. >> >> >> >> The default value for the call-timeout is 10s in akka.conf. >> >> >> >> On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis>> wrote: >> >> That error is from akka persistence. It happens if the backend >> persistence plugin doesn't respond back in time. I've only seen this in a >> CSIT environment whose disk activity was overloaded. The timeouts can be >> tweaked - I don't recall exactly what they are but you can find them in the >> akka docs (names contain circuit-breaker). >> >> >> >> On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman < >> srini.seethara...@gmail.com> wrote: >> >> Hi Tom, >> >> In our ODL deployment that is running in standalone mode with operational >> store persistence enabled, we saw the following error being printed. Once >> the member-1-default-operational shard is shutdown, all write transactions >> after that fail and the system becomes unstable. At this point, we were >> probably doing less than 10 transactions per second. Any idea what is >> causing this? Has anyone seen this before? >> >> >> >> >> >> 2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard >> | 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist >> event type [org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry] >> with sequence number [9897493] for persistenceId >> [member-1-shard-default-operational]. >> >> akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out. >> >> 2017-08-07 19:15:59,628 | INFO | lt-dispatcher-24 | Shard >> | 188 - org.opendaylight.controller.sal-akka-raft - >> 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational >> >> 2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 | >> LocalThreePhaseCommitCohort | 193 - >> org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | Failed to prepare transaction >> member-1-datastore-operational-fe-5-txn-791019 on backend >> >> java.lang.RuntimeException: Transaction aborted due to shutdown. >> >> at org.opendaylight.controller.cluster.datastore.ShardCommitCoo >> rdinator.abortPendingTransactions(ShardCommitCoordinator. >> java:399)[193:org.opendaylight.controller.sal- >> distributed-datastore:1.4.2.Boron-SR2] >> >> at org.opendaylight.controller.cluster.datastore.Shard.postStop >> (Shard.java:211)[193:org.opendaylight.controller.sal- >> distributed-datastore:1.4.2.Boron-SR2] >> >> at akka.actor.Actor$class.aroundPostStop(Actor.scala:494)[175: >> com.typesafe.akka.actor:2.4.7] >> >> at akka.persistence.UntypedPersistentActor.akka$persistence$ >> Eventsourced$$super$aroundPostStop(PersistentActor >> .scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> >> at akka.persistence.Eventsourced$class.aroundPostStop(Eventsour >> ced.scala:223)[181:com.typesafe.akka.persistence:2.4.7] >> >> at akka.persistence.UntypedPersistentActor.aroundPostStop(Persi >> stentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> >> at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$ >> FaultHandling$$finishTerminate(FaultHandling.scala:210)[175: >> com.typesafe.akka.actor:2.4.7] >> >> at akka.actor.dungeon.FaultHandling$class.handleChildTerminated >> (FaultHandling.scala:293)[175:com.typesafe.akka.actor:2.4.7] >> >> at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala: >> 374)[175:com.typesafe.akka.actor:2.4.7] >> >> at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(D >>
Re: [controller-dev] Circuit Breaker timed out
Muthu, It's worrisome to hear that you've seen this too. Did it go away with Nitrogen or with moving to Akka 2.5 persistence? I am referring to the following params within the persistence section of akka.conf circuit-breaker { max-failures = 10 call-timeout = 10s reset-timeout = 30s } On Thu, Aug 10, 2017 at 10:17 PM, Muthukumaran K < muthukumara...@ericsson.com> wrote: > Hi Tom, Srini, > > > > We have also noticed this with Boron very sporadically even without any > explicit action taken on shard like Srini did > > > > Srini, > > > > Are you referring “journal-plugin-fallback” from > http://doc.akka.io/docs/akka/current/scala/general/ > configuration.html#config-akka-persistence ? > > > > Regards > > Muthu > > > > *From:* controller-dev-boun...@lists.opendaylight.org [mailto: > controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Srini > Seetharaman > *Sent:* Friday, August 11, 2017 9:40 AM > *To:* Tom Pantelis > *Cc:* controller-dev@lists.opendaylight.org > *Subject:* Re: [controller-dev] Circuit Breaker timed out > > > > Thanks Tom. I will investigate further on why the local disk operation > failed. Seems strange though because I haven't seen anything in dmesg. > > > > The default value for the call-timeout is 10s in akka.conf. > > > > On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis> wrote: > > That error is from akka persistence. It happens if the backend > persistence plugin doesn't respond back in time. I've only seen this in a > CSIT environment whose disk activity was overloaded. The timeouts can be > tweaked - I don't recall exactly what they are but you can find them in the > akka docs (names contain circuit-breaker). > > > > On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman < > srini.seethara...@gmail.com> wrote: > > Hi Tom, > > In our ODL deployment that is running in standalone mode with operational > store persistence enabled, we saw the following error being printed. Once > the member-1-default-operational shard is shutdown, all write transactions > after that fail and the system becomes unstable. At this point, we were > probably doing less than 10 transactions per second. Any idea what is > causing this? Has anyone seen this before? > > > > > > 2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard >| 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist > event type [org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry] > with sequence number [9897493] for persistenceId [member-1-shard-default- > operational]. > > akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out. > > 2017-08-07 19:15:59,628 | INFO | lt-dispatcher-24 | Shard >| 188 - org.opendaylight.controller.sal-akka-raft - > 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational > > 2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 | > LocalThreePhaseCommitCohort | 193 - > org.opendaylight.controller.sal-distributed-datastore > - 1.4.2.Boron-SR2 | Failed to prepare transaction > member-1-datastore-operational-fe-5-txn-791019 > on backend > > java.lang.RuntimeException: Transaction aborted due to shutdown. > > at org.opendaylight.controller.cluster.datastore. > ShardCommitCoordinator.abortPendingTransactions( > ShardCommitCoordinator.java:399)[193:org.opendaylight. > controller.sal-distributed-datastore:1.4.2.Boron-SR2] > > at org.opendaylight.controller.cluster.datastore.Shard. > postStop(Shard.java:211)[193:org.opendaylight.controller. > sal-distributed-datastore:1.4.2.Boron-SR2] > > at akka.actor.Actor$class.aroundPostStop(Actor.scala: > 494)[175:com.typesafe.akka.actor:2.4.7] > > at akka.persistence.UntypedPersistentActor.akka$ > persistence$Eventsourced$$super$aroundPostStop(PersistentActor.scala:168)[ > 181:com.typesafe.akka.persistence:2.4.7] > > at akka.persistence.Eventsourced$class.aroundPostStop( > Eventsourced.scala:223)[181:com.typesafe.akka.persistence:2.4.7] > > at akka.persistence.UntypedPersistentActor.aroundPostStop( > PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] > > at akka.actor.dungeon.FaultHandling$class.akka$ > actor$dungeon$FaultHandling$$finishTerminate(FaultHandling. > scala:210)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.dungeon.FaultHandling$class.handleChildTerminated( > FaultHandling.scala:293)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.ActorCell.handleChildTerminated( > ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated( > DeathWatch.scala:61)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.ActorCell.watchedActorTerminated( > ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala: > 460)[175:com.typesafe.akka.actor:2.4.7] > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala: >