Re: [controller-dev] [release] Autorelease carbon failed to build sal-distributed-datastore from controller

2017-08-11 Thread Sam Hague
controller had one patch go in today [1]. Looks like maybe an IT just
needed to be udpated.

[1] https://git.opendaylight.org/gerrit/61433

On Fri, Aug 11, 2017 at 9:12 PM, Jenkins  wrote:

> Attention controller-devs,
>
> Autorelease carbon failed to build sal-distributed-datastore from
> controller in build
> 429. Attached is a snippet of the error message related to the
> failure that we were able to automatically parse as well as console logs.
>
>
> Console Logs:
> https://logs.opendaylight.org/releng/jenkins092/autorelease-
> release-carbon/429
>
> Jenkins Build:
> https://jenkins.opendaylight.org/releng/job/autorelease-
> release-carbon/429/
>
> Please review and provide an ETA on when a fix will be available.
>
> Thanks,
> ODL releng/autorelease team
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] [release] Autorelease carbon failed to build sal-distributed-datastore from controller

2017-08-11 Thread Jenkins
Attention controller-devs,

Autorelease carbon failed to build sal-distributed-datastore from controller in 
build
429. Attached is a snippet of the error message related to the
failure that we were able to automatically parse as well as console logs. 


Console Logs:
https://logs.opendaylight.org/releng/jenkins092/autorelease-release-carbon/429

Jenkins Build:
https://jenkins.opendaylight.org/releng/job/autorelease-release-carbon/429/

Please review and provide an ETA on when a fix will be available.

Thanks,
ODL releng/autorelease team



error.log.gz
Description: application/gzip
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] [release] Autorelease nitrogen failed to build sal-distributed-datastore from controller

2017-08-11 Thread Jenkins
Attention controller-devs,

Autorelease nitrogen failed to build sal-distributed-datastore from controller 
in build
156. Attached is a snippet of the error message related to the
failure that we were able to automatically parse as well as console logs. 


Console Logs:
https://logs.opendaylight.org/releng/jenkins092/autorelease-release-nitrogen/156

Jenkins Build:
https://jenkins.opendaylight.org/releng/job/autorelease-release-nitrogen/156/

Please review and provide an ETA on when a fix will be available.

Thanks,
ODL releng/autorelease team



error.log.gz
Description: application/gzip
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Circuit Breaker timed out

2017-08-11 Thread Srini Seetharaman
Or was there a real disk issue in that machine you were using?

On Fri, Aug 11, 2017 at 10:58 AM, Srini Seetharaman <
srini.seethara...@gmail.com> wrote:

> Muthu,
> It's worrisome to hear that you've seen this too. Did it go away with
> Nitrogen or with moving to Akka 2.5 persistence?
>
> I am referring to the following params within the persistence section of
> akka.conf
>
>  circuit-breaker {
> max-failures = 10
> call-timeout = 10s
> reset-timeout = 30s
>   }
>
>
>
> On Thu, Aug 10, 2017 at 10:17 PM, Muthukumaran K <
> muthukumara...@ericsson.com> wrote:
>
>> Hi Tom, Srini,
>>
>>
>>
>> We have also noticed this with Boron very sporadically even without any
>> explicit action taken on shard like Srini did
>>
>>
>>
>> Srini,
>>
>>
>>
>> Are you referring “journal-plugin-fallback” from
>> http://doc.akka.io/docs/akka/current/scala/general/configura
>> tion.html#config-akka-persistence ?
>>
>>
>>
>> Regards
>>
>> Muthu
>>
>>
>>
>> *From:* controller-dev-boun...@lists.opendaylight.org [mailto:
>> controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Srini
>> Seetharaman
>> *Sent:* Friday, August 11, 2017 9:40 AM
>> *To:* Tom Pantelis
>> *Cc:* controller-dev@lists.opendaylight.org
>> *Subject:* Re: [controller-dev] Circuit Breaker timed out
>>
>>
>>
>> Thanks Tom. I will investigate further on why the local disk operation
>> failed. Seems strange though because I haven't seen anything in dmesg.
>>
>>
>>
>> The default value for the call-timeout is 10s in akka.conf.
>>
>>
>>
>> On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis 
>> wrote:
>>
>> That error is from  akka persistence. It happens if the backend
>> persistence plugin doesn't respond back in time. I've only seen this in a
>> CSIT environment whose disk activity was overloaded. The timeouts can be
>> tweaked - I don't recall exactly what they are but you can find them in the
>> akka docs (names contain circuit-breaker).
>>
>>
>>
>> On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman <
>> srini.seethara...@gmail.com> wrote:
>>
>> Hi Tom,
>>
>> In our ODL deployment that is running in standalone mode with operational
>> store persistence enabled, we saw the following error being printed. Once
>> the member-1-default-operational shard is shutdown, all write transactions
>> after that fail and the system becomes unstable. At this point, we were
>> probably doing less than 10 transactions per second. Any idea what is
>> causing this? Has anyone seen this before?
>>
>>
>>
>>
>>
>> 2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard
>>  | 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist
>> event type [org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry]
>> with sequence number [9897493] for persistenceId
>> [member-1-shard-default-operational].
>>
>> akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.
>>
>> 2017-08-07 19:15:59,628 | INFO  | lt-dispatcher-24 | Shard
>>  | 188 - org.opendaylight.controller.sal-akka-raft -
>> 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational
>>
>> 2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 |
>> LocalThreePhaseCommitCohort  | 193 - 
>> org.opendaylight.controller.sal-distributed-datastore
>> - 1.4.2.Boron-SR2 | Failed to prepare transaction
>> member-1-datastore-operational-fe-5-txn-791019 on backend
>>
>> java.lang.RuntimeException: Transaction aborted due to shutdown.
>>
>> at org.opendaylight.controller.cluster.datastore.ShardCommitCoo
>> rdinator.abortPendingTransactions(ShardCommitCoordinator.
>> java:399)[193:org.opendaylight.controller.sal-
>> distributed-datastore:1.4.2.Boron-SR2]
>>
>> at org.opendaylight.controller.cluster.datastore.Shard.postStop
>> (Shard.java:211)[193:org.opendaylight.controller.sal-
>> distributed-datastore:1.4.2.Boron-SR2]
>>
>> at akka.actor.Actor$class.aroundPostStop(Actor.scala:494)[175:
>> com.typesafe.akka.actor:2.4.7]
>>
>> at akka.persistence.UntypedPersistentActor.akka$persistence$
>> Eventsourced$$super$aroundPostStop(PersistentActor
>> .scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>
>> at akka.persistence.Eventsourced$class.aroundPostStop(Eventsour
>> ced.scala:223)[181:com.typesafe.akka.persistence:2.4.7]
>>
>> at akka.persistence.UntypedPersistentActor.aroundPostStop(Persi
>> stentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>
>> at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$
>> FaultHandling$$finishTerminate(FaultHandling.scala:210)[175:
>> com.typesafe.akka.actor:2.4.7]
>>
>> at akka.actor.dungeon.FaultHandling$class.handleChildTerminated
>> (FaultHandling.scala:293)[175:com.typesafe.akka.actor:2.4.7]
>>
>> at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:
>> 374)[175:com.typesafe.akka.actor:2.4.7]
>>
>> at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(D
>> 

Re: [controller-dev] Circuit Breaker timed out

2017-08-11 Thread Srini Seetharaman
Muthu,
It's worrisome to hear that you've seen this too. Did it go away with
Nitrogen or with moving to Akka 2.5 persistence?

I am referring to the following params within the persistence section of
akka.conf

 circuit-breaker {
max-failures = 10
call-timeout = 10s
reset-timeout = 30s
  }



On Thu, Aug 10, 2017 at 10:17 PM, Muthukumaran K <
muthukumara...@ericsson.com> wrote:

> Hi Tom, Srini,
>
>
>
> We have also noticed this with Boron very sporadically even without any
> explicit action taken on shard like Srini did
>
>
>
> Srini,
>
>
>
> Are you referring “journal-plugin-fallback” from
> http://doc.akka.io/docs/akka/current/scala/general/
> configuration.html#config-akka-persistence ?
>
>
>
> Regards
>
> Muthu
>
>
>
> *From:* controller-dev-boun...@lists.opendaylight.org [mailto:
> controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Srini
> Seetharaman
> *Sent:* Friday, August 11, 2017 9:40 AM
> *To:* Tom Pantelis
> *Cc:* controller-dev@lists.opendaylight.org
> *Subject:* Re: [controller-dev] Circuit Breaker timed out
>
>
>
> Thanks Tom. I will investigate further on why the local disk operation
> failed. Seems strange though because I haven't seen anything in dmesg.
>
>
>
> The default value for the call-timeout is 10s in akka.conf.
>
>
>
> On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis 
> wrote:
>
> That error is from  akka persistence. It happens if the backend
> persistence plugin doesn't respond back in time. I've only seen this in a
> CSIT environment whose disk activity was overloaded. The timeouts can be
> tweaked - I don't recall exactly what they are but you can find them in the
> akka docs (names contain circuit-breaker).
>
>
>
> On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman <
> srini.seethara...@gmail.com> wrote:
>
> Hi Tom,
>
> In our ODL deployment that is running in standalone mode with operational
> store persistence enabled, we saw the following error being printed. Once
> the member-1-default-operational shard is shutdown, all write transactions
> after that fail and the system becomes unstable. At this point, we were
> probably doing less than 10 transactions per second. Any idea what is
> causing this? Has anyone seen this before?
>
>
>
>
>
> 2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard
>| 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist
> event type [org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry]
> with sequence number [9897493] for persistenceId [member-1-shard-default-
> operational].
>
> akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.
>
> 2017-08-07 19:15:59,628 | INFO  | lt-dispatcher-24 | Shard
>| 188 - org.opendaylight.controller.sal-akka-raft -
> 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational
>
> 2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 |
> LocalThreePhaseCommitCohort  | 193 - 
> org.opendaylight.controller.sal-distributed-datastore
> - 1.4.2.Boron-SR2 | Failed to prepare transaction 
> member-1-datastore-operational-fe-5-txn-791019
> on backend
>
> java.lang.RuntimeException: Transaction aborted due to shutdown.
>
> at org.opendaylight.controller.cluster.datastore.
> ShardCommitCoordinator.abortPendingTransactions(
> ShardCommitCoordinator.java:399)[193:org.opendaylight.
> controller.sal-distributed-datastore:1.4.2.Boron-SR2]
>
> at org.opendaylight.controller.cluster.datastore.Shard.
> postStop(Shard.java:211)[193:org.opendaylight.controller.
> sal-distributed-datastore:1.4.2.Boron-SR2]
>
> at akka.actor.Actor$class.aroundPostStop(Actor.scala:
> 494)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.persistence.UntypedPersistentActor.akka$
> persistence$Eventsourced$$super$aroundPostStop(PersistentActor.scala:168)[
> 181:com.typesafe.akka.persistence:2.4.7]
>
> at akka.persistence.Eventsourced$class.aroundPostStop(
> Eventsourced.scala:223)[181:com.typesafe.akka.persistence:2.4.7]
>
> at akka.persistence.UntypedPersistentActor.aroundPostStop(
> PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>
> at akka.actor.dungeon.FaultHandling$class.akka$
> actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.
> scala:210)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.dungeon.FaultHandling$class.handleChildTerminated(
> FaultHandling.scala:293)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.handleChildTerminated(
> ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(
> DeathWatch.scala:61)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.watchedActorTerminated(
> ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:
> 460)[175:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:
>