[openstack-dev] [oslo]Introduction of new driver for oslo.messaging

2017-03-31 Thread Deja, Dawid
Hi all,

To work around issues with rabbitMQ scalability we'd like to introduce
new driver in oslo messaging that have nearly no scaling limits[1].
We'd like to have as much eyes on this as possible since we believe
that this is the technology of the future. Thanks for all reviews.

Dawid Deja

[1] https://review.openstack.org/#/c/452219/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Proposing Michal Gershenzon to the core team

2017-03-01 Thread Deja, Dawid
+1

Michal, remember that with great power comes great responsibility :)

Thanks,
Dawid

On Wed, 2017-03-01 at 19:47 +0300, Renat Akhmerov wrote:
Hi,

Based on the stats of Michal Gershenzon in Ocata cycle I’d like to promote her 
to the core team.
Michal works at Nokia CloudBand and being a CloudBand engineer she knows 
Mistral very well
as a user and behind the scenes helped find a lot of bugs and make countless 
number of
improvements, especially in performance.

Overall, she is a deep thinker, cares about details, always has an unusual 
angle of view on any
technical problem. She is one of a few people that I’m aware of who I could 
call a Mistral expert.
She also participates in almost every community meeting in IRC.

In Ocata she improved her statistics pretty significantly (e.g. ~60 reviews 
although the cycle was
very short) and is keeping up the good pace now. Also, Michal is officially 
planning to allocate
more time for upstream development in Pike

I believe Michal would be a great addition for the Mistral core team.

Please let me know if you agree with that.

Thanks

[1] 
http://stackalytics.com/?module=mistral-group=ocata_id=michal-gershenzon

Renat Akhmerov
@Nokia


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Meeting Time

2016-11-23 Thread Deja, Dawid
On Tue, 2016-11-22 at 16:29 +, Dougal Matthews wrote:
> 
> 
> On 22 November 2016 at 11:11, Deja, Dawid <dawid.d...@intel.com>
> wrote:
> > Dougla, Renat
> > 
> > If we want to have another meeting that would suit mostly New
> > Zeland
> > and US, can we move current meeting time to slightly earlier, so it
> > fits India better (and also may be more comfortable for people in
> > Europe)?
> 
> That should be fine with me. How much earlier would it need to be?

I would not say that I 'need' it to be earlier. That was just a thought
so we can make it easier for folks from India to participate.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Meeting Time

2016-11-22 Thread Deja, Dawid
Dougla, Renat

If we want to have another meeting that would suit mostly New Zeland
and US, can we move current meeting time to slightly earlier, so it
fits India better (and also may be more comfortable for people in
Europe)?

Dawid Deja

On Tue, 2016-11-22 at 17:45 +0700, Renat Akhmerov wrote:
> Dougal,
> 
> I think we should have this alternate meeting. The reason is although
> the current time doesn’t work only for 2-3 people those people are
> very important for the project (I primarily mean Lingxian (kong) and
> Winson (m4dcoder)). As far as I understand, for Hardik Parekh
> (hparekh) the current time also doesn’t work well.
> 
> I’m volunteering to participate in this alternate meeting even if
> it’s not too comfortable for me.
> 
> 
> Renat Akhmerov
> @Nokia
> 
> > On 21 Nov 2016, at 21:14, Dougal Matthews 
> > wrote:
> > 
> > 
> > 
> > On 7 November 2016 at 16:20, Dougal Matthews 
> > wrote:
> > > Hi all,
> > > 
> > > We want to make sure that the Mistral meeting time is as good as
> > > possible, so that everyone interested can attend. To do that,
> > > please add your name/nick and the times that suit you to this
> > > etherpad:
> > > 
> > > https://etherpad.openstack.org/p/mistral-meeting-time
> > > 
> > > If we are really lucky, we will find a time in that for everyone.
> > > if we are slightly lucky we will find two time slots that covers
> > > everyone and the meeting can alternate.
> > > 
> > > We may find that the current time is best for everyone, but I
> > > wanted to make sure.
> > > 
> > > Cheers,
> > > Dougal
> > > 
> > 
> > Hey all,
> > 
> > Thanks to everyone that added their names to the etherpad. It looks
> > like the current time does work for most people. How else can we
> > reach out and increase participation?
> > 
> > I would like to propose that we alternate the meetings each week
> > between the normal time and a time later in the day. This assumes
> > we have a volunteer that is able to regularly attend this meeting
> > and start it - if not, we can just stick with the current meeting
> > time. The other time should be arranged so it suits rakhmerov, kong
> > and m4dcode (at least).
> > 
> > Hopefully with two alternating meeting slots we can involve more
> > people in the meetings.
> > 
> > I'll bring this up today in the meeting too.
> > 
> > Thanks,
> > Dougal
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Promoting Dougal Matthews to the core team

2016-11-08 Thread Deja, Dawid
+1

It's good to have you on board Dougal!

Dawid Deja

On Tue, 2016-11-08 at 19:46 +1300, Lingxian Kong wrote:
+1 of course!


Cheers,
Lingxian Kong (Larry)

On Tue, Nov 8, 2016 at 6:09 PM, Renat Akhmerov 
> wrote:
Hi,

I’d like to promote Dougal Matthews to the Mistral core team. [1] shows 
Dougal’s Mistral contribution summary for Newton cycle.

Here are the reasons why I would like to see Dougal in the core team:

  *   He reviews a lot and provides valuable comments, especially when it comes 
to discussing design of the new features
  *   He sent 18 patches in Newton cycle which may not be a big number but it 
was his first full cycle in Mistral and I believe he can do more
  *   He is one of the most active team members in general and in IRC 
specifically, always open for communication and easy to talk to
  *   He is an active user of Mistral which is very important for me since he’s 
capable of providing valuable practical feedback on design, usability, 
reliability etc.
  *   He seems to be very excited about working on Mistral

Besides that I believe Dougal makes a good friendly atmosphere in our 
development team daily in our IRC channel and our weekly IRC meetings.

Team, I would ask you to support me in this promotion.

Thanks

[1] 
http://stackalytics.com/?module=mistral-group_id=d0ugal=newton=marks

Renat Akhmerov
@Nokia


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Who's interested in attending PTG?

2016-10-13 Thread Deja, Dawid
On Thu, 2016-10-06 at 19:21 +0700, Renat Akhmerov wrote:
Hi,

As you likely know, the summit format will change after Barcelona. There will 
be two events now: PTG (Project Team Gathering) for design sessions and 
OpenStack Summit which is more customer/promotion oriented . The first will 
take place in Atlanta on Feb 20-24, 2017 and the second one in May 2017 in 
Boston. More about that at [1]. Please read it.


Hi,

I'd like to go to the PTG in Atlanta but I can't tell right now if I'll have 
budget approved for this event.

Thanks,
Dawid Deja

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [mistral] Cancelling team meeting - 10/10/2016

2016-10-10 Thread Deja, Dawid
Hi,

We decided to cancel today’s team meeting because most people are either busy 
with release activities or on holidays. You can share your status as a reply 
for this mail.

My status: I have uploaded spec for workflow preconditions[1], please take a 
look when you have free time.

Thanks,
Dawid Deja

[1] https://review.openstack.org/#/c/384467/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [mistral] Team meeting reminder - 08/15/2016

2016-08-15 Thread Deja, Dawid
Hi,

This is a reminder that we’ll have a team meeting today at #openstack-meeting 
at 16.00 UTC.

Agenda:

  *   Review action items
  *   Current status (progress, issues, roadblocks, further plans)
  *   Open discussion

Dawid Deja
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [mistral] Promoting Dawid Deja to core reviewers

2016-08-01 Thread Deja, Dawid
Thank you all! I'll do my best to provide good reviews and make Mistral better.

Regards,
Dawid Deja

On Mon, 2016-08-01 at 11:05 +0700, Renat Akhmerov wrote:
Team, thank you for your support!

Dawid, welcome to the Mistral core team :)

You now can vote +2 and approve patches. Use them wisely!

Renat Akhmerov
@Nokia

On 01 Aug 2016, at 10:59, Hardik 
> 
wrote:

+1 , Nice work Dawid !

Thanks and Regards,
Hardik Parekh

On Sunday 31 July 2016 06:56 AM, Lingxian Kong wrote:
+1, good job, Dawid!

Regards!
---
Lingxian Kong


On Sat, Jul 30, 2016 at 10:59 PM, Elisha, Moshe (Nokia - IL)
> wrote:
Hi,

I am not a core reviewer but having met Dawid in person and working closely
with him on some important bug fixes – I fully support the idea.

From: Anastasia Kuznetsova 
>
Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>
Date: Friday, 29 July 2016 at 15:53
To: "OpenStack Development Mailing List (not for usage questions)"
>
Subject: Re: [openstack-dev] [mistral] Promoting Dawid Deja to core
reviewers

Renat,

I fully support Dawid's promotion! Here is my +1 for Dawid.

Dawid,

I will be glad to see you in the Mistral core team.

On Fri, Jul 29, 2016 at 2:39 PM, Renat Akhmerov 
>
wrote:
Hi,

I’d like to promote Dawid Deja working at Intel (ddeja in IRC) to Mistral
core reviewers.

The reasons why I want to see Dawid in the core team is that he provides
amazing, very thorough reviews.
Just by looking at a few of them I was able to make a conclusion that he
knows the system architecture very well
although he started contributing actively not so long ago. He always sees
things deeply, can examine a problem
from different angles, demonstrates solid technical background in general.
He is in top 5 reviewers now by a number
of reviews and the only one who still doesn’t have core status. He also
implemented several very important changes
during Newton cycle. Some of them were in progress for more than a year
(flexible RPC) but Dawid helped to knock
them down elegantly.

Besides purely professional skills that I just mentioned I also want to
say that it’s a great pleasure to work with
Dawid. He’s a bright cheerful guy and a good team player.

Dawid’s statistics is here:
http://stackalytics.com/?module=mistral-group=commits_id=dawid-deja-0


I’m hoping for your support in making this promotion.

Thanks

Renat Akhmerov
@Nokia


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Anastasia Kuznetsova

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...

2016-06-03 Thread Deja, Dawid
On Thu, 2016-05-05 at 11:08 +0700, Renat Akhmerov wrote:

On 05 May 2016, at 01:49, Mehdi Abaakouk 
> wrote:


Le 2016-05-04 10:04, Renat Akhmerov a écrit :
No problem. Let’s not call it RPC (btw, I completely agree with that).
But it’s one of the messaging patterns and hence should be under
oslo.messaging I guess, no?

Yes and no, we currently have two APIs (rpc and notification). And
personally I regret to have the notification part in oslo.messaging.

RPC and Notification are different beasts, and both are today limited
in terms of feature because they share the same driver implementation.

Our RPC errors handling is really poor, for example Nova just put
instance in ERROR when something bad occurs in oslo.messaging layer.
This enforces deployer/user to fix the issue manually.

Our Notification system doesn't allow fine grain routing of message,
everything goes into one configured topic/queue.

And now we want to add a new one... I'm not against this idea,
but I'm not a huge fan.

Thoughts from folks (mistral and oslo)?
Also, I was not at the Summit, should I conclude the Tooz+taskflow approach 
(that ensure the idempotent of the application within the library API) have not 
been accepted by mistral folks ?
Speaking about idempotency, IMO it’s not a central question that we
should be discussing here. Mistral users should have a choice: if they
manage to make their actions idempotent it’s excellent, in many cases
idempotency is certainly possible, btw. If no, then they know about
potential consequences.

You shouldn't mix the idempotency of the user task and the idempotency
of a Mistral action (that will at the end run the user task).
You can have your Mistral task runner implementation idempotent and just
make the workflow to use configurable in case the user task is
interrupted or badly finished even if the user task is idempotent or not.
This makes the thing very predictable. You will know for example:
* if the user task has started or not,
* if the error is due to a node power cut when the user task runs,
* if you can safely retry a not idempotent user task on an other node,
* you will not be impacted by rabbitmq restart or TCP connection issues,
* ...

With the oslo.messaging approach, everything will just end up in a
generic MessageTimeout error.

The RPC API already have this kind of issue. Applications have unfortunately
dealt with that (and I think they want something better now).
I'm just not convinced we should add a new "working queue" API in
oslo.messaging for tasks scheduling that have the same issue we already
have with RPC.

Anyway, that's your choice, if you want rely on this poor structure, I will
not be against, I'm not involved in Mistral. I just want everybody is aware
of this.

And even in this case there’s usually a number
of measures that can be taken to mitigate those consequences (reruning
workflows from certain points after manually fixing problems, rollback
scenarios etc.).

taskflow allows to describe and automate this kind of workflow really easily.

What I’m saying is: let’s not make that crucial decision now about
what a messaging framework should support or not, let’s make it more
flexible to account for variety of different usage scenarios.

I think the confusion is in the "messaging" keyword, currently oslo.messaging
is a "RPC" framework and a "Notification" framework on top of 'messaging'
frameworks.

Messaging framework we uses are 'kombu', 'pika', 'zmq' and 'pingus'.

It’s normal for frameworks to give more rather than less.

I disagree, here we mix different concepts into one library, all concepts
have to be implemented by different 'messaging framework',
So we fortunately give less to make thing just works in the same way with all
drivers for all APIs.

One more thing, at the summit we were discussing the possibility to
define at-most-once/at-least-once individually for Mistral tasks. This
is demanded because there cases where we need to do it, advanced users
may choose one or another depending on a task/action semantics.
However, it won’t be possible to implement w/o changes in the
underlying messaging framework.

If we goes that way, oslo.messaging users and Mistral users have to be aware
that their job/task/action/whatever will perhaps not be called (at-most-once)
or perhaps called twice (at-least-once).

The oslo.messaging/Mistral API and docs must be clear about this behavior to
not having bugs open against oslo.messaging because script written via Mistral
API is not executed as expected "sometimes".
"sometimes" == when deployers have trouble with its rabbitmq (or whatever)
broker and even just when a deployer restart a broker node or when a TCP
issue occurs. At this end the backtrace in theses cases always trows only
oslo.messaging trace (the well known MessageTimeout...).


Also oslo.messaging is already a fragile brick used by everybody that a very 
small subset of people maintain (thanks to them).

I'm afraid 

Re: [openstack-dev] [nova][mistral] Automatic evacuation as a long running task

2015-10-08 Thread Deja, Dawid
Hi Matthew,

Thanks for bringing some light on what problems has nova with evacuation of an 
instance. It is very important to have those limitations in mind when preparing 
final solution. Or to fix them, as you proposed.

Nevertheless, I would say that evacuationD does more than what calling 'nova 
host-evacuate' do. Let's consider such scenario:

1. Call 'nova host evacuate HostX'
2. Caller dies during call - information that some VMs are still to be 
evacuated is lost.

Such thing would not happen with evacuationD, because it prepares one rabbitMQ 
message for each VM that needs to be evacuated. Moreover, it deals with 
situation, when process that lists VMs crashes. In such case, whole operation 
would be continued by another daemon.

EvacD may also handle another problem that you mentioned: failure of target 
host of evacuation. In such scenario, 'evacuate host' message will be send for 
a new host and EvacD will try to evacuate all of it's vms - even those in 
rebuild state. Of course, evacuation of such instances fails, but they would 
eventually enter error state and evacuationD would start resurrection process. 
This can be speed up by setting instances state to 'error' (despite these which 
are in 'active' state) on the beginning of whole 'evacuate host' process.

Finally, another action - called 'Look for VM' - could be added. It would check 
if given VM ended up in active state on new hosts; if no, VM could be rebuild. 
I hope this would give us as much certainty that VM is alive as possible.

Dawid

On Tue, 2015-10-06 at 16:34 +0100, Matthew Booth wrote:
Hi, Roman,

Evacuated has been on my radar for a while and this post has prodded me to take 
a look at the code. I think it's worth starting by explaining the problems in 
the current solution. Nova client is currently responsible for doing this 
evacuate. It does:

1. List all instances on the source host
2. Initiate evacuate for each instance

Evacuating a single instance does:

API:
1. Set instance task state to rebuilding
2. Create a migration record with source and dest if specified

Conductor:
3. Call the scheduler to get a destination host if not specified
4. Get the migration object from the db

Compute:
5. Rebuild the instance on dest
6. Update instance.host to dest

Examining single instance evacuation, the first obvious thing to look at is 
what if 2 happen simultaneously. Because step 1 is atomic, it should not be 
possible to initiate 2 evacuations simultaneously of a single instance. 
However, note that this atomic action hasn't updated the instance host, meaning 
the source host remains the owner of this instance. If the evacuation process 
fails to complete, the source host will automatically delete it if it comes 
back up because it will find a migration record, but it will not be rebuilt 
anywhere else. Evacuating it again will fail, because its task state is already 
rebuilding.

Also, let's imagine that the conductor crashes. There is not enough state for 
any tool, whether internal or external, to be able to know if the rebuild is 
ongoing somewhere or not, and therefore whether it is safe to retry even if 
that retry would succeed, which it wouldn't.

Which is to say that we can't currently robustly evacuate one instance!

Looking at the nova client side, there is an obvious race there: there is no 
guarantee in step 2 that instances returned in step one have not already been 
evacuated by another process. We're protected here, though because evacuating a 
single instance twice will fail the second time. Note that the process isn't 
idempotent, though, because an evacuation which falls into a hole will never be 
retried.

Moving on to what evacuated does. Evacuated uses rabbit to distribute jobs 
reliably. There are 2 jobs in evacuated:

1. Evacuate host:
  1.1 Get list of all instances on the source host from Nova
  1.2 Send an evacuate vm job for each instance
2. Evacuate vm:
  2.1 Tell Nova to start evacuating an instance

Because we're using rabbit as a reliable message bus, the initiator of one of 
the tasks knows that it will eventually run to completion at least once. Note 
that there's nothing to prevent the task being executed more than once per 
call, though. A task may crash before sending an ack, or may just be really 
slow. However, in both cases, for exactly the same reasons as for the 
implementation in nova client, running more than once should not race. It is 
still not idempotent, though, again for exactly the same reasons as nova client.

Also notice that, exactly as in the nova client implementation, we are not 
asserting that an instance has been evacuated. We are only asserting that we 
called nova.evacuate, which is to say that we got as far as step 2 in the 
evacuation sequence above.

In other words, in terms of robustness, calling evacuated's evacuate host is 
identical to asserting that nova client's evacuate host ran to completion at 
least once, which is quite a lot simpler to do. That's