Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-27 Thread Matt Asay
In response to Gil Yehuda's comments on MongoDB and the AGPL (here 
http://lists.openstack.org/pipermail/openstack-dev/2014-March/030510.html), I 
understand the concern about the AGPL. But in this case it's completely, 
absolutely unfounded. As mentioned earlier, MongoDB Inc. wants people to use 
MongoDB, the project. That's why we wrapped the server code (AGPL) in an Apache 
license (drivers). Basically, for 99.999% of the world's population, you can 
use MongoDB under the cover of the Apache license. If you'd like more 
assurance, we're happy to provide it. 

We want people using the world's most popular NoSQL database with the world's 
most popular open source cloud (OpenStack). I think our track record on this is 
100% in the affirmative.___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Mark McLoughlin
On Thu, 2014-03-20 at 01:28 +, Joshua Harlow wrote:
 Proxying from yahoo's open source director (since he wasn't initially
 subscribed to this list, afaik he now is) on his behalf.
 
 From Gil Yehuda (Yahoo’s Open Source director).
 
 I would urge you to avoid creating a dependency between Openstack code
 and any AGPL project, including MongoDB. MongoDB is licensed in a very
 strange manner that is prone to creating unintended licensing mistakes
 (a lawyer’s dream). Indeed, MongoDB itself presents Apache licensed
 drivers – and thus technically, users of those drivers are not
 impacted by the AGPL terms. MongoDB Inc. is in the unique position to
 license their drivers this way (although they appear to violate the
 AGPL license) since MongoDB is not going to sue themselves for their
 own violation. However, others in the community create MongoDB drivers
 are licensing those drivers under the Apache and MIT licenses – which
 does pose a problem.
 
 Why? The AGPL considers 'Corresponding Source' to be defined as “the
 source code for shared libraries and dynamically linked subprograms
 that the work is specifically designed to require, such as by intimate
 data communication or control flow between those subprograms and other
 parts of the work. Database drivers *are* work that is designed to
 require by intimate data communication or control flow between those
 subprograms and other parts of the work. So anyone using MongoDB with
 any other driver now invites an unknown --  that one court case, one
 judge, can read the license under its plain meaning and decide that
 AGPL terms apply as stated. We have no way to know how far they apply
 since this license has not been tested in court yet.
 Despite all the FAQs MongoDB puts on their site indicating they don't
 really mean to assert the license terms, normally when you provide a
 license, you mean those terms. If they did not mean those terms, they
 would not use this license. I hope they intended to do something good
 (to get contributions back without impacting applications using their
 database) but, even good intentions have unintended consequences.
 Companies with deep enough pockets to be lawsuit targets, and
 companies who want to be good open source citizens face the problem
 that using MongoDB anywhere invites the future risk of legal
 catastrophe. A simple development change in an open source project can
 change the economics drastically. This is simply unsafe and unwise.
 
 OpenStack's ecosystem is fueled by the interests of many commercial
 ventures who wish to cooperate in the open source manner, but then
 leverage commercial opportunities they hope to create. I suggest that
 using MongoDB anywhere in this project will result in a loss of
 opportunity -- real or perceived, that would outweigh the benefits
 MongoDB itself provides.
 
 tl;dr version: If you want to use MongoDB in your company, that's your
 call. Please don't turn anyone who uses OpenStack components into a
 unsuspecting MongoDB users. Instead, decouple the database from the
 project. It's not worth the legal risk, nor the impact on the
 Apache-ness of this project.

Thanks for that, Josh and Gil.

Rather than cross-posting, I think this MongoDB/AGPLv3 discussion should
continue on the legal-discuss mailing list:

  http://lists.openstack.org/pipermail/legal-discuss/2014-March/thread.html#174

Bear in mind that we (OpenStack, as a project and community) need to
judge whether this is a credible concern or not. If some users said they
were only willing to deploy Apache licensed code in their organization,
we would dismiss that notion pretty quickly. Is this AGPLv3 concern
sufficiently credible that OpenStack needs to take it into account when
making important decisions? That's what I'm hoping to get to in the
legal-discuss thread.

Mark.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Mark McLoughlin
On Wed, 2014-03-19 at 12:37 -0700, Devananda van der Veen wrote:
 Let me start by saying that I want there to be a constructive discussion
 around all this. I've done my best to keep my tone as non-snarky as I could
 while still clearly stating my concerns. I've also spent a few hours
 reviewing the current code and docs. Hopefully this contribution will be
 beneficial in helping the discussion along.

Thanks, I think it does.

 For what it's worth, I don't have a clear understanding of why the Marconi
 developer community chose to create a new queue rather than an abstraction
 layer on top of existing queues. While my lack of understanding there isn't
 a technical objection to the project, I hope they can address this in the
 aforementioned FAQ.
 
 The reference storage implementation is MongoDB. AFAIK, no integrated
 projects require an AGPL package to be installed, and from the discussions
 I've been part of, that would be a show-stopper if Marconi required
 MongoDB. As I understand it, this is why sqlalchemy support was required
 when Marconi was incubated. Saying Marconi also supports SQLA is
 disingenuous because it is a second-class citizen, with incomplete API
 support, is clearly not the recommended storage driver, and is going to be
 unusuable at scale (I'll come back to this point in a bit).
 
 Let me ask this. Which back-end is tested in Marconi's CI? That is the
 back-end that matters right now. If that's Mongo, I think there's a
 problem. If it's SQLA, then I think Marconi should declare any features
 which SQLA doesn't support to be optional extensions, make SQLA the
 default, and clearly document how to deploy Marconi at scale with a SQLA
 back-end.
 
 
 Then there's the db-as-a-queue antipattern, and the problems that I have
 seen result from this in the past... I'm not the only one in the OpenStack
 community with some experience scaling MySQL databases. Surely others have
 their own experiences and opinions on whether a database (whether MySQL or
 Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall
 over from resource contention. I would hope that those members of the
 community would chime into this discussion at some point. Perhaps they'll
 even disagree with me!
 
 A quick look at the code around claim (which, it seems, will be the most
 commonly requested action) shows why this is an antipattern.
 
 The MongoDB storage driver for claims requires _four_ queries just to get a
 message, with a serious race condition (but at least it's documented in the
 code) if multiple clients are claiming messages in the same queue at the
 same time. For reference:
 
 https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119
 
 The SQLAlchemy storage driver is no better. It's issuing _five_ queries
 just to claim a message (including a query to purge all expired claims
 every time a new claim is created). The performance of this transaction
 under high load is probably going to be bad...
 
 https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83
 
 Lastly, it looks like the Marconi storage drivers assume the storage
 back-end to be infinitely scalable. AFAICT, the mongo storage driver
 supports mongo's native sharding -- which I'm happy to see -- but the SQLA
 driver does not appear to support anything equivalent for other back-ends,
 eg. MySQL. This relegates any deployment using the SQLA backend to the
 scale of only what one database instance can handle. It's unsuitable for
 any large-scale deployment. Folks who don't want to use Mongo are likely to
 use MySQL and will be promptly bitten by Marconi's lack of scalability with
 this back end.
 
 While there is a lot of room to improve the messaging around what/how/why,
 and I think a FAQ will be very helpful, I don't think that Marconi should
 graduate this cycle because:
 (1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's
 graduation;
 (2) deploying Marconi with sqla+mysql will result in an incomplete and
 unscalable service.
 
 It's possible that I'm wrong about the scalability of Marconi with sqla +
 mysql. If anyone feels that this is going to perform blazingly fast on a
 single mysql db backend, please publish a benchmark and I'll be very happy
 to be proved wrong. To be meaningful, it must have a high concurrency of
 clients creating and claiming messages with (num queues)  (num clients)
  (num messages), and all clients polling on a reasonably short interval,
 based on what ever the recommended client-rate-limit is. I'd like the test
 to be repeated with both Mongo and SQLA back-ends on the same hardware for
 comparison.

My guess (and it's just a guess) is that the Marconi developers almost
wish their SQLA driver didn't exist after reading your email because of
the confusion it's causing. My understanding is that the SQLA driver is
not intended for production usage.

If Marconi just had a MongoDB driver, I think 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Flavio Percoco

On 20/03/14 09:09 +, Mark McLoughlin wrote:

On Wed, 2014-03-19 at 12:37 -0700, Devananda van der Veen wrote:

Let me start by saying that I want there to be a constructive discussion
around all this. I've done my best to keep my tone as non-snarky as I could
while still clearly stating my concerns. I've also spent a few hours
reviewing the current code and docs. Hopefully this contribution will be
beneficial in helping the discussion along.


Thanks, I think it does.


Very helpful, Thanks!




For what it's worth, I don't have a clear understanding of why the Marconi
developer community chose to create a new queue rather than an abstraction
layer on top of existing queues. While my lack of understanding there isn't
a technical objection to the project, I hope they can address this in the
aforementioned FAQ.

The reference storage implementation is MongoDB. AFAIK, no integrated
projects require an AGPL package to be installed, and from the discussions
I've been part of, that would be a show-stopper if Marconi required
MongoDB. As I understand it, this is why sqlalchemy support was required
when Marconi was incubated. Saying Marconi also supports SQLA is
disingenuous because it is a second-class citizen, with incomplete API
support, is clearly not the recommended storage driver, and is going to be
unusuable at scale (I'll come back to this point in a bit).

Let me ask this. Which back-end is tested in Marconi's CI? That is the
back-end that matters right now. If that's Mongo, I think there's a
problem. If it's SQLA, then I think Marconi should declare any features
which SQLA doesn't support to be optional extensions, make SQLA the
default, and clearly document how to deploy Marconi at scale with a SQLA
back-end.


Then there's the db-as-a-queue antipattern, and the problems that I have
seen result from this in the past... I'm not the only one in the OpenStack
community with some experience scaling MySQL databases. Surely others have
their own experiences and opinions on whether a database (whether MySQL or
Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall
over from resource contention. I would hope that those members of the
community would chime into this discussion at some point. Perhaps they'll
even disagree with me!

A quick look at the code around claim (which, it seems, will be the most
commonly requested action) shows why this is an antipattern.

The MongoDB storage driver for claims requires _four_ queries just to get a
message, with a serious race condition (but at least it's documented in the
code) if multiple clients are claiming messages in the same queue at the
same time. For reference:

https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119

The SQLAlchemy storage driver is no better. It's issuing _five_ queries
just to claim a message (including a query to purge all expired claims
every time a new claim is created). The performance of this transaction
under high load is probably going to be bad...

https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83

Lastly, it looks like the Marconi storage drivers assume the storage
back-end to be infinitely scalable. AFAICT, the mongo storage driver
supports mongo's native sharding -- which I'm happy to see -- but the SQLA
driver does not appear to support anything equivalent for other back-ends,
eg. MySQL. This relegates any deployment using the SQLA backend to the
scale of only what one database instance can handle. It's unsuitable for
any large-scale deployment. Folks who don't want to use Mongo are likely to
use MySQL and will be promptly bitten by Marconi's lack of scalability with
this back end.

While there is a lot of room to improve the messaging around what/how/why,
and I think a FAQ will be very helpful, I don't think that Marconi should
graduate this cycle because:
(1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's
graduation;
(2) deploying Marconi with sqla+mysql will result in an incomplete and
unscalable service.

It's possible that I'm wrong about the scalability of Marconi with sqla +
mysql. If anyone feels that this is going to perform blazingly fast on a
single mysql db backend, please publish a benchmark and I'll be very happy
to be proved wrong. To be meaningful, it must have a high concurrency of
clients creating and claiming messages with (num queues)  (num clients)
 (num messages), and all clients polling on a reasonably short interval,
based on what ever the recommended client-rate-limit is. I'd like the test
to be repeated with both Mongo and SQLA back-ends on the same hardware for
comparison.


My guess (and it's just a guess) is that the Marconi developers almost
wish their SQLA driver didn't exist after reading your email because of
the confusion it's causing. My understanding is that the SQLA driver is
not intended for production usage.


Yeah, pretty much the feeling now! :D

In a more 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Malini Kamalambal

Let me start by saying that I want there to be a constructive discussion around 
all this. I've done my best to keep my tone as non-snarky as I could while 
still clearly stating my concerns. I've also spent a few hours reviewing the 
current code and docs. Hopefully this contribution will be beneficial in 
helping the discussion along.

For what it's worth, I don't have a clear understanding of why the Marconi 
developer community chose to create a new queue rather than an abstraction 
layer on top of existing queues. While my lack of understanding there isn't a 
technical objection to the project, I hope they can address this in the 
aforementioned FAQ.

The reference storage implementation is MongoDB. AFAIK, no integrated projects 
require an AGPL package to be installed, and from the discussions I've been 
part of, that would be a show-stopper if Marconi required MongoDB. As I 
understand it, this is why sqlalchemy support was required when Marconi was 
incubated. Saying Marconi also supports SQLA is disingenuous because it is a 
second-class citizen, with incomplete API support, is clearly not the 
recommended storage driver, and is going to be unusuable at scale (I'll come 
back to this point in a bit).

Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end 
that matters right now. If that's Mongo, I think there's a problem. If it's 
SQLA, then I think Marconi should declare any features which SQLA doesn't 
support to be optional extensions, make SQLA the default, and clearly document 
how to deploy Marconi at scale with a SQLA back-end.


[drivers]
storage = mongodb

[drivers:storage:mongodb]
uri = mongodb://localhost:27017/marconi



http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/etc/marconi/marconi.conf.txt.gz



On an related note I see that marconi has no gating integration tests.
https://review.openstack.org/#/c/81094/2


But then again that is documented in 
https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements
We have a devstack-gate job running and will be making it voting this week.


Of the non-gating integration test job, I only see one marconi test being run: 
tempest.api.queuing.test_queues.TestQueues.test_create_queue
 
http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/testr_results.html.gz



I have a separate thread started on the graduation gating requirements w.r.t 
Tempest.
The single test we have on Tempest was a result of the one-liner requirement ' 
'Project must have a basic devstack-gate job set up'.
The subsequent discussion in openstack qa meeting lead me to believe that the 
'basic' job we have is good enough.
Please refer to the email 'Graduation Requirements + Scope of Tempest' for more 
details regarding this.

But that does not mean that 'the single tempest test' is all we have to verify 
the Marconi functionality.
We have had a robust test suite (unit  functional tests – with lots of 
positive  negative test scenarios)for a very long time in Marconi.
See 
http://logs.openstack.org/33/81033/2/check/gate-marconi-python27/35822df/testr_results.html.gz
These tests are run against a sqlite backend.
The gating tests have been using sqlalchemy driver ever since we have had it.
Hope that clarifies !

- Malini





Then there's the db-as-a-queue antipattern, and the problems that I have seen 
result from this in the past... I'm not the only one in the OpenStack community 
with some experience scaling MySQL databases. Surely others have their own 
experiences and opinions on whether a database (whether MySQL or Mongo or 
Postgres or ...) can be used in such a way _at_scale_ and not fall over from 
resource contention. I would hope that those members of the community would 
chime into this discussion at some point. Perhaps they'll even disagree with me!

A quick look at the code around claim (which, it seems, will be the most 
commonly requested action) shows why this is an antipattern.

The MongoDB storage driver for claims requires _four_ queries just to get a 
message, with a serious race condition (but at least it's documented in the 
code) if multiple clients are claiming messages in the same queue at the same 
time. For reference:
  
https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119

The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to 
claim a message (including a query to purge all expired claims every time a new 
claim is created). The performance of this transaction under high load is 
probably going to be bad...
  
https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83

Lastly, it looks like the Marconi storage drivers assume the storage back-end 
to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's 
native sharding -- which I'm happy to see -- but the SQLA driver does not 
appear to support anything equivalent 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-03-19 03:01:19 -0700:
 FWIW, I think there's a value on having an sqlalchemy driver. It's
 helpful for newcomers, it integrates perfectly with the gate and I
 don't want to impose other folks what they should or shouldn't use in
 production. Marconi may be providing a data API but it's still
 non-opinionated and it wants to support other drivers - or at least provide
 a nice way to implement them. Working on sqlalchemy instead of amqp (or
 redis) was decided in the incubation meeting.
 
 But again, It's an optional driver that we're talking about here. As
 of now, our recommended driver is mongodb's and as I already mentioned
 in this email, we'll start working on an amqp's one, which will likely
 become the recommended one. There's also support for redis.
 
 As already mentioned, we have plans to complete the redis driver and
 write an amqp based one and let them both live in the code base.
 Having support for different storage dirvers makes marconi's sharding
 feature more valuable.
 
 

Just to steer this back to technical development discussions a bit:

I suggest the sqla driver be removed. It will never be useful as a queue
backend. It will confuse newcomers because they'll see the schema and
think that it will work and then use it, and then they find out that SQL
is just not suitable for queueing about the time that they're taking a
fire extinguisher to their rack.

Just use Redis is pretty interesting as a counter to the concerns
MongoDB's license situation. Redis, AFAIK, does not have many of the
features that make MongoDB attractive for backing a queue. The primary
one that I would cite is sharding. While MongoDB will manage sharding
for you, Redis works more like Memcached when you want to partition[1].
This is particularly problematic for an operational _storage_ product
as that means if you want to offline a node, you are going to have to
consider what kind of partitioning Marconi has used, and how it will
affect the availability and durability of the data.

All of this to say, if Marconi is going to be high scale, I agree that
SQL can't be used, and even that MongoDB, on technical abilities alone,
makes some sense. But I think what might be simpler is if Marconi just
shifted focus to make the API more like AMQP, and used AMQP on its
backend. This allows cloud operators to deploy what they're used to for
OpenStack, and would still give users something they're comfortable with
(an HTTP API) to consume it.

[1] http://redis.io/topics/partitioning

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Ozgur Akan
Hi,

Marconi manages its own sharding  (doesn't rely on mongoDB's own sharding)
in order to have more control on where data is stored. Sharding is done
based on project_id + queue_id and stored in a catalog. Since Marconi
manages it's own shards, it can use the same logic with any storage. If it
was redis, scaling wouldn't be any different than having mongoDB as
backend.

Marconi (with some work) can also offer different backends at the same time
to provide different performance / durability options to it's users. And
users here are not operators but actual customers/users that are using the
queuing service.

MongoDB seems to be a good choice as a storage backend as it doesn't need
VRRP during failover which makes it much easier to deploy on top of
OpenStack compute at times when moving a VIP can' t be done via VRRP. MySQL
for example would require a VIP in order to endure a failed master. monoDB
is relatively easier to manage (scale) when you have to migrate whole data
from one cluster to another.

I don't think RDBM is a bad idea but might not be practical. Mysql without
sql interface can be fast;
https://blogs.oracle.com/mysqlinnodb/entry/mysql_5_7_3_deep

best wishes,
Oz


On Thu, Mar 20, 2014 at 2:56 PM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Flavio Percoco's message of 2014-03-19 03:01:19 -0700:
  FWIW, I think there's a value on having an sqlalchemy driver. It's
  helpful for newcomers, it integrates perfectly with the gate and I
  don't want to impose other folks what they should or shouldn't use in
  production. Marconi may be providing a data API but it's still
  non-opinionated and it wants to support other drivers - or at least
 provide
  a nice way to implement them. Working on sqlalchemy instead of amqp (or
  redis) was decided in the incubation meeting.
 
  But again, It's an optional driver that we're talking about here. As
  of now, our recommended driver is mongodb's and as I already mentioned
  in this email, we'll start working on an amqp's one, which will likely
  become the recommended one. There's also support for redis.
 
  As already mentioned, we have plans to complete the redis driver and
  write an amqp based one and let them both live in the code base.
  Having support for different storage dirvers makes marconi's sharding
  feature more valuable.
 
 

 Just to steer this back to technical development discussions a bit:

 I suggest the sqla driver be removed. It will never be useful as a queue
 backend. It will confuse newcomers because they'll see the schema and
 think that it will work and then use it, and then they find out that SQL
 is just not suitable for queueing about the time that they're taking a
 fire extinguisher to their rack.

 Just use Redis is pretty interesting as a counter to the concerns
 MongoDB's license situation. Redis, AFAIK, does not have many of the
 features that make MongoDB attractive for backing a queue. The primary
 one that I would cite is sharding. While MongoDB will manage sharding
 for you, Redis works more like Memcached when you want to partition[1].
 This is particularly problematic for an operational _storage_ product
 as that means if you want to offline a node, you are going to have to
 consider what kind of partitioning Marconi has used, and how it will
 affect the availability and durability of the data.

 All of this to say, if Marconi is going to be high scale, I agree that
 SQL can't be used, and even that MongoDB, on technical abilities alone,
 makes some sense. But I think what might be simpler is if Marconi just
 shifted focus to make the API more like AMQP, and used AMQP on its
 backend. This allows cloud operators to deploy what they're used to for
 OpenStack, and would still give users something they're comfortable with
 (an HTTP API) to consume it.

 [1] http://redis.io/topics/partitioning

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-20 Thread Clint Byrum
Excerpts from Ozgur Akan's message of 2014-03-20 14:18:27 -0700:
 Hi,
 
 Marconi manages its own sharding  (doesn't rely on mongoDB's own sharding)
 in order to have more control on where data is stored. Sharding is done
 based on project_id + queue_id and stored in a catalog. Since Marconi
 manages it's own shards, it can use the same logic with any storage. If it
 was redis, scaling wouldn't be any different than having mongoDB as
 backend.
 

Cool. Said catalog is duplicated globally then?

 Marconi (with some work) can also offer different backends at the same time
 to provide different performance / durability options to it's users. And
 users here are not operators but actual customers/users that are using the
 queuing service.


Right, sort of like with AMQP how you can ask for reliable delivery or
not?

 MongoDB seems to be a good choice as a storage backend as it doesn't need
 VRRP during failover which makes it much easier to deploy on top of
 OpenStack compute at times when moving a VIP can' t be done via VRRP. MySQL
 for example would require a VIP in order to endure a failed master. monoDB
 is relatively easier to manage (scale) when you have to migrate whole data
 from one cluster to another.


Using Galera, MySQL doesn't require a VIP approach either.

 I don't think RDBM is a bad idea but might not be practical. Mysql without
 sql interface can be fast;
 https://blogs.oracle.com/mysqlinnodb/entry/mysql_5_7_3_deep
 

The SQL isn't the only problem, and speed isn't the same as scalability
(Fast: Ferrari, Scalable: Bullet Train). You also have MVCC. In InnoDB,
just inserting, updating, and deleting millions of tiny rows in a
concurrent fashion will tie up threads and mutexes, and bog down InnoDB
with millions of tiny transactions.

The linked blog is dealing entirely with scaling excessive tiny reads,
which is important, but not really the problem Marconi faces. There's no
point in discussing how to try and make MySQL or any other MVCC database
work well as a queue backend.

IMO, Look at how QPID and RabbitMQ do durable messaging... that is
the model to copy. But that is why the original email was why isn't
Marconi just provisioning brokers? because those brokers have already
implemented this and it seems wasteful to try and do it again.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Stan Lagun
Kurt Griffiths,

Thanks for detailed explanation. Is there a comparison between Marconi and
existing message brokers anywhere that you can point me out?
I can see how your examples can be implemented using other brokers like
RabbitMQ. So why there is a need another broker? And what is wrong with
currently deployed RabbitMQ that most of OpenStack services are using
(typically via oslo.messaging RPC)?



On Wed, Mar 19, 2014 at 4:00 AM, Kurt Griffiths 
kurt.griffi...@rackspace.com wrote:

 I think we can agree that a data-plane API only makes sense if it is
 useful to a large number of web and mobile developers deploying their apps
 on OpenStack. Also, it only makes sense if it is cost-effective and
 scalable for operators who wish to deploy such a service.

 Marconi was born of practical experience and direct interaction with
 prospective users. When Marconi was kicked off a few summits ago, the
 community was looking for a multi-tenant messaging service to round out
 the OpenStack portfolio. Users were asking operators for something easier
 to work with and more web-friendly than established options such as AMQP.

 To that end, we started drafting an HTTP-based API specification that
 would afford several different messaging patterns, in order to support the
 use cases that users were bringing to the table. We did this completely in
 the open, and received lots of input from prospective users familiar with
 a variety of message broker solutions, including more cloudy ones like
 SQS and Iron.io.

 The resulting design was a hybrid that supported what you might call
 claim-based semantics ala SQS and feed-based semantics ala RSS.
 Application developers liked the idea of being able to use one or the
 other, or combine them to come up with new patterns according to their
 needs. For example:

 1. A video app can use Marconi to feed a worker pool of transcoders. When
 a video is uploaded, it is stored in Swift and a job message is posted to
 Marconi. Then, a worker claims the job and begins work on it. If the
 worker crashes, the claim expires and the message becomes available to be
 claimed by a different worker. Once the worker is finished with the job,
 it deletes the message so that another worker will not process it, and
 claims another message. Note that workers never list messages in this
 use case; those endpoints in the API are simply ignored.

 2. A backup service can use Marconi to communicate with hundreds of
 thousands of backup agents running on customers' machines. Since Marconi
 queues are extremely light-weight, the service can create a different
 queue for each agent, and additional queues to broadcast messages to all
 the agents associated with a single customer. In this last scenario, the
 service would post a message to a single queue and the agents would simply
 list the messages on that queue, and everyone would get the same message.
 This messaging pattern is emergent, and requires no special routing setup
 in advance from one queue to another.

 3. A metering service for an Internet application can use Marconi to
 aggregate usage data from a number of web heads. Each web head collects
 several minutes of data, then posts it to Marconi. A worker periodically
 claims the messages off the queue, performs the final aggregation and
 processing, and stores the results in a DB. So far, this messaging pattern
 is very much like example #1, above. However, since Marconi's API also
 affords the observer pattern via listing semantics, the metering service
 could run an auditor that logs the messages as they go through the queue
 in order to provide extremely valuable data for diagnosing problems in the
 aggregated data.

 Users are excited about what Marconi offers today, and we are continuing
 to evolve the API based on their feedback.

 Of course, app developers aren't the only audience Marconi needs to serve.
 Operators want something that is cost-effective, scales, and is
 customizable for the unique needs of their target market.

 While Marconi has plenty of room to improve (who doesn't?), here is where
 the project currently stands in these areas:

 1. Customizable. Marconi transport and storage drivers can be swapped out,
 and messages can be manipulated in-flight with custom filter drivers.
 Currently we have MongoDB and SQLAlchemy drivers, and are exploring Redis
 and AMQP brokers. Now, the v1.0 API does impose some constraints on the
 backend in order to support the use cases mentioned earlier. For example,
 an AMQP backend would only be able to support a subset of the current API.
 Operators occasionally ask about AMQP broker support, in particular, and
 we are exploring ways to evolve the API in order to support that.

 2. Scalable. Operators can use Marconi's HTTP transport to leverage their
 existing infrastructure and expertise in scaling out web heads. When it
 comes to the backend, for small deployments with minimal throughput needs,
 we are providing a SQLAlchemy driver as a 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Flavio Percoco

Kurt already gave a quite detailed explanation of why Marconi, what
can you do with it and where it's standing. I'll reply in-line:

On 19/03/14 10:17 +1300, Robert Collins wrote:

So this came up briefly at the tripleo sprint, and since I can't seem
to find a /why/ document
(https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers
and https://wiki.openstack.org/wiki/Marconi#Design don't supply this)
we decided at the TC meeting that I should raise it here.

Firstly, let me check my facts :) - Marconi is backed by a modular
'storage' layer which places some conceptual design constraints on the
storage backends that are possible (e.g. I rather expect a 0mq
implementation to be very tricky, at best (vs the RPC style front end
https://wiki.openstack.org/wiki/Marconi/specs/zmq/api/v1 )), and has a
hybrid control/data plane API implementation where one can call into
it to make queues etc, and to consume them.


Those docs refers to a transport driver not a storage driver. In
Marconi, it's possible to have different protocols on top of the API.
The current one is based on HTTP but there'll likely be others in the
future.

We've changed some things in the API to support amqp based storage drivers.
We had a session during the HKG summit about this and since then, we've
always kept amqp drivers in mind when doing changes on the API. I'm
not saying it's perfect, though.



The API for the queues is very odd from a queueing perspective -
https://wiki.openstack.org/wiki/Marconi/specs/api/v1#Get_a_Specific_Message
- you don't subscribe to the queue, you enumerate and ask for a single
message.


The current way to subscribe to queues is by using polling.
Subscribing is not just tight to the API but also the transport
itself. As mentioned above, we currently just have support for HTTP.

Also, enumerating is not necessary. For instance, claiming with limit
1 will consume one message.

(Side note: At the incubation meeting, it was recommended to not put
efforts on writing new transport but to stabilize the API and work an
a storage backend with a license != AGPL)


And the implementations in tree are mongodb (which is at best
contentious, due to the AGPL and many folks reasonable concerns about
it), and mysq.


Just to avoid misleading folks that are not familiar with marconi, I
just want to point out that the driver is based on sqlalchemy.


My desires around Marconi are:
- to make sure the queue we have is suitable for use by OpenStack
itself: we have a very strong culture around consolidating technology
choices, and it would be extremely odd to have Marconi be something
that isn't suitable to replace rabbitmq etc as the queue abstraction
in the fullness of time.


Although this could be done in the future, I've heard from many folks
in the community that replacing OpenStack's rabbitmq / qpid / etc layer
with Marconi is a no-go. I don't recall the exact reasons now but I
think I can grab them from logs or something (Unless those folks are
reading this email and want to chime in). FWIW, I'd be more than happy
to *experiment* with this in the future. Marconi is definitely not ready as-is.


- to make sure that deployers with scale / performance needs can have
that met by Marconi
- to make my life easy as a deployer ;)


This has been part of our daily reviews, work and designs. I'm sure
there's room for improvement, though.


So my questions are:
- why isn't the API a queue friendly API (e.g. like


Define *queue friendly*


https://github.com/twitter/kestrel - kestrel which uses the memcache
API, puts put into the queue, gets get from the queue). The current


I don't know kestrel but, how is this different from what Marconi does?


API looks like pretty much the worst case scenario there - CRUD rather
than submit/retrieve with blocking requests (e.g. longpoll vs poll).


I agree there are some limitations from using HTTP for this job, hence
the support for different transports. Just saying *the API is CRUD* is
again misleading and it doesn't highlight the value of having an HTTP
based transport. It's just wrong to think about marconi as *just
another queuing system* instead of considering the use-cases it's
trying to solve.

There's a rough support for websocket in an external project but:

1. It's not offical... yet.
2. It was written as a proof of concept for the transport layer.
3. It likely needs to be updated.

https://github.com/FlaPer87/marconi-websocket


- wouldn't it be better to expose other existing implementations of
HTTP message queues like nova does with hypervisors, rather than
creating our own one? E.g. HTTPSQS, RestMQ, Kestrel, queues.io.


We've discussed to have support for API extensions in order to allow
some deployments to expose features from a queuing technology that we
don't necessary consider part of the core API.


  - or even do what Trove does and expose the actual implementation directly?
- whats the plan to fix the API?


Fix the API?

For starters, moving away 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Thierry Carrez
Flavio Percoco wrote:
 On 19/03/14 10:17 +1300, Robert Collins wrote:
 My desires around Marconi are:
 - to make sure the queue we have is suitable for use by OpenStack
 itself: we have a very strong culture around consolidating technology
 choices, and it would be extremely odd to have Marconi be something
 that isn't suitable to replace rabbitmq etc as the queue abstraction
 in the fullness of time.
 
 Although this could be done in the future, I've heard from many folks
 in the community that replacing OpenStack's rabbitmq / qpid / etc layer
 with Marconi is a no-go. I don't recall the exact reasons now but I
 think I can grab them from logs or something (Unless those folks are
 reading this email and want to chime in). FWIW, I'd be more than happy
 to *experiment* with this in the future. Marconi is definitely not ready
 as-is.

That's the root of this thread. Marconi is not really designed to cover
Robert's use case, which would be to be consumed internally by OpenStack
as a message queue.

I classify Marconi as an application building block (IaaS+), a
convenient, SQS-like way for cloud application builders to pass data
around without having to spin up their own message queue in a VM. I
think that's a relevant use case, as long as performance is not an order
of magnitude worse than the spin up your own in a VM alternative.
Personally I don't consider serving the internal needs of OpenStack as
a feature blocker. It would be nice if it could, but the IaaS+ use case
is IMHO compelling enough.

-- 
Thierry Carrez (ttx)



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Mark McLoughlin
On Wed, 2014-03-19 at 10:17 +1300, Robert Collins wrote:
 So this came up briefly at the tripleo sprint, and since I can't seem
 to find a /why/ document
 (https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers
 and https://wiki.openstack.org/wiki/Marconi#Design don't supply this)

I think we need a slight reset on this discussion. The way this email
was phrased gives a strong sense of Marconi is a dumb idea, it's going
to take a lot to persuade me otherwise.

That's not a great way to start a conversation, but it's easy to
understand - a TC member sees a project on the cusp of graduating and,
when they finally get a chance to look closely at it, a number of things
don't make much sense. Wait! Stop! WTF! is a natural reaction if you
think a bad decision is about to be made.

We've all got to understand how pressurized a situation these graduation
and incubation discussions are. Projects put an immense amount of work
into proving themselves worthy of being an integrated project, they get
fairly short bursts of interaction with the TC, TC members aren't
necessarily able to do a huge amount of due diligence in advance and yet
TC members are really, really keen to avoid either undermining a healthy
project around some cool new technology or undermining OpenStack by
including an unhealthy project or sub-par technology.

And then there's the time pressure where a decision has to be made by a
certain date and if that decision is not this time, the six months
delay until the next chance for a positive decision can be really
draining on motivation and momentum when everybody had been so focused
on getting a positive decision this time around.

We really need cool heads here and, above all, to try our best to assume
good faith, intentions and ability on both sides.


Some of the questions Robert asked are common questions and I know they
were discussed during the incubation review. However, the questions
persist and it's really important that TC members (and the community at
large) feel they can stand behind the answers to those questions. If I'm
chatting to someone and they ask me why does OpenStack need to
implement its own messaging broker?, I need to have a good answer.

How about we do our best to put the implications for the graduation
decision aside for a bit and focus on collaboratively pulling together a
FAQ that everyone can buy into? The raised questions and answers
section of the incubation review linked above is a good start, but I
think we can take this email as feedback that those questions and
answers need much improvement.

This could be a good pattern for all new projects - if the TC and the
new project can't work together to draft a solid FAQ like this, then
it's not a good sign for the project.

See below for my attempt to summarize the questions and how we might go
about answering them. Is this a reasonable start?

Mark.


Why isn't Marconi simply an API for provisioning and managing AMQP, Kestrel,
ZeroMQ, etc. brokers and queues? Why is a new broker implementation needed?

 = I'm not sure I can summarize the answer here - the need for a HTTP data
plane API, the need for multi-tenancy, etc.? Maybe a table listing the
required features and whether they're provided by these existing solutions.

Maybe there's also an element of we think we can do a better job. If so,
the point probably worth addressing is OpenStack shouldn't attempt to write
a new database, or a new hypervisor, or a new SDN controller, or a new block
storage implementation ... so why should we write a implement a new message
broker? If this is just a bad analogy, explain why?

Implementing a message queue using an SQL DB seems like a bad idea, why is
Marconi doing that?

 = Perhaps explain why MongoDB is a good storage technology for this use case
and the SQLalchemy driver is just a toy.

Marconi's default driver depends on MongoDB which is licensed under the AGPL.
This license is currently a no-go for some organizations, so what plans does
Marconi have to implement another production-ready storage driver that supports
all API features?

 = Discuss the Redis driver plans?

Is Marconi designed to be suitable for use by OpenStack itself?

 = Discuss that it's not currently in scope and why not. In what way does the
OpenStack use case differ from the applications Marconi's current API
focused on?

How should a client subscribe to a queue?

 = Discuss that it's not by GET /messages but instead POST /claims?limit=N




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Russell Bryant
On 03/19/2014 07:49 AM, Thierry Carrez wrote:
 Flavio Percoco wrote:
 On 19/03/14 10:17 +1300, Robert Collins wrote:
 My desires around Marconi are: - to make sure the queue we have
 is suitable for use by OpenStack itself: we have a very strong
 culture around consolidating technology choices, and it would
 be extremely odd to have Marconi be something that isn't
 suitable to replace rabbitmq etc as the queue abstraction in
 the fullness of time.
 
 Although this could be done in the future, I've heard from many
 folks in the community that replacing OpenStack's rabbitmq / qpid
 / etc layer with Marconi is a no-go. I don't recall the exact
 reasons now but I think I can grab them from logs or something
 (Unless those folks are reading this email and want to chime in).
 FWIW, I'd be more than happy to *experiment* with this in the
 future. Marconi is definitely not ready as-is.
 
 That's the root of this thread. Marconi is not really designed to
 cover Robert's use case, which would be to be consumed internally
 by OpenStack as a message queue.
 
 I classify Marconi as an application building block (IaaS+), a 
 convenient, SQS-like way for cloud application builders to pass
 data around without having to spin up their own message queue in a
 VM. I think that's a relevant use case, as long as performance is
 not an order of magnitude worse than the spin up your own in a VM
 alternative. Personally I don't consider serving the internal
 needs of OpenStack as a feature blocker. It would be nice if it
 could, but the IaaS+ use case is IMHO compelling enough.

This is my view, as well.  I never considered replacing OpenStack's
current use of messaging within the scope of Marconi.

It's possible we could have yet another project that is a queue
provisioning project in the style of Trove.  I'm not sure that
actually makes sense (an application template you can deploy may
suffice here).  In any case, I view OpenStack's use case and anyone
wanting to use qpid/rabbit/whatever directly separate and out of scope
of Marconi.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Robert Collins
On 20 March 2014 01:06, Mark McLoughlin mar...@redhat.com wrote:

 I think we need a slight reset on this discussion. The way this email
 was phrased gives a strong sense of Marconi is a dumb idea, it's going
 to take a lot to persuade me otherwise.

Thanks Mark, thats a great point to make. I don't think Marconi is
dumb, but I sure don't understand why list of things discussed, and
that you've very nicely rephrased here. Thank you!

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Devananda van der Veen
Let me start by saying that I want there to be a constructive discussion
around all this. I've done my best to keep my tone as non-snarky as I could
while still clearly stating my concerns. I've also spent a few hours
reviewing the current code and docs. Hopefully this contribution will be
beneficial in helping the discussion along.

For what it's worth, I don't have a clear understanding of why the Marconi
developer community chose to create a new queue rather than an abstraction
layer on top of existing queues. While my lack of understanding there isn't
a technical objection to the project, I hope they can address this in the
aforementioned FAQ.

The reference storage implementation is MongoDB. AFAIK, no integrated
projects require an AGPL package to be installed, and from the discussions
I've been part of, that would be a show-stopper if Marconi required
MongoDB. As I understand it, this is why sqlalchemy support was required
when Marconi was incubated. Saying Marconi also supports SQLA is
disingenuous because it is a second-class citizen, with incomplete API
support, is clearly not the recommended storage driver, and is going to be
unusuable at scale (I'll come back to this point in a bit).

Let me ask this. Which back-end is tested in Marconi's CI? That is the
back-end that matters right now. If that's Mongo, I think there's a
problem. If it's SQLA, then I think Marconi should declare any features
which SQLA doesn't support to be optional extensions, make SQLA the
default, and clearly document how to deploy Marconi at scale with a SQLA
back-end.


Then there's the db-as-a-queue antipattern, and the problems that I have
seen result from this in the past... I'm not the only one in the OpenStack
community with some experience scaling MySQL databases. Surely others have
their own experiences and opinions on whether a database (whether MySQL or
Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall
over from resource contention. I would hope that those members of the
community would chime into this discussion at some point. Perhaps they'll
even disagree with me!

A quick look at the code around claim (which, it seems, will be the most
commonly requested action) shows why this is an antipattern.

The MongoDB storage driver for claims requires _four_ queries just to get a
message, with a serious race condition (but at least it's documented in the
code) if multiple clients are claiming messages in the same queue at the
same time. For reference:

https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119

The SQLAlchemy storage driver is no better. It's issuing _five_ queries
just to claim a message (including a query to purge all expired claims
every time a new claim is created). The performance of this transaction
under high load is probably going to be bad...

https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83

Lastly, it looks like the Marconi storage drivers assume the storage
back-end to be infinitely scalable. AFAICT, the mongo storage driver
supports mongo's native sharding -- which I'm happy to see -- but the SQLA
driver does not appear to support anything equivalent for other back-ends,
eg. MySQL. This relegates any deployment using the SQLA backend to the
scale of only what one database instance can handle. It's unsuitable for
any large-scale deployment. Folks who don't want to use Mongo are likely to
use MySQL and will be promptly bitten by Marconi's lack of scalability with
this back end.

While there is a lot of room to improve the messaging around what/how/why,
and I think a FAQ will be very helpful, I don't think that Marconi should
graduate this cycle because:
(1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's
graduation;
(2) deploying Marconi with sqla+mysql will result in an incomplete and
unscalable service.

It's possible that I'm wrong about the scalability of Marconi with sqla +
mysql. If anyone feels that this is going to perform blazingly fast on a
single mysql db backend, please publish a benchmark and I'll be very happy
to be proved wrong. To be meaningful, it must have a high concurrency of
clients creating and claiming messages with (num queues)  (num clients)
 (num messages), and all clients polling on a reasonably short interval,
based on what ever the recommended client-rate-limit is. I'd like the test
to be repeated with both Mongo and SQLA back-ends on the same hardware for
comparison.


Regards,
Devananda

[*]
https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Fox, Kevin M
Can someone please give more detail into why MongoDB being AGPL is a problem? 
The drivers that Marconi uses are Apache2 licensed, MongoDB is separated by the 
network stack and MongoDB is not exposed to the Marconi users so I don't think 
the 'A' part of the GPL really kicks in at all since the MongoDB user is the 
cloud provider, not the cloud end user?

Thanks,
Kevin


From: Devananda van der Veen [devananda@gmail.com]
Sent: Wednesday, March 19, 2014 12:37 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs 
a provisioning API?

Let me start by saying that I want there to be a constructive discussion around 
all this. I've done my best to keep my tone as non-snarky as I could while 
still clearly stating my concerns. I've also spent a few hours reviewing the 
current code and docs. Hopefully this contribution will be beneficial in 
helping the discussion along.

For what it's worth, I don't have a clear understanding of why the Marconi 
developer community chose to create a new queue rather than an abstraction 
layer on top of existing queues. While my lack of understanding there isn't a 
technical objection to the project, I hope they can address this in the 
aforementioned FAQ.

The reference storage implementation is MongoDB. AFAIK, no integrated projects 
require an AGPL package to be installed, and from the discussions I've been 
part of, that would be a show-stopper if Marconi required MongoDB. As I 
understand it, this is why sqlalchemy support was required when Marconi was 
incubated. Saying Marconi also supports SQLA is disingenuous because it is a 
second-class citizen, with incomplete API support, is clearly not the 
recommended storage driver, and is going to be unusuable at scale (I'll come 
back to this point in a bit).

Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end 
that matters right now. If that's Mongo, I think there's a problem. If it's 
SQLA, then I think Marconi should declare any features which SQLA doesn't 
support to be optional extensions, make SQLA the default, and clearly document 
how to deploy Marconi at scale with a SQLA back-end.


Then there's the db-as-a-queue antipattern, and the problems that I have seen 
result from this in the past... I'm not the only one in the OpenStack community 
with some experience scaling MySQL databases. Surely others have their own 
experiences and opinions on whether a database (whether MySQL or Mongo or 
Postgres or ...) can be used in such a way _at_scale_ and not fall over from 
resource contention. I would hope that those members of the community would 
chime into this discussion at some point. Perhaps they'll even disagree with me!

A quick look at the code around claim (which, it seems, will be the most 
commonly requested action) shows why this is an antipattern.

The MongoDB storage driver for claims requires _four_ queries just to get a 
message, with a serious race condition (but at least it's documented in the 
code) if multiple clients are claiming messages in the same queue at the same 
time. For reference:
  
https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119

The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to 
claim a message (including a query to purge all expired claims every time a new 
claim is created). The performance of this transaction under high load is 
probably going to be bad...
  
https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83

Lastly, it looks like the Marconi storage drivers assume the storage back-end 
to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's 
native sharding -- which I'm happy to see -- but the SQLA driver does not 
appear to support anything equivalent for other back-ends, eg. MySQL. This 
relegates any deployment using the SQLA backend to the scale of only what one 
database instance can handle. It's unsuitable for any large-scale deployment. 
Folks who don't want to use Mongo are likely to use MySQL and will be promptly 
bitten by Marconi's lack of scalability with this back end.

While there is a lot of room to improve the messaging around what/how/why, and 
I think a FAQ will be very helpful, I don't think that Marconi should graduate 
this cycle because:
(1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's 
graduation;
(2) deploying Marconi with sqla+mysql will result in an incomplete and 
unscalable service.

It's possible that I'm wrong about the scalability of Marconi with sqla + 
mysql. If anyone feels that this is going to perform blazingly fast on a single 
mysql db backend, please publish a benchmark and I'll be very happy to be 
proved wrong. To be meaningful, it must have a high concurrency of clients 
creating and claiming messages with (num queues)  

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Fox, Kevin M
Its my understanding that the only case the A in the AGPL would kick in is if 
the cloud provider made a change to MongoDB and exposed the MongoDB instance to 
users. Then the users would have to be able to download the changed code. Since 
Marconi's in front, the user is Marconi, and wouldn't ever want to download the 
source. As far as I can tell, in this use case, the AGPL'ed MongoDB is not 
really any different then the GPL'ed MySQL in footprint here. MySQL is 
acceptable, so why isn't MongoDB?

It would be good to get legal's official take on this. It would be a shame to 
make major architectural decisions based on license assumptions that turn out 
not to be true. I'm cc-ing them.

Thanks,
Kevin

From: Chris Friesen [chris.frie...@windriver.com]
Sent: Wednesday, March 19, 2014 2:24 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs 
a provisioning API?

On 03/19/2014 02:24 PM, Fox, Kevin M wrote:
 Can someone please give more detail into why MongoDB being AGPL is a
 problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is
 separated by the network stack and MongoDB is not exposed to the Marconi
 users so I don't think the 'A' part of the GPL really kicks in at all
 since the MongoDB user is the cloud provider, not the cloud end user?

Even if MongoDB was exposed to end-users, would that be a problem?

Obviously the source to MongoDB would need to be made available
(presumably it already is) but does the AGPL licence contaminate the
Marconi stuff?  I would have thought that would fall under mere
aggregation.

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Chris Friesen

On 03/19/2014 02:24 PM, Fox, Kevin M wrote:

Can someone please give more detail into why MongoDB being AGPL is a
problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is
separated by the network stack and MongoDB is not exposed to the Marconi
users so I don't think the 'A' part of the GPL really kicks in at all
since the MongoDB user is the cloud provider, not the cloud end user?


Even if MongoDB was exposed to end-users, would that be a problem?

Obviously the source to MongoDB would need to be made available 
(presumably it already is) but does the AGPL licence contaminate the 
Marconi stuff?  I would have thought that would fall under mere 
aggregation.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Sylvain Bauza
2014-03-19 22:38 GMT+01:00 Fox, Kevin M kevin@pnnl.gov:

 Its my understanding that the only case the A in the AGPL would kick in is
 if the cloud provider made a change to MongoDB and exposed the MongoDB
 instance to users. Then the users would have to be able to download the
 changed code. Since Marconi's in front, the user is Marconi, and wouldn't
 ever want to download the source. As far as I can tell, in this use case,
 the AGPL'ed MongoDB is not really any different then the GPL'ed MySQL in
 footprint here. MySQL is acceptable, so why isn't MongoDB?



MongoDB is AGPL but MongoDB drivers are Apache licenced [1]
GPL contamination should not happen if we consider integrating only drivers
in the code.

[1] http://www.mongodb.org/about/licensing/
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Joe Gordon
On Wed, Mar 19, 2014 at 12:37 PM, Devananda van der Veen 
devananda@gmail.com wrote:

 Let me start by saying that I want there to be a constructive discussion
 around all this. I've done my best to keep my tone as non-snarky as I could
 while still clearly stating my concerns. I've also spent a few hours
 reviewing the current code and docs. Hopefully this contribution will be
 beneficial in helping the discussion along.

 For what it's worth, I don't have a clear understanding of why the Marconi
 developer community chose to create a new queue rather than an abstraction
 layer on top of existing queues. While my lack of understanding there isn't
 a technical objection to the project, I hope they can address this in the
 aforementioned FAQ.

 The reference storage implementation is MongoDB. AFAIK, no integrated
 projects require an AGPL package to be installed, and from the discussions
 I've been part of, that would be a show-stopper if Marconi required
 MongoDB. As I understand it, this is why sqlalchemy support was required
 when Marconi was incubated. Saying Marconi also supports SQLA is
 disingenuous because it is a second-class citizen, with incomplete API
 support, is clearly not the recommended storage driver, and is going to be
 unusuable at scale (I'll come back to this point in a bit).

 Let me ask this. Which back-end is tested in Marconi's CI? That is the
 back-end that matters right now. If that's Mongo, I think there's a
 problem. If it's SQLA, then I think Marconi should declare any features
 which SQLA doesn't support to be optional extensions, make SQLA the
 default, and clearly document how to deploy Marconi at scale with a SQLA
 back-end.


[drivers]
storage = mongodb

[drivers:storage:mongodb]
uri = mongodb://localhost:27017/marconi



http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/etc/marconi/marconi.conf.txt.gz



On an related note I see that marconi has no gating integration tests.
https://review.openstack.org/#/c/81094/2

But then again that is documented in
https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements
 We have a devstack-gate job running and will be making it voting this week.

Of the non-gating integration test job, I only see one marconi test being
run: tempest.api.queuing.test_queues.TestQueues.test_create_queue

http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/testr_results.html.gz




 Then there's the db-as-a-queue antipattern, and the problems that I have
 seen result from this in the past... I'm not the only one in the OpenStack
 community with some experience scaling MySQL databases. Surely others have
 their own experiences and opinions on whether a database (whether MySQL or
 Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall
 over from resource contention. I would hope that those members of the
 community would chime into this discussion at some point. Perhaps they'll
 even disagree with me!

 A quick look at the code around claim (which, it seems, will be the most
 commonly requested action) shows why this is an antipattern.

 The MongoDB storage driver for claims requires _four_ queries just to get
 a message, with a serious race condition (but at least it's documented in
 the code) if multiple clients are claiming messages in the same queue at
 the same time. For reference:

 https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119

 The SQLAlchemy storage driver is no better. It's issuing _five_ queries
 just to claim a message (including a query to purge all expired claims
 every time a new claim is created). The performance of this transaction
 under high load is probably going to be bad...

 https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83

 Lastly, it looks like the Marconi storage drivers assume the storage
 back-end to be infinitely scalable. AFAICT, the mongo storage driver
 supports mongo's native sharding -- which I'm happy to see -- but the SQLA
 driver does not appear to support anything equivalent for other back-ends,
 eg. MySQL. This relegates any deployment using the SQLA backend to the
 scale of only what one database instance can handle. It's unsuitable for
 any large-scale deployment. Folks who don't want to use Mongo are likely to
 use MySQL and will be promptly bitten by Marconi's lack of scalability with
 this back end.

 While there is a lot of room to improve the messaging around what/how/why,
 and I think a FAQ will be very helpful, I don't think that Marconi should
 graduate this cycle because:
 (1) support for a non-AGPL-backend is a legal requirement [*] for
 Marconi's graduation;
 (2) deploying Marconi with sqla+mysql will result in an incomplete and
 unscalable service.


++



 It's possible that I'm wrong about the scalability of Marconi with sqla +
 mysql. If anyone feels that this is going to perform blazingly fast 

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Joshua Harlow
Proxying from yahoo's open source director (since he wasn't initially 
subscribed to this list, afaik he now is) on his behalf.

From Gil Yehuda (Yahoo’s Open Source director).

I would urge you to avoid creating a dependency between Openstack code and any 
AGPL project, including MongoDB. MongoDB is licensed in a very strange manner 
that is prone to creating unintended licensing mistakes (a lawyer’s dream). 
Indeed, MongoDB itself presents Apache licensed drivers – and thus technically, 
users of those drivers are not impacted by the AGPL terms. MongoDB Inc. is in 
the unique position to license their drivers this way (although they appear to 
violate the AGPL license) since MongoDB is not going to sue themselves for 
their own violation. However, others in the community create MongoDB drivers 
are licensing those drivers under the Apache and MIT licenses – which does pose 
a problem.

Why? The AGPL considers 'Corresponding Source' to be defined as “the source 
code for shared libraries and dynamically linked subprograms that the work is 
specifically designed to require, such as by intimate data communication or 
control flow between those subprograms and other parts of the work. Database 
drivers *are* work that is designed to require by intimate data communication 
or control flow between those subprograms and other parts of the work. So 
anyone using MongoDB with any other driver now invites an unknown --  that one 
court case, one judge, can read the license under its plain meaning and decide 
that AGPL terms apply as stated. We have no way to know how far they apply 
since this license has not been tested in court yet.
Despite all the FAQs MongoDB puts on their site indicating they don't really 
mean to assert the license terms, normally when you provide a license, you mean 
those terms. If they did not mean those terms, they would not use this license. 
I hope they intended to do something good (to get contributions back without 
impacting applications using their database) but, even good intentions have 
unintended consequences. Companies with deep enough pockets to be lawsuit 
targets, and companies who want to be good open source citizens face the 
problem that using MongoDB anywhere invites the future risk of legal 
catastrophe. A simple development change in an open source project can change 
the economics drastically. This is simply unsafe and unwise.

OpenStack's ecosystem is fueled by the interests of many commercial ventures 
who wish to cooperate in the open source manner, but then leverage commercial 
opportunities they hope to create. I suggest that using MongoDB anywhere in 
this project will result in a loss of opportunity -- real or perceived, that 
would outweigh the benefits MongoDB itself provides.

tl;dr version: If you want to use MongoDB in your company, that's your call. 
Please don't turn anyone who uses OpenStack components into a unsuspecting 
MongoDB users. Instead, decouple the database from the project. It's not worth 
the legal risk, nor the impact on the Apache-ness of this project.


Gil Yehuda
Sr. Director Of Open Source, Open Standards, Yahoo! Inc.
gyeh...@yahoo-inc.com

From: Fox, Kevin M kevin@pnnl.govmailto:kevin@pnnl.gov
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Wednesday, March 19, 2014 at 2:38 PM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Cc: 
legal-disc...@lists.openstack.orgmailto:legal-disc...@lists.openstack.org 
legal-disc...@lists.openstack.orgmailto:legal-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs 
a provisioning API?

Its my understanding that the only case the A in the AGPL would kick in is if 
the cloud provider made a change to MongoDB and exposed the MongoDB instance to 
users. Then the users would have to be able to download the changed code. Since 
Marconi's in front, the user is Marconi, and wouldn't ever want to download the 
source. As far as I can tell, in this use case, the AGPL'ed MongoDB is not 
really any different then the GPL'ed MySQL in footprint here. MySQL is 
acceptable, so why isn't MongoDB?

It would be good to get legal's official take on this. It would be a shame to 
make major architectural decisions based on license assumptions that turn out 
not to be true. I'm cc-ing them.

Thanks,
Kevin

From: Chris Friesen 
[chris.frie...@windriver.commailto:chris.frie...@windriver.com]
Sent: Wednesday, March 19, 2014 2:24 PM
To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs 
a provisioning API?

On 03/19/2014 02:24 PM, Fox, Kevin M wrote:
Can someone please give more detail into why MongoDB being AGPL is a
problem? 

[openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-18 Thread Robert Collins
So this came up briefly at the tripleo sprint, and since I can't seem
to find a /why/ document
(https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers
and https://wiki.openstack.org/wiki/Marconi#Design don't supply this)
we decided at the TC meeting that I should raise it here.

Firstly, let me check my facts :) - Marconi is backed by a modular
'storage' layer which places some conceptual design constraints on the
storage backends that are possible (e.g. I rather expect a 0mq
implementation to be very tricky, at best (vs the RPC style front end
https://wiki.openstack.org/wiki/Marconi/specs/zmq/api/v1 )), and has a
hybrid control/data plane API implementation where one can call into
it to make queues etc, and to consume them.

The API for the queues is very odd from a queueing perspective -
https://wiki.openstack.org/wiki/Marconi/specs/api/v1#Get_a_Specific_Message
- you don't subscribe to the queue, you enumerate and ask for a single
message.

And the implementations in tree are mongodb (which is at best
contentious, due to the AGPL and many folks reasonable concerns about
it), and mysq.

My desires around Marconi are:
 - to make sure the queue we have is suitable for use by OpenStack
itself: we have a very strong culture around consolidating technology
choices, and it would be extremely odd to have Marconi be something
that isn't suitable to replace rabbitmq etc as the queue abstraction
in the fullness of time.
 - to make sure that deployers with scale / performance needs can have
that met by Marconi
 - to make my life easy as a deployer ;)

So my questions are:
 - why isn't the API a queue friendly API (e.g. like
https://github.com/twitter/kestrel - kestrel which uses the memcache
API, puts put into the queue, gets get from the queue). The current
API looks like pretty much the worst case scenario there - CRUD rather
than submit/retrieve with blocking requests (e.g. longpoll vs poll).
 - wouldn't it be better to expose other existing implementations of
HTTP message queues like nova does with hypervisors, rather than
creating our own one? E.g. HTTPSQS, RestMQ, Kestrel, queues.io.
   - or even do what Trove does and expose the actual implementation directly?
 - whats the plan to fix the API?
 - is there a plan / desire to back onto actual queue services (e.g.
AMQP, $anyof the http ones above, etc)
 - what is the current performance -  how many usecs does it take to
put a message, and get one back, in real world use? How many
concurrent clients can a single Marconi API server with one backing
server deliver today?

As background, 'implement a message queue in a SQL DB' is such a
horrid antipattern its been a standing joke in many organisations I've
been in - and yet we're preparing to graduate *exactly that* which is
frankly perplexing.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-18 Thread Kurt Griffiths
I think we can agree that a data-plane API only makes sense if it is
useful to a large number of web and mobile developers deploying their apps
on OpenStack. Also, it only makes sense if it is cost-effective and
scalable for operators who wish to deploy such a service.

Marconi was born of practical experience and direct interaction with
prospective users. When Marconi was kicked off a few summits ago, the
community was looking for a multi-tenant messaging service to round out
the OpenStack portfolio. Users were asking operators for something easier
to work with and more web-friendly than established options such as AMQP.

To that end, we started drafting an HTTP-based API specification that
would afford several different messaging patterns, in order to support the
use cases that users were bringing to the table. We did this completely in
the open, and received lots of input from prospective users familiar with
a variety of message broker solutions, including more “cloudy” ones like
SQS and Iron.io.

The resulting design was a hybrid that supported what you might call
“claim-based” semantics ala SQS and feed-based semantics ala RSS.
Application developers liked the idea of being able to use one or the
other, or combine them to come up with new patterns according to their
needs. For example:

1. A video app can use Marconi to feed a worker pool of transcoders. When
a video is uploaded, it is stored in Swift and a job message is posted to
Marconi. Then, a worker claims the job and begins work on it. If the
worker crashes, the claim expires and the message becomes available to be
claimed by a different worker. Once the worker is finished with the job,
it deletes the message so that another worker will not process it, and
claims another message. Note that workers never “list” messages in this
use case; those endpoints in the API are simply ignored.

2. A backup service can use Marconi to communicate with hundreds of
thousands of backup agents running on customers' machines. Since Marconi
queues are extremely light-weight, the service can create a different
queue for each agent, and additional queues to broadcast messages to all
the agents associated with a single customer. In this last scenario, the
service would post a message to a single queue and the agents would simply
list the messages on that queue, and everyone would get the same message.
This messaging pattern is emergent, and requires no special routing setup
in advance from one queue to another.

3. A metering service for an Internet application can use Marconi to
aggregate usage data from a number of web heads. Each web head collects
several minutes of data, then posts it to Marconi. A worker periodically
claims the messages off the queue, performs the final aggregation and
processing, and stores the results in a DB. So far, this messaging pattern
is very much like example #1, above. However, since Marconi’s API also
affords the observer pattern via listing semantics, the metering service
could run an auditor that logs the messages as they go through the queue
in order to provide extremely valuable data for diagnosing problems in the
aggregated data.

Users are excited about what Marconi offers today, and we are continuing
to evolve the API based on their feedback.

Of course, app developers aren’t the only audience Marconi needs to serve.
Operators want something that is cost-effective, scales, and is
customizable for the unique needs of their target market.

While Marconi has plenty of room to improve (who doesn’t?), here is where
the project currently stands in these areas:

1. Customizable. Marconi transport and storage drivers can be swapped out,
and messages can be manipulated in-flight with custom filter drivers.
Currently we have MongoDB and SQLAlchemy drivers, and are exploring Redis
and AMQP brokers. Now, the v1.0 API does impose some constraints on the
backend in order to support the use cases mentioned earlier. For example,
an AMQP backend would only be able to support a subset of the current API.
Operators occasionally ask about AMQP broker support, in particular, and
we are exploring ways to evolve the API in order to support that.

2. Scalable. Operators can use Marconi’s HTTP transport to leverage their
existing infrastructure and expertise in scaling out web heads. When it
comes to the backend, for small deployments with minimal throughput needs,
we are providing a SQLAlchemy driver as a non-AGPL alternative to MongoDB.
For large-scale production deployments, we currently provide the MongoDB
driver and will likely add Redis as another option (there is already a POC
driver). And, of course, operators can provide drivers for NewSQL
databases, such as VelocityDB, that are very fast and scale extremely
well. In Marconi, every queue can be associated with a different backend
cluster. This allows operators to scale both up and out, according to what
is most cost-effective for them. Marconi's app-level sharding is currently
done using a