In addition to what @mhrivnak said. For me, the big motivation is
transaction support. A single Pulp sync or publish can issue thousands
of writes to the database. A failure in the middle leaves the database
"half-updated" and Pulp has no feasible way to roll back these changes.
This creates a major problem for data correctness in the face of
failures. Transaction support at the database layer will give Pulp an
opportunity to recover from these failures and preserve correctness.
From a high level, Pulp's transition to PostgreSQL is about correctness
not performance. We don't want to give up performance, but performance
is a secondary concern behind correctness. Pulp 2.y hasn't done much to
have the write and read performance really benefit from "the mongodb
way"[0] so in switching I expect to see "similar" performance. We would
need to benchmark and quantify the performance of 2.y versus 3.y to
really know. We are not planning to do that so we may never know, but
here is a writeup of an outline to track performance [1].
[0]: loosening write/read consistency and deployments that use sharding
[1]: https://etherpad.net/p/pulp_performance_test_plan
-Brian
On 09/13/2016 09:11 AM, Michael Hrivnak wrote:
We have a thread here about a lot of the 3.0 stack choices, although it
seems to skip past the assumption that we're moving to postgres:
https://www.redhat.com/archives/pulp-list/2016-May/msg00042.html
I can't quickly find another summary of why, so I'll describe the
highlights here:
- Pulp has highly relational data. The core use case is managing the
relationships between content and repositories. Using a relational DB
makes that a lot easier.
- A schemaless DB makes it easy to do writes, but you have to be very
careful when doing reads that the your software is prepared for whatever
data structure comes out. If you want to enforce a schema, it has to be
done in software. It's doable, but requires great care.
- Transactions!
- The HA story with mongodb is more complex than most people realize
(certainly more complex than we expected). To get real HA with data
safety, you have to do a lot of the work in your own software.
MongoDB is great at what it does and a good fit for some use cases, but
we learned that it's not the best fit for Pulp.
Michael
On Tue, Sep 13, 2016 at 3:21 AM, Filip Nguyen <[email protected]
<mailto:[email protected]>> wrote:
I heard that Pulp is switching from Mongo to Postgre. Just out of
curiosity, I would like to learn more about the reasons why you
decided to go this direction. Is there any document/email thread
about it?
_______________________________________________
Pulp-dev mailing list
[email protected] <mailto:[email protected]>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>
_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev
_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev