In addition to what @mhrivnak said. For me, the big motivation is transaction support. A single Pulp sync or publish can issue thousands of writes to the database. A failure in the middle leaves the database "half-updated" and Pulp has no feasible way to roll back these changes. This creates a major problem for data correctness in the face of failures. Transaction support at the database layer will give Pulp an opportunity to recover from these failures and preserve correctness.

From a high level, Pulp's transition to PostgreSQL is about correctness not performance. We don't want to give up performance, but performance is a secondary concern behind correctness. Pulp 2.y hasn't done much to have the write and read performance really benefit from "the mongodb way"[0] so in switching I expect to see "similar" performance. We would need to benchmark and quantify the performance of 2.y versus 3.y to really know. We are not planning to do that so we may never know, but here is a writeup of an outline to track performance [1].

[0]: loosening write/read consistency and deployments that use sharding
[1]: https://etherpad.net/p/pulp_performance_test_plan

-Brian

On 09/13/2016 09:11 AM, Michael Hrivnak wrote:
We have a thread here about a lot of the 3.0 stack choices, although it
seems to skip past the assumption that we're moving to postgres:

https://www.redhat.com/archives/pulp-list/2016-May/msg00042.html

I can't quickly find another summary of why, so I'll describe the
highlights here:

- Pulp has highly relational data. The core use case is managing the
relationships between content and repositories. Using a relational DB
makes that a lot easier.
- A schemaless DB makes it easy to do writes, but you have to be very
careful when doing reads that the your software is prepared for whatever
data structure comes out. If you want to enforce a schema, it has to be
done in software. It's doable, but requires great care.
- Transactions!
- The HA story with mongodb is more complex than most people realize
(certainly more complex than we expected). To get real HA with data
safety, you have to do a lot of the work in your own software.

MongoDB is great at what it does and a good fit for some use cases, but
we learned that it's not the best fit for Pulp.

Michael

On Tue, Sep 13, 2016 at 3:21 AM, Filip Nguyen <[email protected]
<mailto:[email protected]>> wrote:

    I heard that Pulp is switching from Mongo to Postgre. Just out of
    curiosity, I would like to learn more about the reasons why you
    decided to go this direction. Is there any document/email thread
    about it?

    _______________________________________________
    Pulp-dev mailing list
    [email protected] <mailto:[email protected]>
    https://www.redhat.com/mailman/listinfo/pulp-dev
    <https://www.redhat.com/mailman/listinfo/pulp-dev>




_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev


_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to