On 1/15/2023 2:39 PM, Peter J. Holzer wrote:
On 2023-01-15 10:38:22 -0500, Thomas Passin wrote:
On 1/15/2023 6:14 AM, Peter J. Holzer wrote:
On 2023-01-14 23:26:27 -0500, Dino wrote:
Anyway, my Flask service initializes by loading a big "table" of 100k rows
and 40 columns or so (memory footprint: order of 300 Mb)

300 MB is large enough that you should at least consider putting that
into a database (Sqlite is probably simplest. Personally I would go with
PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
outlier).

The main reason for putting it into a database is the ability to use
indexes, so you don't have to scan all 100 k rows for each query.

I have an (inherited) server program that uses about 30 MB of data in a
MySQL database. It services queries received over the network. It too had
performance problems, to which adding indexes and smarter joins helped but
not enough.

I changed the program so that at startup it imports much of the data into
Python dictionaries that are structured to support the kinds of queries that
need the help.  Response time to queries dropped dramatically.

This is to be expected: Firstly, because you don't have disk accesses
any more, secondly because you don't have network latency any more and
thirdly, because you structured the data to fit the queries.

Of course: that's exactly why I made those changes. The tradeoff is using more memory for your program, sometimes a lot more.

The thing to keep in mind is that the relational database model was
invented to have a uniform and simple way to model all data, and that
RDBMSs are designed to handle all workloads (from a single tiny table to
thousands of tables with hundreds of terabytes) reasonably well. For any
given application you can always find a more efficient solution than
using an RDBMS. Sometimes it's simple (just load all the data into a
dict and serve from there), sometimes it's a major research project.
The nice thing about RDBMSs isn't that they are the optimal solution for
anything but that they are a "good enough" solution for a large class of
problems.

Often the solution is careful (and not very normalized) table design to support your queries. In the case I'm discussing, it was easier for me to make Python do the work, and I could afford the memory load. In other cases, you have to put in the work on the database side. Often for slow queries, disk latency and I/O are not the limiting factors, but you have to put in the work and do the testing to make sure.

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to