On 1/15/2023 6:14 AM, Peter J. Holzer wrote:
On 2023-01-14 23:26:27 -0500, Dino wrote:
Hello, I have built a PoC service in Python Flask for my work, and - now
that the point is made - I need to make it a little more performant (to be
honest, chances are that someone else will pick up from where I left off,
and implement the same service from scratch in a different language (GoLang?
.Net? Java?) but I am digressing).

Anyway, my Flask service initializes by loading a big "table" of 100k rows
and 40 columns or so (memory footprint: order of 300 Mb)

300 MB is large enough that you should at least consider putting that
into a database (Sqlite is probably simplest. Personally I would go with
PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
outlier).

The main reason for putting it into a database is the ability to use
indexes, so you don't have to scan all 100 k rows for each query.

I have an (inherited) server program that uses about 30 MB of data in a MySQL database. It services queries received over the network. It too had performance problems, to which adding indexes and smarter joins helped but not enough.

I changed the program so that at startup it imports much of the data into Python dictionaries that are structured to support the kinds of queries that need the help. Response time to queries dropped dramatically. Some kinds of queries needed more help, and I collected auxiliary collections of (usually highly pre-processed) data into ordinary files, and those too get imported into dictionaries during startup.

Note that these dictionaries do not always match the table structures. Some of them change the structure to make queries easier to process. You may be able to do that with Python code, or by creating SQL views in the database and importing directly from the views (database views take almost no database memory).

The drawback is that all that data is now stored in memory while the program is running. In my case, many hundreds of MB. But if it would be too much memory for you - you would need to prototype it to know - you should let the database engine do the work. It is more highly optimized and efficient for searches than your code could ever be. But there will be a price to pay. The price is in denormalizing the database table design. This means to include redundant data, organized to match the kinds of queries that will be made. No more 3rd normal form! Your sql queries will need to be designed to take advantage of this new structure. This will be a cost because the database will be larger, but also because the redundancies will make it much harder to update the data correctly. Fortunately you do not need to do that during normal operation (my program's data was also static like yours).

PostgreSQL would probably be a better choice than Sqlite, since it supports features such as foreign keys, and has a function definition capability.



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to