> On 6 Jul 2022, at 1:35 PM, Florents Tselai <florents.tse...@gmail.com> wrote: > > > >> On 6 Jul 2022, at 1:11 PM, Francisco Olarte <fola...@peoplecall.com> wrote: >> >> On Wed, 6 Jul 2022 at 11:55, Florents Tselai <florents.tse...@gmail.com> >> wrote: >>> Also, fwiw looking at top the CPU% and MEM% activity, looks like it does >>> data crunching work. >> ... >>>>> On 06.07.22 10:42, Florents Tselai wrote: >>>>>> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and >>>>>> a table holding (key text, text text,) of around 50M rows. >>>>>> These are text fields extracted from 4-5 page pdfs each. >> >> How big is yout table? from your query it seems you expect more than >> 1M-1 ( left... ), but if you have very big text columns it may be >> spending a lot of time fully decompressing / reading them ( I'm not >> sure if it left(..) on toasted values is optimized to stop after >> reading enough ). Also, it has to rewrite a lot of data to insert the >> columns, it it takes some ms per row which I would not discard 50M >> rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at >> 2 ms ( which may be right for reading a big row, calculating the >> vector and writing an even bigger row ) it would take more than a day >> to finish, which I would not discard given you are asking for a heavy >> thing. > > 50M+ rows and iirc pg_relation_size was north of 80GB or so. > >> >> If you have stopped it I would try doing a 1000 row sample in a copied > > Haven’t stopped it as I’m not convinced there’s an alternative to just > waiting > For it to complete :/ > >> table to get an speed idea. Otherwise, with this query, I would >> normally monitor disk usage of disk files as an indication of
Actually, I monitored my disk usage and it was **definitely** working as It had already eaten up an additional 30% of my disk capacity. Thus, I’ll have to fall back on my initial solution and use GIN indexes To get ts_vectors on the fly. >> progress, I'm not sure there is another thing you could look at >> without disturbing it. >> >> FWIW, I would consider high mem usage normal in these kind of query, >> hi cpu would depend on what you call it, but it wouldn't surprise me >> if it has at least one cpu running at full detoasting and doing >> vectors, I do not know if alter table can go paralell.. >> > > You’re probably right, a lot of the CPU usage could be detoasting. > >> Francisco Olarte. Thanks everyone for your comments. You can consider this solved.