Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-28 Thread Tom Lane
Andres Freund writes: > On 2019-07-28 10:07:27 -0400, Tom Lane wrote: >> In the long run, might we ever switch to 64-bit OIDs? I dunno. > Depends on the the table, I'd say. Having toast tables have 64bit ids, > and not advance the oid counter, would be quite the advantage over the > current

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-28 Thread Andres Freund
Hi, On 2019-07-28 10:07:27 -0400, Tom Lane wrote: > In the long run, might we ever switch to 64-bit OIDs? I dunno. > Now that we kicked them out of user tables, it might be feasible, > but by the same token there's not much pressure to do it. Depends on the the table, I'd say. Having toast

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-28 Thread Tom Lane
Michael Paquier writes: > Just wondering something... List cells include one pointer, one > signed integer and an OID. The two last entries are basically 4-byte > each, hence could we reduce a bit the bloat by unifying both of them? We couldn't really simplify the API that way; for example,

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-28 Thread Michael Paquier
On Mon, Jul 22, 2019 at 01:05:32PM -0400, Alvaro Herrera wrote: > Fair enough. List has gotten quite sophisticated now, so I understand > the concern. Just wondering something... List cells include one pointer, one signed integer and an OID. The two last entries are basically 4-byte each,

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-27 Thread Julien Rouhaud
On Sat, Jul 27, 2019 at 3:27 PM Michael Paquier wrote: > > On Sat, Jul 27, 2019 at 11:44:47AM +0200, Julien Rouhaud wrote: > > That's probably still more intuitive than having the count coming from > > either main() or from get_parallel_object_list() depending on the > > process type, so I'm fine

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-27 Thread Michael Paquier
On Sat, Jul 27, 2019 at 11:44:47AM +0200, Julien Rouhaud wrote: > That's probably still more intuitive than having the count coming from > either main() or from get_parallel_object_list() depending on the > process type, so I'm fine with that alternative. Maybe we could bite > the bullet and add

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-27 Thread Julien Rouhaud
On Fri, Jul 26, 2019 at 9:41 AM Michael Paquier wrote: > > On Fri, Jul 26, 2019 at 09:36:32AM +0200, Julien Rouhaud wrote: > > I see that you iterate over the SimpleStringList after it's generated. > > Why not computing that while building it in get_parallel_object_list > > (and keep the provided

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-26 Thread Michael Paquier
On Fri, Jul 26, 2019 at 10:53:03AM +0300, Sergei Kornilov wrote: > Explicit is better than implicit, so I am +1 to commit both patches. Hence my count is incorrect: - Forbid --jobs and --index: Michael P, Sergei K. - Enforce --jobs=1 with --index: Julien R. - Have no restrictions: 0. -- Michael

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-26 Thread Sergei Kornilov
Hi > So here we go: > - 0001 is your original thing, with --jobs enforced to 1 for the index > part. > - 0002 is my addition to forbid --index with --jobs. > > I am fine to be outvoted regarding 0002, and it is the case based on > the state of this thread with 2:1. We could always revisit this >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-26 Thread Michael Paquier
On Fri, Jul 26, 2019 at 09:36:32AM +0200, Julien Rouhaud wrote: > I see that you iterate over the SimpleStringList after it's generated. > Why not computing that while building it in get_parallel_object_list > (and keep the provided table list count) instead? Yeah. I was hesitating to do that,

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-26 Thread Julien Rouhaud
On Fri, Jul 26, 2019 at 5:27 AM Michael Paquier wrote: > > On Thu, Jul 25, 2019 at 01:00:34PM +0200, Julien Rouhaud wrote: > > The problem is that a user doing something like: > > > > reindexdb -j48 -i some_index -S s1 -S s2 -S s3 > > > > will probably be disappointed to learn that he has to

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Michael Paquier
On Thu, Jul 25, 2019 at 01:00:34PM +0200, Julien Rouhaud wrote: > The problem is that a user doing something like: > > reindexdb -j48 -i some_index -S s1 -S s2 -S s3 > > will probably be disappointed to learn that he has to run a specific > command for the index(es) that should be reindexed.

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Julien Rouhaud
On Thu, Jul 25, 2019 at 12:03 PM Michael Paquier wrote: > > On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote: > > Thank you, v9 code and behavior looks good for me. Builds cleanly, > > works with older releases. I'll mark Ready to Commiter. > > The restriction with --jobs and

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Sergei Kornilov
Hi >>>  I don't have a strong opinion here. If we change for >=, it'd be >>>  better to also adapt vacuumdb for consistency. I didn't change it for >>>  now, to stay consistent with vacuumdb. >> >>  Yep, no strong opinion from me too. > > My opinion tends towards consistency. Consistency sounds

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Michael Paquier
On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote: > Thank you, v9 code and behavior looks good for me. Builds cleanly, > works with older releases. I'll mark Ready to Commiter. The restriction with --jobs and --system is not documented, and that's good to have from the start. This

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Sergei Kornilov
Hi Thank you, v9 code and behavior looks good for me. Builds cleanly, works with older releases. I'll mark Ready to Commiter. > I don't have a strong opinion here. If we change for >=, it'd be > better to also adapt vacuumdb for consistency. I didn't change it for > now, to stay consistent with

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Julien Rouhaud
Thanks for the review! On Thu, Jul 25, 2019 at 10:17 AM Sergei Kornilov wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, failed > Spec compliant: not tested >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Sergei Kornilov
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: not tested Documentation:tested, passed Hi I did some review and have few notes about behavior.

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-25 Thread Julien Rouhaud
Sorry for the late answer, On Tue, Jul 23, 2019 at 9:38 AM Michael Paquier wrote: > > On Mon, Jul 22, 2019 at 02:40:19PM +0200, Julien Rouhaud wrote: > > Right, so I kept the long option. Also this comment was outdated, as > > the --jobs is now just ignored with a list of indexes, so I fixed

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-23 Thread Michael Paquier
On Mon, Jul 22, 2019 at 02:40:19PM +0200, Julien Rouhaud wrote: > Right, so I kept the long option. Also this comment was outdated, as > the --jobs is now just ignored with a list of indexes, so I fixed that > too. + if (!parallel) + { + if (user_list == NULL) + { + /*

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Michael Paquier
On Mon, Jul 22, 2019 at 11:18:06AM -0400, Alvaro Herrera wrote: > BTW "progname" is a global variable in logging.c, and it's initialized > by pg_logging_init(), so there's no point in having a local variable in > main() that's called the same and initialized the same way. You could > just remove

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Alvaro Herrera
On 2019-Jul-22, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jul-22, Julien Rouhaud wrote: > >> On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera > >> wrote: > >>> Can we use List for this instead? > > >> Isn't that for backend code only? > > > Well, we already have palloc() on the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Tom Lane
Alvaro Herrera writes: > On 2019-Jul-22, Julien Rouhaud wrote: >> On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera >> wrote: >>> Can we use List for this instead? >> Isn't that for backend code only? > Well, we already have palloc() on the frontend side, and list.c doesn't > have any

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Alvaro Herrera
On 2019-Jul-22, Julien Rouhaud wrote: > On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera > wrote: > > > > > I considered this, but it would require to adapt all code that declare > > > SimpleStringList stack variable (vacuumdb.c, clusterdb.c, > > > createuser.c, pg_dumpall.c and pg_dump.c), so it

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Julien Rouhaud
On Mon, Jul 22, 2019 at 5:11 PM Alvaro Herrera wrote: > > On 2019-Jul-22, Julien Rouhaud wrote: > > > On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier wrote: > > > >simple_string_list_append(, optarg); > > > + tbl_count++; > > >break; > > > The number of items in a simple

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Alvaro Herrera
On 2019-Jul-19, Julien Rouhaud wrote: > > For the second patch, could you send a rebase with a fix for the > > connection slot when processing the reindex commands? > > Attached, I also hopefully removed all the now unneeded progname usage. BTW "progname" is a global variable in logging.c, and

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Alvaro Herrera
On 2019-Jul-22, Julien Rouhaud wrote: > On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier wrote: > >simple_string_list_append(, optarg); > > + tbl_count++; > >break; > > The number of items in a simple list is not counted, and vacuumdb does > > the same thing to count

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-22 Thread Julien Rouhaud
On Mon, Jul 22, 2019 at 6:11 AM Michael Paquier wrote: > > On Fri, Jul 19, 2019 at 08:29:27AM +0200, Julien Rouhaud wrote: > > On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier wrote: > >> For the second patch, could you send a rebase with a fix for the > >> connection slot when processing the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-21 Thread Michael Paquier
On Fri, Jul 19, 2019 at 08:29:27AM +0200, Julien Rouhaud wrote: > On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier wrote: >> For the second patch, could you send a rebase with a fix for the >> connection slot when processing the reindex commands? > > Attached, I also hopefully removed all the now

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-19 Thread Julien Rouhaud
On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier wrote: > > On Thu, Jul 18, 2019 at 09:45:14AM +0900, Michael Paquier wrote: > > On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: > >> Is it ok to call pg_free(slots) and let caller have a pointer pointing > > to freed memory? > > > >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-18 Thread Michael Paquier
On Thu, Jul 18, 2019 at 09:45:14AM +0900, Michael Paquier wrote: > On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: >> Is it ok to call pg_free(slots) and let caller have a pointer pointing > to freed memory? > > The interface has a Setup call which initializes the whole thing,

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-17 Thread Michael Paquier
On Wed, Jul 17, 2019 at 07:46:10PM +0200, Julien Rouhaud wrote: > On Wed, Jul 17, 2019 at 9:59 AM Michael Paquier wrote: >> On top of that quick lookup, I have done an in-depth review on 0001 to >> bring it to a committable state, fixing a couple of typos, incorrect >> comments (description of

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-17 Thread Julien Rouhaud
On Wed, Jul 17, 2019 at 9:59 AM Michael Paquier wrote: > > On Tue, Jul 16, 2019 at 02:03:16PM +0200, Julien Rouhaud wrote: > > After more thinking about schema and multiple jobs, I think that > > erroring out is quite user unfriendly, as it's entirely ok to ask > > for > > multiple indexes and

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-17 Thread Michael Paquier
On Tue, Jul 16, 2019 at 02:03:16PM +0200, Julien Rouhaud wrote: > After more thinking about schema and multiple jobs, I think that > erroring out is quite user unfriendly, as it's entirely ok to ask > for > multiple indexes and multiple object that do support parallelism in > a > single call. So

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-16 Thread Julien Rouhaud
On Fri, Jul 12, 2019 at 11:47 AM Julien Rouhaud wrote: > > I didn't change the behavior wrt. possible deadlock if user specify > catalog objects using --index or --table and ask for multiple > connection, as I'm afraid that it'll add too much code for a little > benefit. Please shout if you

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-12 Thread Julien Rouhaud
On Fri, Jul 12, 2019 at 7:57 AM Michael Paquier wrote: > > On Fri, Jul 12, 2019 at 07:49:13AM +0200, Julien Rouhaud wrote: > > It shouldn't be a problem, I reused the same infrastructure as for > > vacuumdb. so run_reindex_command has a new "async" parameter, so when > > there's no parallelism

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Michael Paquier
On Fri, Jul 12, 2019 at 07:49:13AM +0200, Julien Rouhaud wrote: > It shouldn't be a problem, I reused the same infrastructure as for > vacuumdb. so run_reindex_command has a new "async" parameter, so when > there's no parallelism it's using executeMaintenanceCommand (instead > of PQsendQuery)

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Julien Rouhaud
On Fri, Jul 12, 2019 at 3:20 AM Michael Paquier wrote: > > On Thu, Jul 11, 2019 at 06:22:25PM +0200, Julien Rouhaud wrote: > > I think t hat it makes the code quite cleaner to have the selection > > outside ConsumeIdleSlot. > > Actually, you have an issue with ConsumeIdleSlot() if there is only >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Michael Paquier
On Thu, Jul 11, 2019 at 06:22:25PM +0200, Julien Rouhaud wrote: > I think t hat it makes the code quite cleaner to have the selection > outside ConsumeIdleSlot. Actually, you have an issue with ConsumeIdleSlot() if there is only one parallel slot, no? In this case the current patch returns

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Julien Rouhaud
On Thu, Jul 11, 2019 at 3:34 PM Michael Paquier wrote: > > On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote: > > On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier wrote: > >> Get* would be my choice, because we look at the set of parallel slots, > >> and get an idle one. > > > >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Michael Paquier
On Thu, Jul 11, 2019 at 11:48:20AM +0200, Julien Rouhaud wrote: > On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier wrote: >> Get* would be my choice, because we look at the set of parallel slots, >> and get an idle one. > > That's what the former GetIdleSlot (that I renamed to get_idle_slot as >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-11 Thread Julien Rouhaud
On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier wrote: > > On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote: > > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera > > wrote: > >> Looking good! > > > > Thanks! > > Confirmed. The last set is much easier to go through. > > >> I'm not

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-10 Thread Michael Paquier
On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote: > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera > wrote: >> Looking good! > > Thanks! Confirmed. The last set is much easier to go through. >> I'm not sure about the "Consume" word in ConsumeIdleSlot; >> maybe "Reserve"?

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-10 Thread Julien Rouhaud
Hi Alvaro, Thanks a lot for the review On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera wrote: > > On 2019-Jul-09, Julien Rouhaud wrote: > > > I finished to do a better refactoring, and ended up with this API in > > scripts_parallel: > > Looking good! Thanks! > I'm not sure about the "Consume"

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-10 Thread Alvaro Herrera
On 2019-Jul-09, Julien Rouhaud wrote: > I finished to do a better refactoring, and ended up with this API in > scripts_parallel: Looking good! I'm not sure about the "Consume" word in ConsumeIdleSlot; maybe "Reserve"? "Obtain"? "Get"? Code commentary: I think the comment that sits atop the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Michael Paquier
On Tue, Jul 09, 2019 at 01:09:38PM +0200, Peter Eisentraut wrote: > You can already do that: Run a query through psql to get a list of > affected tables or indexes and feed those to reindexdb using -i or -t > options. Sure, but that's limited if one can only afford a limited amount of downtime

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Julien Rouhaud
On Tue, Jul 9, 2019 at 9:52 AM Julien Rouhaud wrote: > > On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier wrote: > > > > I have done a lookup of this patch set with a focus on the refactoring > > part, and the split is a bit confusing. > [...] I finished to do a better refactoring, and ended up

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Peter Eisentraut
On 2019-07-08 21:08, Julien Rouhaud wrote: > Don't get me wrong, I do agree that implementing filtering in the > backend is a better design. What's bothering me is that I also agree > that there will be more glibc breakage, and if that happens within a > few years, a lot of people will still be

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Julien Rouhaud
Thanks for the review. On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier wrote: > > On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote: > > On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud wrote: > >> I'll resubmit the parallel patch using per-table only > >> approach > > > > Attached. >

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Bruce Momjian
On Mon, Jul 1, 2019 at 09:51:12AM -0400, Alvaro Herrera wrote: > Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to > gain more popularity. > > Please don't reuse a file name as generic as "parallel.c" -- it's > annoying when navigating source. Maybe conn_parallel.c

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Julien Rouhaud
On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud wrote: > > I'll resubmit the parallel patch using per-table only > approach Attached. 0001-Export-vacuumdb-s-parallel-infrastructure.patch Description: Binary data 0002-Add-parallel-processing-to-reindexdb.patch Description: Binary data

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Julien Rouhaud
On Mon, Jul 8, 2019 at 9:57 AM Michael Paquier wrote: > > On Fri, Jul 05, 2019 at 07:25:41PM +0200, Julien Rouhaud wrote: > > On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut > > wrote: > >> Isn't that also the case for your proposal? We are not going to release > >> a new reindexdb before a

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Michael Paquier
On Fri, Jul 05, 2019 at 07:25:41PM +0200, Julien Rouhaud wrote: > On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut > wrote: >> Isn't that also the case for your proposal? We are not going to release >> a new reindexdb before a new REINDEX. > > Sure, but my point was that once the new reindexdb

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-05 Thread Julien Rouhaud
On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut wrote: > > On 2019-07-02 10:30, Julien Rouhaud wrote: > > That's a great idea, and would make the parallelism in reindexdb much > > simpler. There's however a downside, as users won't have a way to > > benefit from index filtering until they

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-05 Thread Peter Eisentraut
On 2019-07-02 10:45, Julien Rouhaud wrote: > It just seemed wrong to me to allow a partial processing for something > that's aimed to prevent corruption. I'd think that if users are > knowledgeable enough to only reindex a subset of indexes/tables in > such cases, they can also discard indexes

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-05 Thread Peter Eisentraut
On 2019-07-02 10:30, Julien Rouhaud wrote: > That's a great idea, and would make the parallelism in reindexdb much > simpler. There's however a downside, as users won't have a way to > benefit from index filtering until they upgrade to this version. OTOH > glibc 2.28 is already there, and a

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Tomas Vondra
On Tue, Jul 02, 2019 at 10:45:44AM +0200, Julien Rouhaud wrote: On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra wrote: I wonder why this is necessary: pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); What's the reasoning behind that? It seems like a valid use

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra wrote: > > I wonder why this is necessary: > > pg_log_error("cannot reindex glibc dependent objects and a subset of > objects"); > > What's the reasoning behind that? It seems like a valid use case to me - > imagine you have a bug database, but only a

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Tue, Jul 2, 2019 at 9:19 AM Peter Eisentraut wrote: > > On 2019-07-01 22:46, Alvaro Herrera wrote: > > On 2019-Jul-02, Thomas Munro wrote: > >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > >>> Even if that's just me being delusional, I'd still prefer Alvaro's > >>> approach to have

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Tomas Vondra
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: Hi, With the glibc 2.28 coming, all users will have to reindex almost every indexes after a glibc upgrade to guarantee the lack of corruption. Unfortunately, reindexdb is not ideal for that as it's processing everything using a

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 11:21 PM Peter Geoghegan wrote: > > On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud wrote: > > I have a vague recollection that ICU was providing some backward > > compatibility so that even if you upgrade your lib you can still get > > the sort order that was active when

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Peter Eisentraut
On 2019-07-01 22:46, Alvaro Herrera wrote: > On 2019-Jul-02, Thomas Munro wrote: >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: >>> Even if that's just me being delusional, I'd still prefer Alvaro's >>> approach to have distinct switches for each collation system. >> >> Makes sense. But

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Michael Paquier
On Mon, Jul 01, 2019 at 06:14:20PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera > wrote: > > > > Please don't reuse a file name as generic as "parallel.c" -- it's > > annoying when navigating source. Maybe conn_parallel.c multiconn.c > > connscripts.c

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Michael Paquier
On Mon, Jul 01, 2019 at 06:28:13PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 4:10 PM Tom Lane wrote: >> Couldn't we make this enormously simpler and less bug-prone by just >> dictating that --jobs applies only to reindex-table operations? I had the same argument about the first patch

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Alvaro Herrera
On 2019-Jul-02, Thomas Munro wrote: > On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > > Even if that's just me being delusional, I'd still prefer Alvaro's > > approach to have distinct switches for each collation system. > > Hi Julien, > > Makes sense. But why use the name "glibc" in

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Thomas Munro
On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > Even if that's just me being delusional, I'd still prefer Alvaro's > approach to have distinct switches for each collation system. Hi Julien, Makes sense. But why use the name "glibc" in the code and user interface? The name of the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Peter Geoghegan
On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud wrote: > I have a vague recollection that ICU was providing some backward > compatibility so that even if you upgrade your lib you can still get > the sort order that was active when you built your indexes, though > maybe for a limited number of

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 10:13 PM Daniel Verite wrote: > > > > I think you'd be better off to define and document this as "reindex > > > only collation-sensitive indexes", without any particular reference > > > to a reason why somebody might want to do that. > > > > We should still document that

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Alvaro Herrera
On 2019-Jul-01, Daniel Verite wrote: > But why exclude them? > As a data point, in the last 5 years, the en_US collation in ICU > had 7 different versions (across 11 major versions of ICU): So we need a switch --include-rule=icu-collations? (I mentioned "--exclude-rule=glibc" elsewhere in the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Daniel Verite
Julien Rouhaud wrote: > > I think you'd be better off to define and document this as "reindex > > only collation-sensitive indexes", without any particular reference > > to a reason why somebody might want to do that. > > We should still document that indexes based on ICU would be

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 4:10 PM Tom Lane wrote: > > Michael Paquier writes: > > - 0003 begins to be the actual fancy thing with the addition of a > > --jobs option into reindexdb. The main issue here which should be > > discussed is that when it comes to reindex of tables, you basically > > are

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera wrote: > > Please don't reuse a file name as generic as "parallel.c" -- it's > annoying when navigating source. Maybe conn_parallel.c multiconn.c > connscripts.c admconnection.c ...? I could use scripts_parallel.[ch] as I've already used it in the

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Tom Lane
Michael Paquier writes: > - 0003 begins to be the actual fancy thing with the addition of a > --jobs option into reindexdb. The main issue here which should be > discussed is that when it comes to reindex of tables, you basically > are not going to have any conflicts between the objects

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Alvaro Herrera
Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to gain more popularity. Please don't reuse a file name as generic as "parallel.c" -- it's annoying when navigating source. Maybe conn_parallel.c multiconn.c connscripts.c admconnection.c ...? If your server crashes or is stopped

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 10:55 AM Michael Paquier wrote: > > On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > > With the glibc 2.28 coming, all users will have to reindex almost > > every indexes after a glibc upgrade to guarantee the lack of > > corruption. Unfortunately,

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-01 Thread Michael Paquier
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > With the glibc 2.28 coming, all users will have to reindex almost > every indexes after a glibc upgrade to guarantee the lack of > corruption. Unfortunately, reindexdb is not ideal for that as it's > processing everything using a

Add parallelism and glibc dependent only options to reindexdb

2019-06-30 Thread Julien Rouhaud
Hi, With the glibc 2.28 coming, all users will have to reindex almost every indexes after a glibc upgrade to guarantee the lack of corruption. Unfortunately, reindexdb is not ideal for that as it's processing everything using a single connexion and isn't able to discard indexes that doesn't