On Wed, Nov 15, 2017 at 7:55 PM, Huong Dangminh <huo-dangm...@ys.jp.nec.com> wrote: > Hi, > >> We are getting the bellow error while trying use Logical Replication with >> user defined data types in a C program (when call elog function). >> >> ERROR: XX000: cache lookup failed for type XXXXX >> > > Sorry for continuously disturbing in this topic, but am I missing something > here?
No, but I'd suggest to provide a procedure for reproducing if possible, which will be helpful for investigation. > I mean that in case of type's OID in PUBLICATION host does not exists in > SUBSCRIPTION host's pg_type, > it could returns unintended error (the XX000 above) when elog or ereport is > executed. > > For more details, it happen in slot_store_error_callback when it try to call > format_type_be(localtypoid) for errcontext. > slot_store_error_callback is set in slot_store_cstrings, slot_modify_cstrings > function and it also be unset here, so the effect here is small but it > happens. > I think I found out the cause of this issue, and this is a bug. This can be reproduced, for example, if the input function of the data type calls elog() during applying on the environment where OIDs of the data type on publisher and subscriber are different. The cause of this issue is that we call format_type_be() with remotetypoid. If the OIDs of data type on publisher and subscriber are different we search it from syscache by the OID that doesn't exist on subscriber. On detail of your patch, I don't think this direction is good. Since the subscriber already has a LogicalRepTyp cache entry for the type we can report the error message using the data type name. So I think this issue can be fixed by using the remote type name got from the cache. Also I'm confused about the message of errcontext; currently we store the local data type OID corresponding to the remote data type name into the cache, and then we search the local data type name by the local data type OID stored in the cache. So it means the both the local data type OID and the remote data type OID always imply the same data type. We use the both data type OIDs for log message in slot_store_error_callback, but I think what the function want to do is to show the different type names if the table definitions on both server are different (e.g. sending jsonb column data to text column data). I think we should use the type of the local relation attribute rather than remote's one. Attached draft patch fixed this issue, at least on my environment. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
fix_slot_store_error_callback.patch
Description: Binary data