[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847651#comment-17847651
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit c8415513158842e2ddb1d64891298d76fb0b367f in impala's branch 
refs/heads/branch-4.4.0 from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c84155131 ]

IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t

Thrift 0.16.0 introduced a max message size to protect
receivers against a malicious message allocating large
amounts of memory. That limit is a 32-bit signed integer,
so the max value is 2GB. Impala introduced the
thrift_rpc_max_message_size startup option to set that
for Impala's thrift servers.

There are times when Impala wants to send a message that
is larger than 2GB. In particular, the catalog-update
topic for the statestore can exceed 2GBs when there is
a lot of metadata loaded using the old v1 catalog. When
there is a 2GB max message size, the statestore can create
and send a >2GB message, but the impalads will reject
it. This can lead to impalads having stale metadata.

This switches to a patched Thrift that uses an int64_t
for the max message size for C++ code. It does not modify
the limit.

The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp,
so this fixes that location to always print MaxMessageSize
exceptions.

This is only patching the Thrift C++ library. It does not
patch the Thrift Java library. There are a few reasons for
that:
 - This specific issue involves C++ to C++ communication and
   will be solved by patching the C++ library.
 - C++ is easy to patch as it is built via the native-toolchain.
   There is no corresponding mechanism for patching our Java
   dependencies (though one could be developed).
 - Java modifications have implications for other dependencies
   like Hive which use Thrift to communicate with HMS.
For the Java code that uses max message size, this converts
the 64-bit value to 32-bit value by capping the value at
Integer.MAX_VALUE.

Testing:
 - Added enough tables to produce a >2GB catalog-topic and
   restarted an impalad with a higher limit specific. Without
   the patch, the catalog-topic update would be rejected by the
   impalad. With the patch, it succeeds.

Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Reviewed-on: http://gerrit.cloudera.org:8080/21367
Reviewed-by: Michael Smith 
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 
(cherry picked from commit 13df8239d82a61afc3196295a7878ca2ffe91873)


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847652#comment-17847652
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit c9745fd5b941f52b3cd3496c425722fcbbffe894 in impala's branch 
refs/heads/branch-4.4.0 from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9745fd5b ]

IMPALA-13020 (part 2): Split out external vs internal Thrift max message size

The Thrift max message size is designed to protect against malicious
messages that consume a lot of memory on the receiver. This is an
important security measure for externally facing services, but it
can interfere with internal communication within the cluster.
Currently, the max message size is controlled by a single startup
flag for both. This puts tensions between having a low value to
protect against malicious messages versus having a high value to
avoid issues with internal communication (e.g. large statestore
updates).

This introduces a new flag thrift_external_rpc_max_message_size to
specify the limit for externally-facing services. The current
thrift_rpc_max_message_size now applies only for internal services.
Splitting them apart allows setting a much higher value for
internal services (64GB) while leaving the externally facing services
using the current 2GB limit.

This modifies various code locations that wrap a Thrift transport to
pass in the original transport's TConfiguration. This also adds DCHECKs
to make sure that the new transport inherits the max message size. This
limits the locations where we actually need to set max message size.

ThriftServer/ThriftServerBuilder have a setting "is_external_facing"
which can be specified on each ThriftServer. This modifies statestore
and catalog to set is_external_facing to false. All other servers stay
with the default of true.

Testing:
 - This adds a test case to verify that is_external_facing uses the
   higher limit.
 - Ran through the steps in testdata/scale_test_metadata/README.md
   and updated the value in that doc.
 - Created many tables to push the catalog-update topic to be >2GB
   and verified that statestore successfully sends it when an impalad
   restarts.

Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311
Reviewed-on: http://gerrit.cloudera.org:8080/21420
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 
(cherry picked from commit bcff4df6194b2f192d937bb9c031721feccb69df)


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847419#comment-17847419
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit 13df8239d82a61afc3196295a7878ca2ffe91873 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=13df8239d ]

IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t

Thrift 0.16.0 introduced a max message size to protect
receivers against a malicious message allocating large
amounts of memory. That limit is a 32-bit signed integer,
so the max value is 2GB. Impala introduced the
thrift_rpc_max_message_size startup option to set that
for Impala's thrift servers.

There are times when Impala wants to send a message that
is larger than 2GB. In particular, the catalog-update
topic for the statestore can exceed 2GBs when there is
a lot of metadata loaded using the old v1 catalog. When
there is a 2GB max message size, the statestore can create
and send a >2GB message, but the impalads will reject
it. This can lead to impalads having stale metadata.

This switches to a patched Thrift that uses an int64_t
for the max message size for C++ code. It does not modify
the limit.

The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp,
so this fixes that location to always print MaxMessageSize
exceptions.

This is only patching the Thrift C++ library. It does not
patch the Thrift Java library. There are a few reasons for
that:
 - This specific issue involves C++ to C++ communication and
   will be solved by patching the C++ library.
 - C++ is easy to patch as it is built via the native-toolchain.
   There is no corresponding mechanism for patching our Java
   dependencies (though one could be developed).
 - Java modifications have implications for other dependencies
   like Hive which use Thrift to communicate with HMS.
For the Java code that uses max message size, this converts
the 64-bit value to 32-bit value by capping the value at
Integer.MAX_VALUE.

Testing:
 - Added enough tables to produce a >2GB catalog-topic and
   restarted an impalad with a higher limit specific. Without
   the patch, the catalog-topic update would be rejected by the
   impalad. With the patch, it succeeds.

Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Reviewed-on: http://gerrit.cloudera.org:8080/21367
Reviewed-by: Michael Smith 
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847420#comment-17847420
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit bcff4df6194b2f192d937bb9c031721feccb69df in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bcff4df61 ]

IMPALA-13020 (part 2): Split out external vs internal Thrift max message size

The Thrift max message size is designed to protect against malicious
messages that consume a lot of memory on the receiver. This is an
important security measure for externally facing services, but it
can interfere with internal communication within the cluster.
Currently, the max message size is controlled by a single startup
flag for both. This puts tensions between having a low value to
protect against malicious messages versus having a high value to
avoid issues with internal communication (e.g. large statestore
updates).

This introduces a new flag thrift_external_rpc_max_message_size to
specify the limit for externally-facing services. The current
thrift_rpc_max_message_size now applies only for internal services.
Splitting them apart allows setting a much higher value for
internal services (64GB) while leaving the externally facing services
using the current 2GB limit.

This modifies various code locations that wrap a Thrift transport to
pass in the original transport's TConfiguration. This also adds DCHECKs
to make sure that the new transport inherits the max message size. This
limits the locations where we actually need to set max message size.

ThriftServer/ThriftServerBuilder have a setting "is_external_facing"
which can be specified on each ThriftServer. This modifies statestore
and catalog to set is_external_facing to false. All other servers stay
with the default of true.

Testing:
 - This adds a test case to verify that is_external_facing uses the
   higher limit.
 - Ran through the steps in testdata/scale_test_metadata/README.md
   and updated the value in that doc.
 - Created many tables to push the catalog-update topic to be >2GB
   and verified that statestore successfully sends it when an impalad
   restarts.

Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311
Reviewed-on: http://gerrit.cloudera.org:8080/21420
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-04-18 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838830#comment-17838830
 ] 

Joe McDonnell commented on IMPALA-13020:


[~stigahuang] pointed out that the exception is not being printed due to this 
logic in TAcceptQueueServer.cpp (see 
[https://github.com/apache/impala/blob/master/be/src/rpc/TAcceptQueueServer.cpp#L91]
 ):
{noformat}
    } catch (const TTransportException& ttx) {
      if (ttx.getType() != TTransportException::END_OF_FILE) {
        string errStr = string("TAcceptQueueServer client died: ") + ttx.what();
        GlobalOutput(errStr.c_str());
      }{noformat}
The MaxMessageSize exception uses END_OF_FILE, so it doesn't get printed. If we 
print all exceptions, we get:
{noformat}
I0418 18:39:55.026305 3241592 thrift-util.cc:198] TAcceptQueueServer client 
died: MaxMessageSize reached{noformat}

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-04-18 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838828#comment-17838828
 ] 

Joe McDonnell commented on IMPALA-13020:


One way to make the catalog-update topic large is to create tables with many 
partitions. Example:
{noformat}
set mt_dop=20;
create table {0} partitioned by (ss_sold_time_sk) as select ss_item_sk, 
ss_sold_time_sk from tpcds.store_sales;{noformat}
Each of these tables adds about 25MB to the catalog-update topic, so getting to 
about 90 of these tables puts the topic well above 2GB.

This is resource intensive and needs a sizeable machine. Starting the cluster 
with a single impalad (i.e. bin/start-impala-cluster.py -s 1) helps keep the 
memory footprint down. There are easier / less resource intensive ways to do 
this.

> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't give a good error, but we see this:
> {noformat}
> I0418 16:54:53.889683 3214537 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreSubscriber from client 
> I0418 16:54:54.080694 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 110
> I0418 16:54:56.080920 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 111
> I0418 16:54:58.081131 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 112
> I0418 16:55:00.081358 3214136 Frontend.java:1837] Waiting for local catalog 
> to be initialized, attempt: 113{noformat}
> With a patch Thrift that allows an int64_t max message size and setting that 
> to a larger value, the Impala was able to start up (even without restarting 
> the statestored).
> Some clusters that upgrade to a newer version may hit this, as Thrift didn't 
> use to enforce this limit, so this is something we should fix to avoid 
> upgrade issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org