[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/4654 The actual solution that I'm thinking of currently is to have a modified `HttpObjectAggregator` implementation that spills larger messages to disk. ---
[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/4654 removing the object aggregator will break the remaining pipeline, so that option is out too. As for the client upload, that's exactly whats being done right now. The jar is uploaded to the blob server, and only the blob key is transmitted via REST. The problem here is that this also requires any client to be written in java and rely on Flink, whereas we would prefer if any client, written in any language, can communicate with the REST APIs. ---
[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user zjureel commented on the issue: https://github.com/apache/flink/pull/4654 @kl0u @zentol Thank you for your suggestions. So remove the code `.addLast(new HttpObjectAggregator(100*1024*1024)` will be a temporary fix? In fact it surprises me that the client sends the jar file of job to the rest server instead of block file system, such as hdfs. As @zentol said, the jars may be multiple 100mb large and we obviously can't allocate that much memory in general, I think the client upload the jars to a block file system may be a better way, or we should find a way to create a stream service in the rest server and save the input stream of jar as soon as possible instead of aggregate it in the memory. What do you think? thanks ---
[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/4654 This doesn't solve the problem of the JIRA. For the job submission we have to send jars to the cluster, which may be multiple 100mb large; we obviously can't allocate that much memory in general. As @kl0u said, setting this value any higher (it is already _very_ high) will cause insane memory consumption in the object aggregator in any case where concurrent requests are in progress. As such this isn't even a temporary fix, as this is not a viable fix for the use-cases that are affected by the limit, so I'm afraid we'll have to reject this PR. ---
[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/4654 Hi @zjureel ! Thanks for the work! This can be a temporary fix, but I was thinking more of a long term one where there is no limit. The problems that I can find with such temporary solution are: 1) if we add this as a configuration parameter, and then remove it when we have a proper fix, this may result in confusion for the users. 2) with this fix, the aggregator is going to allocate the specified limit every time there is an element to parse, even though the element may be small. This can be a waste of resources. What do you think @zentol ? ---
[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...
Github user zjureel commented on the issue: https://github.com/apache/flink/pull/4654 @kl0u I have tried to fix [https://issues.apache.org/jira/browse/FLINK-7521](https://issues.apache.org/jira/browse/FLINK-7521) in this PR, could you please have a look when you're free, thanks ---