[ https://issues.apache.org/jira/browse/BEAM-6612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Amato updated BEAM-6612: ----------------------------- Summary: PerformanceRegression in QueueingBeamFnDataClient (was: Remove QueueingBeamFnDataClient) > PerformanceRegression in QueueingBeamFnDataClient > ------------------------------------------------- > > Key: BEAM-6612 > URL: https://issues.apache.org/jira/browse/BEAM-6612 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution > Reporter: Alex Amato > Assignee: Alex Amato > Priority: Major > Labels: triaged > Time Spent: 50m > Remaining Estimate: 0h > > Remove QueueingBeamFnDataClient, which made process() calls all run on the > same thread. > [~lcwik] and I came up with this design thinking that it was required to > process the bundle in parallel anyways, and we would have good performance. > However after speaking to Ken, there is no requirement for a bundle or key to > be processed in parallel. Elements are either iterables or single elements > which defines the needs for processing a group of elements on the same thread. > Simply performing this change will lead to the following issues: > (1) MetricsContainerImpl and MetricsContainer are not thread safe, so when > the process() functions enter the metric container context, they will be > accessing an thread-unsafe collection in parallel > (2) An ExecutionStateTracker will be needed in every thread, So we will need > to > create an instance and activate it in every GrpC thread which receives a new > element. > (Will this get sampled properly, since the trackers will be short lived). > (3) The SimpleExecutionStates being used will need to be thread safe as well? > I don't think so, because I don't think that the ExecutionStateSampler > invokes them in parallel. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)