jtmer opened a new pull request, #15884:
URL: https://github.com/apache/iotdb/pull/15884
## Description
This PR adds an request–pooling engine for multi-request inference on
time-series models such as TimerXL. The change set introduces three core Python
modules—requestpool.py, request.py, and utils.py—plus a self-contained
benchmark harness (guarded by if __name__ == "__main__":) to compare pooled vs.
baseline generation speed and numerical fidelity.
### Design
<b>Overlap multiple user requests on one device: </b>RequestPool.step()
batches all ready requests every 15 ms if there are no requests running, and
feeds a single forward pass to the model.
<b>Handle variable sequence lengths: </b>Left-padding to max_len per tensor
type; preserves causal semantics while enabling torch.cat.
### Behavior & configuration
RequestPool.add_request truncates inputs that are not an exact multiple of
config.input_token_len, ensuring model state alignment;
Oversized write attempts are silently clipped to max_new_steps;
### Class & method organization
`RequestPool`
<b>Public API: </b>add_request, run_inference (starts loop), step (single
scheduling + forward pass).
`Request`
id, chunk_size, …, state, cur_step_idx, output_tensor
write_step_output pre-allocates and in-place fills a fixed buffer—no
Python-side reallocation after start.
`utils`
split_moe_output Slices Moe[Causal]LMOutputWithPast into per-request objects.
<!--
In each section, please describe design decisions made, including:
- Choice of algorithms
- Behavioral aspects. What configuration values are acceptable? How are
corner cases and error
conditions handled, such as when there are insufficient resources?
- Class organization and design (how the logic is split between classes,
inheritance, composition,
design patterns)
- Method organization and design (how the logic is split between methods,
parameters and return types)
- Naming (class, method, API, configuration, HTTP endpoint, names of
emitted metrics)
-->
<!-- It's good to describe an alternative design (or mention an alternative
name) for every design
(or naming) decision point and compare the alternatives with the designs
that you've implemented
(or the names you've chosen) to highlight the advantages of the chosen
designs and names. -->
<!-- If there was a discussion of the design of the feature implemented in
this PR elsewhere
(e. g. a "Proposal" issue, any other issue, or a thread in the development
mailing list),
link to that discussion from this PR description and explain what have
changed in your final design
compared to your original proposal or the consensus version in the end of
the discussion.
If something hasn't changed since the original discussion, you can omit a
detailed discussion of
those aspects of the design here, perhaps apart from brief mentioning for
the sake of readability
of this PR description. -->
<!-- Some of the aspects mentioned above may be omitted for simple and small
changes. -->
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious
for an unfamiliar reader.
- [x] added unit tests.
<!-- Check the items by putting "x" in the brackets for the done things. Not
all of these items
apply to every PR. Remove the items which are not done or not relevant to
the PR. None of the items
from the checklist above are strictly necessary, but it would be very
helpful if you at least
self-review the PR. -->
<hr>
##### Key changed/added classes (or packages if there are too many classes)
in this PR
`ainode.core.inference.requestpool.RequestPool`
`ainode.core.inference.request.Request`
`ainode.core.inference.utils.split_moe_output`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]