xiangmy21 opened a new pull request, #17065:
URL: https://github.com/apache/iotdb/pull/17065

   # feat: show timeseries [order by timseries] clause
   
   Support `ORDER BY TIMESERIES` in `SHOW TIMESERIES` and Optimize `OFFSET` 
with Subtree Measurement Statistics
   
   ---
   
   ## Brief
   
   This PR introduces ordering support for `SHOW TIMESERIES` by measurement 
name (lexicographical) and optimizes the performance of the `OFFSET` clause 
using subtree measurement statistics.
   
   Currently, `SHOW TIMESERIES` supports the `ORDER BY TIMESERIES` clause.
   When the query scope is within a single database, `OFFSET X` no longer 
requires traversing the first X measurements one by one. Instead, it skips 
subtrees in the metadata tree based on subtree measurement counts.
   
   Example:
   
   ```sql
   IoTDB> show timeseries root.dbtest.** order by timeseries desc limit 5 
offset 5001
   
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
   |                      Timeseries|Alias|   
Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType|
   
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
   |root.dbtest.plant9.device84.m188| null|root.dbtest|   INT32|     RLE|     
SNAPPY|null|      null|    null|              null|    BASE|
   |root.dbtest.plant9.device84.m187| null|root.dbtest|   INT32|     RLE|     
SNAPPY|null|      null|    null|              null|    BASE|
   |root.dbtest.plant9.device84.m186| null|root.dbtest|   INT32|     RLE|     
SNAPPY|null|      null|    null|              null|    BASE|
   |root.dbtest.plant9.device84.m185| null|root.dbtest|   INT32|     RLE|     
SNAPPY|null|      null|    null|              null|    BASE|
   |root.dbtest.plant9.device84.m184| null|root.dbtest|   INT32|     RLE|     
SNAPPY|null|      null|    null|              null|    BASE|
   
+--------------------------------+-----+-----------+--------+--------+-----------+----+----------+--------+------------------+--------+
   ```
   
   * `DESC`: descending order
   * `ASC`: ascending order (default if not specified)
   
   ---
   
   ## Design & Implementation Details
   
   ### 1. Syntax Extension
   
   * Updated the grammar of `SHOW TIMESERIES` to support 
`orderByTimeseriesClause`.
   * Added ordering-related fields to `ShowTimeSeriesStatement` to record 
sorting requirements.
   
   ### 2. Logical Plan Construction
   
   * Introduced a `SortNode` at the top of the logical plan to enable global 
ordering across partitions.
   * For single-partition queries, `LIMIT` and `OFFSET` are pushed down to the 
metadata tree traversal stage in `LogicalPlanVisitor`.
   
   ### 3. Limit/Offset Handling in Multi-Partition Queries
   
   * For multi-partition queries, `OFFSET` cannot be pushed down.
     Instead, `limit' = limit + offset` is used.
   * Fixed a bug in the previous implementation:
     when `limit = 0` (which is treated as unlimited in IoTDB), `limit'` was 
incorrectly pushed down, causing unexpected truncation of results.
   
   ### 4. Ordered Traversal in Metadata Tree
   
   * In `MTreeBelowSGMemoryImpl`, the iteration strategy of child nodes in 
`schemaReader` is overridden under ordering mode.
   * Child nodes are traversed in lexicographical order by name.
   
   ### 5. Subtree Measurement Count
   
   * Implemented subtree measurement statistics (only for in-memory metadata 
tree):
   
     * Added `subtreeMeasurementCount` to `IMemMNode` and related classes.
     * Maintained `subtreeMeasurementCount` along ancestor paths during 
measurement insertion and deletion.
     * This value is not persisted; it is initialized via a DFS during metadata 
tree loading.
   * During schemaReader construction, `acceptFullMatchedNode` is overridden to 
skip subtree traversal when `subtreeMeasurementCount < offset`.
   
   ---
   
   ## Tests
   
   ### Functional Tests
   * Added `IoTDBShowTimeseriesOrderByTimeseriesIT` to verify correctness of 
ordering and offset behavior.
   ### Performance Evaluation
   
   * Evaluated performance on a single-partition dataset with a typical 
three-level hierarchy:
     ```
     root.plantA.deviceB.measurementC
     ```
     with approximately 900,000 measurements.
   * Query pattern:
     ```sql
     SHOW TIMESERIES root.dbtest.** order by timeseries LIMIT 100 OFFSET X;
     ```
     where `X` ranges from 0 to 900,000 with a step of 5,000.
   * Compared the original implementation and the optimized version.
   * Results are shown in the figure below: 
   <img width="1200" height="800" alt="image" 
src="https://github.com/user-attachments/assets/625a2825-7790-4c95-9cb4-854c3307e7fb";
 />
   
   * Detailed benchmark methodology and results are available at: 
[[link].](https://apache-iotdb-project.feishu.cn/wiki/P0RcwbUleipgDjkKwhqcOkkynJd?from=from_copylink)
   
   ---
   
   ## Impact
   
   * Enables deterministic ordering for `SHOW TIMESERIES`.
   * Significantly reduces traversal cost for large `OFFSET` values in 
single-partition scenarios $O(limit+offset)\to O(limit+TreeHeight)$.
   * Improves correctness of limit/offset handling in multi-partition queries.
   * Introduces minimal overhead to metadata maintenance (in-memory only).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to