[PR] fix: from pretrain and save pretrain [iotdb]

via GitHub Tue, 17 Jun 2025 19:43:36 -0700


Alchuang22-dev opened a new pull request, #15756:
URL: https://github.com/apache/iotdb/pull/15756


   ## Description
   
   This PR implements comprehensive support for IoTDB model format integration, 
enabling AINode to work seamlessly with both legacy PyTorch models and new 
IoTDB models using `from_pretrained` and `save_pretrained` methodologies. The 
implementation addresses Bug1 where `DEFAULT_MODEL_FILE_NAME` was incorrectly 
set to "model.safetensors" instead of "model.pt", and extends the system to 
support both formats through automatic detection and conversion.
   
   **The PR is only a testing version, so please DO NOT pass the pull request!**
   
   ### Content1: Model Format Auto-Detection and Loading System
   
   The core enhancement introduces a robust model format detection system that 
automatically identifies IoTDB format (config.json + model.safetensors) and 
legacy format (config.yaml + model.pt). Key design decisions:
   
   **Choice of algorithms**: Implemented priority-based format detection where 
IoTDB format takes precedence over legacy format. This ensures forward 
compatibility while maintaining backward compatibility. The detection algorithm 
checks for configuration files in order: config.json, configuration.json, then 
config.yaml.
   
   **Behavioral aspects**: The system gracefully handles missing files by 
attempting fallback formats. Error conditions are handled with detailed logging 
and appropriate exception types. Configuration values are validated against 
model-specific schemas for both TimerXL and Sundial models.
   
   **Class organization**: Created a layered architecture with:
   - `ModelStorage` as the central coordinator handling both formats
   - `ModelFactory` for format-agnostic model registration and download
   - `ConfigParser` for configuration file parsing and format conversion
   - `SafetensorLoader` for weight file loading with automatic format detection
   
   **Method organization**: Split functionality into focused methods:
   - `_detect_local_model_format()` for format identification
   - `_load_iotdb_model_with_from_pretrained()` for IoTDB model loading
   - `_load_legacy_model()` for backward compatibility
   - `load_weights_for_from_pretrained()` for flexible weight loading
   
   **Naming**: Used descriptive naming that clearly indicates format support:
   - `IOTDB_CONFIG_FILES` and `WEIGHT_FORMAT_PRIORITY` constants
   - `convert_iotdb_config_to_ainode_format()` for configuration conversion
   - `validate_iotdb_config()` for configuration validation
   
   Alternative design considered: Single universal loader vs format-specific 
loaders. Chose format-specific approach for better error handling and clearer 
separation of concerns.
   
   ### Content2: From_pretrained and Save_pretrained Implementation
   
   Implemented full `from_pretrained` and `save_pretrained` support following 
HuggingFace conventions:
   
   **Choice of algorithms**: Used dynamic model class import based on 
model_type configuration, enabling extensible support for new model types. 
Configuration objects are created using `ConfigClass.from_dict()` pattern for 
type safety.
   
   **Behavioral aspects**: Handles various configuration formats through 
parameter mapping (e.g., `seq_len` → `input_token_len`). Error conditions 
include unsupported model types, corrupted weights, and missing dependencies. 
The system provides graceful degradation when TorchScript conversion fails.
   
   **Class organization**: Extended existing classes with new capabilities:
   - `ModelStorage.save_model_with_save_pretrained()` for model saving
   - `ModelStorage.clone_model_with_save_load()` for model cloning
   - `ConfigParser.create_model_config_for_save_pretrained()` for config 
generation
   
   **Method organization**: Separated concerns into atomic operations:
   - Configuration parsing and validation
   - Weight loading and state dict creation
   - Model instantiation with proper error handling
   - Optional optimizations (TorchScript, compilation)
   
   **Naming**: Used standard HuggingFace naming conventions:
   - `from_pretrained()` and `save_pretrained()` method names
   - `state_dict`, `config`, `torch_dtype` parameter names
   - `safe_serialization=True` for safetensors format
   
   Alternative design: Custom serialization format vs HuggingFace standard. 
Chose HuggingFace standard for ecosystem compatibility and community support.
   
   ### Content3: Enhanced Error Handling and Model State Management
   
   Implemented comprehensive error handling and model lifecycle management:
   
   **Choice of algorithms**: Used state machine pattern for model status 
tracking (LOADING → ACTIVE → INACTIVE/ERROR). Implemented LRU caching with 
memory-based eviction for efficient model management.
   
   **Behavioral aspects**: All operations provide detailed error messages with 
model context. ConfigNode synchronization ensures cluster-wide model status 
consistency. Memory management prevents OOM conditions through proactive cache 
eviction.
   
   **Class organization**: Enhanced existing managers with new capabilities:
   - `ModelManager` with status tracking and ConfigNode integration
   - `InferenceManager` with improved strategy selection
   - New exception classes for specific error types
   
   **Method organization**: Added lifecycle management methods:
   - `_update_model_status()` for status synchronization
   - `validate_model_files()` for integrity checking
   - `get_model_status()` and `list_models()` for monitoring
   
   **Naming**: Used consistent naming for status management:
   - Model states: "LOADING", "ACTIVE", "INACTIVE", "ERROR"
   - Exception classes: `ModelLoadingError`, `IoTDBModelError`, 
`WeightFileError`
   - Status methods: `verify_model_success()`, `log_status_result()`
   
   Alternative design: Synchronous vs asynchronous status updates. Chose 
synchronous for simplicity and consistency, with ConfigNode communication 
handled separately.
   
   The implementation maintains full backward compatibility while enabling 
modern IoTDB model workflows. All changes are covered by comprehensive error 
handling and logging for production deployment.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
       - [x] concurrent read
       - [x] concurrent write
       - [x] concurrent read and write 
   - [x] added documentation for new or modified features or behaviors.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious 
     for an unfamiliar reader.
   - [x] added or updated version, __license__, or notice information
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold 
     for code coverage.
   - [ ] added integration tests.
   - [ ] been tested in a test IoTDB cluster.
   
   <hr>
   
   ##### Key changed/added classes (or packages if there are too many classes) 
in this PR
   
   **Core Infrastructure:**
   - `ainode.core.constant` - Added IoTDB format constants and file priorities
   - `ainode.core.exception` - Added IoTDB-specific exception classes
   - `ainode.core.config_parser` - Enhanced with IoTDB config support and 
validation
   - `ainode.core.safetensor_loader` - New module for weight file handling
   
   **Model Management:**
   - `ainode.core.model.model_storage` - Enhanced with format detection and 
from_pretrained support
   - `ainode.core.model.model_factory` - Enhanced with automatic format 
detection and download
   - `ainode.core.manager.model_manager` - Added model status management and 
ConfigNode integration
   - `ainode.core.manager.inference_manager` - Improved strategy selection for 
IoTDB models
   
   **Utilities:**
   - `ainode.core.util.cache` - Enhanced memory management and statistics
   - `ainode.core.util.status` - Added model-specific status functions
   - `ainode.core.util.serde` - Added IoTDB format validation support
   
   **Service Layer:**
   - `ainode.core.handler` - Enhanced error handling for IoTDB models
   - `ainode.core.config` - Added IoTDB model configuration options


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix: from pretrain and save pretrain [iotdb]

Reply via email to