Alchuang22-dev opened a new pull request, #15756:
URL: https://github.com/apache/iotdb/pull/15756
## Description
This PR implements comprehensive support for IoTDB model format integration,
enabling AINode to work seamlessly with both legacy PyTorch models and new
IoTDB models using `from_pretrained` and `save_pretrained` methodologies. The
implementation addresses Bug1 where `DEFAULT_MODEL_FILE_NAME` was incorrectly
set to "model.safetensors" instead of "model.pt", and extends the system to
support both formats through automatic detection and conversion.
**The PR is only a testing version, so please DO NOT pass the pull request!**
### Content1: Model Format Auto-Detection and Loading System
The core enhancement introduces a robust model format detection system that
automatically identifies IoTDB format (config.json + model.safetensors) and
legacy format (config.yaml + model.pt). Key design decisions:
**Choice of algorithms**: Implemented priority-based format detection where
IoTDB format takes precedence over legacy format. This ensures forward
compatibility while maintaining backward compatibility. The detection algorithm
checks for configuration files in order: config.json, configuration.json, then
config.yaml.
**Behavioral aspects**: The system gracefully handles missing files by
attempting fallback formats. Error conditions are handled with detailed logging
and appropriate exception types. Configuration values are validated against
model-specific schemas for both TimerXL and Sundial models.
**Class organization**: Created a layered architecture with:
- `ModelStorage` as the central coordinator handling both formats
- `ModelFactory` for format-agnostic model registration and download
- `ConfigParser` for configuration file parsing and format conversion
- `SafetensorLoader` for weight file loading with automatic format detection
**Method organization**: Split functionality into focused methods:
- `_detect_local_model_format()` for format identification
- `_load_iotdb_model_with_from_pretrained()` for IoTDB model loading
- `_load_legacy_model()` for backward compatibility
- `load_weights_for_from_pretrained()` for flexible weight loading
**Naming**: Used descriptive naming that clearly indicates format support:
- `IOTDB_CONFIG_FILES` and `WEIGHT_FORMAT_PRIORITY` constants
- `convert_iotdb_config_to_ainode_format()` for configuration conversion
- `validate_iotdb_config()` for configuration validation
Alternative design considered: Single universal loader vs format-specific
loaders. Chose format-specific approach for better error handling and clearer
separation of concerns.
### Content2: From_pretrained and Save_pretrained Implementation
Implemented full `from_pretrained` and `save_pretrained` support following
HuggingFace conventions:
**Choice of algorithms**: Used dynamic model class import based on
model_type configuration, enabling extensible support for new model types.
Configuration objects are created using `ConfigClass.from_dict()` pattern for
type safety.
**Behavioral aspects**: Handles various configuration formats through
parameter mapping (e.g., `seq_len` → `input_token_len`). Error conditions
include unsupported model types, corrupted weights, and missing dependencies.
The system provides graceful degradation when TorchScript conversion fails.
**Class organization**: Extended existing classes with new capabilities:
- `ModelStorage.save_model_with_save_pretrained()` for model saving
- `ModelStorage.clone_model_with_save_load()` for model cloning
- `ConfigParser.create_model_config_for_save_pretrained()` for config
generation
**Method organization**: Separated concerns into atomic operations:
- Configuration parsing and validation
- Weight loading and state dict creation
- Model instantiation with proper error handling
- Optional optimizations (TorchScript, compilation)
**Naming**: Used standard HuggingFace naming conventions:
- `from_pretrained()` and `save_pretrained()` method names
- `state_dict`, `config`, `torch_dtype` parameter names
- `safe_serialization=True` for safetensors format
Alternative design: Custom serialization format vs HuggingFace standard.
Chose HuggingFace standard for ecosystem compatibility and community support.
### Content3: Enhanced Error Handling and Model State Management
Implemented comprehensive error handling and model lifecycle management:
**Choice of algorithms**: Used state machine pattern for model status
tracking (LOADING → ACTIVE → INACTIVE/ERROR). Implemented LRU caching with
memory-based eviction for efficient model management.
**Behavioral aspects**: All operations provide detailed error messages with
model context. ConfigNode synchronization ensures cluster-wide model status
consistency. Memory management prevents OOM conditions through proactive cache
eviction.
**Class organization**: Enhanced existing managers with new capabilities:
- `ModelManager` with status tracking and ConfigNode integration
- `InferenceManager` with improved strategy selection
- New exception classes for specific error types
**Method organization**: Added lifecycle management methods:
- `_update_model_status()` for status synchronization
- `validate_model_files()` for integrity checking
- `get_model_status()` and `list_models()` for monitoring
**Naming**: Used consistent naming for status management:
- Model states: "LOADING", "ACTIVE", "INACTIVE", "ERROR"
- Exception classes: `ModelLoadingError`, `IoTDBModelError`,
`WeightFileError`
- Status methods: `verify_model_success()`, `log_status_result()`
Alternative design: Synchronous vs asynchronous status updates. Chose
synchronous for simplicity and consistency, with ConfigNode communication
handled separately.
The implementation maintains full backward compatibility while enabling
modern IoTDB model workflows. All changes are covered by comprehensive error
handling and logging for production deployment.
<hr>
This PR has:
- [x] been self-reviewed.
- [x] concurrent read
- [x] concurrent write
- [x] concurrent read and write
- [x] added documentation for new or modified features or behaviors.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious
for an unfamiliar reader.
- [x] added or updated version, __license__, or notice information
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold
for code coverage.
- [ ] added integration tests.
- [ ] been tested in a test IoTDB cluster.
<hr>
##### Key changed/added classes (or packages if there are too many classes)
in this PR
**Core Infrastructure:**
- `ainode.core.constant` - Added IoTDB format constants and file priorities
- `ainode.core.exception` - Added IoTDB-specific exception classes
- `ainode.core.config_parser` - Enhanced with IoTDB config support and
validation
- `ainode.core.safetensor_loader` - New module for weight file handling
**Model Management:**
- `ainode.core.model.model_storage` - Enhanced with format detection and
from_pretrained support
- `ainode.core.model.model_factory` - Enhanced with automatic format
detection and download
- `ainode.core.manager.model_manager` - Added model status management and
ConfigNode integration
- `ainode.core.manager.inference_manager` - Improved strategy selection for
IoTDB models
**Utilities:**
- `ainode.core.util.cache` - Enhanced memory management and statistics
- `ainode.core.util.status` - Added model-specific status functions
- `ainode.core.util.serde` - Added IoTDB format validation support
**Service Layer:**
- `ainode.core.handler` - Enhanced error handling for IoTDB models
- `ainode.core.config` - Added IoTDB model configuration options
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]