Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-12 Thread via GitHub


guan404ming merged PR #803:
URL: https://github.com/apache/mahout/pull/803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-12 Thread via GitHub


guan404ming commented on PR #803:
URL: https://github.com/apache/mahout/pull/803#issuecomment-3738465190

   Thanks, I'll send a follow up for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-12 Thread via GitHub


rich7420 commented on PR #803:
URL: https://github.com/apache/mahout/pull/803#issuecomment-3737738254

   Thanks for the update!
   I think some comments been removed unnecessarily.
   maybe shape validation in pytorch hasn't done yet. but it could be follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-12 Thread via GitHub


guan404ming commented on PR #803:
URL: https://github.com/apache/mahout/pull/803#issuecomment-3737460419

   Thanks, I've updated with three additions. Please take another look, thanks!
 1. Updated benchmark files to use unified encode() API:
 2. Added tensor shape validation with clear error messages (lib.rs):
 3. Added pathlib.Path support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-12 Thread via GitHub


rich7420 commented on PR #803:
URL: https://github.com/apache/mahout/pull/803#issuecomment-3737344010

   @guan404ming thanks for the patch!
   I think we need to change encode() functions in benchmark files as well.
   And it would be better to check the tensor's shape. like, if it's 1D, use 
the single-sample encode path and then if it's 2D, use the batch encoding path 
and for anything else, throw a clear error explaining what shapes are supported.
   On the other hand , maybe we could support for pathlib.Path objects by using 
os.fspath() to convert them to strings before processing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-11 Thread via GitHub


guan404ming commented on PR #803:
URL: https://github.com/apache/mahout/pull/803#issuecomment-3737234464

   cc @ryankert01 @400Ping 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



[PR] [QDP] Simplify Python API with unified encode() method [mahout]

2026-01-11 Thread via GitHub


guan404ming opened a new pull request, #803:
URL: https://github.com/apache/mahout/pull/803

   ### Purpose of PR
   
    Why
   
   - API had 6 different encoding methods (encode, encode_batch, encode_tensor, 
encode_from_parquet, encode_from_arrow_ipc, encode_from_numpy)
   - Users needed to know which method to call for each input type
   
    How
   
   - Unified encode() method - auto-detects lists, NumPy arrays, PyTorch 
tensors, and file paths
   - Default encoding method - encoding_method="amplitude" is now the default
   
   ### Related Issues or PRs
   
   
   
   
   ### Changes Made
   
   - [ ] Bug fix
   - [ ] New feature
   - [x] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Breaking Changes
   
   - [x] Yes
   - [ ] No
   
   ### Checklist
   
   
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes
   - [x] Successfully built and ran all unit tests or manual tests locally
   - [ ] PR title follows "MAHOUT-XXX: Brief Description" format (if related to 
an issue)
   - [x] Code follows ASF guidelines
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]