akshatshenoi-db commented on PR #56572: URL: https://github.com/apache/spark/pull/56572#issuecomment-4736159999
## AI code review (self-review via spark-dev) Ran an automated code review at head `fc6c690`. **Verdict: 0 blocking, 0 non-blocking, 0 nits — clean.** Checked: - `readArchive` and `inferWithArchives` are both per-mode and mirror `readFile` / the JSON analogue — single-line splits each entry/file into line records, multi-line tokenizes the whole stream into `rowTag`-delimited records. - The shared-cursor parse-before-advance invariant holds through the `perInput` helper (lazy tokenization, line strings copied before the entry cursor advances). - Single-line inference reproduces `readFile`'s line decode exactly, so a single-line archive read infers and scans the same as a single-line directory read. - Corrupt/missing inputs are skipped as a unit under `ignoreCorruptFiles`/`ignoreMissingFiles`. Added a single-line read+infer parity test. Not built locally; CI is the gate. <!-- ai-code-review --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
