viirya commented on PR #56073:
URL: https://github.com/apache/spark/pull/56073#issuecomment-4526246595

   LGTM. The extraction is a straight, behavior-preserving lift of the same 
idiom from
   four sites into one helper, and the Javadoc correctly calls out the one 
footgun
   (callers must assign the returned reference back to the field) — all four 
call
   sites do.
   
   A couple of small, non-blocking nits:
   
   1. `JoinHelper` is a fairly generic name for what is currently a 
single-purpose
      utility. If we don't expect more shared join helpers to land here soon,
      something more specific like `FullOuterJoinHelper` (or even keeping it 
scoped
      as `MatchedBitSet`-style) would make the file's responsibility clearer and
      avoid it becoming a catch-all over time. Up to you.
   
   2. Worth disambiguating the `BitSet` in the Javadoc — e.g. "Spark's
      `org.apache.spark.util.collection.BitSet`" — since Java readers' first
      association is `java.util.BitSet`.
   
   3. Optional: a tiny unit test covering the three branches
      (`bufferSize == 0`, `bufferSize <= capacity`, `bufferSize > capacity`,
      including that the returned reference is the same instance on the reuse 
path
      and a fresh one on realloc) would lock in the contract the Javadoc 
describes.
      Existing `OuterJoinSuite` coverage is fine for behavior, but a direct 
test on
      the helper would be cheap insurance for the reassign-the-return-value
      contract.
   
   None of these are blockers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to