nnguyen168 commented on PR #55466:
URL: https://github.com/apache/spark/pull/55466#issuecomment-4528175469

   Hi @yadavay-amzn , I did some deeper analysis and found an edge case with 
nested block comments.
   
   pure nested trailing comments are incorrectly kept when they should be 
dropped:
   
     | Input                             | Expected     | Actual                
                     |
     
|-----------------------------------|--------------|--------------------------------------------|
     | SELECT 1; /* comment */           | ["SELECT 1"] | ["SELECT 1"] ✓        
                     |
     | SELECT 1; /* outer /* inner */ */ | ["SELECT 1"] | ["SELECT 1", " /* 
outer /* inner */ */"] ✗ |
   
     Root Cause
   
     At line 697-698, hasPrecedingNonCommentString is recalculated for every 
/*, including nested ones:
     } else if (hasNext && line.charAt(index + 1) == '*') {
       bracketedCommentLevel += 1
       hasPrecedingNonCommentString = beginIndex != index &&
         line.substring(beginIndex, index).exists(!_.isWhitespace)
     }
   
     When the second /* is encountered in /* outer /* inner */ */, the 
substring includes the first /* characters, which are non-whitespace, so 
hasPrecedingNonCommentString incorrectly becomes true.
   
   Suggested Fix
   
     Only set hasPrecedingNonCommentString when entering the outermost comment:
   
   ```
     } else if (hasNext && line.charAt(index + 1) == '*') {
       // Only set hasPrecedingNonCommentString when entering the outermost 
comment
       if (bracketedCommentLevel == 0) {
         hasPrecedingNonCommentString = beginIndex != index &&
           line.substring(beginIndex, index).exists(!_.isWhitespace)
       }
       bracketedCommentLevel += 1
     }
   ```
   
   
     Suggested Test Cases
   
   ```
     // Nested comment WITH preceding SQL (already works)
     "SELECT 1; SELECT 2 /* outer /* inner */ */" -> Seq("SELECT 1", " SELECT 2 
/* outer /* inner */ */"),
     // Pure nested trailing comment (needs fix to work)
     "SELECT 1; /* outer /* inner */ */" -> Seq("SELECT 1"),
     // Leading nested comment with statement
     "/* outer /* inner */ */ SELECT 1;" -> Seq("/* outer /* inner */ */ SELECT 
1"),
     // Inline nested comment
     "SELECT /* nested /* comment */ */ 1;" -> Seq("SELECT /* nested /* comment 
*/ */ 1")
   ```
   
   Happy to help with a follow-up PR if needed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to