Re: [PR] Ignore repetitions of semicolon [cassandra]

via GitHub Tue, 28 Jan 2025 07:30:52 -0800


bschoening commented on code in PR #3812:
URL: https://github.com/apache/cassandra/pull/3812#discussion_r1931268627



##########
pylib/cqlshlib/test/test_cql_parsing.py:
##########
@@ -804,6 +804,39 @@ def test_strip_comment_blocks_from_input(self):
                                ''')
         self.assertRaises(SyntaxError)
 
+    def test_group_tokens_skip_duplicate_endtokens(self):
+        tokens = [
+            ('reserved_identifier', 'SELECT', (0, 6)),
+            ('star', '*', (7, 8)),
+            ('endtoken', ';', (9, 10)),  # first semicolon
+            ('endtoken', ';', (11, 12)),  # duplicate semicolon to skip
+            ('reserved_identifier', 'FROM', (13, 17)),
+            ('identifier', 'my_table', (18, 26)),
+            ('endtoken', ';', (27, 28)),  # valid semicolon to keep
+            ('reserved_identifier', 'SELECT', (29, 35)),
+            ('identifier', 'another_table', (36, 51)),
+            ('endtoken', ';', (52, 53))  # valid semicolon to keep
+        ]

Review Comment:
   This test doesn't seem to follow the pattern of 
   
   >         parsed = parse_cqlsh_statements('some statement')
   >         self.assertSequenceEqual(tokens_with_types(parsed),
   >                                  [list of parsed types])
   
    also,   _SELECT * ; FROM table; SELECT table;_  if I'm reading it right, 
has a semi-colon between SELECT and FROM?  That wouldn't be valid CQL.



##########
pylib/cqlshlib/cqlhandling.py:
##########
@@ -151,6 +151,39 @@ def cql_split_statements(self, text):
                     in_batch = True
         return output, in_batch or in_pg_string
 
+    def group_tokens(self, items):
+        """
+        Split an iterable into sublists, using 'endtoken' to mark the end of 
each sublist.
+        Each sublist accumulates elements until an 'endtoken' is encountered. 
If the sublist
+        consists only of a single 'endtoken', it is excluded. An empty list is 
added to the
+        result after the last 'endtoken' for cases like autocompletion.
+
+        Parameters:
+        - items (iterable): An iterable of tokens, including 'endtoken' 
elements.
+
+        Returns:
+        - list: A list of sublists, with each sublist containing tokens split 
by 'endtoken'.
+        """
+        thisresult = []
+        output = []
+
+        for i in items:
+            thisresult.append(i)
+
+            # When an 'endtoken' is encountered, start a new sublist
+            if i[0] == 'endtoken':
+                # Skip adding sublist if it contains just one "endtoken"
+                if len(thisresult) > 1:
+                    output.append(thisresult)  # Add valid sublist
+
+                # Start a new sublist after an 'endtoken'
+                thisresult = []

Review Comment:
   Can you use itertools.groupby(list, lambda t: t[0] == 'endtoken')?
   
   maybe rename _thisresult_ to _sublist_



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Re: [PR] Ignore repetitions of semicolon [cassandra]

Reply via email to