bschoening commented on code in PR #3812: URL: https://github.com/apache/cassandra/pull/3812#discussion_r1931268627
########## pylib/cqlshlib/test/test_cql_parsing.py: ########## @@ -804,6 +804,39 @@ def test_strip_comment_blocks_from_input(self): ''') self.assertRaises(SyntaxError) + def test_group_tokens_skip_duplicate_endtokens(self): + tokens = [ + ('reserved_identifier', 'SELECT', (0, 6)), + ('star', '*', (7, 8)), + ('endtoken', ';', (9, 10)), # first semicolon + ('endtoken', ';', (11, 12)), # duplicate semicolon to skip + ('reserved_identifier', 'FROM', (13, 17)), + ('identifier', 'my_table', (18, 26)), + ('endtoken', ';', (27, 28)), # valid semicolon to keep + ('reserved_identifier', 'SELECT', (29, 35)), + ('identifier', 'another_table', (36, 51)), + ('endtoken', ';', (52, 53)) # valid semicolon to keep + ] Review Comment: This test doesn't seem to follow the pattern of > parsed = parse_cqlsh_statements('some statement') > self.assertSequenceEqual(tokens_with_types(parsed), > [list of parsed types]) also, _SELECT * ; FROM table; SELECT table;_ if I'm reading it right, has a semi-colon between SELECT and FROM? That wouldn't be valid CQL. ########## pylib/cqlshlib/cqlhandling.py: ########## @@ -151,6 +151,39 @@ def cql_split_statements(self, text): in_batch = True return output, in_batch or in_pg_string + def group_tokens(self, items): + """ + Split an iterable into sublists, using 'endtoken' to mark the end of each sublist. + Each sublist accumulates elements until an 'endtoken' is encountered. If the sublist + consists only of a single 'endtoken', it is excluded. An empty list is added to the + result after the last 'endtoken' for cases like autocompletion. + + Parameters: + - items (iterable): An iterable of tokens, including 'endtoken' elements. + + Returns: + - list: A list of sublists, with each sublist containing tokens split by 'endtoken'. + """ + thisresult = [] + output = [] + + for i in items: + thisresult.append(i) + + # When an 'endtoken' is encountered, start a new sublist + if i[0] == 'endtoken': + # Skip adding sublist if it contains just one "endtoken" + if len(thisresult) > 1: + output.append(thisresult) # Add valid sublist + + # Start a new sublist after an 'endtoken' + thisresult = [] Review Comment: Can you use itertools.groupby(list, lambda t: t[0] == 'endtoken')? maybe rename _thisresult_ to _sublist_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org