David Knupp has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15642 )

Change subject: IMPALA-9362: Upgrade sqlparse 0.1.19 -> 0.3.1
......................................................................


Patch Set 6:

> Patch Set 6:
>
> I think we can probably live with a 10-15% regression for extreme cases like 
> that, it would be a bit heroic to try to fix it in sqlparse.

I was more wondering if there were things that we could do in our code. It does 
seem like sqlparse itself is slower, but just doing something like this before 
stripping leading comments:

    # Only call sqlparse if indicated
    if sql.lstrip() and sql.lstrip()[0] in ('-', '/'):
      stack = sqlparse.engine.FilterStack()
      strip_leading_comment_filter = StripLeadingCommentFilter()
      stack.stmtprocess.append(strip_leading_comment_filter)
      stack.postprocess.append(sqlparse.filters.SerializerUnicode())
      stripped_line = ''.join(stack.run(sql, 'utf-8'))
      return strip_leading_comment_filter.comment, stripped_line
    else:
      return None, sql.decode('utf-8')

...all the tests still pass, and the time improves:

# sqlparse-0.3.1
Time to parse large sql: 0.771239995956
Time to parse large sql: 0.756343841553
Time to parse large sql: 0.755437135696
Time to parse large sql: 0.760624885559
Time to parse large sql: 0.764199018478
Time to parse large sql: 0.753574132919
Time to parse large sql: 0.75120306015
Time to parse large sql: 0.753540039062
Time to parse large sql: 0.764436006546

I'm happy to do that if you think it's worth it, or I'm happy ship as is if 
you're fine with the current patch.


--
To view, visit http://gerrit.cloudera.org:8080/15642
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77a1fd5ae311634a18ee04b8c389d8a3f3a6e001
Gerrit-Change-Number: 15642
Gerrit-PatchSet: 6
Gerrit-Owner: David Knupp <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 08 Apr 2020 22:13:24 +0000
Gerrit-HasComments: No

Reply via email to