Re: [PR] fix(operator): correct regex escaping in WordCloud operator [texera]
bobbai00 merged PR #4261: URL: https://github.com/apache/texera/pull/4261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] fix(operator): correct regex escaping in WordCloud operator [texera]
bobbai00 commented on code in PR #4261:
URL: https://github.com/apache/texera/pull/4261#discussion_r2892664382
##
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/visualization/wordCloud/WordCloudOpDesc.scala:
##
@@ -67,7 +66,7 @@ class WordCloudOpDesc extends PythonOperatorDescriptor {
def manipulateTable(): PythonTemplateBuilder = {
pyb"""
|table.dropna(subset = [$textColumn], inplace = True) #remove
missing values
- |table = table[table[$textColumn].str.contains(r'\\w',
regex=True)]
+ |table = table[table[$textColumn].str.contains(r'\w',
regex=True)]
Review Comment:
Test case added
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] fix(operator): correct regex escaping in WordCloud operator [texera]
chenlica commented on code in PR #4261:
URL: https://github.com/apache/texera/pull/4261#discussion_r2891360157
##
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/visualization/wordCloud/WordCloudOpDesc.scala:
##
@@ -67,7 +66,7 @@ class WordCloudOpDesc extends PythonOperatorDescriptor {
def manipulateTable(): PythonTemplateBuilder = {
pyb"""
|table.dropna(subset = [$textColumn], inplace = True) #remove
missing values
- |table = table[table[$textColumn].str.contains(r'\\w',
regex=True)]
+ |table = table[table[$textColumn].str.contains(r'\w',
regex=True)]
Review Comment:
Thanks. Can we introduce a test case to catch such issues earlier?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] fix(operator): correct regex escaping in WordCloud operator [texera]
bobbai00 commented on code in PR #4261:
URL: https://github.com/apache/texera/pull/4261#discussion_r2891347362
##
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/visualization/wordCloud/WordCloudOpDesc.scala:
##
@@ -67,7 +66,7 @@ class WordCloudOpDesc extends PythonOperatorDescriptor {
def manipulateTable(): PythonTemplateBuilder = {
pyb"""
|table.dropna(subset = [$textColumn], inplace = True) #remove
missing values
- |table = table[table[$textColumn].str.contains(r'\\w',
regex=True)]
+ |table = table[table[$textColumn].str.contains(r'\w',
regex=True)]
Review Comment:
This is introduced in #4189
In Scala s"..." interpolation
```
s"""
|table = table[table['$textColumn'].str.contains(r'\\w',
regex=True)]
|""".stripMargin
```
\\ is an escape sequence producing a single \. So the generated Python was
r'\w' — correct.
After using `pyb"""..."""` template:
```
pyb"""
|table = table[table[$textColumn].str.contains(r'\\w',
regex=True)]
|"""
```
In Scala triple-quoted strings (used by pyb), backslashes are literal — no
escape processing. So \\w stays as \\w, producing r'\\w' in Python, which
matches a literal \ + w instead of word characters.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] fix(operator): correct regex escaping in WordCloud operator [texera]
chenlica commented on code in PR #4261:
URL: https://github.com/apache/texera/pull/4261#discussion_r2891216492
##
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/visualization/wordCloud/WordCloudOpDesc.scala:
##
@@ -67,7 +66,7 @@ class WordCloudOpDesc extends PythonOperatorDescriptor {
def manipulateTable(): PythonTemplateBuilder = {
pyb"""
|table.dropna(subset = [$textColumn], inplace = True) #remove
missing values
- |table = table[table[$textColumn].str.contains(r'\\w',
regex=True)]
+ |table = table[table[$textColumn].str.contains(r'\w',
regex=True)]
Review Comment:
@bobbai00 Just curious, how was this problem introduced? WordCloud used to
work well. Can we add a test case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
