avkgh commented on a change in pull request #25654: [SPARK-28912][STREAMING]
Fixed MatchError in getCheckpointFiles()
URL: https://github.com/apache/spark/pull/25654#discussion_r320230298
##########
File path: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
##########
@@ -102,7 +102,7 @@ class Checkpoint(ssc: StreamingContext, val
checkpointTime: Time)
private[streaming]
object Checkpoint extends Logging {
val PREFIX = "checkpoint-"
- val REGEX = (PREFIX + """([\d]+)([\w\.]*)""").r
+ val REGEX = (PREFIX + """([\d]{9,})([\w\.]*)""").r
Review comment:
The intention behind this change was to skip invalid (or perhaps too old)
checkpoint files since numeric part of checkpoint file name consists of current
time in milliseconds and therefore cannot be shorter than 9 digits.
This caused some unit tests to fail because they are using ManualClock which
reports fake time allowing generation of shorter checkpoint file names like
`checkpoint-2000` (where 2000 is supposedly current time in milliseconds).
Now I consider this change in regex redundant and unnecessary since
filtering out directories and matching only the final component of a path
(p.getName) should be sufficient to prevent MatchErrors.
I will revert this change to fix unit test fails.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]