[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41705: [SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails

via GitHub Wed, 05 Jul 2023 20:03:30 -0700


HeartSaVioR commented on code in PR #41705:
URL: https://github.com/apache/spark/pull/41705#discussion_r1253862022



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -133,6 +133,48 @@
       "Could not load Protobuf class with name <protobufClassName>. 
<explanation>."
     ]
   },
+  "CANNOT_LOAD_STATE_STORE" : {
+    "message" : [
+      "An error occurred during loading state."
+    ],
+    "subClass" : {
+      "CANNOT_READ_CHECKPOINT" : {
+        "message" : [
+          "Cannot read RocksDB checkpoint metadata of version <versionLine>"
+        ]
+      },
+      "CANNOT_READ_DELTA_FILE" : {
+        "message" : [
+          "Error reading delta file <fileToRead> of <clazz>: <message>"
+        ]
+      },
+      "CANNOT_READ_SNAPSHOT_FILE" : {
+        "message" : [
+          "Error reading snapshot file <fileToRead> of <clazz>: <message>"
+        ]
+      },
+      "CANNOT_READ_STREAMING_STATE_FILE" : {
+        "message" : [
+          "Error reading streaming state file of <fileToRead> does not exist. 
If the stream job is restarted with a new or updated state operation, please 
create a new checkpoint location or clear the existing checkpoint location."
+        ]
+      },
+      "UNEXPECTED_FILE_SIZE" : {
+        "message" : [
+          "Copied <dfsFile> to <localFile>, expected <expectedSize> bytes, 
found <localFileSize> bytes."
+        ]
+      },
+      "UNRELEASED_THREAD_ERROR": {
+        "message" : [
+          "<loggingId>: RocksDB instance could not be acquired by 
<newAcquiredThreadInfo> as it was not released by <acquiredThreadInfo> after 
<timeWaitedMs> ms.",
+          "Thread holding the lock has trace: <stackTraceOutput>"
+        ]
+      },
+      "WRAPPER" : {
+        "message" : [""]
+      }
+    },
+    "sqlState" : "40000"

Review Comment:
   These are not about inputs and outputs from the point of end-to-end query. 
You can imagine that there is an operator which have to retain accumulators 
over the streaming query's lifetime (I'm over-simplifying but I guess this is 
more SQL friendly), and we are checkpointing accumulators for that microbatch 
to the durable (mostly remote) file system.
   
   RocksDB is what users pick up as local storage on retaining accumulators. 
Picking up in-memory map is also feasible. It's just that when Spark runs a 
microbatch, Spark needs to load the state store for specific microbatch (to 
continue accumulating), which would involve downloading the files from the 
remote file system, and deserializing them, and loading the accumulators into 
local storage. Various interactions happen in there which we want to capture 
these cases with different sub-categories.
   
   So it's not directly related to the data users read from/write to, but the 
internal data maintained by the streaming query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41705: [SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails

Reply via email to