Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
aicam merged PR #4177: URL: https://github.com/apache/texera/pull/4177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726210252
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
}
// Create a commit in LakeFS
- val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
- )
+ val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+ repoName = repositoryName,
+ branch = "main",
+ commitMessage = s"Created dataset version: $newVersionName"
+)
+ }
Review Comment:
**We can keep this approach for now.**
It would be good to open a discussion, and would be good if @Ma77Ball can
give his input since he is working in logging. Here is the things to discuss
later:
I think this PR is really about **how we want error handling to work across
the services calls** (LakeFS now, others later): do we *force* callers to deal
with failures, or do we centralize + make it easy/consistent.
### Option 1: change defs and force handling (typed)
Return `Either[StorageError, A]` (or `F[Either[...]]`). Caller can’t ignore
it.
```scala
storage.commit(repo, branch, msg): Either[StorageError, CommitId]
storage.uploadPart(key, part, bytes): Either[StorageError, Unit]
// usage
storage.commit(repo, branch, msg).map(Ok(_)).leftMap(mapToHttp)
```
Pros: type-safe + enforced. Cons: signature churn + lots of callsite updates
+ need `Error` mapping.
### Option 2: implicit + `Dynamic` `.safe` wrapper (ergonomic, but risky)
**This option adds automatic logs**
Nice callsite but reflection + weak typing (`Any`) + possible runtime method
lookup issues. Notice this method does not require any changes to the def like
commit, or uploadPart but it is optional to the caller to use `.safe` and
Changing the return type (2.1)
```scala
lakefs.safe.commit(repo, branch, msg) // Either[Throwable, Any]
datasetResource.safe.uploadPart(key, part, bs) // Either[Throwable, Any]
```
Keeping the return type and throwing (2.2)
```scala
lakefs.safe.commit(repo, branch, msg)
datasetResource.safe.uploadPart(key, part, bs)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on PR #4177: URL: https://github.com/apache/texera/pull/4177#issuecomment-3797698027 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726211452
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+ Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+ case 400 => new BadRequestException(errorResponse(400))
+ case 401 => new NotAuthorizedException(errorResponse(401))
+ case 403 => new ForbiddenException(errorResponse(403))
+ case 404 => new NotFoundException(errorResponse(404))
+ case 409 => new WebApplicationException(errorResponse(409))
+ case 410 => new WebApplicationException(errorResponse(410))
+ case 412 => new WebApplicationException(errorResponse(412))
+ case 416 => new WebApplicationException(errorResponse(416))
+ case 420 => new WebApplicationException(errorResponse(420))
+ case _ => new InternalServerErrorException(errorResponse(500))
+})
Review Comment:
I think it will be good if possible, if not this approach is good enough.
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726211807
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
Review Comment:
Got it thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726210252
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
}
// Create a commit in LakeFS
- val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
- )
+ val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+ repoName = repositoryName,
+ branch = "main",
+ commitMessage = s"Created dataset version: $newVersionName"
+)
+ }
Review Comment:
**We can keep this approach for now.**
It would be good to open a discussion, and would be good if @Ma77Ball can
give his input since he is working in logging. Here is the things to discuss
later:
I think this PR is really about **how we want error handling to work across
the services calls** (LakeFS now, others later): do we *force* callers to deal
with failures, or do we centralize + make it easy/consistent.
### Option 1: change defs and force handling (typed)
Return `Either[StorageError, A]` (or `F[Either[...]]`). Caller can’t ignore
it.
```scala
storage.commit(repo, branch, msg): Either[StorageError, CommitId]
storage.uploadPart(key, part, bytes): Either[StorageError, Unit]
// usage
storage.commit(repo, branch, msg).map(Ok(_)).leftMap(mapToHttp)
```
Pros: type-safe + enforced. Cons: signature churn + lots of callsite updates
+ need `Error` mapping.
### Option 2: implicit + `Dynamic` `.safe` wrapper (ergonomic, but risky)
Nice callsite but reflection + weak typing (`Any`) + possible runtime method
lookup issues. Notice this method does not require any changes to the def like
commit, or uploadPart but it is optional to the caller to use `.safe` and
Changing the return type (2.1)
```scala
lakefs.safe.commit(repo, branch, msg) // Either[Throwable, Any]
datasetResource.safe.uploadPart(key, part, bs) // Either[Throwable, Any]
```
Keeping the return type and throwing (2.2)
```scala
lakefs.safe.commit(repo, branch, msg)
datasetResource.safe.uploadPart(key, part, bs)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726139588
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
}
// Create a commit in LakeFS
- val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
- )
+ val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+ repoName = repositoryName,
+ branch = "main",
+ commitMessage = s"Created dataset version: $newVersionName"
+)
+ }
Review Comment:
Thanks for the suggestion. I currently centralized the LakeFS exception→HTTP
mapping in the file-service (LakeFSExceptionHandler). The safeCall approach
also looks clean and would centralize handling at the storage layer near
LakeFSStorageClient. Would you recommend switching to
LakeFSStorageClient.safeCall(...) so all callers share the same handler?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726128321
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+ Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+ case 400 => new BadRequestException(errorResponse(400))
+ case 401 => new NotAuthorizedException(errorResponse(401))
+ case 403 => new ForbiddenException(errorResponse(403))
+ case 404 => new NotFoundException(errorResponse(404))
+ case 409 => new WebApplicationException(errorResponse(409))
+ case 410 => new WebApplicationException(errorResponse(410))
+ case 412 => new WebApplicationException(errorResponse(412))
+ case 416 => new WebApplicationException(errorResponse(416))
+ case 420 => new WebApplicationException(errorResponse(420))
+ case _ => new InternalServerErrorException(errorResponse(500))
+})
Review Comment:
Good point. For now we're using a small fallback map for the common LakeFS
status codes, which are fairly stable. Please let me know if you’d recommend
dynamically extracting codes from the ApiException instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726124590
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+ Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+ case 400 => new BadRequestException(errorResponse(400))
+ case 401 => new NotAuthorizedException(errorResponse(401))
+ case 403 => new ForbiddenException(errorResponse(403))
+ case 404 => new NotFoundException(errorResponse(404))
+ case 409 => new WebApplicationException(errorResponse(409))
+ case 410 => new WebApplicationException(errorResponse(410))
+ case 412 => new WebApplicationException(errorResponse(412))
+ case 416 => new WebApplicationException(errorResponse(416))
+ case 420 => new WebApplicationException(errorResponse(420))
+ case _ => new InternalServerErrorException(errorResponse(500))
+})
+ }
+
+ /**
+* Wraps a LakeFS call with centralized error handling.
+*/
+ private def withLakeFSErrorHandling[T](lakeFsCall: => T): T = {
+try {
+ lakeFsCall
+} catch {
+ case e: io.lakefs.clients.sdk.ApiException => handleLakeFSException(e)
+}
+ }
+
Review Comment:
Fixed, `LakeFSExceptionHandler` is added under `service/util`. It’s placed
near the DatasetResource.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726122200
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
Review Comment:
Fixed. The updated handler no longer parse the response body, so this
parsing line is removed.
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
Review Comment:
Fixed. The updated handler no longer parse the response body, so this
parsing line is removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726120633
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
Review Comment:
Good point. The message may not be very sensitive in most cases, but
returning `e.getMessage` to the frontend could still leak internal details. I
have updated the handler to return simplified `fallbackMessages` to clients,
while keeping the original LakeFS error details in server logs for debugging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2725032491
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
Review Comment:
I see we are creating a new ObjectMapper everytime, can we avoid it and
reuse it somehow to reduce the tiny overhead?
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
}
// Create a commit in LakeFS
- val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
- )
+ val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+ repoName = repositoryName,
+ branch = "main",
+ commitMessage = s"Created dataset version: $newVersionName"
+)
+ }
Review Comment:
@xuang7 There is another approach with scala implicit, it will look like the
next code snippet, in your preference, is this cleaner?
```
val commit =
LakeFSStorageClient.safeCall(_.createCommit(
repoName = repositoryName,
branch = "main",
commitMessage = s"Created dataset version: $newVersionName"
))
```
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
Review Comment:
Is there any security issue if we return `e.getMessage`?
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+ Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+ case 400 => new BadRequestException(errorResponse(400))
+ case 401 => new NotAuthorizedException(errorResponse(401))
+ case 403 => new ForbiddenException(errorResponse(403))
+ case 404 => new NotFoundException(errorResponse(404))
+ case 409 => new WebApplicationException(errorResponse(409))
+ case 410 => new WebApplicationException(errorResponse(410))
+ case 412 => new WebApplicationException(errorResponse(412))
+ case 416 => new WebApplicationException(errorResponse(416))
+ case 420 => new WebApplicationException(errorResponse(420))
+ case _ => new InternalServerErrorException(errorResponse(500))
+})
Review Comment:
Can the codes be extracted dynamically from code (Maybe from LakeFs) so for
future changes from their side there is no need to maintain this piece of code?
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException):
Nothing = {
+val rawBody =
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+ Try(new
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
Review Comment:
Instead of `.get("message")`, can we use `.path("message")` or similar if
available in case the rawBody has nested structure? Or LakeFS exceptions always
use the same structure?
##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
normalized
}
+ /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+ private def h
Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]
chenlica commented on PR #4177: URL: https://github.com/apache/texera/pull/4177#issuecomment-3795280471 @carloea2 Please review it first before @aicam does it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
