Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-27 Thread via GitHub


aicam merged PR #4177:
URL: https://github.com/apache/texera/pull/4177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726210252


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
   }
 
   // Create a commit in LakeFS
-  val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
-  )
+  val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+  repoName = repositoryName,
+  branch = "main",
+  commitMessage = s"Created dataset version: $newVersionName"
+)
+  }

Review Comment:
   **We can keep this approach for now.** 
   
   It would be good to open a discussion, and would be good if @Ma77Ball can 
give his input since he is working in logging. Here is the things to discuss 
later:
   
   I think this PR is really about **how we want error handling to work across 
the services calls** (LakeFS now, others later): do we *force* callers to deal 
with failures, or do we centralize + make it easy/consistent.
   
   ### Option 1: change defs and force handling (typed)
   
   Return `Either[StorageError, A]` (or `F[Either[...]]`). Caller can’t ignore 
it.
   
   ```scala
   storage.commit(repo, branch, msg): Either[StorageError, CommitId]
   storage.uploadPart(key, part, bytes): Either[StorageError, Unit]
   
   // usage
   storage.commit(repo, branch, msg).map(Ok(_)).leftMap(mapToHttp)
   ```
   
   Pros: type-safe + enforced. Cons: signature churn + lots of callsite updates 
+ need `Error` mapping.
   
   ### Option 2: implicit + `Dynamic` `.safe` wrapper (ergonomic, but risky)
   
   **This option adds automatic logs**
   Nice callsite but reflection + weak typing (`Any`) + possible runtime method 
lookup issues. Notice this method does not require any changes to the def like 
commit, or uploadPart but it is optional to the caller to use `.safe` and
   
   Changing the return type (2.1)
   ```scala
   lakefs.safe.commit(repo, branch, msg)  // Either[Throwable, Any]
   datasetResource.safe.uploadPart(key, part, bs) // Either[Throwable, Any]
   ```
   Keeping the return type and throwing (2.2)
   ```scala
   lakefs.safe.commit(repo, branch, msg)
   datasetResource.safe.uploadPart(key, part, bs)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on PR #4177:
URL: https://github.com/apache/texera/pull/4177#issuecomment-3797698027

   LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726211452


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+  Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+  case 400 => new BadRequestException(errorResponse(400))
+  case 401 => new NotAuthorizedException(errorResponse(401))
+  case 403 => new ForbiddenException(errorResponse(403))
+  case 404 => new NotFoundException(errorResponse(404))
+  case 409 => new WebApplicationException(errorResponse(409))
+  case 410 => new WebApplicationException(errorResponse(410))
+  case 412 => new WebApplicationException(errorResponse(412))
+  case 416 => new WebApplicationException(errorResponse(416))
+  case 420 => new WebApplicationException(errorResponse(420))
+  case _   => new InternalServerErrorException(errorResponse(500))
+})

Review Comment:
   I think it will be good if possible, if not this approach is good enough. 
Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726211807


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)

Review Comment:
   Got it thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726210252


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
   }
 
   // Create a commit in LakeFS
-  val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
-  )
+  val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+  repoName = repositoryName,
+  branch = "main",
+  commitMessage = s"Created dataset version: $newVersionName"
+)
+  }

Review Comment:
   **We can keep this approach for now.** 
   
   It would be good to open a discussion, and would be good if @Ma77Ball can 
give his input since he is working in logging. Here is the things to discuss 
later:
   
   I think this PR is really about **how we want error handling to work across 
the services calls** (LakeFS now, others later): do we *force* callers to deal 
with failures, or do we centralize + make it easy/consistent.
   
   ### Option 1: change defs and force handling (typed)
   
   Return `Either[StorageError, A]` (or `F[Either[...]]`). Caller can’t ignore 
it.
   
   ```scala
   storage.commit(repo, branch, msg): Either[StorageError, CommitId]
   storage.uploadPart(key, part, bytes): Either[StorageError, Unit]
   
   // usage
   storage.commit(repo, branch, msg).map(Ok(_)).leftMap(mapToHttp)
   ```
   
   Pros: type-safe + enforced. Cons: signature churn + lots of callsite updates 
+ need `Error` mapping.
   
   ### Option 2: implicit + `Dynamic` `.safe` wrapper (ergonomic, but risky)
   
   Nice callsite but reflection + weak typing (`Any`) + possible runtime method 
lookup issues. Notice this method does not require any changes to the def like 
commit, or uploadPart but it is optional to the caller to use `.safe` and
   
   Changing the return type (2.1)
   ```scala
   lakefs.safe.commit(repo, branch, msg)  // Either[Throwable, Any]
   datasetResource.safe.uploadPart(key, part, bs) // Either[Throwable, Any]
   ```
   Keeping the return type and throwing (2.2)
   ```scala
   lakefs.safe.commit(repo, branch, msg)
   datasetResource.safe.uploadPart(key, part, bs)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726139588


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
   }
 
   // Create a commit in LakeFS
-  val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
-  )
+  val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+  repoName = repositoryName,
+  branch = "main",
+  commitMessage = s"Created dataset version: $newVersionName"
+)
+  }

Review Comment:
   Thanks for the suggestion. I currently centralized the LakeFS exception→HTTP 
mapping in the file-service (LakeFSExceptionHandler). The safeCall approach 
also looks clean and would centralize handling at the storage layer near 
LakeFSStorageClient. Would you recommend switching to 
LakeFSStorageClient.safeCall(...) so all callers share the same handler?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726128321


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+  Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+  case 400 => new BadRequestException(errorResponse(400))
+  case 401 => new NotAuthorizedException(errorResponse(401))
+  case 403 => new ForbiddenException(errorResponse(403))
+  case 404 => new NotFoundException(errorResponse(404))
+  case 409 => new WebApplicationException(errorResponse(409))
+  case 410 => new WebApplicationException(errorResponse(410))
+  case 412 => new WebApplicationException(errorResponse(412))
+  case 416 => new WebApplicationException(errorResponse(416))
+  case 420 => new WebApplicationException(errorResponse(420))
+  case _   => new InternalServerErrorException(errorResponse(500))
+})

Review Comment:
   Good point. For now we're using a small fallback map for the common LakeFS 
status codes, which are fairly stable. Please let me know if you’d recommend 
dynamically extracting codes from the ApiException instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726124590


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+  Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+  case 400 => new BadRequestException(errorResponse(400))
+  case 401 => new NotAuthorizedException(errorResponse(401))
+  case 403 => new ForbiddenException(errorResponse(403))
+  case 404 => new NotFoundException(errorResponse(404))
+  case 409 => new WebApplicationException(errorResponse(409))
+  case 410 => new WebApplicationException(errorResponse(410))
+  case 412 => new WebApplicationException(errorResponse(412))
+  case 416 => new WebApplicationException(errorResponse(416))
+  case 420 => new WebApplicationException(errorResponse(420))
+  case _   => new InternalServerErrorException(errorResponse(500))
+})
+  }
+
+  /**
+* Wraps a LakeFS call with centralized error handling.
+*/
+  private def withLakeFSErrorHandling[T](lakeFsCall: => T): T = {
+try {
+  lakeFsCall
+} catch {
+  case e: io.lakefs.clients.sdk.ApiException => handleLakeFSException(e)
+}
+  }
+

Review Comment:
   Fixed, `LakeFSExceptionHandler` is added under `service/util`. It’s placed 
near the DatasetResource.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726122200


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)

Review Comment:
   Fixed. The updated handler no longer parse the response body, so this 
parsing line is removed.



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)

Review Comment:
   Fixed. The updated handler no longer parse the response body, so this 
parsing line is removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


xuang7 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2726120633


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)

Review Comment:
   Good point. The message may not be very sensitive in most cases, but 
returning `e.getMessage` to the frontend could still leak internal details. I 
have updated the handler to return simplified `fallbackMessages` to clients, 
while keeping the original LakeFS error details in server logs for debugging.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-25 Thread via GitHub


carloea2 commented on code in PR #4177:
URL: https://github.com/apache/texera/pull/4177#discussion_r2725032491


##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)

Review Comment:
   I see we are creating a new ObjectMapper everytime, can we avoid it and 
reuse it somehow to reduce the tiny overhead?



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -384,11 +428,13 @@ class DatasetResource {
   }
 
   // Create a commit in LakeFS
-  val commit = LakeFSStorageClient.createCommit(
-repoName = repositoryName,
-branch = "main",
-commitMessage = s"Created dataset version: $newVersionName"
-  )
+  val commit = withLakeFSErrorHandling {
+LakeFSStorageClient.createCommit(
+  repoName = repositoryName,
+  branch = "main",
+  commitMessage = s"Created dataset version: $newVersionName"
+)
+  }

Review Comment:
   @xuang7 There is another approach with scala implicit, it will look like the 
next code snippet, in your preference, is this cleaner?
   ```
   val commit =
 LakeFSStorageClient.safeCall(_.createCommit(
   repoName = repositoryName,
   branch = "main",
   commitMessage = s"Created dataset version: $newVersionName"
 ))
   ```



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)

Review Comment:
   Is there any security issue if we return `e.getMessage`?



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)
+
+def errorResponse(status: Int): Response =
+  Response
+.status(status)
+.entity(Map("message" -> message).asJava)
+.`type`(MediaType.APPLICATION_JSON)
+.build()
+
+throw (e.getCode match {
+  case 400 => new BadRequestException(errorResponse(400))
+  case 401 => new NotAuthorizedException(errorResponse(401))
+  case 403 => new ForbiddenException(errorResponse(403))
+  case 404 => new NotFoundException(errorResponse(404))
+  case 409 => new WebApplicationException(errorResponse(409))
+  case 410 => new WebApplicationException(errorResponse(410))
+  case 412 => new WebApplicationException(errorResponse(412))
+  case 416 => new WebApplicationException(errorResponse(416))
+  case 420 => new WebApplicationException(errorResponse(420))
+  case _   => new InternalServerErrorException(errorResponse(500))
+})

Review Comment:
   Can the codes be extracted dynamically from code (Maybe from LakeFs) so for 
future changes from their side there is no need to maintain this piece of code?



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def handleLakeFSException(e: io.lakefs.clients.sdk.ApiException): 
Nothing = {
+val rawBody = 
Option(e.getResponseBody).filter(_.nonEmpty).getOrElse(e.getMessage)
+
+val message =
+  Try(new 
ObjectMapper().readTree(rawBody).get("message").asText()).getOrElse(rawBody)

Review Comment:
   Instead of `.get("message")`, can we use `.path("message")` or similar if 
available in case the rawBody has nested structure? Or LakeFS exceptions always 
use the same structure?



##
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala:
##
@@ -175,6 +176,47 @@ object DatasetResource {
 normalized
   }
 
+  /**
+* Converts LakeFS ApiException to appropriate HTTP exception
+*/
+  private def h

Re: [PR] feat: add centralized LakeFS error handling for multipart upload and dataset version operations [texera]

2026-01-24 Thread via GitHub


chenlica commented on PR #4177:
URL: https://github.com/apache/texera/pull/4177#issuecomment-3795280471

   @carloea2 Please review it first before @aicam does it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]