Re: [PR] [#5361] improvment(hadoop-catalog): Introduce a timeout mechanism to get Hadoop File System. [gravitino]

2024-12-04 Thread via GitHub


yuqi1129 commented on PR #5406:
URL: https://github.com/apache/gravitino/pull/5406#issuecomment-2519470208

   @jerryshao 
   Do you have any suggestions on this issue and should I proceed with it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [#5361] improvment(hadoop-catalog): Introduce a timeout mechanism to get Hadoop File System. [gravitino]

2024-10-31 Thread via GitHub


yuqi1129 commented on code in PR #5406:
URL: https://github.com/apache/gravitino/pull/5406#discussion_r1824368270


##
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##
@@ -774,6 +778,27 @@ FileSystem getFileSystem(Path path, Map 
config) throws IOExcepti
   scheme, path, fileSystemProvidersMap.keySet(), 
fileSystemProvidersMap.values()));
 }
 
-return provider.getFileSystem(path, config);
+int timeoutSeconds =
+(int)
+propertiesMetadata
+.catalogPropertiesMetadata()
+.getOrDefault(config, 
HadoopCatalogPropertiesMetadata.GET_FILESYSTEM_TIMEOUT_SECONDS);
+try {
+  AtomicReference fileSystem = new AtomicReference<>();
+  Awaitility.await()
+  .atMost(timeoutSeconds, TimeUnit.SECONDS)
+  .until(
+  () -> {
+fileSystem.set(provider.getFileSystem(path, config));

Review Comment:
   If the user sets an incorrect endpoint, the client will retry to get the 
connection for a certain amount of time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [#5361] improvment(hadoop-catalog): Introduce a timeout mechanism to get Hadoop File System. [gravitino]

2024-10-31 Thread via GitHub


jerryshao commented on code in PR #5406:
URL: https://github.com/apache/gravitino/pull/5406#discussion_r1824341558


##
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##
@@ -774,6 +778,27 @@ FileSystem getFileSystem(Path path, Map 
config) throws IOExcepti
   scheme, path, fileSystemProvidersMap.keySet(), 
fileSystemProvidersMap.values()));
 }
 
-return provider.getFileSystem(path, config);
+int timeoutSeconds =
+(int)
+propertiesMetadata
+.catalogPropertiesMetadata()
+.getOrDefault(config, 
HadoopCatalogPropertiesMetadata.GET_FILESYSTEM_TIMEOUT_SECONDS);
+try {
+  AtomicReference fileSystem = new AtomicReference<>();
+  Awaitility.await()
+  .atMost(timeoutSeconds, TimeUnit.SECONDS)
+  .until(
+  () -> {
+fileSystem.set(provider.getFileSystem(path, config));

Review Comment:
   Why it is so time-consuming to initialize the filesystem client?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [#5361] improvment(hadoop-catalog): Introduce a timeout mechanism to get Hadoop File System. [gravitino]

2024-10-31 Thread via GitHub


jerryshao commented on code in PR #5406:
URL: https://github.com/apache/gravitino/pull/5406#discussion_r1824422638


##
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##
@@ -774,6 +778,27 @@ FileSystem getFileSystem(Path path, Map 
config) throws IOExcepti
   scheme, path, fileSystemProvidersMap.keySet(), 
fileSystemProvidersMap.values()));
 }
 
-return provider.getFileSystem(path, config);
+int timeoutSeconds =
+(int)
+propertiesMetadata
+.catalogPropertiesMetadata()
+.getOrDefault(config, 
HadoopCatalogPropertiesMetadata.GET_FILESYSTEM_TIMEOUT_SECONDS);
+try {
+  AtomicReference fileSystem = new AtomicReference<>();
+  Awaitility.await()
+  .atMost(timeoutSeconds, TimeUnit.SECONDS)
+  .until(
+  () -> {
+fileSystem.set(provider.getFileSystem(path, config));

Review Comment:
   I don't think you really fix this problem without using another thread to 
create a FS and polling the status asynchronously?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [#5361] improvment(hadoop-catalog): Introduce a timeout mechanism to get Hadoop File System. [gravitino]

2024-10-31 Thread via GitHub


yuqi1129 commented on PR #5406:
URL: https://github.com/apache/gravitino/pull/5406#issuecomment-2449681635

   @jerryshao 
   Please help look if this should be included in release 0.7.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org