Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-08 Thread via GitHub


FANNG1 merged PR #5997:
URL: https://github.com/apache/gravitino/pull/5997


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-08 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r190860


##
docs/how-to-use-gvfs.md:
##
@@ -455,18 +455,18 @@ to recompile the native libraries like `libhdfs` and 
others, and completely repl
 
 ### Configuration
 
-| Configuration item   | Description   



 | Default value | Required  | Since version
|
-|--||---|---|--|
-| `server_uri` | The Gravitino server uri, e.g. 
`http://localhost:8090`.


| (none)| Yes   | 
0.6.0-incubating |
-| `metalake_name`  | The metalake name which the fileset 
belongs to. 


   | (none)| Yes   | 
0.6.0-incubating |
-| `cache_size` | The cache capacity of the Gravitino 
Virtual File System.


   | `20`  | No| 
0.6.0-incubating |  

-| `cache_expired_time` | The value of time that the cache expires 
after accessing in the Gravitino Virtual File System. The value is in 
`seconds`.  

| `3600`| No| 
0.6.0-incubating |
-| `auth_type`  | The auth type to initialize the Gravitino 
client to use with the Gravitino Virtual File System. Currently supports 
`simple` and `oauth2` auth types.   

| `simple`  | No| 
0.6.0-incubating |
-| `oauth2_server_uri`  | The auth server URI for the Gravitino 
client when using `oauth2` auth type.   


 | (none)| Yes if you use `oauth2` auth type | 
0.7.0-incubating |
-| `oauth2_credential`  | The auth credential for the Gravitino 
client when using `oauth2` auth type.   


 | (none)| Yes if you use `oauth2` auth type | 
0.7.0-incubating |
-| `oauth2_path`| The auth server path for the Gravitino 
client when using `oauth2` auth type. Please remove the first slash `/` from 
the path, for example `oauth/token`.

   | (none)| Yes if you use `oauth2` auth type | 
0.7.0-incubating |
-| `oauth2_scope`   | The auth scope for the Gravitino client 
when using `oauth2` auth type with the Gravitino Virtual File System.   


   | (none)| Yes if you use `oauth2` auth type | 
0.7.0-incubating |
-| `credential_expired_time_ratio`  | The ratio of expiration time for 
credential from Gravitino. This is used in the

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906592946


##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
 
 GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
 GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For 
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the 
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and 
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = 
"credential_expired_time_ratio"

Review Comment:
   There is already a configuration item `cache_expired_time` to control the 
expiration time of the value in the cache, I'm not very in favor of 
`credential_cache_expire_ratio`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


jerryshao commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576923879

   You can make a final call if you think it's OK. @FANNG1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906587776


##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
 
 GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
 GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For 
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the 
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and 
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = 
"credential_expired_time_ratio"

Review Comment:
   `credential will be considered as expired` seems not correct,  it's 
something like remove credential from the cache, use 
`credential_cache_expire_ratio` like sever side?



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9

Review Comment:
   I'm ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906587776


##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
 
 GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
 GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For 
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the 
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and 
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = 
"credential_expired_time_ratio"

Review Comment:
   `credential will be considered as expired` seems not correct,  it's 
something like remove credential from the cache, use 
`credential-cache-expire-ratio` like sever side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576914185

   LGTM except minor comments, @jerryshao do you have time to review again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906580485


##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
 
 GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
 GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For 
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the 
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and 
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = 
"credential_expired_time_ratio"

Review Comment:
   It's also okay for me, I can make changes. Could you please help verify the 
accuracy of the description above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906579610


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9

Review Comment:
   Setting the default value to 0.5 as suggested?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906577156


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9

Review Comment:
   I still think `0.9` is too high for client



##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
 
 GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
 GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For 
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the 
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and 
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = 
"credential_expired_time_ratio"

Review Comment:
   credential_expire_ratio?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576830961

   All have been resolved. @FANNG1 please help to take a look again, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906528594


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   I have added a new configuration item as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906528134


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url = 
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906527983


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   I have added some details on why we need to overwrite it. 



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906328456


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")

Review Comment:
   done



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2575308487

   > For S3, I have tested again. Both Java and Python clients can omit the 
endpoint, I'm hesitant to change this old behaviour, maybe another PR to 
optimize it will be proper.
   
   After this PR, does client side need to configuration some catalog 
properties like `region`, `endpoint`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2575304910

   > I have said that I just want to keep the configuration for OSS in Java and 
Python consistent 0.7.0
   
   I agree OSS endpoint is required, What I doubt is in same cases you use OSS 
endpoint from the server side, in another condition you use OOS endpoint from 
the client side.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905456155


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   If change to 0.9, Only 6 mins are left to use the filesystem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905443289


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   If you couldn't get credentials from the server like 
`CatalogNotInUseException` , what's the action do you expect?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905112047


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   Can you clearly describe the issue? If I'm not wrong, you mean I should only 
catch `NoSuchCredentialException` and should not catch 
`CatalogNotInUseException`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2574710106

   For S3, I have tested again. Both Java and Python clients can omit the 
endpoint, I'm hesitant to change this old behaviour, maybe another PR to 
optimize it will be proper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2574695190

   @FANNG1 
   
   https://github.com/user-attachments/assets/6fa9cd71-36d8-4810-8b43-ab9f190035bc";
 />
   
   You can comment L100 in `GravitinoVirtualFileSystemOSSIT` and see what 
happens 
   
   
   https://github.com/user-attachments/assets/49c7575e-aa50-42dc-8961-f3abc4225350";
 />
   
   I have said that I just want to keep the configuration for OSS in Java and 
Python consistent. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905080858


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   The exception you catch is `(NoSuchCredentialException, 
CatalogNotInUseException)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905046331


##
clients/client-python/tests/integration/test_gvfs_with_s3_credential.py:
##
@@ -0,0 +1,151 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from s3fs import S3FileSystem
+
+from gravitino import (
+gvfs,
+GravitinoClient,
+Catalog,
+Fileset,
+)
+from gravitino.filesystem.gvfs_config import GVFSConfig
+from tests.integration.test_gvfs_with_s3 import TestGvfsWithS3
+
+logger = logging.getLogger(__name__)
+
+
+def s3_with_credential_is_configured():
+return all(
+[
+os.environ.get("S3_STS_ACCESS_KEY_ID") is not None,
+os.environ.get("S3_STS_SECRET_ACCESS_KEY") is not None,
+os.environ.get("S3_STS_ENDPOINT") is not None,
+os.environ.get("S3_STS_BUCKET_NAME") is not None,
+os.environ.get("S3_STS_REGION") is not None,
+os.environ.get("S3_STS_ROLE_ARN") is not None,
+]
+)
+
+
[email protected](s3_with_credential_is_configured(), "S3 is not 
configured.")
+class TestGvfsWithS3Credential(TestGvfsWithS3):
+# Before running this test, please set the make sure aws-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+s3_access_key = os.environ.get("S3_STS_ACCESS_KEY_ID")
+s3_secret_key = os.environ.get("S3_STS_SECRET_ACCESS_KEY")
+s3_endpoint = os.environ.get("S3_STS_ENDPOINT")
+bucket_name = os.environ.get("S3_STS_BUCKET_NAME")
+s3_sts_region = os.environ.get("S3_STS_REGION")
+s3_role_arn = os.environ.get("S3_STS_ROLE_ARN")
+
+metalake_name: str = "TestGvfsWithS3Credential_metalake" + str(randint(1, 
1))
+
+def setUp(self):
+self.options = {
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_ACCESS_KEY}": self.s3_access_key,
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_SECRET_KEY}": self.s3_secret_key,
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_ENDPOINT}": self.s3_endpoint,
+}
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "s3",
+"s3-access-key-id": cls.s3_access_key,
+"s3-secret-access-key": cls.s3_secret_key,
+"s3-endpoint": cls.s3_endpoint,
+"s3-region": cls.s3_sts_region,
+"s3-role-arn": cls.s3_role_arn,
+"credential-providers": "s3-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"s3a://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = S3FileSystem(
+key=cls.s3_access_key,
+secret=cls.s3_secret_key,
+endpoint_url=cls.s3_endpoint,
+)
+
+# The following tests are copied from 
tests/integration/test_gvfs_with_s3.py, with some modifications as
+# `mkdir` and `makedirs` have different behavior in the S3, other cloud 
storage like GCS, ABS, and OSS.
+# are similar.
+def 

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905061484


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   what's the the influence if we adjust the value from 0.5 to 0.9?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905059319


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   Another point, I think you have similar comments in the Java implementation, 
from 0.5 to 0.9. What are your considerations regarding this point?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905048644


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   Directly throw exception??? What if the fieldset has not set credentials for 
old version?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905047524


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   Do you mean the credential should be refreshed within 30 minutes, even if 
the expiration time is an hour? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905045765


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")

Review Comment:
   remove `STS` from the enviroment variables?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905044983


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +896,94 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+def _file_system_is_not_expired(self, expire_time: int):

Review Comment:
   it's okay for me. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-07 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905030532


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
 raise GravitinoRuntimeException(
 "Service account key is not found in the options."
 )
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
 )
 
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)

Review Comment:
   How about throw exception here?



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return 
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9

Review Comment:
   `0.9` seems too high, since the filesystem is used out of the control of 
GVFS,  how about make it configurable with default `0.5`?



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +896,94 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+def _file_system_is_not_expired(self, expire_time: int):

Review Comment:
   is `_file_system_expired` more clear?



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904843633


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:

Review Comment:
   OK, I have changed it to a constant value.  I just follow what it did in 
`S3SecretKeyCredential`, `OSSSecretKeyCredential`, `AzureAccountKeyCredential`, 
there are all use `0` directly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904838263


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   I can add some description about the detailed reason, not only `mkdir`, 
other operation like `makedirs`, `ls` also have similar issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904838263


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   I can added it, not only `mkdir`, other operation like `makedirs`, `ls` also 
have similar issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904837268


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,

Review Comment:
   Using S3FileSystem does not need endpoint parameters, but when applying GVFS 
to padas (or other third-party application), it's required, this has been 
tested by jerry. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904832615


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url = 
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)

Review Comment:
   The former one is from the catalog (fetching from Gravitino server) and the 
other is set by the user from the client.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904830374


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,

Review Comment:
   Yes, aliyun OSS is required, but this is aws 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904829325


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url = 
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)

Review Comment:
   If they are the same, why use two variables?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904803955


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1173,58 @@ def _get_abs_filesystem(self):
 "ABS account key is not found in the options."
 )
 
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+sys.maxsize,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
 )
 
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904803854


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,

Review Comment:
   For Python client, the endpoint is optional, but for Java clients, this is a 
required value to ensure consistency.
   
   ```text
   java.lang.IllegalArgumentException: Aliyun OSS endpoint should not be null 
or empty. Please set proper endpoint with 'fs.oss.endpoint'.
at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:150)
at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:349)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at 
org.apache.gravitino.filesystem.hadoop.integration.test.GravitinoVirtualFileSystemIT.testDelete(GravitinoVirtualFileSystemIT.java:238)
   ```



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r190479


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url = 
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)

Review Comment:
   They are the same thing, what do you mean here, `oss_endpoint_url` is not a 
good name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904246261


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   could you add detailed reason in the code about why adding the test in 
separate file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904244945


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:

Review Comment:
   for local and HDFS, you return 0 for expire time, suggest defining a 
constant to represent expire time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904234764


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,

Review Comment:
   according to the pydoc, `aws_endpoint_url` seems not required? 
   ```
   endpoint_url : string (None)
   Use this endpoint_url, if specified. Needed for connecting to non-AWS
   S3 buckets. Takes precedence over `endpoint_url` in client_kwargs.
   ```



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url = 
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)

Review Comment:
   what's the difference of `oss_endpoint_url ` and `oss_endpoint`?



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
 "AWS endpoint url is not found in the options."
 )
 
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
 )
 
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if cr

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904235369


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   GCS has different behaviour compared to hdfs, so we to overwrite it. OSS and 
S3 are similar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904229503


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
 if storage_type == StorageType.HDFS:
 fs_class = 
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
 elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
 elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
 else:
 raise GravitinoRuntimeException(
 f"Storage type: `{storage_type}` doesn't support now."
 )
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
 finally:
 write_lock.release()
 
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
 # get 'service-account-key' from gcs_options, if the key is not found, 
throw an exception
 

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904222921


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   Could you provide the reason?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904073910


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
 if storage_type == StorageType.HDFS:
 fs_class = 
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
 elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
 elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
 else:
 raise GravitinoRuntimeException(
 f"Storage type: `{storage_type}` doesn't support now."
 )
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
 finally:
 write_lock.release()
 
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
 # get 'service-account-key' from gcs_options, if the key is not found, 
throw an exception
   

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904073910


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
 if storage_type == StorageType.HDFS:
 fs_class = 
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
 elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
 elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
 elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
 else:
 raise GravitinoRuntimeException(
 f"Storage type: `{storage_type}` doesn't support now."
 )
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
 finally:
 write_lock.release()
 
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier: 
NameIdentifier):
+try:
+fileset: GenericFileset = 
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2), 
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time = 
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
 # get 'service-account-key' from gcs_options, if the key is not found, 
throw an exception
   

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904072789


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   TestGvfsWithHDFS already has the method and 
[test_gvfs_with_gcs_credential.py](https://github.com/apache/gravitino/pull/5997/files/bd8618ec3c446ee3e9df4ce840fc2c4306b257a7#diff-c1af14e87a421259945e3e79d9c2f271743395b0a70eb1a957757d8028e29a19)
 just overwrite it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904070502


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):

Review Comment:
   ok, I will create a issue to track this problem as it's just a improvement. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904069465


##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
 finally:
 read_lock.release()
 
 write_lock = self._cache_lock.gen_wlock()
 try:
 write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
+
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:

Review Comment:
   I have handled it. see 
   https://github.com/user-attachments/assets/c25ec5de-6d47-45f3-a6fb-84fcc02d89c9";
 />
   
   
   > nd could you use a function to check whether the token expires?
   
   ok
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-06 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904042397


##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not 
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has 
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1, 
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):

Review Comment:
   why not move these tests to `TestGvfsWithHDFS` ? 



##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: 
NameIdentifier):
 finally:
 write_lock.release()
 
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
 storage_type = self._recognize_storage_type(actual_file_location)
 read_lock = self._cache_lock.gen_rlock()
 try:
 read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] = 
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
 )
 if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]

Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-03 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901707988


##
clients/client-python/gravitino/api/credential/adls_token_credential.py:
##
@@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC):
 
 ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token"
 ADLS_DOMAIN: str = "dfs.core.windows.net"
-_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"
-_SAS_TOKEN: str = "adls-sas-token"
+STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"

Review Comment:
   How can I get constant value?  use method `get`or just create a new 
constants with the same value? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-03 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901574255


##
clients/client-python/gravitino/api/credential/adls_token_credential.py:
##
@@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC):
 
 ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token"
 ADLS_DOMAIN: str = "dfs.core.windows.net"
-_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"
-_SAS_TOKEN: str = "adls-sas-token"
+STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"

Review Comment:
   it shoud be `same` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-02 Thread via GitHub


yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901535397


##
clients/client-python/gravitino/api/credential/adls_token_credential.py:
##
@@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC):
 
 ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token"
 ADLS_DOMAIN: str = "dfs.core.windows.net"
-_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"
-_SAS_TOKEN: str = "adls-sas-token"
+STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"

Review Comment:
   `seem to other credentials.`, What's the meaning of this sentences?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2025-01-02 Thread via GitHub


FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901531443


##
clients/client-python/gravitino/api/credential/adls_token_credential.py:
##
@@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC):
 
 ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token"
 ADLS_DOMAIN: str = "dfs.core.windows.net"
-_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"
-_SAS_TOKEN: str = "adls-sas-token"
+STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name"

Review Comment:
   please don't use the internal `STORAGE_ACCOUNT_NAME`  properties directly.  
seem to other credentials.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]

2024-12-31 Thread via GitHub


yuqi1129 commented on PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2566488260

   @FANNG1 @jerryshao 
   Please help to review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]