Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 merged PR #5997: URL: https://github.com/apache/gravitino/pull/5997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r190860 ## docs/how-to-use-gvfs.md: ## @@ -455,18 +455,18 @@ to recompile the native libraries like `libhdfs` and others, and completely repl ### Configuration -| Configuration item | Description | Default value | Required | Since version | -|--||---|---|--| -| `server_uri` | The Gravitino server uri, e.g. `http://localhost:8090`. | (none)| Yes | 0.6.0-incubating | -| `metalake_name` | The metalake name which the fileset belongs to. | (none)| Yes | 0.6.0-incubating | -| `cache_size` | The cache capacity of the Gravitino Virtual File System. | `20` | No| 0.6.0-incubating | -| `cache_expired_time` | The value of time that the cache expires after accessing in the Gravitino Virtual File System. The value is in `seconds`. | `3600`| No| 0.6.0-incubating | -| `auth_type` | The auth type to initialize the Gravitino client to use with the Gravitino Virtual File System. Currently supports `simple` and `oauth2` auth types. | `simple` | No| 0.6.0-incubating | -| `oauth2_server_uri` | The auth server URI for the Gravitino client when using `oauth2` auth type. | (none)| Yes if you use `oauth2` auth type | 0.7.0-incubating | -| `oauth2_credential` | The auth credential for the Gravitino client when using `oauth2` auth type. | (none)| Yes if you use `oauth2` auth type | 0.7.0-incubating | -| `oauth2_path`| The auth server path for the Gravitino client when using `oauth2` auth type. Please remove the first slash `/` from the path, for example `oauth/token`. | (none)| Yes if you use `oauth2` auth type | 0.7.0-incubating | -| `oauth2_scope` | The auth scope for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. | (none)| Yes if you use `oauth2` auth type | 0.7.0-incubating | -| `credential_expired_time_ratio` | The ratio of expiration time for credential from Gravitino. This is used in the
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906592946 ## clients/client-python/gravitino/filesystem/gvfs_config.py: ## @@ -44,3 +44,8 @@ class GVFSConfig: GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name" GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key" + +# This configuration marks the expired time of the credential. For instance, if the credential +# fetched from Gravitino server has expired time of 3600 seconds, and the credential_expired_time_ration is 0.5 +# then the credential will be considered as expired after 1800 seconds and will try to fetch a new credential. +GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = "credential_expired_time_ratio" Review Comment: There is already a configuration item `cache_expired_time` to control the expiration time of the value in the cache, I'm not very in favor of `credential_cache_expire_ratio`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
jerryshao commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576923879 You can make a final call if you think it's OK. @FANNG1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906587776
##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO =
"credential_expired_time_ratio"
Review Comment:
`credential will be considered as expired` seems not correct, it's
something like remove credential from the cache, use
`credential_cache_expire_ratio` like sever side?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9
Review Comment:
I'm ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906587776 ## clients/client-python/gravitino/filesystem/gvfs_config.py: ## @@ -44,3 +44,8 @@ class GVFSConfig: GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name" GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key" + +# This configuration marks the expired time of the credential. For instance, if the credential +# fetched from Gravitino server has expired time of 3600 seconds, and the credential_expired_time_ration is 0.5 +# then the credential will be considered as expired after 1800 seconds and will try to fetch a new credential. +GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = "credential_expired_time_ratio" Review Comment: `credential will be considered as expired` seems not correct, it's something like remove credential from the cache, use `credential-cache-expire-ratio` like sever side? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576914185 LGTM except minor comments, @jerryshao do you have time to review again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906580485 ## clients/client-python/gravitino/filesystem/gvfs_config.py: ## @@ -44,3 +44,8 @@ class GVFSConfig: GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name" GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key" + +# This configuration marks the expired time of the credential. For instance, if the credential +# fetched from Gravitino server has expired time of 3600 seconds, and the credential_expired_time_ration is 0.5 +# then the credential will be considered as expired after 1800 seconds and will try to fetch a new credential. +GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO = "credential_expired_time_ratio" Review Comment: It's also okay for me, I can make changes. Could you please help verify the accuracy of the description above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906579610
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9
Review Comment:
Setting the default value to 0.5 as suggested?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906577156
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1151,67 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
+)
+
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+
+ratio = float(
+self._options.get(
+GVFSConfig.GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO, 0.9
Review Comment:
I still think `0.9` is too high for client
##
clients/client-python/gravitino/filesystem/gvfs_config.py:
##
@@ -44,3 +44,8 @@ class GVFSConfig:
GVFS_FILESYSTEM_AZURE_ACCOUNT_NAME = "abs_account_name"
GVFS_FILESYSTEM_AZURE_ACCOUNT_KEY = "abs_account_key"
+
+# This configuration marks the expired time of the credential. For
instance, if the credential
+# fetched from Gravitino server has expired time of 3600 seconds, and the
credential_expired_time_ration is 0.5
+# then the credential will be considered as expired after 1800 seconds and
will try to fetch a new credential.
+GVFS_FILESYSTEM_CREDENTIAL_EXPIRED_TIME_RATIO =
"credential_expired_time_ratio"
Review Comment:
credential_expire_ratio?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2576830961 All have been resolved. @FANNG1 please help to take a look again, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906528594
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
I have added a new configuration item as suggested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906528134
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url =
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906527983
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
I have added some details on why we need to overwrite it.
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1906328456
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
Review Comment:
done
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2575308487 > For S3, I have tested again. Both Java and Python clients can omit the endpoint, I'm hesitant to change this old behaviour, maybe another PR to optimize it will be proper. After this PR, does client side need to configuration some catalog properties like `region`, `endpoint`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2575304910 > I have said that I just want to keep the configuration for OSS in Java and Python consistent 0.7.0 I agree OSS endpoint is required, What I doubt is in same cases you use OSS endpoint from the server side, in another condition you use OOS endpoint from the client side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905456155
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
If change to 0.9, Only 6 mins are left to use the filesystem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905443289
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
If you couldn't get credentials from the server like
`CatalogNotInUseException` , what's the action do you expect?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905112047
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
Can you clearly describe the issue? If I'm not wrong, you mean I should only
catch `NoSuchCredentialException` and should not catch
`CatalogNotInUseException`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2574710106 For S3, I have tested again. Both Java and Python clients can omit the endpoint, I'm hesitant to change this old behaviour, maybe another PR to optimize it will be proper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2574695190 @FANNG1 https://github.com/user-attachments/assets/6fa9cd71-36d8-4810-8b43-ab9f190035bc"; /> You can comment L100 in `GravitinoVirtualFileSystemOSSIT` and see what happens https://github.com/user-attachments/assets/49c7575e-aa50-42dc-8961-f3abc4225350"; /> I have said that I just want to keep the configuration for OSS in Java and Python consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905080858
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
The exception you catch is `(NoSuchCredentialException,
CatalogNotInUseException)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905046331
##
clients/client-python/tests/integration/test_gvfs_with_s3_credential.py:
##
@@ -0,0 +1,151 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from s3fs import S3FileSystem
+
+from gravitino import (
+gvfs,
+GravitinoClient,
+Catalog,
+Fileset,
+)
+from gravitino.filesystem.gvfs_config import GVFSConfig
+from tests.integration.test_gvfs_with_s3 import TestGvfsWithS3
+
+logger = logging.getLogger(__name__)
+
+
+def s3_with_credential_is_configured():
+return all(
+[
+os.environ.get("S3_STS_ACCESS_KEY_ID") is not None,
+os.environ.get("S3_STS_SECRET_ACCESS_KEY") is not None,
+os.environ.get("S3_STS_ENDPOINT") is not None,
+os.environ.get("S3_STS_BUCKET_NAME") is not None,
+os.environ.get("S3_STS_REGION") is not None,
+os.environ.get("S3_STS_ROLE_ARN") is not None,
+]
+)
+
+
[email protected](s3_with_credential_is_configured(), "S3 is not
configured.")
+class TestGvfsWithS3Credential(TestGvfsWithS3):
+# Before running this test, please set the make sure aws-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+s3_access_key = os.environ.get("S3_STS_ACCESS_KEY_ID")
+s3_secret_key = os.environ.get("S3_STS_SECRET_ACCESS_KEY")
+s3_endpoint = os.environ.get("S3_STS_ENDPOINT")
+bucket_name = os.environ.get("S3_STS_BUCKET_NAME")
+s3_sts_region = os.environ.get("S3_STS_REGION")
+s3_role_arn = os.environ.get("S3_STS_ROLE_ARN")
+
+metalake_name: str = "TestGvfsWithS3Credential_metalake" + str(randint(1,
1))
+
+def setUp(self):
+self.options = {
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_ACCESS_KEY}": self.s3_access_key,
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_SECRET_KEY}": self.s3_secret_key,
+f"{GVFSConfig.GVFS_FILESYSTEM_S3_ENDPOINT}": self.s3_endpoint,
+}
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "s3",
+"s3-access-key-id": cls.s3_access_key,
+"s3-secret-access-key": cls.s3_secret_key,
+"s3-endpoint": cls.s3_endpoint,
+"s3-region": cls.s3_sts_region,
+"s3-role-arn": cls.s3_role_arn,
+"credential-providers": "s3-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"s3a://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = S3FileSystem(
+key=cls.s3_access_key,
+secret=cls.s3_secret_key,
+endpoint_url=cls.s3_endpoint,
+)
+
+# The following tests are copied from
tests/integration/test_gvfs_with_s3.py, with some modifications as
+# `mkdir` and `makedirs` have different behavior in the S3, other cloud
storage like GCS, ABS, and OSS.
+# are similar.
+def
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905061484
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
what's the the influence if we adjust the value from 0.5 to 0.9?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905059319
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
Another point, I think you have similar comments in the Java implementation,
from 0.5 to 0.9. What are your considerations regarding this point?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905048644
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
Directly throw exception??? What if the fieldset has not set credentials for
old version?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905047524
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
Do you mean the credential should be refreshed within 30 minutes, even if
the expiration time is an hour?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905045765
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
Review Comment:
remove `STS` from the enviroment variables?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905044983 ## clients/client-python/gravitino/filesystem/gvfs.py: ## @@ -866,50 +896,94 @@ def _get_fileset_catalog(self, catalog_ident: NameIdentifier): finally: write_lock.release() -def _get_filesystem(self, actual_file_location: str): +def _file_system_is_not_expired(self, expire_time: int): Review Comment: it's okay for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1905030532
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -918,11 +992,41 @@ def _get_gcs_filesystem(self):
raise GravitinoRuntimeException(
"Service account key is not found in the options."
)
-return importlib.import_module("gcsfs").GCSFileSystem(
-token=service_account_key_path
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("gcsfs").GCSFileSystem(
+token=service_account_key_path
+),
)
-def _get_s3_filesystem(self):
+def _get_s3_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
Review Comment:
How about throw exception here?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1168,61 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+TIME_WITHOUT_EXPIRATION,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, S3TokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, S3SecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_oss_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# secret key credential.
+if isinstance(credential, OSSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, OSSSecretKeyCredential):
+return credential
+return None
+
+def _get_most_suitable_gcs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, return
None.
+if isinstance(credential, GCSTokenCredential):
+return credential
+return None
+
+def _get_most_suitable_abs_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if it does not exist, use the
+# account key credential
+if isinstance(credential, ADLSTokenCredential):
+return credential
+
+for credential in credentials:
+if isinstance(credential, AzureAccountKeyCredential):
+return credential
+return None
+
+def _get_expire_time_by_ratio(self, expire_time: int):
+if expire_time <= 0:
+return TIME_WITHOUT_EXPIRATION
+return time.time() * 1000 + (expire_time - time.time() * 1000) * 0.9
Review Comment:
`0.9` seems too high, since the filesystem is used out of the control of
GVFS, how about make it configurable with default `0.5`?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +896,94 @@ def _get_fileset_catalog(self, catalog_ident:
NameIdentifier):
finally:
write_lock.release()
-def _get_filesystem(self, actual_file_location: str):
+def _file_system_is_not_expired(self, expire_time: int):
Review Comment:
is `_file_system_expired` more clear?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904843633 ## clients/client-python/gravitino/filesystem/gvfs.py: ## @@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: NameIdentifier): finally: write_lock.release() -def _get_filesystem(self, actual_file_location: str): +# Disable Too many branches (13/12) (too-many-branches) +# pylint: disable=R0912 +def _get_filesystem( +self, +actual_file_location: str, +fileset_catalog: Catalog, +name_identifier: NameIdentifier, +): storage_type = self._recognize_storage_type(actual_file_location) read_lock = self._cache_lock.gen_rlock() try: read_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: +return cache_value[1] finally: read_lock.release() write_lock = self._cache_lock.gen_wlock() try: write_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) + if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: Review Comment: OK, I have changed it to a constant value. I just follow what it did in `S3SecretKeyCredential`, `OSSSecretKeyCredential`, `AzureAccountKeyCredential`, there are all use `0` directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904838263
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
I can add some description about the detailed reason, not only `mkdir`,
other operation like `makedirs`, `ls` also have similar issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904838263
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
I can added it, not only `mkdir`, other operation like `makedirs`, `ls` also
have similar issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904837268
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
Review Comment:
Using S3FileSystem does not need endpoint parameters, but when applying GVFS
to padas (or other third-party application), it's required, this has been
tested by jerry.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904832615
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url =
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)
Review Comment:
The former one is from the catalog (fetching from Gravitino server) and the
other is set by the user from the client.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904830374
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
Review Comment:
Yes, aliyun OSS is required, but this is aws
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904829325
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url =
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)
Review Comment:
If they are the same, why use two variables?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904803955
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -1001,10 +1173,58 @@ def _get_abs_filesystem(self):
"ABS account key is not found in the options."
)
-return importlib.import_module("adlfs").AzureBlobFileSystem(
-account_name=abs_account_name,
-account_key=abs_account_key,
+return (
+sys.maxsize,
+importlib.import_module("adlfs").AzureBlobFileSystem(
+account_name=abs_account_name,
+account_key=abs_account_key,
+),
)
+def _get_most_suitable_s3_credential(self, credentials: List[Credential]):
+for credential in credentials:
+# Prefer to use the token credential, if
Review Comment:
ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904803854
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
Review Comment:
For Python client, the endpoint is optional, but for Java clients, this is a
required value to ensure consistency.
```text
java.lang.IllegalArgumentException: Aliyun OSS endpoint should not be null
or empty. Please set proper endpoint with 'fs.oss.endpoint'.
at
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:150)
at
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:349)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at
org.apache.gravitino.filesystem.hadoop.integration.test.GravitinoVirtualFileSystemIT.testDelete(GravitinoVirtualFileSystemIT.java:238)
```
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
Review Comment:
ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r190479
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url =
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)
Review Comment:
They are the same thing, what do you mean here, `oss_endpoint_url` is not a
good name?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904246261
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
could you add detailed reason in the code about why adding the test in
separate file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904244945 ## clients/client-python/gravitino/filesystem/gvfs.py: ## @@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: NameIdentifier): finally: write_lock.release() -def _get_filesystem(self, actual_file_location: str): +# Disable Too many branches (13/12) (too-many-branches) +# pylint: disable=R0912 +def _get_filesystem( +self, +actual_file_location: str, +fileset_catalog: Catalog, +name_identifier: NameIdentifier, +): storage_type = self._recognize_storage_type(actual_file_location) read_lock = self._cache_lock.gen_rlock() try: read_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: +return cache_value[1] finally: read_lock.release() write_lock = self._cache_lock.gen_wlock() try: write_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) + if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: Review Comment: for local and HDFS, you return 0 for expire time, suggest defining a constant to represent expire time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904234764
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
Review Comment:
according to the pydoc, `aws_endpoint_url` seems not required?
```
endpoint_url : string (None)
Use this endpoint_url, if specified. Needed for connecting to non-AWS
S3 buckets. Takes precedence over `endpoint_url` in client_kwargs.
```
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if credential is not None:
+oss_endpoint = fileset_catalog.properties()["oss-endpoint"]
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, OSSTokenCredential):
+fs = importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+token=credential.security_token(),
+endpoint=oss_endpoint,
+)
+return (expire_time, fs)
+if isinstance(credential, OSSSecretKeyCredential):
+return (
+expire_time,
+importlib.import_module("ossfs").OSSFileSystem(
+key=credential.access_key_id(),
+secret=credential.secret_access_key(),
+endpoint=oss_endpoint,
+),
+)
+
+oss_endpoint_url =
self._options.get(GVFSConfig.GVFS_FILESYSTEM_OSS_ENDPOINT)
Review Comment:
what's the difference of `oss_endpoint_url ` and `oss_endpoint`?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -946,13 +1048,54 @@ def _get_s3_filesystem(self):
"AWS endpoint url is not found in the options."
)
-return importlib.import_module("s3fs").S3FileSystem(
-key=aws_access_key_id,
-secret=aws_secret_access_key,
-endpoint_url=aws_endpoint_url,
+return (
+sys.maxsize,
+importlib.import_module("s3fs").S3FileSystem(
+key=aws_access_key_id,
+secret=aws_secret_access_key,
+endpoint_url=aws_endpoint_url,
+),
)
-def _get_oss_filesystem(self):
+def _get_oss_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+# Can get credential from the fileset
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_oss_credential(credentials)
+if cr
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904235369
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
GCS has different behaviour compared to hdfs, so we to overwrite it. OSS and
S3 are similar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904229503
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident:
NameIdentifier):
finally:
write_lock.release()
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
storage_type = self._recognize_storage_type(actual_file_location)
read_lock = self._cache_lock.gen_rlock()
try:
read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
finally:
read_lock.release()
write_lock = self._cache_lock.gen_wlock()
try:
write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
+
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
if storage_type == StorageType.HDFS:
fs_class =
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
else:
raise GravitinoRuntimeException(
f"Storage type: `{storage_type}` doesn't support now."
)
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
finally:
write_lock.release()
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
# get 'service-account-key' from gcs_options, if the key is not found,
throw an exception
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904222921
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
Could you provide the reason?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904073910
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident:
NameIdentifier):
finally:
write_lock.release()
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
storage_type = self._recognize_storage_type(actual_file_location)
read_lock = self._cache_lock.gen_rlock()
try:
read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
finally:
read_lock.release()
write_lock = self._cache_lock.gen_wlock()
try:
write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
+
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
if storage_type == StorageType.HDFS:
fs_class =
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
else:
raise GravitinoRuntimeException(
f"Storage type: `{storage_type}` doesn't support now."
)
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
finally:
write_lock.release()
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
# get 'service-account-key' from gcs_options, if the key is not found,
throw an exception
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904073910
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident:
NameIdentifier):
finally:
write_lock.release()
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
storage_type = self._recognize_storage_type(actual_file_location)
read_lock = self._cache_lock.gen_rlock()
try:
read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
finally:
read_lock.release()
write_lock = self._cache_lock.gen_wlock()
try:
write_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
+
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
+
+new_cache_value: Tuple[int, AbstractFileSystem]
if storage_type == StorageType.HDFS:
fs_class =
importlib.import_module("pyarrow.fs").HadoopFileSystem
-fs = ArrowFSWrapper(fs_class.from_uri(actual_file_location))
+new_cache_value = (
+0,
+ArrowFSWrapper(fs_class.from_uri(actual_file_location)),
+)
elif storage_type == StorageType.LOCAL:
-fs = LocalFileSystem()
+new_cache_value = (0, LocalFileSystem())
elif storage_type == StorageType.GCS:
-fs = self._get_gcs_filesystem()
+new_cache_value = self._get_gcs_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.S3A:
-fs = self._get_s3_filesystem()
+new_cache_value = self._get_s3_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.OSS:
-fs = self._get_oss_filesystem()
+new_cache_value = self._get_oss_filesystem(
+fileset_catalog, name_identifier
+)
elif storage_type == StorageType.ABS:
-fs = self._get_abs_filesystem()
+new_cache_value = self._get_abs_filesystem(
+fileset_catalog, name_identifier
+)
else:
raise GravitinoRuntimeException(
f"Storage type: `{storage_type}` doesn't support now."
)
-self._cache[storage_type] = fs
-return fs
+self._cache[name_identifier] = new_cache_value
+return new_cache_value[1]
finally:
write_lock.release()
-def _get_gcs_filesystem(self):
+def _get_gcs_filesystem(self, fileset_catalog: Catalog, identifier:
NameIdentifier):
+try:
+fileset: GenericFileset =
fileset_catalog.as_fileset_catalog().load_fileset(
+NameIdentifier.of(identifier.namespace().level(2),
identifier.name())
+)
+credentials = fileset.support_credentials().get_credentials()
+except (NoSuchCredentialException, CatalogNotInUseException) as e:
+logger.warning("Failed to get credentials from fileset: %s", e)
+credentials = []
+
+credential = self._get_most_suitable_gcs_credential(credentials)
+if credential is not None:
+expire_time =
self._get_expire_time_by_ratio(credential.expire_time_in_ms())
+if isinstance(credential, GCSTokenCredential):
+fs = importlib.import_module("gcsfs").GCSFileSystem(
+token=credential.token()
+)
+return (expire_time, fs)
+
# get 'service-account-key' from gcs_options, if the key is not found,
throw an exception
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904072789
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
TestGvfsWithHDFS already has the method and
[test_gvfs_with_gcs_credential.py](https://github.com/apache/gravitino/pull/5997/files/bd8618ec3c446ee3e9df4ce840fc2c4306b257a7#diff-c1af14e87a421259945e3e79d9c2f271743395b0a70eb1a957757d8028e29a19)
just overwrite it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904070502
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
Review Comment:
ok, I will create a issue to track this problem as it's just a improvement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904069465 ## clients/client-python/gravitino/filesystem/gvfs.py: ## @@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident: NameIdentifier): finally: write_lock.release() -def _get_filesystem(self, actual_file_location: str): +# Disable Too many branches (13/12) (too-many-branches) +# pylint: disable=R0912 +def _get_filesystem( +self, +actual_file_location: str, +fileset_catalog: Catalog, +name_identifier: NameIdentifier, +): storage_type = self._recognize_storage_type(actual_file_location) read_lock = self._cache_lock.gen_rlock() try: read_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: +return cache_value[1] finally: read_lock.release() write_lock = self._cache_lock.gen_wlock() try: write_lock.acquire() -cache_value: Tuple[StorageType, AbstractFileSystem] = self._cache.get( -storage_type +cache_value: Tuple[int, AbstractFileSystem] = self._cache.get( +name_identifier ) + if cache_value is not None: -return cache_value +# if the cache value is not expired, return the cache value +if cache_value[0] > time.time() * 1000: Review Comment: I have handled it. see https://github.com/user-attachments/assets/c25ec5de-6d47-45f3-a6fb-84fcc02d89c9"; /> > nd could you use a function to check whether the token expires? ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997:
URL: https://github.com/apache/gravitino/pull/5997#discussion_r1904042397
##
clients/client-python/tests/integration/test_gvfs_with_gcs_credential.py:
##
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import logging
+import os
+from random import randint
+import unittest
+
+from gcsfs import GCSFileSystem
+
+from gravitino import Catalog, Fileset, GravitinoClient
+from gravitino.filesystem import gvfs
+from tests.integration.test_gvfs_with_gcs import TestGvfsWithGCS
+
+logger = logging.getLogger(__name__)
+
+
+def gcs_with_credential_is_configured():
+return all(
+[
+os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH") is not None,
+os.environ.get("GCS_STS_BUCKET_NAME") is not None,
+]
+)
+
+
[email protected](gcs_with_credential_is_configured(), "GCS is not
configured.")
+class TestGvfsWithGCSCredential(TestGvfsWithGCS):
+# Before running this test, please set the make sure gcp-bundle-x.jar has
been
+# copy to the $GRAVITINO_HOME/catalogs/hadoop/libs/ directory
+key_file = os.environ.get("GCS_STS_SERVICE_ACCOUNT_JSON_PATH")
+bucket_name = os.environ.get("GCS_STS_BUCKET_NAME")
+metalake_name: str = "TestGvfsWithGCSCredential_metalake" + str(randint(1,
1))
+
+@classmethod
+def _init_test_entities(cls):
+cls.gravitino_admin_client.create_metalake(
+name=cls.metalake_name, comment="", properties={}
+)
+cls.gravitino_client = GravitinoClient(
+uri="http://localhost:8090";, metalake_name=cls.metalake_name
+)
+
+cls.config = {}
+cls.conf = {}
+catalog = cls.gravitino_client.create_catalog(
+name=cls.catalog_name,
+catalog_type=Catalog.Type.FILESET,
+provider=cls.catalog_provider,
+comment="",
+properties={
+"filesystem-providers": "gcs",
+"gcs-credential-file-path": cls.key_file,
+"gcs-service-account-file": cls.key_file,
+"credential-providers": "gcs-token",
+},
+)
+catalog.as_schemas().create_schema(
+schema_name=cls.schema_name, comment="", properties={}
+)
+
+cls.fileset_storage_location: str = (
+
f"gs://{cls.bucket_name}/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+cls.fileset_gvfs_location = (
+
f"gvfs://fileset/{cls.catalog_name}/{cls.schema_name}/{cls.fileset_name}"
+)
+catalog.as_fileset_catalog().create_fileset(
+ident=cls.fileset_ident,
+fileset_type=Fileset.Type.MANAGED,
+comment=cls.fileset_comment,
+storage_location=cls.fileset_storage_location,
+properties=cls.fileset_properties,
+)
+
+cls.fs = GCSFileSystem(token=cls.key_file)
+
+def test_mkdir(self):
Review Comment:
why not move these tests to `TestGvfsWithHDFS` ?
##
clients/client-python/gravitino/filesystem/gvfs.py:
##
@@ -866,50 +894,93 @@ def _get_fileset_catalog(self, catalog_ident:
NameIdentifier):
finally:
write_lock.release()
-def _get_filesystem(self, actual_file_location: str):
+# Disable Too many branches (13/12) (too-many-branches)
+# pylint: disable=R0912
+def _get_filesystem(
+self,
+actual_file_location: str,
+fileset_catalog: Catalog,
+name_identifier: NameIdentifier,
+):
storage_type = self._recognize_storage_type(actual_file_location)
read_lock = self._cache_lock.gen_rlock()
try:
read_lock.acquire()
-cache_value: Tuple[StorageType, AbstractFileSystem] =
self._cache.get(
-storage_type
+cache_value: Tuple[int, AbstractFileSystem] = self._cache.get(
+name_identifier
)
if cache_value is not None:
-return cache_value
+# if the cache value is not expired, return the cache value
+if cache_value[0] > time.time() * 1000:
+return cache_value[1]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901707988 ## clients/client-python/gravitino/api/credential/adls_token_credential.py: ## @@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC): ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token" ADLS_DOMAIN: str = "dfs.core.windows.net" -_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" -_SAS_TOKEN: str = "adls-sas-token" +STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" Review Comment: How can I get constant value? use method `get`or just create a new constants with the same value? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901574255 ## clients/client-python/gravitino/api/credential/adls_token_credential.py: ## @@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC): ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token" ADLS_DOMAIN: str = "dfs.core.windows.net" -_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" -_SAS_TOKEN: str = "adls-sas-token" +STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" Review Comment: it shoud be `same` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901535397 ## clients/client-python/gravitino/api/credential/adls_token_credential.py: ## @@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC): ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token" ADLS_DOMAIN: str = "dfs.core.windows.net" -_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" -_SAS_TOKEN: str = "adls-sas-token" +STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" Review Comment: `seem to other credentials.`, What's the meaning of this sentences? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
FANNG1 commented on code in PR #5997: URL: https://github.com/apache/gravitino/pull/5997#discussion_r1901531443 ## clients/client-python/gravitino/api/credential/adls_token_credential.py: ## @@ -27,12 +27,12 @@ class ADLSTokenCredential(Credential, ABC): ADLS_TOKEN_CREDENTIAL_TYPE: str = "adls-token" ADLS_DOMAIN: str = "dfs.core.windows.net" -_STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" -_SAS_TOKEN: str = "adls-sas-token" +STORAGE_ACCOUNT_NAME: str = "azure-storage-account-name" Review Comment: please don't use the internal `STORAGE_ACCOUNT_NAME` properties directly. seem to other credentials. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [#5996] feat(python-client): Using credentail in python GVFS client. [gravitino]
yuqi1129 commented on PR #5997: URL: https://github.com/apache/gravitino/pull/5997#issuecomment-2566488260 @FANNG1 @jerryshao Please help to review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
