Re: Cybershuttle Replica Catalog API

2023-02-22 Thread Christie, Marcus Aaron
Hi Jayan,

I would like to echo Dimuthu and say that this looks great and I appreciate the 
effort in your pulling this all together.  I have some feedback to share.

The high-level architecture diagram shows the replica id being stored in the 
data catalog. That was an initial idea that we had, but we decided that the 
replica catalog would store the data product id. That seems reflected in your 
API design so I think you already know this, but I wanted to point it out since 
the diagram might be a little confusing for others.

In the ReplicaCatalogAPI.proto the name of the data product id field is 
"parent_data_product_id". I would suggest calling it "data_product_id" instead. 
"parent_data_product_id" means "the id of the parent data product of this data 
product" in the data catalog. It might be confusing to use the same name in 
ReplicaCatalogAPI.proto.


Thanks,

Marcus

> On Feb 18, 2023, at 3:09 PM, Jayan Vidanapathirana 
>  wrote:
> 
> Hi All, 
> 
> As a new contributor to the Cybershuttle project, I have been actively 
> involved in implementing the Data Replica Catalog. This new catalog is 
> designed to interface with both the Apache Airavata Data Catalog [1] and 
> Airavata MFT[2]. This replica catalog should be able to store each replica 
> resource storage details and secret/credential details specific to the 
> storage type. The proposed high-level architecture will be as follows:
> 
> 
> 
> I will mainly work on the highlighted area (red color box) and as an initial 
> step started defining APIs which communicate with Replica Catalog. This API 
> calls will be gRPC APIs and following methods will be implement,
> 
> Replica Registration
> 
>   • registerReplicaLocation(DataReplicaCreateRequest createRequest)
>   • updateReplicaLocation(DataReplicaCreateRequest updateRequest)
>   • DataReplicaLocationModel getReplicaLocation(DataReplicaGetRequest 
> getReplicaRequest)
>   • removeReplicaLocation(DataReplicaDeleteRequest deleteReplicaRequest)
>   • getAllReplicaLocations(AllDataReplicaGetRequest allDataGetRequest)
>   • removeAllReplicaLocations(AllDataReplicaDeleteRequest 
> allDataDeleteRequest)
> 
> Storage Registration
> 
> registerSecretForStorage(SecretForStorage request)
> deleteSecretsForStorage(SecretForStorageDeleteRequest request)
> getSecretForStorage(SecretForStorageGetRequest request)
> searchStorages(StorageSearchRequest request)
> listStorages(StorageListRequest request)
> resolveStorageType (StorageTypeResolveRequest request)
> 
> Storage - Internal APIs
> 
> S3StorageListResponse listS3Storage(S3StorageListRequest request) 
> Optional getS3Storage(S3StorageGetRequest request) 
> S3Storage createS3Storage(S3StorageCreateRequest request) 
> boolean updateS3Storage(S3StorageUpdateRequest request) 
> boolean deleteS3Storage(S3StorageDeleteRequest request) 
> 
> AzureStorageListResponse listAzureStorage(AzureStorageListRequest request) 
> Optional getAzureStorage(AzureStorageGetRequest request) 
> AzureStorage createAzureStorage(AzureStorageCreateRequest request) 
> boolean updateAzureStorage(AzureStorageUpdateRequest request) 
> boolean deleteAzureStorage(AzureStorageDeleteRequest request) 
> 
> GCSStorageListResponse listGCSStorage(GCSStorageListRequest request) 
> Optional getGCSStorage(GCSStorageGetRequest request) 
> GCSStorage createGCSStorage(GCSStorageCreateRequest request) 
> boolean updateGCSStorage(GCSStorageUpdateRequest request) 
> boolean deleteGCSStorage(GCSStorageDeleteRequest request) 
> 
> Secret Registration
> 
> registerSecret(SecretRegistrationRequest request)
> deleteSecret(SecretDeleteRequest request)
> resolveStorageType (StorageTypeResolveRequest request)
> 
> Secret  - Internal APIs
> 
> Optional getS3Secret(S3SecretGetRequest request) 
> S3Secret createS3Secret(S3SecretCreateRequest request) 
> boolean updateS3Secret(S3SecretUpdateRequest request) 
> boolean deleteS3Secret(S3SecretDeleteRequest request) 
> 
> Optional getAzureSecret(AzureSecretGetRequest request) 
> AzureSecret createAzureSecret(AzureSecretCreateRequest request) 
> boolean updateAzureSecret(AzureSecretUpdateRequest request) 
> boolean deleteAzureSecret(AzureSecretDeleteRequest request) 
> 
> Optional getGCSSecret(GCSSecretGetRequest request) 
> GCSSecret createGCSSecret(GCSSecretCreateRequest request) 
> boolean updateGCSSecret(GCSSecretUpdateRequest request) 
> boolean deleteGCSSecret(GCSSecretDeleteRequest request) 
> 
> 
> Poc[3] : https://github.com/Jayancv/airavata-replica-catalog  (Defining API 
> calls)
> Draft APIs : refer the attachment replicaCatalogAPIsDocumentation.html[4] 
> which generated using the Poc [3]
> 
> I greatly appreciate your thoughts and feedback on the designs[5], as they 
> can help us improve and adopt a more generalized approach. Additionally, I 
> would like to identify any other factors that we should take into account to 
> minimize potential issues in the future. Are there any other considerations 
> that we should keep in 

Re: Cybershuttle Replica Catalog API

2023-02-22 Thread DImuthu Upeksha
Hi Jayan,

This looks great and the APIs are clear enough to understand. Specifically,
automatic documentation generation is a very nice feature. I would like to
suggest a few modifications to the replica data models though.

I understand that you got an idea about the replica data models through
Airavata but there are a few modifications/generalizations that we need to
do here. For example, the Replica location category [6] might not make
sense here as you already specify the storage type. UserInfo, GroupInfo,
and Permissions in the same proto file do not apply to this design as those
are handled at the Data Catalog level. What is your expectation in
ReplicaListEntry below?

message ReplicaListEntry {
  string data_replica_id = 1;
  string replica_name = 2;
  StorageType storage_type = 3;
}

If you are planning to provide the grouping of Replica Items through that,
I suggest updating it in the following way

message ReplicaGroupEntry {
  string replica_group_id = 1
  repeated ReplicaGroupEntry directories = 2;
  repeated DataReplicaLocation files = 3;
}

This will provide both grouping and hierarchical replica registration and
you can emulate it as a virtual file hierarchy.

[6]
https://github.com/Jayancv/airavata-replica-catalog/blob/master/replica-catalog-api/stubs/src/main/proto/catalogapi/ReplicaCatalogAPI.proto#L60

Thanks
Dimuthu


On Sat, Feb 18, 2023 at 3:10 PM Jayan Vidanapathirana <
jcvidanapathir...@gmail.com> wrote:

> Hi All,
>
> As a new contributor to the Cybershuttle project, I have been actively
> involved in implementing the Data Replica Catalog. This new catalog is
> designed to interface with both the Apache Airavata Data Catalog [1] and
> Airavata MFT[2]. This replica catalog should be able to store each replica
> resource storage details and secret/credential details specific to the
> storage type. The proposed high-level architecture will be as follows:
>
>
> I will mainly work on the highlighted area (red color box) and as an
> initial step started defining APIs which communicate with Replica Catalog.
> This API calls will be gRPC APIs and following methods will be implement,
>
> Replica Registration
>
>
>1.
>
>registerReplicaLocation(DataReplicaCreateRequest createRequest)
>2.
>
>updateReplicaLocation(DataReplicaCreateRequest updateRequest)
>3.
>
>DataReplicaLocationModel getReplicaLocation(DataReplicaGetRequest
>getReplicaRequest)
>4.
>
>removeReplicaLocation(DataReplicaDeleteRequest deleteReplicaRequest)
>5.
>
>getAllReplicaLocations(AllDataReplicaGetRequest allDataGetRequest)
>6.
>
>removeAllReplicaLocations(AllDataReplicaDeleteRequest
>allDataDeleteRequest)
>
>
> Storage Registration
>
> registerSecretForStorage(SecretForStorage request)
>
> deleteSecretsForStorage(SecretForStorageDeleteRequest request)
>
> getSecretForStorage(SecretForStorageGetRequest request)
>
> searchStorages(StorageSearchRequest request)
>
> listStorages(StorageListRequest request)
>
> resolveStorageType (StorageTypeResolveRequest request)
>
> Storage - Internal APIs
>
> S3StorageListResponse listS3Storage(S3StorageListRequest request)
>
> Optional getS3Storage(S3StorageGetRequest request)
>
> S3Storage createS3Storage(S3StorageCreateRequest request)
>
> boolean updateS3Storage(S3StorageUpdateRequest request)
>
> boolean deleteS3Storage(S3StorageDeleteRequest request)
>
> AzureStorageListResponse listAzureStorage(AzureStorageListRequest request)
>
> Optional getAzureStorage(AzureStorageGetRequest request)
>
> AzureStorage createAzureStorage(AzureStorageCreateRequest request)
>
> boolean updateAzureStorage(AzureStorageUpdateRequest request)
>
> boolean deleteAzureStorage(AzureStorageDeleteRequest request)
>
> GCSStorageListResponse listGCSStorage(GCSStorageListRequest request)
>
> Optional getGCSStorage(GCSStorageGetRequest request)
>
> GCSStorage createGCSStorage(GCSStorageCreateRequest request)
>
> boolean updateGCSStorage(GCSStorageUpdateRequest request)
>
> boolean deleteGCSStorage(GCSStorageDeleteRequest request)
>
> Secret Registration
>
> registerSecret(SecretRegistrationRequest request)
>
> deleteSecret(SecretDeleteRequest request)
>
> resolveStorageType (StorageTypeResolveRequest request)
>
> Secret - Internal APIs
>
>
> Optional getS3Secret(S3SecretGetRequest request)
>
> S3Secret createS3Secret(S3SecretCreateRequest request)
>
> boolean updateS3Secret(S3SecretUpdateRequest request)
>
> boolean deleteS3Secret(S3SecretDeleteRequest request)
>
> Optional getAzureSecret(AzureSecretGetRequest request)
>
> AzureSecret createAzureSecret(AzureSecretCreateRequest request)
>
> boolean updateAzureSecret(AzureSecretUpdateRequest request)
>
> boolean deleteAzureSecret(AzureSecretDeleteRequest request)
>
> Optional getGCSSecret(GCSSecretGetRequest request)
>
> GCSSecret createGCSSecret(GCSSecretCreateRequest request)
>
> boolean updateGCSSecret(GCSSecretUpdateRequest request)
>
> boolean deleteGCSSecret(GCSSecretDeleteRequest request)
>
>
> Poc[3] : 

Re: SEAGrid Data Catalog

2023-02-22 Thread Lahiru Jayathilake
Hi All,

To provide an update about the project, we have decided to proceed with
approach one with a few changes. Instead of using Django model classes for
the Computational, Experimental, and Literature data, protobuf generated
model classes will be used to handle communications. This approach will
enable easier accommodation of changes to the models without having to
modify the Django code base.
[image: SMILES Django Portal.png]

The project has been renamed to smiles-django-portal [1], and I have
created a PR [2] which has the functionality of creating a computational
product (note that the frontend has not been implemented yet), as well as
the Python client implementation to call the Airavata Data Catalog service.

With regards to the SMILES protobuf files, there are two possible ways of
creating them,

1. Declaring all the fields in the proto file
All the data related to specific SMILES data products will be mentioned
within the proto file, along with the required fields for the Airavata Data
Catalog product, except metadata. Fields such as data_product_id and name
will be included. When the Airavata Data Catalog gRPC service is invoked,
all the SMILES-specific data will go to the metadata field as a JSON
string.

This method has already been implemented in the previous PR. A sample
protobuf file [3] (this is not the final version) was borrowed from here [4]

2. Using JSON-LD for metadata
In this schema, we are using the "google.protobuf.Struct" type to represent
the metadata for each data product. This allows us to store JSON-LD data in
the metadata field, as the "google.protobuf.Struct" type can hold arbitrary
JSON data. The rest of the required fields will also be mentioned (e.g.,
data_product_id, name, etc.)."
The 'google.protobuf.Struct metadata' will be assigned to the 'metadata'
field of the Airavata Data Catalog product as a JSON string.

A Sample protobuf file,

syntax = "proto3";

import "google/protobuf/struct.proto";

message ComputationalDP {
  string data_product_id = 1;
  string parent_data_product_id = 2;
  string name = 3;
  google.protobuf.Struct metadata = 4;
}

Using this approach, I believe we can have the following two main
advantages,

- Flexibility (Because the "google.protobuf.Struct" type can hold arbitrary
JSON data, we can represent a wide range of data structures, from simple
key-value pairs to nested objects and arrays. This can make it easier to
work with complex data and integrate it with other systems)
- Type safety (By using the "google.protobuf.Struct" type, we can ensure
that the JSON data is well-formed and conforms to a specific schema)

I'd like to hear your thoughts and feedback on this.

[1] - https://github.com/SciGaP/smiles-django-portal
[2] - https://github.com/SciGaP/smiles-django-portal/pull/1
[3] -
https://github.com/lahirujayathilake/smiles-django-portal/blob/main/data_catalog/proto/computational_dp.proto
[4] -
https://github.com/bhavesh-asana/SEAGrid/blob/main/rpcHandler/ExpDBDataHandler/proto/molecule.proto

Thanks,
Lahiru


On Fri, Feb 17, 2023 at 12:28 PM Lahiru Jayathilake <
lahirujayathil...@gmail.com> wrote:

> Hi Suresh,
>
> Thanks for the advice, sure I will do it as you suggested.
>
> Lahiru
>
> On Thu, Feb 16, 2023 at 7:42 PM Suresh Marru  wrote:
>
>> Hi Lahiru,
>>
>> The two dependencies, a Django-grpc fork (
>> https://github.com/socotecio/django-socio-grpc/) and
>> https://github.com/grpc/grpc-web are reasonably ok. So building on them
>> may not be a bad idea. But if you are hitting too frequent roadblocks, it
>> may be wise to switch to Django-rest-framework and take your approach 1.
>> Sometimes the downsides of depending on not-so-actively maintained
>> dependencies outweigh the technical advantages.
>>
>> So + 1 to proceed with grpc, but if you stumble, revert to the REST-based
>> approach.
>>
>> Suresh
>>
>> On Feb 16, 2023, at 3:24 AM, Lahiru Jayathilake <
>> lahirujayathil...@gmail.com> wrote:
>>
>> Hi Marcus,
>>
>> Thanks for the suggestions and the heads-up. Sure, I will do more
>> investigation on that and get back to you with the details.
>>
>> Thanks,
>> Lahiru
>>
>> On Wed, Feb 15, 2023 at 8:36 PM Christie, Marcus Aaron 
>> wrote:
>>
>>> Hi Lahiru,
>>>
>>> Thanks for putting together this investigation. I'm not 100% sure but it
>>> looks like gRPC-JS only works with Node.js since it uses Node.js APIs. I
>>> think you'll need gRPC-Web to make gRPC calls from a browser. My
>>> understanding is that that requires an Envoy proxy on the server side.
>>> (Rereading your email, I think you probably already know this, but just in
>>> case I thought I would point this out.)
>>>
>>> It looks like django-grpc-framework isn't an active project [1], so I
>>> agree with your concern about depending on it. One issue with using gRPC in
>>> Django, I think, is that the integration that we've done with the Django
>>> framework would need to be re-implemented, things like middleware and
>>> authentication.  It's probably doable, just something