rangadi commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r984821786
########## connector/proto/src/main/scala/org/apache/spark/sql/proto/utils/SchemaConverters.scala: ########## @@ -0,0 +1,172 @@ +/* Review Comment: We could rename this to `ProtobufSchemaConverters.scala`. I know Avro does not do it, but I find multiple files in the repo with generic names are hard to use in practice. ########## connector/proto/src/test/resources/protobuf/proto_functions_suite.proto: ########## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +// To compile and create test class: +// protoc --java_out=connector/proto/src/test/resources/protobuf/ connector/proto/src/test/resources/protobuf/proto_functions_suite.proto +// protoc --descriptor_set_out=connector/proto/src/test/resources/protobuf/proto_functions_suite.desc --java_out=connector/proto/src/test/resources/protobuf/org/apache/spark/sql/proto/ connector/proto/src/test/resources/protobuf/proto_functions_suite.proto + +syntax = "proto3"; + +package org.apache.spark.sql.proto; +option java_outer_classname = "SimpleMessageProtos"; + +message SimpleMessageJavaTypes { + int64 id = 1; + string string_value = 2; + int32 int32_value = 3; + int64 int64_value = 4; + double double_value = 5; + float float_value = 6; + bool bool_value = 7; + bytes bytes_value = 8; +} + +message SimpleMessage { + int64 id = 1; + string string_value = 2; + int32 int32_value = 3; + uint32 uint32_value = 4; + sint32 sint32_value = 5; + fixed32 fixed32_value = 6; + sfixed32 sfixed32_value = 7; + int64 int64_value = 8; + uint64 uint64_value = 9; + sint64 sint64_value = 10; + fixed64 fixed64_value = 11; + sfixed64 sfixed64_value = 12; + double double_value = 13; + float float_value = 14; + bool bool_value = 15; + bytes bytes_value = 16; +} + +message SimpleMessageRepeated { + string key = 1; + string value = 2; + enum NestedEnum { + ESTED_NOTHING = 0; + NESTED_FIRST = 1; + NESTED_SECOND = 2; + } + repeated string rstring_value = 3; + repeated int32 rint32_value = 4; + repeated bool rbool_value = 5; + repeated int64 rint64_value = 6; + repeated float rfloat_value = 7; + repeated double rdouble_value = 8; + repeated bytes rbytes_value = 9; + repeated NestedEnum rnested_enum = 10; +} + +message BasicMessage { + int64 id = 1; + string string_value = 2; + int32 int32_value = 3; + int64 int64_value = 4; + double double_value = 5; + float float_value = 6; + bool bool_value = 7; + bytes bytes_value = 8; +} + +message RepeatedMessage { + repeated BasicMessage basic_message = 1; +} + +message SimpleMessageMap { + string key = 1; + string value = 2; + map<string, string> string_mapdata = 3; Review Comment: Is there a unit test that uses this Message? When we support maps, Spark SQL schema for for this would be `MapType(StringType, StringType)` not `ArrayType`. Can we verify in a unit test? This can be detected at at run time by checking `map_entry` flag in MessageOptions : https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto#L526 ########## connector/proto/pom.xml: ########## @@ -0,0 +1,119 @@ +<?xml version="1.0" encoding="UTF-8"?> Review Comment: (B): Lets look into more details in the PR that adds schema-registry support. Yes, we would have to include the required jars. (C) Lets discuss in my comment for SimpleMessageMap proto used in the tests below. Essentially a map in proto definition should ideally be a `MapType` in Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
