[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171706547 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -408,6 +408,7 @@ message CommandProducerSuccess { // The last sequence id that was stored by this producer in the previous session // This will only be meaningful if deduplication has been enabled. optional int64 last_sequence_id = 3 [default = -1]; + optional bytes schema_version = 4; Review comment: @mgodave I think we need the `schema_version` in producer_success. That should be attached to messages when publishing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171651451 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -22,6 +22,30 @@ package pulsar.proto; option java_package = "org.apache.pulsar.common.api.proto"; option optimize_for = LITE_RUNTIME; +message Schema { + enum Type { + Json = 1; + Protobuf = 2; + Thrift = 3; + Avro = 4; + Reserved1 = 5; Review comment: I don't think we should need these. When deserializing an unknown enum value, protobuf returns `null`. Also, we typically we use peer protocol version, which is exchanged in the `Connect`/`Connected` phase. There we can check if the other side supports already a new feature or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171623263 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -22,6 +22,13 @@ package pulsar.proto; option java_package = "org.apache.pulsar.common.api.proto"; option optimize_for = LITE_RUNTIME; +message Schema { +required string name = 1; +required bytes version = 2; +required bytes schema_data = 3; +repeated KeyValue properties = 4; Review comment: Though I think that, when we attach to a message, the `schema_version` field is anyway defined as `int64` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171418965 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -197,6 +205,8 @@ message CommandSubscribe { repeated KeyValue metadata = 10; optional bool read_compacted = 11; + + optional int64 schema_version = 12; Review comment: When I subscribe, should I just pass the schema itself? How do I know the version that my current code for `MyClass.class` has? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171411681 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -387,6 +400,7 @@ message CommandProducerSuccess { // The last sequence id that was stored by this producer in the previous session // This will only be meaningful if deduplication has been enabled. optional int64 last_sequence_id = 3 [default = -1]; + optional Schema schema = 4; Review comment: @mgodave I think that the schema should be on the `CreateProducer` command, but I don't see how it would be used on the `ProducerSuccess`, since it will be not useful for the producer at that point. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r171418590 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -22,6 +22,13 @@ package pulsar.proto; option java_package = "org.apache.pulsar.common.api.proto"; option optimize_for = LITE_RUNTIME; +message Schema { +required string name = 1; +required bytes version = 2; +required bytes schema_data = 3; +repeated KeyValue properties = 4; Review comment: I think it was already pointed out, though it's missing now. 1. Shouldn't `version` be a `long` type? 2. Since we are not supporting arbitrary schemas, I think we should identify the type of the schema that the client is sending. We should have an enum of choices, but use `int` type on the wire to prepare for when adding new enum values. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r168293225 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -368,6 +387,7 @@ message CommandRedeliverUnacknowledgedMessages { message CommandSuccess { required uint64 request_id = 1; + optional Schema schema = 2; Review comment: I think it would make sense to have a schema at the namespace level (with per-topic override), since we already have a lot of administration tasks at that level. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r168274945 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -79,8 +86,9 @@ message MessageMetadata { repeated EncryptionKeys encryption_keys = 13; // Algorithm used to encrypt data key optional string encryption_algo = 14; + optional bytes schema_version = 15; // Additional parameters required by encryption - optional bytes encryption_param = 15; + optional bytes encryption_param = 16; Review comment: We cannot change the integer id of an existing field This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r168274469 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -22,6 +22,20 @@ package pulsar.proto; option java_package = "org.apache.pulsar.common.api.proto"; option optimize_for = LITE_RUNTIME; +message Schema { + required string name = 1; + required bytes version = 2; + required bytes schema_data = 3; Review comment: I agree that storing the schema data is agnostic to the format, however these protobufs are relative to the interactions between client and broker. In this case, the producer might use something like `Schema.avroOf(MyClass.class)` or `Schema.jsonOf(MyClass.class)`. In this case, don't we need to communicate to the server what kind of schema the client is using? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r168236331 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -272,6 +289,8 @@ message CommandProducer { /// Add optional metadata key=value to this producer repeated KeyValue metadata= 6; + + optional int64 schema_version = 7; Review comment: It should carry the whole schema here. The application/producer doesn't necessarely know what the version is. Eg: if I'm using a type like : ```java class MyClass { public int a; public int b; }; ``` When I try to publish, I just know the format I'm about to send, and we can extract the avro/json schema out of it. When I try to create producer, I can send that information, and the broker can compare with the schemas already availalable for this topic. If the schema is know, we just return the schema version in the `ProducerSuccess` response. If it's a new (compatible) version of the schema, it will also be stored. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r168231034 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -272,6 +289,8 @@ message CommandProducer { /// Add optional metadata key=value to this producer repeated KeyValue metadata= 6; + + optional int64 schema_version = 7; Review comment: Sure, no problem. I think it would be good to have the "final" version of the `PulsarApi.proto` to get a sense of all the changes in the interaction between client & broker. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r167959766 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -22,6 +22,20 @@ package pulsar.proto; option java_package = "org.apache.pulsar.common.api.proto"; option optimize_for = LITE_RUNTIME; +message Schema { + required string name = 1; + required bytes version = 2; + required bytes schema_data = 7; +repeated KeyValue properties = 8; +} Review comment: How are we identifying the schema type? (eg: Json vs Avro vs..) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1232: Schema registry (1/4)
merlimat commented on a change in pull request #1232: Schema registry (1/4) URL: https://github.com/apache/incubator-pulsar/pull/1232#discussion_r167959515 ## File path: pulsar-common/src/main/proto/PulsarApi.proto ## @@ -272,6 +289,8 @@ message CommandProducer { /// Add optional metadata key=value to this producer repeated KeyValue metadata= 6; + + optional int64 schema_version = 7; Review comment: I think that when creating producer & consumer we need to specify the actual schema so that we can validate on broker side, and if that fails, the producer creation fails. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services