[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465238705 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: Done! `nested` is no longer specified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465238156 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false +examples: + "\x01\x00\x00\x00\x00\x00": {f_map:{}} + "\x01\x00\x00\x00\x00\x02\x03foo\x01\xa9F\x03bar\x01\xff\xff\xff\xff\xff\xff\xff\xff\x7f": {f_map:{"foo": 9001, "bar": 9223372036854775807}} + "\x01\x00\x00\x00\x00\x04\neverything\x00\x02is\x00\x05null!\x00\r\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf\x00": {f_map:{"everything":null, "is": null, "null!": null, "¯\\_(ツ)_/¯": null}} Review comment: Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465216120 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: I bet I ran into this way back when and I just opted to pick a (bad) value for `nested` rather than uncovering this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465215557 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: Found the issue in Java. It's a bug in CommonCodersTest that occurs when testing both nested and un-nested. It happens because we share an object for the expected value: https://github.com/apache/beam/blob/1bf60b99741ff4d0a8b88e93f5241379e1f3962f/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java#L193-L196 and it get's mutated when parsing Rows: https://github.com/apache/beam/blob/1bf60b99741ff4d0a8b88e93f5241379e1f3962f/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java#L380 It's a pretty easy fix I can apply here. Is it ok if the row test case just doesn't specify `nested`, or are you thinking it should be `nested: true`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465204223 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false +examples: + "\x01\x00\x00\x00\x00\x00": {f_map:{}} + "\x01\x00\x00\x00\x00\x02\x03foo\x01\xa9F\x03bar\x01\xff\xff\xff\xff\xff\xff\xff\xff\x7f": {f_map:{"foo": 9001, "bar": 9223372036854775807}} + "\x01\x00\x00\x00\x00\x04\neverything\x00\x02is\x00\x05null!\x00\r\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf\x00": {f_map:{"everything":null, "is": null, "null!": null, "¯\\_(ツ)_/¯": null}} Review comment: Ack, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465204097 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: > So, I re-iterate: Why is nested: false, instead of nested true if the coding is going to be identical in both context? You're not re-iterating. You're asking the same question with a lot of new and different context. I don't think there's a good reason. If I had to guess - when I added the first row coder test I was new to Beam and coders and I didn't really understand nested vs un-nested, so I just picked one. I'm trying to change it now and it looks like Java is failing some of the nested=true cases - that's probably why. Looking into the failures now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465194296 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -530,6 +530,88 @@ def estimate_size(self, unused_value, nested=False): return 1 +class MapCoderImpl(StreamCoderImpl): + """For internal use only; no backwards-compatibility guarantees. + + A coder for typing.Mapping objects.""" + def __init__( + self, + key_coder, # type: CoderImpl + value_coder # type: CoderImpl + ): +self._key_coder = key_coder +self._value_coder = value_coder + + def encode_to_stream(self, value, out, nested): +size = len(value) +out.write_bigendian_int32(size) +for i, kv in enumerate(value.items()): + key, value = kv + last = i == size - 1 Review comment: Good point, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465180802 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: Robert was saying that the coders used to encode each attribute value are always nested. In the Java implementation we always call the version of `encode` which does not accept a context (as I understand it most implementations default to the nested encoding in this case): https://github.com/apache/beam/blob/1bf60b99741ff4d0a8b88e93f5241379e1f3962f/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java#L267-L272 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r465180802 ## File path: model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml ## @@ -384,3 +384,31 @@ nested: false examples: "\x02\x01\x02\x01": {f_bool: True, f_bytes: null} "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"} + +--- + +# Binary data generated with the python SDK: +# +# import typing +# import apache_beam as beam +# class Test(typing.NamedTuple): +# f_map: typing.Mapping[str,int] +# schema = beam.typehints.schemas.named_tuple_to_schema(Test) +# coder = beam.coders.row_coder.RowCoder(schema) +# print("payload = %s" % schema.SerializeToString()) +# examples = (Test(f_map={}), +# Test(f_map={"foo": 9001, "bar": 9223372036854775807}), +# Test(f_map={"everything": None, "is": None, "null!": None, "¯\_(ツ)_/¯": None})) +# for example in examples: +# print("example = %s" % coder.encode(example)) +coder: + urn: "beam:coder:row:v1" + # f_map: map + payload: "\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd" + # map ordering is non-deterministic + non_deterministic: True +nested: false Review comment: Robert was saying that the coders used to encode each attribute value are always nested. In the Java implementation we always call the version of `encode` which does not accept a context (as I understand it most implementations default to the nested encoding in this case): ``` https://github.com/apache/beam/blob/1bf60b99741ff4d0a8b88e93f5241379e1f3962f/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java#L267-L272 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r464732307 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -855,10 +855,21 @@ message StandardCoders { // BOOLEAN: beam:coder:bool:v1 // BYTES: beam:coder:bytes:v1 // ArrayType: beam:coder:iterable:v1 (always has a known length) -// MapType: not yet a standard coder (BEAM-7996) +// MapType: not a standard coder, specification defined below. // RowType: beam:coder:row:v1 // LogicalType: Uses the coder for its representation. // +// The MapType is encoded by: +// - An INT32 representing the size of the map (N) +// - Followed by N interleaved keys and values, encoded with their +// corresponding coder. +// +// Nullable types in container types (ArrayType, MapType) are encoded by: +// - A one byte null indicator, 0x00 for null values, or 0x01 for present +// values. +// - For present values the null indicator is followed by the value +// encoded with it's corresponding coder. +// Review comment: That's right. Nullability is represented on the FieldType: https://github.com/apache/beam/blob/d5a6c2f0ceb7b3e609ba8d97e523b3bded7d0459/model/pipeline/src/main/proto/schema.proto#L47 We _could_ document that schemas should not accept null keys/values in `MapType` and reject them in the SDKs. But there would still be an unused nullable field on the key and value FieldType: https://github.com/apache/beam/blob/d5a6c2f0ceb7b3e609ba8d97e523b3bded7d0459/model/pipeline/src/main/proto/schema.proto#L79-L82 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463311197 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -855,10 +855,21 @@ message StandardCoders { // BOOLEAN: beam:coder:bool:v1 // BYTES: beam:coder:bytes:v1 // ArrayType: beam:coder:iterable:v1 (always has a known length) -// MapType: not yet a standard coder (BEAM-7996) +// MapType: not a standard coder, specification defined below. // RowType: beam:coder:row:v1 // LogicalType: Uses the coder for its representation. // +// The MapType is encoded by: +// - An INT32 representing the size of the map (N) +// - Followed by N interleaved keys and values, encoded with their +// corresponding coder. +// +// Nullable types in container types (ArrayType, MapType) are encoded by: +// - A one byte null indicator, 0x00 for null values, or 0x01 for present +// values. +// - For present values the null indicator is followed by the value +// encoded with it's corresponding coder. +// Review comment: This is specifically for nullable types that are elements of an Array or keys/values of a Map. For rows we encode nulls in a separate bitmask: https://github.com/apache/beam/blob/587dde57cbb2b0095a1fa04b59798d1b62c66f18/model/pipeline/src/main/proto/beam_runner_api.proto#L837-L843 Java schemas will already let you use nullable types for keys and values in Maps. I doubt anyone is relying on it.. but there's a risk if we disallow it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463311197 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -855,10 +855,21 @@ message StandardCoders { // BOOLEAN: beam:coder:bool:v1 // BYTES: beam:coder:bytes:v1 // ArrayType: beam:coder:iterable:v1 (always has a known length) -// MapType: not yet a standard coder (BEAM-7996) +// MapType: not a standard coder, specification defined below. // RowType: beam:coder:row:v1 // LogicalType: Uses the coder for its representation. // +// The MapType is encoded by: +// - An INT32 representing the size of the map (N) +// - Followed by N interleaved keys and values, encoded with their +// corresponding coder. +// +// Nullable types in container types (ArrayType, MapType) are encoded by: +// - A one byte null indicator, 0x00 for null values, or 0x01 for present +// values. +// - For present values the null indicator is followed by the value +// encoded with it's corresponding coder. +// Review comment: This is specifically for nullable types that are elements of an Array or keys/values of a Map. For rows we nulls in a separate bitmask: https://github.com/apache/beam/blob/587dde57cbb2b0095a1fa04b59798d1b62c66f18/model/pipeline/src/main/proto/beam_runner_api.proto#L837-L843 Java schemas will already let you use nullable types for keys and values in Maps. I doubt anyone is relying on it.. but there's a risk if we disallow it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463308693 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -530,6 +530,88 @@ def estimate_size(self, unused_value, nested=False): return 1 +class MapCoderImpl(StreamCoderImpl): + """For internal use only; no backwards-compatibility guarantees. + + A coder for typing.Mapping objects.""" + def __init__( + self, + key_coder, # type: CoderImpl + value_coder # type: CoderImpl + ): +self._key_coder = key_coder +self._value_coder = value_coder + + def encode_to_stream(self, value, out, nested): +size = len(value) +out.write_bigendian_int32(size) Review comment: Yeah I thought about that too. I was just trying to exactly replicate what we do in [MapCoder.java] (https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java) for now though. Do you think I should switch this and MapCoder.java over to using varint here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463308917 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -530,6 +530,88 @@ def estimate_size(self, unused_value, nested=False): return 1 +class MapCoderImpl(StreamCoderImpl): + """For internal use only; no backwards-compatibility guarantees. + + A coder for typing.Mapping objects.""" + def __init__( + self, + key_coder, # type: CoderImpl + value_coder # type: CoderImpl + ): +self._key_coder = key_coder +self._value_coder = value_coder + + def encode_to_stream(self, value, out, nested): +size = len(value) +out.write_bigendian_int32(size) +for i, kv in enumerate(value.items()): + key, value = kv + last = i == size - 1 Review comment: Same comment as above, this was just an attempt to replicate the Java logic. I could change them both if you think we should This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463308693 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -530,6 +530,88 @@ def estimate_size(self, unused_value, nested=False): return 1 +class MapCoderImpl(StreamCoderImpl): + """For internal use only; no backwards-compatibility guarantees. + + A coder for typing.Mapping objects.""" + def __init__( + self, + key_coder, # type: CoderImpl + value_coder # type: CoderImpl + ): +self._key_coder = key_coder +self._value_coder = value_coder + + def encode_to_stream(self, value, out, nested): +size = len(value) +out.write_bigendian_int32(size) Review comment: Yeah I thought about that too. I was just trying to exactly replicate what we do in [MapCoder.java](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java) for now though. Do you think I should switch this and MapCoder.java over to using varint here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463194957 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -855,10 +855,21 @@ message StandardCoders { // BOOLEAN: beam:coder:bool:v1 // BYTES: beam:coder:bytes:v1 // ArrayType: beam:coder:iterable:v1 (always has a known length) -// MapType: not yet a standard coder (BEAM-7996) +// MapType: not a standard coder, specification defined below. // RowType: beam:coder:row:v1 // LogicalType: Uses the coder for its representation. // +// The MapType is encoded by: +// - An INT32 representing the size of the map (N) +// - Followed by N interleaved keys and values, encoded with their +// corresponding coder. +// +// Nullable types in container types (ArrayType, MapType) are encoded by: +// - A one byte null indicator, 0x00 for null values, or 0x01 for present +// values. +// - For present values the null indicator is followed by the value +// encoded with it's corresponding coder. +// Review comment: FYI @lostluck - previously the encoding for MapType and for nullable types within containers was not documented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder
TheNeuralBit commented on a change in pull request #12426: URL: https://github.com/apache/beam/pull/12426#discussion_r463194527 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java ## @@ -366,12 +368,15 @@ private static Object parseField(Object value, Schema.FieldType fieldType) { .map((element) -> parseField(element, fieldType.getCollectionElementType())) .collect(toImmutableList()); case MAP: -Map kvMap = (Map) value; -return kvMap.entrySet().stream() -.collect( -toImmutableMap( -(pair) -> parseField(pair.getKey(), fieldType.getMapKeyType()), -(pair) -> parseField(pair.getValue(), fieldType.getMapValueType(; +Map kvMap = new HashMap<>(); +((Map) value) +.entrySet().stream() +.forEach( +(entry) -> +kvMap.put( +parseField(entry.getKey(), fieldType.getMapKeyType()), +parseField(entry.getValue(), fieldType.getMapValueType(; +return kvMap; Review comment: This change is necessary because `ImmutableMap` (as well as `Collectors.toMap`) does not allow null values, so it errors on the new test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org