[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465224502



##
File path: 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##
@@ -384,3 +384,31 @@ nested: false
 examples:
   "\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
   "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+#   f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None, 
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+#   print("example = %s" % coder.encode(example))
+coder:
+  urn: "beam:coder:row:v1"
+  # f_map: map
+  payload: 
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+  # map ordering is non-deterministic
+  non_deterministic: True
+nested: false

Review comment:
   Not specifying it would be ideal, as it avoids the hack, so the SDK can 
only run the "nested" tests, just like the other values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465223398



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   @robertwb Your comment fed my own misunderstanding. It is possible to 
declare a map in schemas as not having null/keys values, but not necessarily on 
the SDK side.
   Technically, there's no reason that the SDK can't use a non-nullable 
containing version of the container if the Key and Value components are not 
themselves marked as nillable. IIRC, the Java SDK *could* converted 
ImmutableMaps or similar into just non-nullable Key and non-nullable Value 
types. 
   The issue as I'm understanding it is that the limitation is on the SDK 
Language side, rather than the schema specification side, as discussed the 
schemas fields can individually have their nullable bits set.
   Eg. Go doesn't have this ambiguity for map types.
   On the other hand, in Go, Iterable/array types which will be represented by 
slices *will* have this ambiguity when used as a field, as they can be nil, and 
could also still be pointers to said reference types. That ambiguity is well 
known enough that pointers to reference types (maps, slices, chans..) are 
strongly discouraged in idiomatic Go.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465223398



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   @robertwb Your comment fed my own misunderstanding. It is possible to 
declare a map as not having null/keys values.
   Technically, there's no reason that the SDK can't use a non-nullable 
containing version of the container if the Key and Value components are not 
themselves marked as nillable. IIRC, the Java SDK *could* converted 
ImmutableMaps or similar into just non-nullable Key and non-nullable Value 
types. 
   The issue as I'm understanding it is that the limitation is on the SDK 
Language side, rather than the schema specification side, as discussed the 
schemas fields can individually have their nullable bits set.
   Eg. Go doesn't have this ambiguity for map types.
   On the other hand, in Go, Iterable/array types which will be represented by 
slices *will* have this ambiguity when used as a field, as they can be nil, and 
could also still be pointers to said reference types. That ambiguity is well 
known enough that pointers to reference types (maps, slices, chans..) are 
strongly discouraged in idiomatic Go.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465202931



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   Also, this requirement means that *all* values must be pointers (or 
reference types) in Go as ordinary primitives cannot be nullable. That feels 
very strange.
   
   EDIT: Please disregard my last comments, I misread that this only applies to 
when the field is specified as nullable. 
   I misunderstood that the discussion is orthogonal to that (whether to allow 
nullable map components at all.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465213462



##
File path: 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##
@@ -384,3 +384,31 @@ nested: false
 examples:
   "\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
   "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+#   f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None, 
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+#   print("example = %s" % coder.encode(example))
+coder:
+  urn: "beam:coder:row:v1"
+  # f_map: map
+  payload: 
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+  # map ordering is non-deterministic
+  non_deterministic: True
+nested: false

Review comment:
   Ack. Thank you. I figured as much. If it's not easy to clean up, I'm 
fine with the Go coder test blaming "Legacy Java concerns" in a comment for the 
hackaround.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465202931



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   Also, this requirement means that *all* values must be pointers (or 
reference types) in Go as ordinary primitives cannot be nullable. That feels 
very strange.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465200619



##
File path: 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##
@@ -384,3 +384,31 @@ nested: false
 examples:
   "\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
   "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+#   f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None, 
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+#   print("example = %s" % coder.encode(example))
+coder:
+  urn: "beam:coder:row:v1"
+  # f_map: map
+  payload: 
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+  # map ordering is non-deterministic
+  non_deterministic: True
+nested: false
+examples:
+  "\x01\x00\x00\x00\x00\x00": {f_map:{}}
+  
"\x01\x00\x00\x00\x00\x02\x03foo\x01\xa9F\x03bar\x01\xff\xff\xff\xff\xff\xff\xff\xff\x7f":
 {f_map:{"foo": 9001, "bar": 9223372036854775807}}
+  
"\x01\x00\x00\x00\x00\x04\neverything\x00\x02is\x00\x05null!\x00\r\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf\x00":
 {f_map:{"everything":null, "is": null, "null!": null, "¯\\_(ツ)_/¯": null}}

Review comment:
   There needs to be a space between f_map: and {} in all three examples. 
Otherwise it doesn't parse as valid yaml.
   https://yaml.org/spec/1.2/spec.html#id2759963  `Mappings use a colon and 
space  (": ") to mark each key: value pair.`
   
   I copied the example into my local working copy so I could get the code 
going, and the Go YAML parser I'm using is strict.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465200619



##
File path: 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##
@@ -384,3 +384,31 @@ nested: false
 examples:
   "\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
   "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+#   f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None, 
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+#   print("example = %s" % coder.encode(example))
+coder:
+  urn: "beam:coder:row:v1"
+  # f_map: map
+  payload: 
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+  # map ordering is non-deterministic
+  non_deterministic: True
+nested: false
+examples:
+  "\x01\x00\x00\x00\x00\x00": {f_map:{}}
+  
"\x01\x00\x00\x00\x00\x02\x03foo\x01\xa9F\x03bar\x01\xff\xff\xff\xff\xff\xff\xff\xff\x7f":
 {f_map:{"foo": 9001, "bar": 9223372036854775807}}
+  
"\x01\x00\x00\x00\x00\x04\neverything\x00\x02is\x00\x05null!\x00\r\xc2\xaf\\_(\xe3\x83\x84)_/\xc2\xaf\x00":
 {f_map:{"everything":null, "is": null, "null!": null, "¯\\_(ツ)_/¯": null}}

Review comment:
   There needs to be a space between f_map: and {}. Otherwise it doesn't 
parse as valid yaml.
   https://yaml.org/spec/1.2/spec.html#id2759963
   
   I copied the example into my local working copy so I could get the code 
going, and the Go YAML parser I'm using is strict.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465189010



##
File path: 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##
@@ -384,3 +384,31 @@ nested: false
 examples:
   "\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
   "\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+#   f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None, 
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+#   print("example = %s" % coder.encode(example))
+coder:
+  urn: "beam:coder:row:v1"
+  # f_map: map
+  payload: 
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+  # map ordering is non-deterministic
+  non_deterministic: True
+nested: false

Review comment:
   As it stands, this is confusing for SDK authors writing tests against 
standard_coders.yaml, as I've got the go testing written I need to explicitly 
ignore the nested field for the row coders because they're all set to 
nested:false, rather than nested:true.
   
   This is per my thread on the dev list: 
https://lists.apache.org/thread.html/r7da098363e6ce607ce96f9fbedb08f9f4757bedd68846aaeba5dd4f0%40%3Cdev.beam.apache.org%3E
   
   Portability only ever supports nested coders. The semantics of 
standard_coders.yaml say that 
   ```
   #   nested: a boolean meaning whether the coder was used in the nested 
context. Missing means to
   #   test both contexts, a shorthand for when the coder is invariant 
across context.
   ```
   
https://github.com/apache/beam/blob/587dde57cbb2b0095a1fa04b59798d1b62c66f18/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L24
   Meaning that nested: false means that the outer most encoding has the length 
prefix if necessary.
   
   Structually, there's never a reason for a single schema value to have the 
wrapped length prefix (it's orthogonal to this aspect of the encoding, as any 
sub component is always nested as needed), so it's not included in the various 
payload examples.
   
   So, I re-iterate: Why is nested: false, instead of nested true if the coding 
is going to be identical in both context?
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-04 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465181033



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   So the verdict here is to ignore the nullability field of a FieldType 
when it's nested in a Map or Array?
   
   That's... unfortunate. Must we bend over backwards to maintain compatibility 
with Java's previous encoding, which wasn't officially a schema encoding until 
this PR? It seems very strange that we're already relying on unspecified 
behaviour.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-08-03 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r464647966



##
File path: sdks/python/apache_beam/coders/coder_impl.py
##
@@ -530,6 +530,88 @@ def estimate_size(self, unused_value, nested=False):
 return 1
 
 
+class MapCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees.
+
+  A coder for typing.Mapping objects."""
+  def __init__(
+  self,
+  key_coder,  # type: CoderImpl
+  value_coder  # type: CoderImpl
+  ):
+self._key_coder = key_coder
+self._value_coder = value_coder
+
+  def encode_to_stream(self, value, out, nested):
+size = len(value)
+out.write_bigendian_int32(size)

Review comment:
   As a practical matter, int32 max of any amount of data is bigger than 
our GRPC limits for receiving single values into an SDK. At least, until we add 
a large value protocol to stream in single large values somewhere.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #12426: [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder

2020-07-30 Thread GitBox


lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r463208404



##
File path: model/pipeline/src/main/proto/beam_runner_api.proto
##
@@ -855,10 +855,21 @@ message StandardCoders {
 // BOOLEAN:   beam:coder:bool:v1
 // BYTES: beam:coder:bytes:v1
 //   ArrayType:   beam:coder:iterable:v1 (always has a known length)
-//   MapType: not yet a standard coder (BEAM-7996)
+//   MapType: not a standard coder, specification defined below.
 //   RowType: beam:coder:row:v1
 //   LogicalType: Uses the coder for its representation.
 //
+// The MapType is encoded by:
+//   - An INT32 representing the size of the map (N)
+//   - Followed by N interleaved keys and values, encoded with their
+// corresponding coder.
+//
+// Nullable types in container types (ArrayType, MapType) are encoded by:
+//   - A one byte null indicator, 0x00 for null values, or 0x01 for present
+// values.
+//   - For present values the null indicator is followed by the value
+// encoded with it's corresponding coder.
+//

Review comment:
   Ack! Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org