Re: Fixing equality of Rows

2018-10-29 Thread Rui Wang
I might misunderstand what portability is in Beam. If the portability is designed as each SDK has its own representation of something and after that it's converted to portable representation, then wrapping byte[] into an object is fine. -Rui On Mon, Oct 29, 2018 at 11:26 AM Gleb Kanterov wrote:

Re: Fixing equality of Rows

2018-10-29 Thread Gleb Kanterov
Rui, I'm not completely sure I understand why it isn't possible to find suitable encoding for portability. As I understand, the only requirement is deterministic encoding consistent with equality, so existing representation of BYTES will work (VarInt followed by bytes). In my understanding, it's

Re: Fixing equality of Rows

2018-10-29 Thread Lukasz Cwik
I believe Kenn is spot on. The focus of the issue is too narrow as your talking about the short term problem related to Map. Schemas are very similar to coders and coders have been solving this problem by delegating to the underlying component coder to figure out whether two things are equal. You

Re: Fixing equality of Rows

2018-10-29 Thread Rui Wang
Seems to me that Only Map's quality check cannot be solved by deepEquals because Keys cannot be looked up correctly in Map. If we cannot have a useful use case for Map, we could reject it in Schema and still keep byte[]. The option3 needs to find a wrapper of byte[] that is language-independent

Re: Fixing equality of Rows

2018-10-29 Thread Gleb Kanterov
There is an indirect connection to RowCoder because `MapCoder` isn't deterministic, therefore, this doesn't hold: > - also each type (hence Row type) should have portable encoding(s) that respect this equality so shuffling is consistent I think it's a requirement only for rows we want to

Re: Fixing equality of Rows

2018-10-29 Thread Anton Kedin
About these specific use cases, how useful is it to support Map and List? These seem pretty exotic (maybe they aren't) and I wonder whether it would make sense to just reject them until we have a solid design. And wouldn't the same problems arise even without RowCoder? Is the path in that case to

Re: Fixing equality of Rows

2018-10-29 Thread Kenneth Knowles
I'll summarize my input to the discussion. It is rather high level. But IMO: - even though schemas are part of Beam Java today, I think they should become part of portability when ready - so each type in a schema needs a language-independent & encoding-independent notion of domain of values and

Fixing equality of Rows

2018-10-29 Thread Gleb Kanterov
With adding BYTES type, we broke equality. `RowCoder#consistentWithEquals` is always true, but this property doesn't hold for exotic types such as `Map`, `List`. The root cause is `byte[]`, where `equals` is implemented as reference equality instead of structural. Before we jump into solution