Github user henryr commented on the issue:
https://github.com/apache/spark/pull/19683
My guess is that it's safe to do so in our case because of the immediate
projection that happens. In general, emitting JoinedRows where the RHS row is
shared between all JoinedRows could be a problem if some operator mutates that
RHS row. There may be other issues that I'm not aware of (perhaps memory
management concerns about transferring resources between operators).
@davies @cloud-fan you guys collaborated on SPARK-13476 - do you have any
input on whether it's safe for us to skip the projection inside the generator
if we know the next operator will immediately do a projection? (For context:
projecting all JoinedRows into UnsafeRows inside a generator is very very slow
if the input row is huge (e.g. contains a large array), and that's usually
wasted work because most of the input row gets projected out immediately
anyhow).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]