Let's say I have a list of users in PCollection<String> - userIds. And I
want to do a "select * from table where userId=?" for each user. What is
the recommended approach of doing this in beam?
I have seen approaches where we first turn userIds into a
PCollection<ReadOperation>, then feed them to SpannerIO. Something like
this:
PCollection<String> userIds = ...
userIds.apply(
MapElements.into(TypeDescriptor.of(ReadOperation.*class*))
.via((SerializableFunction<Struct, ReadOperation>) input -> {
String userId = input.getString(0);
*return* ReadOperation.create().*withQuery*
<https://www.tabnine.com/code/java/methods/org.apache.beam.sdk.io.gcp.spanner.ReadOperation/withQuery>("select
* from table where user_id=" + userId);
})).apply(SpannerIO.readAll().withSpannerConfig(spannerConfig));
But is this the most scalable approach since it looks like we will be
sending 1 sql statement per user to Spanner, as opposed to doing an actual
batch query like "select * from table where userId in {...userId
list...}".
Thanks,
Nick