I'm trying to write a relatively simple plan using the table API, and I'm getting horrific performance on my joins. I discovered after looking at the execution plan in the web UI that a number of the joins had NoUniqueKey on one or both sides of the join. I couldn't understand this as all of my tables should have had well defined unique keys at all times.
However, diving deeper I've discovered that doing any kind of select operation strips my table of its primary key, and I can't fathom why it would do that. For instsance, I have a GROUPS table that has a schema like this... root |-- id: BIGINT NOT NULL |-- uuid: STRING |-- name: STRING |-- has_children: BOOLEAN |-- organization_id: BIGINT |-- CONSTRAINT PK_3386 PRIMARY KEY (id) When I execute *groups = groups.select($("id"),$("organization_id")); *the output suddenly turns into this... root |-- id: BIGINT NOT NULL |-- organization_id: BIGINT The original PK column is being selected, so why would the constraint not be passed along to the new table, ensuring that a subsequent join would have optimal performance. In this case I could work around the problem, but in my real-world use case I need to do a select passing multiple columns to a scalar function which then outputs one of two columns depending on the value of a third (yes, I've made a transistor). If there's no way of getting a select to preserve a pre-existing PK, is there any way i can explicitly tell a new derived table that it should treat one or more columns as it's primary key? regards, Brad