I'm trying to write a relatively simple plan using the table API, and I'm
getting horrific performance on my joins.  I discovered after looking at
the execution plan in the web UI that a number of the joins had NoUniqueKey
on one or both sides of the join.  I couldn't understand this as all of my
tables should have had well defined unique keys at all times.

However, diving deeper I've discovered that doing any kind of select
operation strips my table of its primary key, and I can't fathom why it
would do that.

For instsance, I have a GROUPS table that has a schema like this...

root
 |-- id: BIGINT NOT NULL
 |-- uuid: STRING
 |-- name: STRING
 |-- has_children: BOOLEAN
 |-- organization_id: BIGINT
 |-- CONSTRAINT PK_3386 PRIMARY KEY (id)

When I execute  *groups = groups.select($("id"),$("organization_id")); *the
output suddenly turns into this...

root
 |-- id: BIGINT NOT NULL
 |-- organization_id: BIGINT

The original PK column is being selected, so why would the constraint not
be passed along to the new table, ensuring that a subsequent join would
have optimal performance.

In this case I could work around the problem, but in my real-world use case
I need to do a select passing multiple columns to a scalar function which
then outputs one of two columns depending on the value of a third (yes,
I've made a transistor).

If there's no way of getting a select to preserve a pre-existing PK, is
there any way i can explicitly tell a new derived table that it should
treat one or more columns as it's primary key?

regards,
Brad

Reply via email to