Hi All,
I have a use case where I need to join a table with an index on itself in
order to find the latest entry. In doing so, I've come a case in Phoenix
(4.14.1) where I appear to get incorrect results if I do a join with
TIMESTAMPs where indexes are involved.
I have a suspicion that this is to down to the fact that TIMESTAMPs are
stored differently on index tables (at least based on what I see in the
result set meta data it is BigDecimal) and somehow conversion is not done
properly before joining. However, I thought I would reach out to check with
those that have more expertise in this area before I continue.
Attached, you can find a standalone program that you can run which
reproduces this issue. Here are the basic steps:
1) Create a table which effectively has 3 representations of the value - as
a TIMESTAMP, a VARCHAR and as a CHAR(29). Create indexes for each of these
representations:
CREATE TABLE IF NOT EXISTS TIME_TEST
(
ID VARCHARNOT NULL,
ENTRY_NAMEVARCHAR,
TRANSACTION TIMESTAMP ,
TRANSACTION_STR VARCHAR,
TRANSACTION_CHR CHAR(29)
CONSTRAINT pk PRIMARY KEY(ID)
)
IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,
UPDATE_CACHE_FREQUENCY=90,
COLUMN_ENCODED_BYTES=NONE,
IMMUTABLE_ROWS=true
CREATE LOCAL INDEX TIME_TEST_I01 ON TIME_TEST(ENTRY_NAME, TRANSACTION DESC)
CREATE LOCAL INDEX TIME_TEST_I02 ON TIME_TEST(ENTRY_NAME, TRANSACTION_STR
DESC)
CREATE LOCAL INDEX TIME_TEST_I03 ON TIME_TEST(ENTRY_NAME, TRANSACTION_CHR
DESC)
2) Fill up the table with some values, in the different representations. In
the attached test-case I do 100 entries. Such that you have something like :
IDENTRY_NAME TRANSACTION TRANSACTION_STR
TRANSACTION_CHR
-
001 001 1970-01-01 01:00:00.1 1970-01-01
00:00:00.1 1970-01-01 00:00:00.1
0010001 0010001 1970-01-01 01:00:00.10001 1970-01-01
00:00:00.10001 1970-01-01 00:00:00.10001
0010002 0010002 1970-01-01 01:00:00.10002 1970-01-01
00:00:00.10002 1970-01-01 00:00:00.10002
0010003 0010003 1970-01-01 01:00:00.10003 1970-01-01
00:00:00.10003 1970-01-01 00:00:00.10003
3) Essentially do the same MAX / Join query using each of the different
values (e.g. TIMESTAMP vs. VARCHAR vs. CHAR(29)). They should all return
the same number of rows (e.g. the total number of rows on the table). Only
the one based on VARCHAR does:
SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION_STR) AS TRANSACTION_STR
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_STR =
sub.TRANSACTION_STR
WHERE TIME_TEST.ENTRY_NAME like '0%'
*TOTAL Rows : 100*
SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION) AS TRANSACTION
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION =
sub.TRANSACTION
WHERE TIME_TEST.ENTRY_NAME like '0%'
*TOTAL Rows : 1*
SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION_CHR) AS TRANSACTION_CHR
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_CHR=
sub.TRANSACTION_CHR
WHERE TIME_TEST.ENTRY_NAME like '0%'
*TOTAL Rows : 0*
Interestingly enough - in the final query, if I change the inner join
expression to be 'CAST( MAX(TRANSACTION_CHR) AS VARCHAR) AS TRANSACTION_CHR'
it works as expected. Hence, leading me to believe this is some sort of
column type conversion issue.
Either way, would appreciate it if anyone could give me any pointers on the
above.
Thanks,
package com.jsteenkamp.phoenix;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.Timestamp;
import java.text.DecimalFormat;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.TimeZone;
public class PhoenixTimestampEncodingTest
{
public static final String JDBC_URL = "jdbc:phoenix:localhost:63214";
private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss.n");
public static final int ITEMS_TO_INSERT = 100;
public static final DecimalFormat decf = new DecimalFormat("000");
public static final long STARTING_NANO = 1L;
static
{
TimeZone.setDefault(TimeZone.getTimeZone("UTC"));
}
public static void main(String... args) throws Exception
{
final java.sql.Connection conn = DriverManager.getConnection(JDBC_URL, "sa", "");
final Statement stmt = conn.createStatement();
stmt.execute("DROP TABLE IF EXISTS TIME_TEST ");
String