Hi All, I have a use case where I need to join a table with an index on itself in order to find the latest entry. In doing so, I've come a case in Phoenix (4.14.1) where I appear to get incorrect results if I do a join with TIMESTAMPs where indexes are involved.
I have a suspicion that this is to down to the fact that TIMESTAMPs are stored differently on index tables (at least based on what I see in the result set meta data it is BigDecimal) and somehow conversion is not done properly before joining. However, I thought I would reach out to check with those that have more expertise in this area before I continue. Attached, you can find a standalone program that you can run which reproduces this issue. Here are the basic steps: 1) Create a table which effectively has 3 representations of the value - as a TIMESTAMP, a VARCHAR and as a CHAR(29). Create indexes for each of these representations: CREATE TABLE IF NOT EXISTS TIME_TEST ( ID VARCHAR NOT NULL, ENTRY_NAME VARCHAR , TRANSACTION TIMESTAMP , TRANSACTION_STR VARCHAR , TRANSACTION_CHR CHAR(29) CONSTRAINT pk PRIMARY KEY(ID) ) IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN, UPDATE_CACHE_FREQUENCY=900000, COLUMN_ENCODED_BYTES=NONE, IMMUTABLE_ROWS=true CREATE LOCAL INDEX TIME_TEST_I01 ON TIME_TEST(ENTRY_NAME, TRANSACTION DESC) CREATE LOCAL INDEX TIME_TEST_I02 ON TIME_TEST(ENTRY_NAME, TRANSACTION_STR DESC) CREATE LOCAL INDEX TIME_TEST_I03 ON TIME_TEST(ENTRY_NAME, TRANSACTION_CHR DESC) 2) Fill up the table with some values, in the different representations. In the attached test-case I do 100 entries. Such that you have something like : ID ENTRY_NAME TRANSACTION TRANSACTION_STR TRANSACTION_CHR ------------------------------------------------------------------------------------------------------------------------- 0010000 0010000 1970-01-01 01:00:00.00001 1970-01-01 00:00:00.000010000 1970-01-01 00:00:00.000010000 0010001 0010001 1970-01-01 01:00:00.000010001 1970-01-01 00:00:00.000010001 1970-01-01 00:00:00.000010001 0010002 0010002 1970-01-01 01:00:00.000010002 1970-01-01 00:00:00.000010002 1970-01-01 00:00:00.000010002 0010003 0010003 1970-01-01 01:00:00.000010003 1970-01-01 00:00:00.000010003 1970-01-01 00:00:00.000010003 3) Essentially do the same MAX / Join query using each of the different values (e.g. TIMESTAMP vs. VARCHAR vs. CHAR(29)). They should all return the same number of rows (e.g. the total number of rows on the table). Only the one based on VARCHAR does: SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST INNER JOIN ( SELECT ENTRY_NAME, MAX(TRANSACTION_STR) AS TRANSACTION_STR FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME ) sub ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_STR = sub.TRANSACTION_STR WHERE TIME_TEST.ENTRY_NAME like '0%' *TOTAL Rows : 100* SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST INNER JOIN ( SELECT ENTRY_NAME, MAX(TRANSACTION) AS TRANSACTION FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME ) sub ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION = sub.TRANSACTION WHERE TIME_TEST.ENTRY_NAME like '0%' *TOTAL Rows : 1* SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST INNER JOIN ( SELECT ENTRY_NAME, MAX(TRANSACTION_CHR) AS TRANSACTION_CHR FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME ) sub ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_CHR= sub.TRANSACTION_CHR WHERE TIME_TEST.ENTRY_NAME like '0%' *TOTAL Rows : 0* Interestingly enough - in the final query, if I change the inner join expression to be 'CAST( MAX(TRANSACTION_CHR) AS VARCHAR) AS TRANSACTION_CHR' it works as expected. Hence, leading me to believe this is some sort of column type conversion issue. Either way, would appreciate it if anyone could give me any pointers on the above. Thanks,
package com.jsteenkamp.phoenix; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Statement; import java.sql.Timestamp; import java.text.DecimalFormat; import java.time.Instant; import java.time.format.DateTimeFormatter; import java.util.TimeZone; public class PhoenixTimestampEncodingTest { public static final String JDBC_URL = "jdbc:phoenix:localhost:63214"; private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.nnnnnnnnn"); public static final int ITEMS_TO_INSERT = 100; public static final DecimalFormat decf = new DecimalFormat("0000000"); public static final long STARTING_NANO = 10000L; static { TimeZone.setDefault(TimeZone.getTimeZone("UTC")); } public static void main(String... args) throws Exception { final java.sql.Connection conn = DriverManager.getConnection(JDBC_URL, "sa", ""); final Statement stmt = conn.createStatement(); stmt.execute("DROP TABLE IF EXISTS TIME_TEST "); String createTable = "CREATE TABLE IF NOT EXISTS TIME_TEST \n" + "( \n" + " ID VARCHAR NOT NULL,\n" + " ENTRY_NAME VARCHAR ,\n" + " TRANSACTION TIMESTAMP ,\n" + " TRANSACTION_STR VARCHAR ,\n" + " TRANSACTION_CHR CHAR(29) \n" + " CONSTRAINT pk PRIMARY KEY(ID)\n" + ") " + "IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,\n" + "UPDATE_CACHE_FREQUENCY=900000,\n" + "COLUMN_ENCODED_BYTES=NONE,\n" + "IMMUTABLE_ROWS=true\n"; String index_1 = "CREATE LOCAL INDEX TIME_TEST_I01 ON TIME_TEST(ENTRY_NAME, TRANSACTION DESC) \n"; String index_2 = "CREATE LOCAL INDEX TIME_TEST_I02 ON TIME_TEST(ENTRY_NAME, TRANSACTION_STR DESC) \n"; String index_3 = "CREATE LOCAL INDEX TIME_TEST_I03 ON TIME_TEST(ENTRY_NAME, TRANSACTION_CHR DESC) \n"; System.out.println(createTable); stmt.execute(createTable); System.out.println(index_1); stmt.execute(index_1); System.out.println(index_2); stmt.execute(index_2); System.out.println(index_3); stmt.execute(index_3); final PreparedStatement updateStmt = conn.prepareStatement("UPSERT INTO TIME_TEST VALUES(?, ?, ?, ?, ?)"); for (int i = 0; i < ITEMS_TO_INSERT; i++) { long nano = STARTING_NANO + i; String item = decf.format(nano ); Instant instant = Instant.ofEpochSecond(0, nano ); Timestamp timestamp = Timestamp.from(instant); String timestampStr = FORMATTER.format(timestamp.toLocalDateTime()); updateStmt.setString(1, item); updateStmt.setString(2, item); updateStmt.setTimestamp(3, timestamp); updateStmt.setString(4, timestampStr); updateStmt.setString(5, timestampStr); updateStmt.executeUpdate(); conn.commit(); } System.out.println("Done updates \n"); String queryOnTimestamp = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + " INNER JOIN \n" + " ( \n" + " SELECT ENTRY_NAME, MAX(TRANSACTION) AS TRANSACTION\n" + " FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + " ) sub \n" + " ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION = sub.TRANSACTION\n" + "WHERE TIME_TEST.ENTRY_NAME like '0%'\n"; System.out.println(queryOnTimestamp); ResultSet rs = stmt.executeQuery(queryOnTimestamp); int counter = 0; while(rs.next() == true) { counter++; } System.out.println("TOTAL Rows : " + counter + "\n"); String queryOnVarchar = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + " INNER JOIN \n" + " ( \n" + " SELECT ENTRY_NAME, MAX(TRANSACTION_STR) AS TRANSACTION_STR\n" + " FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + " ) sub \n" + " ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_STR = sub.TRANSACTION_STR\n" + "WHERE TIME_TEST.ENTRY_NAME like '0%'\n"; System.out.println(queryOnVarchar); rs = stmt.executeQuery(queryOnVarchar); counter = 0; while(rs.next() == true) { counter++; } System.out.println("TOTAL Rows : " + counter + "\n"); String queryOnChar = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + " INNER JOIN \n" + " ( \n" + " SELECT ENTRY_NAME, MAX(TRANSACTION_CHR) AS TRANSACTION_CHR\n" + " FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + " ) sub \n" + " ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_CHR= sub.TRANSACTION_CHR\n" + "WHERE TIME_TEST.ENTRY_NAME like '0%'\n"; System.out.println(queryOnChar); rs = stmt.executeQuery(queryOnChar); counter = 0; while(rs.next() == true) { counter++; } System.out.println("TOTAL Rows : " + counter + "\n"); System.exit(0); } }