Hi All,

I have a use case where I need to join a table with an index on itself in
order to find the latest entry. In doing so, I've come a case in Phoenix
(4.14.1) where I appear to get incorrect results if I do a join with
TIMESTAMPs where indexes are involved.

I have a suspicion that this is to down to the fact that TIMESTAMPs are
stored differently on index tables (at least based on what I see in the
result set meta data it is BigDecimal) and somehow conversion is not done
properly before joining. However, I thought I would reach out to check with
those that have more expertise in this area before I continue.

Attached, you can find a standalone program that you can run which
reproduces this issue. Here are the basic steps:

1) Create a table which effectively has 3 representations of the value - as
a TIMESTAMP, a VARCHAR and as a CHAR(29). Create indexes for each of these
representations:

CREATE TABLE IF NOT EXISTS TIME_TEST
(
   ID VARCHAR        NOT NULL,
   ENTRY_NAME        VARCHAR    ,
   TRANSACTION       TIMESTAMP  ,
   TRANSACTION_STR   VARCHAR    ,
   TRANSACTION_CHR   CHAR(29)
   CONSTRAINT pk PRIMARY KEY(ID)
)
IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,
UPDATE_CACHE_FREQUENCY=900000,
COLUMN_ENCODED_BYTES=NONE,
IMMUTABLE_ROWS=true

CREATE LOCAL INDEX TIME_TEST_I01 ON TIME_TEST(ENTRY_NAME, TRANSACTION DESC)

CREATE LOCAL INDEX TIME_TEST_I02 ON TIME_TEST(ENTRY_NAME, TRANSACTION_STR
DESC)

CREATE LOCAL INDEX TIME_TEST_I03 ON TIME_TEST(ENTRY_NAME, TRANSACTION_CHR
DESC)

2) Fill up the table with some values, in the different representations. In
the attached test-case I do 100 entries. Such that you have something like :

ID        ENTRY_NAME   TRANSACTION                       TRANSACTION_STR
                TRANSACTION_CHR
-------------------------------------------------------------------------------------------------------------------------
0010000   0010000      1970-01-01 01:00:00.00001         1970-01-01
00:00:00.000010000     1970-01-01 00:00:00.000010000
0010001   0010001      1970-01-01 01:00:00.000010001     1970-01-01
00:00:00.000010001     1970-01-01 00:00:00.000010001
0010002   0010002      1970-01-01 01:00:00.000010002     1970-01-01
00:00:00.000010002     1970-01-01 00:00:00.000010002
0010003   0010003      1970-01-01 01:00:00.000010003     1970-01-01
00:00:00.000010003     1970-01-01 00:00:00.000010003

3) Essentially do the same MAX / Join query using each of the different
values (e.g. TIMESTAMP vs. VARCHAR vs. CHAR(29)). They should all return
the same number of rows (e.g. the total number of rows on the table). Only
the one based on VARCHAR does:

SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
 INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION_STR) AS TRANSACTION_STR
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_STR =
sub.TRANSACTION_STR
WHERE TIME_TEST.ENTRY_NAME like '0%'

*TOTAL Rows : 100*

SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
 INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION) AS TRANSACTION
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION =
sub.TRANSACTION
WHERE TIME_TEST.ENTRY_NAME like '0%'

*TOTAL Rows : 1*

SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST
 INNER JOIN
(
SELECT ENTRY_NAME, MAX(TRANSACTION_CHR) AS TRANSACTION_CHR
FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME
) sub
ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_CHR=
sub.TRANSACTION_CHR
WHERE TIME_TEST.ENTRY_NAME like '0%'

*TOTAL Rows : 0*

Interestingly enough - in the final query, if I change the inner join
expression to be 'CAST( MAX(TRANSACTION_CHR) AS VARCHAR) AS TRANSACTION_CHR'
it works as expected. Hence, leading me to believe this is some sort of
column type conversion issue.

Either way, would appreciate it if anyone could give me any pointers on the
above.

Thanks,
package com.jsteenkamp.phoenix;

import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.Timestamp;
import java.text.DecimalFormat;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.TimeZone;


public class PhoenixTimestampEncodingTest
{
	public static final String JDBC_URL = "jdbc:phoenix:localhost:63214";
	private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.nnnnnnnnn");
	
	public static final int ITEMS_TO_INSERT = 100;
	
	public static final DecimalFormat decf = new DecimalFormat("0000000"); 
	public static final long STARTING_NANO = 10000L;
	static
	{
		TimeZone.setDefault(TimeZone.getTimeZone("UTC"));
	}
	
	public static void main(String... args) throws Exception
	{
		final java.sql.Connection conn = DriverManager.getConnection(JDBC_URL, "sa", "");
		final Statement stmt = conn.createStatement();
		
		stmt.execute("DROP TABLE IF EXISTS TIME_TEST ");
		
		String createTable = "CREATE TABLE IF NOT EXISTS TIME_TEST \n" + 
				"(	\n" + 
				"   ID									VARCHAR    NOT NULL,\n" + 
				"   ENTRY_NAME							VARCHAR    ,\n" +
				"   TRANSACTION							TIMESTAMP  ,\n" +
				"   TRANSACTION_STR						VARCHAR    ,\n" +
				"   TRANSACTION_CHR						CHAR(29)	\n" +
				"   CONSTRAINT pk PRIMARY KEY(ID)\n" + 
			    ") " +
				"IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,\n" + 
				"UPDATE_CACHE_FREQUENCY=900000,\n" + 
				"COLUMN_ENCODED_BYTES=NONE,\n" + 
				"IMMUTABLE_ROWS=true\n";
		
		String index_1 = "CREATE LOCAL INDEX TIME_TEST_I01 ON TIME_TEST(ENTRY_NAME, TRANSACTION DESC) \n";
		String index_2 = "CREATE LOCAL INDEX TIME_TEST_I02 ON TIME_TEST(ENTRY_NAME, TRANSACTION_STR DESC) \n";
		String index_3 = "CREATE LOCAL INDEX TIME_TEST_I03 ON TIME_TEST(ENTRY_NAME, TRANSACTION_CHR DESC) \n";
		
		System.out.println(createTable);
		stmt.execute(createTable);
		
		System.out.println(index_1);
		stmt.execute(index_1);
		System.out.println(index_2);
		stmt.execute(index_2);
		System.out.println(index_3);
		stmt.execute(index_3);
		
		final PreparedStatement updateStmt = conn.prepareStatement("UPSERT INTO TIME_TEST  VALUES(?, ?, ?, ?, ?)");
		
		for (int i = 0; i < ITEMS_TO_INSERT; i++)
		{
			long nano = STARTING_NANO + i;
			
			String item =  decf.format(nano );
			
			Instant instant = Instant.ofEpochSecond(0, nano );
			Timestamp timestamp = Timestamp.from(instant);
			String timestampStr = FORMATTER.format(timestamp.toLocalDateTime());
			
			updateStmt.setString(1, item);
			updateStmt.setString(2, item);
			updateStmt.setTimestamp(3, timestamp);
			updateStmt.setString(4, timestampStr);
			updateStmt.setString(5, timestampStr);
			
			updateStmt.executeUpdate();
			conn.commit();
		}
		
		System.out.println("Done updates \n");
		
		String queryOnTimestamp = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + 
				" INNER JOIN \n" + 
				"		 ( \n" + 
				"			 SELECT ENTRY_NAME, MAX(TRANSACTION) AS TRANSACTION\n" + 
				"			 FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + 
				"		 ) sub \n" + 
				"	 ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION = sub.TRANSACTION\n" + 
				"WHERE TIME_TEST.ENTRY_NAME like '0%'\n";
		
		System.out.println(queryOnTimestamp);
		
		ResultSet rs = stmt.executeQuery(queryOnTimestamp);
		int counter = 0;
		while(rs.next() == true)
		{
			counter++;
		}
		
		System.out.println("TOTAL Rows : " + counter + "\n");
		
		String queryOnVarchar = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + 
				" INNER JOIN \n" + 
				"		 ( \n" + 
				"			 SELECT ENTRY_NAME, MAX(TRANSACTION_STR) AS TRANSACTION_STR\n" + 
				"			 FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + 
				"		 ) sub \n" + 
				"	 ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_STR = sub.TRANSACTION_STR\n" + 
				"WHERE TIME_TEST.ENTRY_NAME like '0%'\n";
		
		System.out.println(queryOnVarchar);

		rs = stmt.executeQuery(queryOnVarchar);
		counter = 0;
		while(rs.next() == true)
		{
			counter++;
		}
		
		System.out.println("TOTAL Rows : " + counter + "\n");
		
		String queryOnChar = "SELECT TIME_TEST.ENTRY_NAME FROM TIME_TEST\n" + 
				" INNER JOIN \n" + 
				"		 ( \n" + 
				"			 SELECT ENTRY_NAME, MAX(TRANSACTION_CHR) AS TRANSACTION_CHR\n" + 
				"			 FROM TIME_TEST where ENTRY_NAME like '0%' GROUP BY ENTRY_NAME \n" + 
				"		 ) sub \n" + 
				"	 ON TIME_TEST.ENTRY_NAME = sub.ENTRY_NAME AND TIME_TEST.TRANSACTION_CHR= sub.TRANSACTION_CHR\n" + 
				"WHERE TIME_TEST.ENTRY_NAME like '0%'\n";
		
		System.out.println(queryOnChar);

		rs = stmt.executeQuery(queryOnChar);
		counter = 0;
		while(rs.next() == true)
		{
			counter++;
		}
		
		System.out.println("TOTAL Rows : " + counter + "\n");
 
		System.exit(0);
	}
}














Reply via email to