0: jdbc:phoenix:localhost:2181> explain select count(*) from historian.data;
+------------------------------------------+ | PLAN | +------------------------------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA | | ROW TIMESTAMP FILTER [0, 1470852712807) | | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO SINGLE ROW | +------------------------------------------+ 4 rows selected (0.071 seconds) From: Samarth Jain <sama...@apache.org<mailto:sama...@apache.org>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Wednesday, August 10, 2016 at 12:05 AM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature Ryan, Can you tell us what the explain plan says for the select count(*) query. - Samarth On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <rtemple...@hortonworks.com<mailto:rtemple...@hortonworks.com>> wrote: I am working on a project that will be consuming sensor data. The “fact” table is defined as: CREATE TABLE historian.data ( assetid unsigned_int not null, metricid unsigned_int not null, ts timestamp not null, val double CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp)) IMMUTABLE_ROWS=true; I generated a 1million row csv sample dataset and use the Phoenix bulk loader to load this data up. The tool reports that all 1,000,000 rows were loaded successfully which I can confirm via sqlline. I then dropped and recreated the table to take advantage of the row_timestamp feature drop table historian.data; CREATE TABLE historian.data ( assetid unsigned_int not null, metricid unsigned_int not null, ts timestamp not null, val double CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp)) IMMUTABLE_ROWS=true; I reran the bulk loader utility which says it completed successfully [rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t HISTORIAN.DATA /tmp/data.csv SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. csv columns from database. CSV Upsert complete. 1000000 rows upserted Time: 65.985 sec(s) But when I run “select count(*) from historian.data” I see that only the first 572 rows appear in the table. These rows correlate to the the first 572 rows of the input file. 0: jdbc:phoenix:localhost:2181> select count(*) from historian.data; +------------------------------------------+ | COUNT(1) | +------------------------------------------+ | 572 | +------------------------------------------+ 1 row selected (4.541 seconds) 0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from historian.data; +------------------------------------------+------------------------------------------+ | MIN(TS) | MAX(TS) | +------------------------------------------+------------------------------------------+ | 2016-08-08 11:05:15.000 | 2016-08-08 20:36:15.000 | +------------------------------------------+—————————————————————+ Any ideas? Thanks, Ryan