Re: Problems with Phoenix bulk loader when using row_timestamp feature
FYI… The sample data that I loaded in the table was based on the current timestamp with each additional row increasing that value by 1 minute so the current time up to 999,999 minutes into the future. Turns out this was a bug that prevents the scanner from reading timestamp values greater than the current time. More details here: https://issues.apache.org/jira/browse/PHOENIX-3176 From: default <rtemple...@hortonworks.com<mailto:rtemple...@hortonworks.com>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Wednesday, August 10, 2016 at 1:12 PM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature 0: jdbc:phoenix:localhost:2181> explain select count(*) from historian.data; +--+ | PLAN | +--+ | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA | | ROW TIMESTAMP FILTER [0, 1470852712807) | | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO SINGLE ROW | +--+ 4 rows selected (0.071 seconds) From: Samarth Jain <sama...@apache.org<mailto:sama...@apache.org>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Wednesday, August 10, 2016 at 12:05 AM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Problems with Phoenix bulk loader when using row_timestamp feature Ryan, Can you tell us what the explain plan says for the select count(*) query. - Samarth On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton <rtemple...@hortonworks.com<mailto:rtemple...@hortonworks.com>> wrote: I am working on a project that will be consuming sensor data. The “fact” table is defined as: CREATE TABLE historian.data ( assetid unsigned_int not null, metricid unsigned_int not null, ts timestamp not null, val double CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp)) IMMUTABLE_ROWS=true; I generated a 1million row csv sample dataset and use the Phoenix bulk loader to load this data up. The tool reports that all 1,000,000 rows were loaded successfully which I can confirm via sqlline. I then dropped and recreated the table to take advantage of the row_timestamp feature drop table historian.data; CREATE TABLE historian.data ( assetid unsigned_int not null, metricid unsigned_int not null, ts timestamp not null, val double CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp)) IMMUTABLE_ROWS=true; I reran the bulk loader utility which says it completed successfully [rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t HISTORIAN.DATA /tmp/data.csv SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. csv columns from database. CSV Upsert complete. 100 rows upserted Time: 65.985 sec(s) But when I run “select count(*) from historian.data” I see that only the first 572 rows appear in the table. These rows correlate to the the first 572 rows of the input file. 0: jdbc:phoenix:localhost:2181> select count(*) from historian.data; +--+ | COUNT(1) | +--+ | 572 | +--+ 1 row selected (4.541 seconds) 0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from historian.data; +--+--+ | MIN(TS) | MAX(TS) | +--+--+ | 2016-08-08 11:05:15.000 | 2016-08-08 20:36:15.000 | +--+—+ Any ideas? Thanks, Ryan
Re: Problems with Phoenix bulk loader when using row_timestamp feature
Samarth, filed PHOENIX-3176 for the same. On Wed, Aug 10, 2016 at 11:42 PM, Ryan Templeton <rtemple...@hortonworks.com > wrote: > 0: jdbc:phoenix:localhost:2181> explain select count(*) from > historian.data; > > *+--+* > > *| * * PLAN ** |* > > *+--+* > > *| * CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER HISTORIAN.DATA* |* > > *| * ROW TIMESTAMP FILTER [0, 1470852712807)* |* > > *| * SERVER FILTER BY FIRST KEY ONLY * |* > > *| * SERVER AGGREGATE INTO SINGLE ROW* |* > > *+--+* > > 4 rows selected (0.071 seconds) > > From: Samarth Jain <sama...@apache.org> > Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org> > Date: Wednesday, August 10, 2016 at 12:05 AM > To: "user@phoenix.apache.org" <user@phoenix.apache.org> > Subject: Re: Problems with Phoenix bulk loader when using row_timestamp > feature > > Ryan, > > Can you tell us what the explain plan says for the select count(*) query. > > - Samarth > > > On Tue, Aug 9, 2016 at 12:58 PM, Ryan Templeton < > rtemple...@hortonworks.com> wrote: > >> I am working on a project that will be consuming sensor data. The “fact” >> table is defined as: >> >> CREATE TABLE historian.data ( >> assetid unsigned_int not null, >> metricid unsigned_int not null, >> ts timestamp not null, >> val double >> CONSTRAINT pk PRIMARY KEY (assetid, metricid, tsp)) >> IMMUTABLE_ROWS=true; >> >> I generated a 1million row csv sample dataset and use the Phoenix bulk >> loader to load this data up. The tool reports that all 1,000,000 rows were >> loaded successfully which I can confirm via sqlline. >> >> I then dropped and recreated the table to take advantage of the >> row_timestamp feature >> >> drop table historian.data; >> CREATE TABLE historian.data ( >> assetid unsigned_int not null, >> metricid unsigned_int not null, >> ts timestamp not null, >> val double >> CONSTRAINT pk PRIMARY KEY (assetid, metricid, ts row_timestamp)) >> IMMUTABLE_ROWS=true; >> >> I reran the bulk loader utility which says it completed successfully >> >> [rtempleton@M1 phoenix-client]$ bin/psql.py localhost:2181 -t >> HISTORIAN.DATA /tmp/data.csv >> >> SLF4J: Class path contains multiple SLF4J bindings. >> >> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180 >> /phoenix/phoenix-4.4.0.2.4.3.0-180-client.jar!/org/slf4j/im >> pl/StaticLoggerBinder.class] >> >> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-180 >> /hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticL >> oggerBinder.class] >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> explanation. >> >> 16/08/08 20:34:43 WARN util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> >> 16/08/08 20:34:44 WARN shortcircuit.DomainSocketFactory: The >> short-circuit local reads feature cannot be used because libhadoop cannot >> be loaded. >> >> csv columns from database. >> >> CSV Upsert complete. 100 rows upserted >> >> Time: 65.985 sec(s) >> >> But when I run “select count(*) from historian.data” I see that only the >> first 572 rows appear in the table. These rows correlate to the the first >> 572 rows of the input file. >> >> 0: jdbc:phoenix:localhost:2181> select count(*) from historian.data; >> >> *+--+* >> >> *| **COUNT(1)** |* >> >> *+--+* >> >> *| *572 * |* >> >> *+--+* >> >> 1 row selected (4.541 seconds) >> >> 0: jdbc:phoenix:localhost:2181> select min(ts), max(ts) from >> historian.data; >> >> >> *+--+--+* >> >> *| **MIN(TS) ** | ** >> MAX(TS) ** |* >> >> >> *+--+--+* >> >> *| *2016-08-08 11:05:15.000 * | *2016-08-08 >> 20:36:15.000 * |* >> >> *+--+—+* >> >> >> >> >> Any ideas? >> >> >> Thanks, >> Ryan >> > >