I created https://issues.apache.org/jira/browse/PHOENIX-914
You can assign it to me, James -Vladimir On Sun, Apr 6, 2014 at 11:40 AM, James Taylor <[email protected]>wrote: > Thanks, Vladimir - that's a very nice writeup. Would you mind adding it as > a comment on https://issues.apache.org/jira/browse/PHOENIX-590? Also, > would you be interested in contributing this functionality? > Regards, > James > > > On Sun, Apr 6, 2014 at 11:10 AM, Vladimir Rodionov <[email protected] > > wrote: > >> James, >> >> for many applications one of the column of a table can be (and must be) >> naturally mapped >> >> to HBase timestamp. What it gives us is the optimization on StoreScanner >> where HFiles with timestamps out of range of >> >> a Scan operator will be omitted. Let us say that we have time-series type of >> data (EVENTS) and custom compaction, where we create >> series of HFiles with continuous non-overlapping timestamp ranges. >> >> CREATE TABLE IF NOT EXISTS ODS.EVENTS ( >> METRICID VARCHAR NOT NULL, >> METRICNAME VARCHAR, >> SERVICENAME VARCHAR NOT NULL, >> ORIGIN VARCHAR NOT NULL, >> APPID VARCHAR, >> IPID VARCHAR, >> NVALUE DOUBLE, >> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/, >> DATA VARCHAR, >> SVALUE VARCHAR >> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, >> IPID, TIME) >> ) SALT_BUCKETS=40, >> IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE'; >> >> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint >> to Phoenix that the column >> >> TIME must be mapped to HBase timestamp. >> >> The Query: >> >> Select all events of type 'X' for last 7 days >> >> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() >> - 7*24*3600000; (this may be not correct SQL syntax of course) >> >> These types of queries will be efficiently optimized if: >> >> >> 1. Phoenix maps TIME column to HBase timestamp >> >> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan >> timerange >> >> >> Although this : >> >> Properties props = new Properties(); >> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); >> Connection conn = DriverManager.connect(myUrl, props); >> >> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')"); >> conn.commit(); >> >> will work in my case- it may not be efficient from performance point of view >> because for every INSERT/UPSERT >> new Connection object and new Statement is created, beside this we still >> need the optimization 2. (see above). >> >> >> >> -Vladimir >> >> >> >> >> >> >> >> >> >> On Sun, Apr 6, 2014 at 8:45 AM, James Taylor <[email protected]>wrote: >> >>> Hi Vladimir, >>> By default, Phoenix queries show you the "latest" data. You can override >>> this at connection time and do "flashback" queries, DDL, and DML as >>> described here: >>> http://phoenix.incubator.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API >>> >>> What ever optimizations HBase does, Phoenix will see too, since at the >>> end of the day, we're issuing a regular set of scans. >>> >>> Is this the kind of functionality for which you're looking? Or is it >>> more along the lines of a query returning multiple versions of the same >>> row: https://issues.apache.org/jira/browse/PHOENIX-590? This isn't >>> supported yet. >>> >>> Or maybe something else? >>> >>> Thanks, >>> James >>> >>> >>> On Sat, Apr 5, 2014 at 10:42 AM, Vladimir Rodionov < >>> [email protected]> wrote: >>> >>>> I am not sure it was implemented already, or it was not ... >>>> It would be nice to have a way to tell Phoenix which column is >>>> naturally mapped to HBase timestamp. This will greatly improve range >>>> queries on >>>> this column by utilizing internal HBase optimizations (skipping store >>>> files, which are completely out of the specified timestamp range). >>>> >>>> >>>> -Vladimir Rodionov >>>> >>> >>> >> >
