Please ignore the current thread as I forgot to set email subject :( Please use the new one with the correct subject: "Slow performance on PRIMARY KEY Queries compared to HBase API"
Adrianos Dadis. On Thu, Apr 7, 2016 at 6:20 PM, Adrianos Dadis <[email protected]> wrote: > Hello all, > > as part of a larger project, we are evaluating Phoenix performance > compared to pure HBase API. > > Demo cluster have 6 slave nodes (plus required nodes for additional helper > nodes) and all nodes are real machines (not VMs). Cluster is running > Cloudera 5.4.5 distribution (Hadoop, HBase, etc.). > Phoenix distribution (4.5.2 with some changes by CDH) is provided by > Cloudera too, but it has some minor alterations in order to be compatible > with CDH 5.4.5. > > Our code use phoenix version required by CDH. > <dependency> > <groupId>org.apache.phoenix</groupId> > <artifactId>phoenix-core</artifactId> > <version>4.5.2-cdh5.4.5</version> > </dependency> > > For Phoenix and Cloudera details please check this: > https://blog.cloudera.com/blog/2015/11/new-apache-phoenix-4-5-2-package-from-cloudera-labs/ > > > When running our test, we saw a considerably slower performance on the > select query on row key. > > Our test scenario is the following: > 1) WRITE to HBase 4.000.000 rows using Phoenix > 2) When WRITE is completed, then FETCH random (existed) ROWKEYs from HBase > (using Phoenix). Code that run FETCH is running on the same nodes with > HBase region servers (using Apache Storm streaming engine). > > > The CREATE statement used on my example and the select query code are > below: > CREATE TABLE IF NOT EXISTS TA (ID varchar not null primary key, TA1 > varchar, TA2 varchar, TA3 varchar, TA4 varchar, ta5 varchar) SPLIT ON > ('666833', '1333666', '2000500','2667333', '3334166'); > > The only code related to Phoenix is the following: > > public void findById(String id) { > Connection conn = DriverManager.getConnection(databaseURL); > PreparedStatement ps = null; > ResultSet rs = null; > try { > ps = conn.prepareStatement("select TA1,TA2,TA4 from TA where ID = ?"); > ps.setString(1, id); > rs = ps.executeQuery(); > while (rs.next()) { > LOG.debug("Fetched row OK"); > } > } catch (SQLException ex) { > LOG.error("Could not fetch data for Id {}", Id, ex); > } finally { > rs.close(); > ps.close(); > conn.close(); > } > } > > The explain query (explain select TA1,TA2,TA4 from TA where ID = '10') > returns the following plan: > | CLIENT 1-CHUNK PARALLEL 1-WAY POINT LOOKUP ON 1 KEY OVER TA | > > We have used metrics to get the mean execution time of each part of > previous code: > - ps.executeQuery(): 0.3ms > - rs.next(): 1.2ms > > The query returns only one row and it seems that the slow part is > rs.next(). > The total execution of the same query with HBase API is about 0.3ms , > making phoenix 5 times slower. > > Just to say that the upsert query of a row (not shown in code) takes about > the same time on both implementations (Phoenix and HBase API). > > Is it normal or there is something that I should improve? > > Thanks, > Adrianos Dadis. >
