[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529167#comment-14529167 ]
Zhijie Shen commented on YARN-3134: ----------------------------------- Li, thanks for updating the patch. Here're some comments about it. 1. How do we chose the size and the expiry time? {code} 113 connectionCache = CacheBuilder.newBuilder().maximumSize(16) 114 .expireAfterAccess(10, TimeUnit.SECONDS).removalListener( {code} 2. If we use try with resources, do we still need to close stmt? Shall we close them in finally block? {code} 235 try (Statement stmt = conn.createStatement()) { {code} {code} 272 stmt.close(); 273 conn.commit(); 274 conn.close(); {code} 3. Seems to be a trivial method wrapper {code} 292 private <K> StringBuilder appendVarcharColumnsSQL( 293 StringBuilder colNames, ColumnFamilyInfo<K> cfInfo) { 294 return appendColumnsSQL(colNames, cfInfo, " VARCHAR"); 295 } {code} 4. So why name and version should be combined and put it the same cell, but not be separated? {code} 345 ps.setString(idx++, 346 context.getFlowName() + STORAGE_SEPARATOR + context.getFlowVersion()); {code} 5. Seems not to be necessary. {code} 356 if (entity.getConfigs() == null 357 && entity.getInfo() == null 358 && entity.getIsRelatedToEntities() == null 359 && entity.getRelatesToEntities() == null) { 360 return; 361 } {code} 6. Should info be varbinary? {code} 245 + INFO_COLUMN_FAMILY + PHOENIX_COL_FAMILY_PLACE_HOLDER + " VARCHAR, " {code} 7. Should config be varchar? {code} 366 appendColumnsSQL(sqlColumns, new ColumnFamilyInfo<>( 367 CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()), " VARBINARY"); {code} 8. Does phoenix support numeric/decimal? Not sure if we should store the numbers in these types? {code} 268 + "singledata VARBINARY " {code} 9. In storeMetrics, assuming we only deal with single value case now, I think it's better to check if the metric is single value first. Another question here is if we want to ignore the associated timestamp of the single value? Or we should add one more column to store the timestamp of this value. 10. W.R.T the number of conn and threads? Is it better to have the same number of conn threads as the number of app collector? And the requests of one app is routed to the same thread. This is because I remember somewhere we have mentioned we want to isolate between apps. Otherwise, the app with more timeline data will occupy more writing capacity to the backend. /cc [~sjlee0] 11. In TestTimelineWriterImpl, can we cover the case that the entity has non string info value? 12. In TestPhoenixTimelineWriterImpl, can we verify the each cell are storing the right data? > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --------------------------------------------------------------------------------------- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Zhijie Shen > Assignee: Li Lu > Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, > YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, > YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, > YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)