[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529167#comment-14529167
 ] 

Zhijie Shen commented on YARN-3134:
-----------------------------------

Li, thanks for updating the patch. Here're some comments about it.

1. How do we chose the size and the expiry time?
{code}
113         connectionCache = CacheBuilder.newBuilder().maximumSize(16)
114             .expireAfterAccess(10, TimeUnit.SECONDS).removalListener(
{code}

2. If we use try with resources, do we still need to close stmt? Shall we close 
them in finally block?
{code}
235         try (Statement stmt = conn.createStatement()) {
{code}
{code}
272           stmt.close();
273           conn.commit();
274           conn.close();
{code}

3. Seems to be a trivial method wrapper
{code}
292       private <K> StringBuilder appendVarcharColumnsSQL(
293           StringBuilder colNames, ColumnFamilyInfo<K> cfInfo) {
294         return appendColumnsSQL(colNames, cfInfo, " VARCHAR");
295       }
{code}

4. So why name and version should be combined and put it the same cell, but not 
be separated? 
{code}
345         ps.setString(idx++,
346             context.getFlowName() + STORAGE_SEPARATOR + 
context.getFlowVersion());
{code}

5. Seems not to be necessary.
{code}
356         if (entity.getConfigs() == null
357             && entity.getInfo() == null
358             && entity.getIsRelatedToEntities() == null
359             && entity.getRelatesToEntities() == null) {
360           return;
361         }
{code}

6. Should info be varbinary?
{code}
        245               + INFO_COLUMN_FAMILY + 
PHOENIX_COL_FAMILY_PLACE_HOLDER + " VARCHAR, "
{code}

7. Should config be varchar?
{code}
366           appendColumnsSQL(sqlColumns, new ColumnFamilyInfo<>(
367               CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()), " 
VARBINARY");
{code}

8. Does phoenix support numeric/decimal? Not sure if we should store the 
numbers in these types?
{code}
268               + "singledata VARBINARY "
{code}

9. In storeMetrics, assuming we only deal with single value case now, I think 
it's better to check if the metric is single value first. Another question here 
is if we want to ignore the associated timestamp of the single value? Or we 
should add one more column to store the timestamp of this value.

10. W.R.T the number of conn and threads? Is it better to have the same number 
of conn threads as the number of app collector? And the requests of one app is 
routed to the same thread. This is because I remember somewhere we have 
mentioned we want to isolate between apps. Otherwise, the app with more 
timeline data will occupy more writing capacity to the backend. /cc [~sjlee0]

11. In TestTimelineWriterImpl, can we cover the case that the entity has non 
string info value?

12. In TestPhoenixTimelineWriterImpl, can we verify the each cell are storing 
the right data?

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
> YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
> YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
> YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to