[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369786#comment-15369786 ] Hudson commented on YARN-3134: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-3134. Implemented Phoenix timeline writer to access HBase backend. (sjlee: rev 41fb5c738117ab65a2f152a13de8c85476acdc58) * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/PhoenixTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestPhoenixTimelineWriterImpl.java > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, > YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, > YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, > YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, > YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, > YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, > hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571606#comment-14571606 ] Vrushali C commented on YARN-3134: -- After evaluating both approaches of backend storage implementations in terms of their performance, scalability, usability, maintenance as given by YARN-3134 (Phoenix based HBase schema) and YARN-3411 (hybrid HBase schema - vanilla HBase tables in the direct write path and phoenix based tables for reporting), conclusion is to use vanilla hbase tables in the direct write path. Attached to YARN-2928 is a write-up that describes how we ended up choosing the approach of writing to vanilla HBase tables (YARN-3411) in the direct write path. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Fix For: YARN-2928 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538466#comment-14538466 ] Vinod Kumar Vavilapalli commented on YARN-3134: --- Tx folks, this is great progress! [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Fix For: YARN-2928 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534291#comment-14534291 ] Junping Du commented on YARN-3134: -- bq. For now, I'm removing the connection cache to make the first step right. I'll change the description of YARN-3595 for the connection cache. +1. The plan sounds reasonable. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533656#comment-14533656 ] Zhijie Shen commented on YARN-3134: --- Noticed pom.xml is using 4.3.0 Phoenix. I retried with this version, the problem still happened. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533670#comment-14533670 ] Li Lu commented on YARN-3134: - Sure, I'll make those clean up changes. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533671#comment-14533671 ] Li Lu commented on YARN-3134: - Looking into this. Seems like we need more fine tunes for the synchronizations. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533600#comment-14533600 ] Junping Du commented on YARN-3134: -- bq. We've also got some code cleanup work to do, but I put them in a priority lower than getting the performance evaluation done for now. I disagree. Actually, it usually takes more time for reviewer to identify these code style issues than simply fix it. Please respect the effort unless you have different opinions against these comments. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533393#comment-14533393 ] Li Lu commented on YARN-3134: - I just opened YARN-3595 to trace all connection cache related discussions. I wrote a summary for the background of this problem. Please feel free to add more. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533481#comment-14533481 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 51s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 25m 54s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731281/YARN-3134-YARN-2928.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / d4a2362 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7782/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7782/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7782/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533883#comment-14533883 ] Li Lu commented on YARN-3134: - Looked into the concurrency bug. The problem is caused by concurrent operations on Connections and the Guava cache removalListener calls. So on cache evictions, active connections may be mistakenly closed. I believe a concurrent algorithm to resolve this is possible, but not quite trivial. For now, I'm removing the connection cache to make the first step right. I'll change the description of YARN-3595 for the connection cache. In this JIRA, I'm focusing on code clean ups. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533422#comment-14533422 ] Hadoop QA commented on YARN-3134: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 15m 20s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 41m 39s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731261/YARN-3134-YARN-2928.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / d4a2362 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build//artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build//testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build//console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532948#comment-14532948 ] Junping Du commented on YARN-3134: -- Thanks [~gtCarrera9] for updating the patch! This looks much closer. In FileSystemTimelineWriterImpl.java, {code} - out = new PrintWriter(new BufferedWriter(new FileWriter(fileName, true))); + out = new PrintWriter(new BufferedWriter(new OutputStreamWriter( + new FileOutputStream(fileName),UTF-8))); {code} This is not necessary given YARN-3562 is already get in. Can you remove it? In PhoenixTimelineWriterImpl.java, For createTables(), {code} ... + cluster, user, flow_name, flow_version, flow_run DESC, app_id, + type, entity_id)); {code} For putting DESC in key of flow_run, I remember from the other doc in YARN-3411, Vrushali said we should put user as the first key to make sure timeline entities from a particular user can distributed less regions. I agree with that, so I think we should put DESC to user instead of flow_run as we don't really care the sequence of flow_run. [~vrushalic], can you also comment here? {code} + time UNSIGNED_LONG {code} Two spaces here, omit one. In method of storeEntityVariableLengthFields(), {code} if (entity.getConfigs() != null) { appendColumnsSQL(sqlColumns, new ColumnFamilyInfo( CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet())); numPlaceholders += entity.getConfigs().keySet().size(); } {code} We should put the String as type to ColumnFamilyInfo, the same case as getInfo(), getIsRelatedToEntities() and getRelatesToEntities(), etc. Also, we are duplicated calling entity.getConfigs().keySet() twice which should be avoid. In addition, sqlColumns better rename to ColumnDefs which define primary key columns? {code} private static K StringBuilder appendColumnsSQL( StringBuilder colNames, ColumnFamilyInfoK cfInfo) { return appendColumnsSQL(colNames, cfInfo, VARCHAR); } {code} Looks like we are adding column families here rather than columns. May be we should rename it to appendColumnFamiliesToSQL? For setStringsForColumnFamily() and setBytesForColumnFamily(), it looks like most code there is duplicated, can we consolidate into one method? {code} @Private @VisibleForTesting void dropTable(String tableName) throws Exception { {code} Document this is only for test. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530885#comment-14530885 ] Junping Du commented on YARN-3134: -- bq. I have not made the change on the Phoenix connection string since, according to our previous discussion, we're planning to address this after we've decided which implementation to pursue in the future. It should be fine to keep hard coded JDBC connection address here. At least, we should update the name and make it private instead of default visibility (package level) as no place outside of CLASS need to use it for now. Some other comments against latest patch: {code} + /** Default Phoenix JDBC driver name */ + static final String DRIVER_CLASS_NAME = org.apache.phoenix.jdbc.PhoenixDriver; + /** Default Phoenix timeline entity table name */ + static final String ENTITY_TABLE_NAME = TIMELINE_ENTITY; ... {code} Are we going to share these constant variables with HBaseTimelineWriterImpl? If not, we should mark them as private. In fact, just quickly check patch in YARN-3411, I would expect we could reuse some of these constants but that not something we should worry about it now. {code} + private static final String[] PHOENIX_STORAGE_PK_LIST + = {cluster, user, flow, version, run, appid, type, entityid}; + private static final String[] TIMELINE_EVENT_EXTRA_PK_LIST = + {timestamp, eventid}; + private static final String[] TIMELINE_METRIC_EXTRA_PK_LIST = + {metricid}; {code} For key name, we should keep consistent with other places, i.e. replacing with cluster_id, user_id, flow_name, flow_version, etc. or we will have significant headache when maintain it in future. In addition, for key naming convention, it sounds like we don't follow any name conversions for now, e.g. lower/upper CamelCase or something else. From examples of Phonix project(http://phoenix.apache.org/views.html or http://phoenix.apache.org/multi-tenancy.html), they hire the style with having an underscore as a word separator. I would suggest to conform the same style here. {code} +try { + conn.commit(); +} catch (SQLException se) { + LOG.error(Failed to close the phoenix connection! + + se.getLocalizedMessage()); {code} Message is not correct as exception is not happen in closing connection, but commit. Last but not the least, looks like we have many helper methods (appendColumnsSQL, setStringsForColumnFamily, setStringsForPrimaryKey, etc.) for constructing the SQL sentences. However, these helper methods are without any comments and lacking of unit tests to verify correctness in all cases. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531544#comment-14531544 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 14s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 26m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730977/YARN-3134-YARN-2928.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 557a395 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7741/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7741/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7741/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531398#comment-14531398 ] Zhijie Shen commented on YARN-3134: --- Some more comments: 1. Maybe it's better to commit the batch per entity. Otherwise, if one entity has some I/O error, all entities in this write call will be failed? {code} 186 storeMetrics(entity, currContext, conn); 187 } 188 ps.executeBatch(); 189 conn.commit(); {code} 2. stmt doesn't need to be closed explicitly, but conn still does, right? {code} 277 conn.commit(); 278 } catch (SQLException se) { 279 LOG.error(Failed in init data + se.getLocalizedMessage()); 280 throw se; 281 } {code} 3. Many of PhoenixTimelineWriterImpl private functions can be static. bq. except for two points I'm okay if we want to defer it as the future stabilization work. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531964#comment-14531964 ] Zhijie Shen commented on YARN-3134: --- bq. I'm relying on the underlying LoadingCache to do the close, which is now consistent with other connection's logic. I see, but do we need to create the table on a separate thread? At that moment the service is not started yet. Some more questions about the schema: {{parent VARCHAR, queue VARCHAR}} seems not to be necessary columns as we store those in the info section. And we may want to rename some more columns: type - entity_type; singledata - single_data; time - timestamp; and table: metric_singledata - metric_single_data. Thoughts? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529739#comment-14529739 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 54s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 26m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730660/YARN-3134-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 557a395 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7719/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7719/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7719/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529377#comment-14529377 ] Li Lu commented on YARN-3134: - Thanks [~zjshen]! I'm addressing your comments. Meanwhile, I think there are some points that I'd like to discuss: bq. How do we chose the size and the expiry time? Oh, thanks for reminding that! Currently I just set some placeholders for frequent accesses (10 seconds clean up time) and medium concurrency (16 threads). We may want to tune this case-by-case in the future. bq. If we use try with resources, do we still need to close stmt? Shall we close them in finally block? I believe this is why we need try-with-resource statements. See https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html bq. So why name and version should be combined and put it the same cell, but not be separated? I've also noticed the HBase implementation and the Phoenix implementation use those two fields in different ways (the Hbase implementation is not using the version field, if I understand correctly). How do we want to use the version field? I was trying to use it as a part of the id to uniquely locate one flow run bq. Does phoenix support numeric/decimal? Not sure if we should store the numbers in these types? Phoenix supports concrete number types such as INTEGER, BIGINT, SMALLINT, FLOAT, DOUBLE, etc, but nothing is directly mapped to lang.Number. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529167#comment-14529167 ] Zhijie Shen commented on YARN-3134: --- Li, thanks for updating the patch. Here're some comments about it. 1. How do we chose the size and the expiry time? {code} 113 connectionCache = CacheBuilder.newBuilder().maximumSize(16) 114 .expireAfterAccess(10, TimeUnit.SECONDS).removalListener( {code} 2. If we use try with resources, do we still need to close stmt? Shall we close them in finally block? {code} 235 try (Statement stmt = conn.createStatement()) { {code} {code} 272 stmt.close(); 273 conn.commit(); 274 conn.close(); {code} 3. Seems to be a trivial method wrapper {code} 292 private K StringBuilder appendVarcharColumnsSQL( 293 StringBuilder colNames, ColumnFamilyInfoK cfInfo) { 294 return appendColumnsSQL(colNames, cfInfo, VARCHAR); 295 } {code} 4. So why name and version should be combined and put it the same cell, but not be separated? {code} 345 ps.setString(idx++, 346 context.getFlowName() + STORAGE_SEPARATOR + context.getFlowVersion()); {code} 5. Seems not to be necessary. {code} 356 if (entity.getConfigs() == null 357 entity.getInfo() == null 358 entity.getIsRelatedToEntities() == null 359 entity.getRelatesToEntities() == null) { 360 return; 361 } {code} 6. Should info be varbinary? {code} 245 + INFO_COLUMN_FAMILY + PHOENIX_COL_FAMILY_PLACE_HOLDER + VARCHAR, {code} 7. Should config be varchar? {code} 366 appendColumnsSQL(sqlColumns, new ColumnFamilyInfo( 367 CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()), VARBINARY); {code} 8. Does phoenix support numeric/decimal? Not sure if we should store the numbers in these types? {code} 268 + singledata VARBINARY {code} 9. In storeMetrics, assuming we only deal with single value case now, I think it's better to check if the metric is single value first. Another question here is if we want to ignore the associated timestamp of the single value? Or we should add one more column to store the timestamp of this value. 10. W.R.T the number of conn and threads? Is it better to have the same number of conn threads as the number of app collector? And the requests of one app is routed to the same thread. This is because I remember somewhere we have mentioned we want to isolate between apps. Otherwise, the app with more timeline data will occupy more writing capacity to the backend. /cc [~sjlee0] 11. In TestTimelineWriterImpl, can we cover the case that the entity has non string info value? 12. In TestPhoenixTimelineWriterImpl, can we verify the each cell are storing the right data? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527835#comment-14527835 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 51s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 26m 0s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730332/YARN-3134-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 557a395 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7698/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7698/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7698/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527632#comment-14527632 ] Li Lu commented on YARN-3134: - And, one more thing: I'm closing all PreparedStatements implicitly in the try-with-resource statements. This statement will not swallow any exceptions (since there's no catch after it) but will guarantee the resource is released after the block's execution, even if there're exceptions. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523303#comment-14523303 ] Junping Du commented on YARN-3134: -- Thanks [~gtCarrera9] for updating the patch! Some comments below: {code} + /** Connection string to the deployed Phoenix cluster */ + static final String CONN_STRING = jdbc:phoenix:localhost:2181:/hbase; {code} Do we need to make port number as configurable given the port could be occupied by other applications in production environment? Also CONN_STRING - CONN_ADDR? {code} +connectionCache = CacheBuilder.newBuilder().maximumSize(16) +.expireAfterAccess(10, TimeUnit.SECONDS).removalListener( +removalListener) +.build(new CacheLoaderThread, Connection() { + @Override public Connection load(Thread key) throws Exception { + Connection conn = null; + try { + Class.forName(DRIVER_CLASS_NAME); + conn = DriverManager.getConnection(CONN_STRING); + conn.setAutoCommit(false); + } catch (SQLException se) { + LOG.error(Failed to connect to phoenix server! + + se.getLocalizedMessage()); + } catch (ClassNotFoundException e) { + LOG.error(Class not found! + e.getLocalizedMessage()); + } + return conn; + } + } +); {code} Indentation issue there. Also, we shouldn't swallow fetal exception. Only logging error is not enough, we need to throw exception back. For write method: - we should close PreparedStatement in finally block whenever we don't need it, or it could affected the JDBC connection cannot be closed physically later. - IOException should be thrown out when catching a SQLException, as comments above, only log error is not enough. For tryInitTable method: rename to createTable()? Also throw exception out whenever catch a SQLException. Similar two issues for methods of storeEntityVariableLengthFields/storeMetrics/storeEvents: 1) PreparedStatement doesn't get closed, 2) exception get swallowed. Also, I see we are constructing SQL sentences with complicated parameters. Do we need to log (at least in debug level) SQL sentence (for some cases, SQL is executed successfully but not our expected) ? I think it could be helpful for trouble shooting. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522678#comment-14522678 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 48s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 26m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729627/YARN-3134-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b689f5d | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7562/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7562/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7562/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520520#comment-14520520 ] Hadoop QA commented on YARN-3134: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729341/YARN-3134-YARN-2928.runJenkins.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4c1af15 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7546/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.runJenkins.001.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520599#comment-14520599 ] Hadoop QA commented on YARN-3134: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 47s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:red}-1{color} | javac | 7m 58s | The applied patch generated 8 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 5s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 41s | The patch appears to introduce 10 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 18s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | | | Found reliance on default encoding in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String, String, String, String, long, String, TimelineEntity, TimelineWriteResponse):in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String, String, String, String, long, String, TimelineEntity, TimelineWriteResponse): new java.io.FileWriter(String, boolean) At FileSystemTimelineWriterImpl.java:[line 86] | | | org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.tryInitTable() may fail to clean up java.sql.Statement on checked exception Obligation to clean up resource created at PhoenixTimelineWriterImpl.java:up java.sql.Statement on checked exception Obligation to clean up resource created at PhoenixTimelineWriterImpl.java:[line 227] is not discharged | | | org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.executeQuery(String) may fail to close Statement At PhoenixTimelineWriterImpl.java:Statement At PhoenixTimelineWriterImpl.java:[line 492] | | | A prepared statement is generated from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:[line 389] | | | A prepared statement is generated from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:[line 476] | | | A prepared statement is generated from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity, TimelineCollectorContext, Connection) At PhoenixTimelineWriterImpl.java:[line 433] | | | A prepared statement is generated from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String, String, String, String, long, String, TimelineEntities) At PhoenixTimelineWriterImpl.java:from a nonconstant String in org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String, String, String, String, long, String, TimelineEntities) At PhoenixTimelineWriterImpl.java:[line 167] | | | org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.setBytesForColumnFamily(PreparedStatement, Map, int) makes inefficient use of keySet iterator instead of entrySet
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518480#comment-14518480 ] Sangjin Lee commented on YARN-3134: --- {{conn.close()}} is called again on l.222 in the finally clause. I think we can remove the call on l.216 and rely on the call in the finally clause. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518383#comment-14518383 ] Sangjin Lee commented on YARN-3134: --- Thanks for the patch [~gtCarrera9]! Some quick comments on the patch: I'm a little confused as to whether Connection.close() should be called on every write. I see write() does not call Connection.close() but tryInitTable() calls it. Which is the expected pattern? - l.216: conn.close() is called twice - l.318: Are the flow name and the flow version concatenated without any separators? Wouldn't that be confusing? - l.479: is stmt.close() supposed to be called? For that matter, is it better to wrap the statement creation in executeQuery() and dropTable() with the auto-close try? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518458#comment-14518458 ] Li Lu commented on YARN-3134: - Hi [~sjlee0], thanks for the review! We do not need to call Connection.close() after every write, since they're cached by the loading cache, and will be closed upon removal. However, I treat tryInitTable as a special case because there are potentially less reuse after the table creation. Not sure if this makes sense so I can change it once more. A quick question about this comment: bq. l.216: conn.close() is called twice I'm a little bit confused here because I was trying to close the connection after a commit (for potentially pending data). Could you please give me a hint on why it's closed twice? Thanks! [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516107#comment-14516107 ] Hadoop QA commented on YARN-3134: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728671/YARN-3134-042715.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / feb68cb | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7516/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514388#comment-14514388 ] Junping Du commented on YARN-3134: -- I would like to raise an important issue for reusing JDBC connections in PhoenixTimelineWriterImpl: It sounds like we only release/close these JDBC connections until the writer get stopped. Given the writer's lifecycle is the same as TimelineCollectorManager (at current design which could be changed due to discussions above) which means it almost the same as RM or NM. It also means we don't close/release any JDBC connections in the whole lifecycle of NM/RM. It doesn't sounds right as the resource of JDBC connections is pretty expensive and very limited (in traditional DB case), phoenix could be better as the client only server for local node. However, it could still be expensive when large app number especially for RMTimelineCollectorManager. In addition, sounds like our cache the connection per thread is also problematic: these threads are coming from each collectors, we cache them in a Hashmap which could live forever that could affect the GC of these collectors even these collectors should be removed when application get finished. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511897#comment-14511897 ] Li Lu commented on YARN-3134: - Hi [~sjlee0], thanks for the review! I'll address your comments soon, but one thing to discuss: the original point for keeping writers inside managers, instead of collectors, is to reuse the storage layer connection. Having static writers in collectors may be too restrictive, while having writers for each collector may accidentally introduce too many heavy weight storage layer connections? I think there might be some miscommunications in the collector design, where I thought collectors will only hold for collection logic, and always need some sort of managers to wrap other logics such as web server and writers. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511976#comment-14511976 ] Sangjin Lee commented on YARN-3134: --- Actually my comment was not so much about collectors having their own writers. It might not be the best idea as you mentioned. The main thing I'm concerned about is the TimelineCollector's dependency on TimelineCollectorManager *just to get the writer*. The real need of the TimelineCollector is the writer, and true to the dependency injection principle, it is the writer that should be injected, either through the constructor or through the setter. Is that feasible? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512080#comment-14512080 ] Zhijie Shen commented on YARN-3134: --- I'd like to raise another important issue. According the schema, config/info/metrics value are written strings. However, the current data model assume that they are objects. Therefore they can be Integer, Long and even nested structure that can be passed by jackson ObjectMapper. If we write them as strings, how can we read them back and convert them into the corresponding objects? Shall we write them as bytes[] instead? On the other side, I'm not sure we can narrow down config/info/metrics value to String only, as we previously allow users to put Integer/Long/Float/Double and so on directly into the entity instance. For the metrics, I even think it shouldn't be String, but usually the decimal object. Otherwise, I'm not sure how should we do aggregation upon string values. bq. I'm concerned about is the TimelineCollector's dependency on TimelineCollectorManager just to get the writer. To Sangjin's question, I suggest we can change the way to let collect manager set the collector writer, and Collector doesn't need to have manager insider, but have a setWriter for manage to call. Some other comments about the patch detail? 1. Should we make it configurable? {code} static final String CONN_STRING = jdbc:phoenix:localhost:2181:/hbase; {code} 2. putEntities are invoked by multiple threads. However, HashMap should not be thread safe. {code} private HashMapThread, Connection connectionMap = null; {code} 3. When stopping the writer, should we wait until the current outstanding writing gets finished? {code} 100 @Override 101 protected void serviceStop() throws Exception { 102 // Close all Phoenix connections 103 for (Connection conn : connectionMap.values()) { 104 try { 105 conn.close(); {code} 4. Shall we set auto commit to false, because we commit the batch? Or it's not necessary for Phoenix? {code} 128 Connection conn = getConnection(Thread.currentThread()); {code} 5. So in the backend, we don't put flow name and version in different columns? {code} 217 + flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, {code} {code} 281 ps.setString(idx++, context.getFlowName() + context.getFlowVersion()); {code} 6. The conn seems not to be closed and removed after finishing the writes. It may use a lot of necessary resource after running for a long time. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511444#comment-14511444 ] Sangjin Lee commented on YARN-3134: --- Sorry [~gtCarrera9] for the late comments. I took a quick look at the patch, and I have yet to delve deep into the SQL- and schema-related parts of the code. But I have some quick comments on other aspects: (TimelineCollector.java) - I'm curious, is there a strong reason to use the TimelineCollectorManager to obtain the the writer? This would introduce a bi-directional (instance) dependency between the TimelineCollector and the TimelineCollectorManager, and it could be problematic. For example, the current timeline service performance benchmark tool uses TimelineCollector directly without creating a manager. Can we avoid this dependency? (PhoenixTimelineWriterImpl.java) - l.87: I wish you could use ThreadLocal directly, but I do get you'd need to get all the connections at the end when you stop it - Please replace all StringBuffers with StringBuilders. StringBuffers should not be used as a rule as they do unnecessary synchronization. - l.175: getConnection() is not thread safe with unsynchronized HashMap. Even though different threads would operate on different keys, it doesn't mean it will be thread safe with HashMap. You need to use ConcurrentHashMap for this or another thread-safe concurrent solution. - l.198: I'm not quite sure why initializeData() should be called every time the service comes up. Shouldn't we do this only once at the very beginning when the tables do not exist? Also, the method name initializeData() is bit misleading. I think initializeTables() is the right name for this? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507914#comment-14507914 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks for the patch, I had some questions: - I don't see the isRelatedTo and relatesTo entities being written in this patch - For the metrics timeseries, I see that the metric values are being written as a ; separated list of values as a string, is that right? But I could not figure how where the timestamps associated with each metric value are stored. Storing metric values as strings would make it harder I think to query in numerical queries, like how many entities had GC MILLIS that were more than 25% of the CPU MILLIS. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507986#comment-14507986 ] Li Lu commented on YARN-3134: - Hi [~vrushalic], in the current version I've not implemented isRelatedTo and relatesTo. I can certainly add this section if it's required for the performance benchmark. My current plan is to use the metrics precision table for aggregations, and just use the aggregated data for Phoenix SQL queries. I'm open in mind about both points, though. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508005#comment-14508005 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks! bq. in the current version I've not implemented isRelatedTo and relatesTo. I can certainly add this section if it's required for the performance benchmark. Yes, I think for the PoC we should write everything that the TimelineEntity class has to the backend store. bq. My current plan is to use the metrics precision table for aggregations, and just use the aggregated data for Phoenix SQL queries. Okay, I see, (for my understanding) how would the query for say, a map task level metrics be? There won't be any aggregation at that level, no? Also I am wondering how this metrics timeseries information would be queried. Could you please explain how the timestamps are stored? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508025#comment-14508025 ] Li Lu commented on YARN-3134: - Hi [~vrushalic], sure, will add isRelatedTo and relatesTo since YARN-3431 is close to finish. For the metrics, my thought is we may need to have some time-based aggregations, like taking the average (or max) of a few time series data and store them in an aggregated table. The precision table for now serves as the raw data table. The user can query on the aggregation table(s) for data points per-hour, per-day or so. Time stamps information is split into two parts: the time epoch information, marked by the startTime and endTime of the metric object, and the actual time for a point in a time series. Epoch start and end times are used as PKs for the Phoenix storage for better indexing, and detailed time for each point is stored in the time series. We can certainly discuss about this design, though... [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505903#comment-14505903 ] Hadoop QA commented on YARN-3134: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727009/YARN-3134-042115.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2c14690 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7435//console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503800#comment-14503800 ] Zhijie Shen commented on YARN-3134: --- [~vrushalic], so entity relationship will also be included in the POC implementation, right? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503427#comment-14503427 ] Vrushali C commented on YARN-3134: -- Hi [~zjshen] Actually, [~gtCarrera9], [~sjlee0] and I were just discussing this last week. I was thinking of storing the relatesTo and isRelatedTo as a set of comma-separated (some separator) string that contains the list of entities as a single value and the column key to be isRelatedTo or relatesTo. That way, the reader that needs these values can look at the columns isRelatedTo or relatesTo. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503414#comment-14503414 ] Zhijie Shen commented on YARN-3134: --- I noticed that both Phoenix writer and HBase writer (YARN-3411) don't implement writing the entity relationship. However, as we may need more thoughts and discussion to sort out writing entity relationship, let's keep to the current implementation that focus on individual entity details: info, configs, events and metrics, and have a separate Jira for storing entity relationship later. Does it sound good? /cc [~vrushalic] [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499326#comment-14499326 ] Li Lu commented on YARN-3134: - Hi [~djp] and [~zjshen], thanks a lot for the review! I'll fix them pretty soon and upload a new patch. For now, I'm focusing on correctness, readability, and exception handling. Does that plan sound good to you? Thanks! [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499264#comment-14499264 ] Zhijie Shen commented on YARN-3134: --- Some thoughts about backend POC, not just limited to Phoenix writer, but HBase writer too. 1. At the current stage, I suggest we focus on logic correctness and performance tuning. We may have multiple iterations between improving and doing benchmark. 2. At the beginning, we may not implement storing everything of timeline entity (such as relationship), but we should at lease make sure what Phoenix writer and HBase writer have implemented are identical in terms of the data to store. 3. It's good if we can have rich test suites like TimelineStoreTestUtils to ensure the robustness of the writer. Moreover, it's black box testing, and we can use them to check if Phoenix writer and HBase writer behave the same. /cc [~vrushalic] For Phoenix implementation only: I used Phoenix writer for a real deployment, and I could see the implementation is not thread safe. ConcurrentModificatioException will be thrown upon committing the statements. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499398#comment-14499398 ] Li Lu commented on YARN-3134: - Hi [~zjshen] could you please provide some more information to reproduce the failures? Or, the exception stack would also be helpful. I'm trying to setup a deployment but would like to make sure we're seeing consistent problems. Thanks! [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499764#comment-14499764 ] Junping Du commented on YARN-3134: -- bq. 1. At the current stage, I suggest we focus on logic correctness and performance tuning. We may have multiple iterations between improving and doing benchmark +1. We should get some performance data which help us better understanding on the direction and priority. bq. For now, I'm focusing on correctness, readability, and exception handling. Does that plan sound good to you? Sounds like a good plan. Thanks [~gtCarrera9]. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498924#comment-14498924 ] Junping Du commented on YARN-3134: -- Thanks [~gtCarrera9] for delivering a patch here! Just start to look at the patch. Some initiative comments so far: {code} + String sql = CREATE TABLE IF NOT EXISTS + ENTITY_TABLE_NAME + + (cluster VARCHAR NOT NULL, user VARCHAR NOT NULL, + + flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, + + appid VARCHAR NOT NULL, type VARCHAR NOT NULL, + + entityid VARCHAR NOT NULL, + + creationtime UNSIGNED_LONG, modifiedtime UNSIGNED_LONG, ... + stmt.executeUpdate(sql); + stmt.close(); + conn.commit(); {code} Putting raw SQL sentences in this way sounds a little headache to me as this means difficult to debug/maintain in future. Given we could have more tables in pipeline, we may want to refactor this in some way to be more maintainable? BTW, I don't think HBase support any atomic operation across multiple tables. Here we create 3 tables but only one commit which means if 2nd table created failed, 1st table should still be created and commit successfully and won't be rollback. These partial success after commit doesn't sounds a good practice to me. Additional problem is we didn't close connection here but we need to. {code} + private class TimelineEntityCtxt { {code} TimelineEntityCtxt = TimelineEntityContext, better not omit full name (except very obviously, like conf for configuration) in method name. It looks like exactly the same with TimelineCollectorContext.java. Can we reuse that class instead of creating a new duplicated one? {code} + private K, V int setStringsForCf( {code} What Cf means? Just like I mentioned above, don't omit the character of a word in a method which break code's readability. {code} + private int setStringsForPk(PreparedStatement ps, String clusterId, String userId, {code} setStringsForPk = setStringsForPrimaryKeys {code} + ResultSet executeQuery(String sql) { +ResultSet rs = null; +try { + Statement stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); +} catch (SQLException se) { + LOG.error(SQL exception! + se.getLocalizedMessage()); +} +return rs; + } {code} Does getLocalizedMessage contains enough info (at least the SQL sentences executed)? If not, I would prefer we add raw SQL sentences in error message when Exception get throw. {code} +// Execute and close +psConfigInfo.execute(); +psConfigInfo.close(); {code} Many places like here that we are forgetting to put closable resources to finally block. We should close it even exception get throw. More comments could comes later. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495180#comment-14495180 ] Li Lu commented on YARN-3134: - Hi [~vrushalic], thanks for the note! Yes events and metrics are missing in the previous POC patch (actually I should have called them Work-In-Progress). I'm currently working on it and mostly done. About appending metrics and events, it would be very nice if we can distinguish the creation calls, the append and the update calls. For the last two we can have some fast path to not touch many fields, and only work on the delta part of the entity. We can pretty much simulate that with our current writer API, by setting unchanged TimelineEntity fields to be null or empty (like events/info/metrics). However, if this is the case, we need to provide some wrapper methods in our client to allow users to easily generate the patch, hopefully in some user-friendly ways (such as appendMetric(myEntity, myContext, newMetric) or updateInfo(myEntity, myContext, infoKey, infoValue), just some quick examples...). [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495519#comment-14495519 ] Hadoop QA commented on YARN-3134: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725447/YARN-3134-041415_poc.patch against trunk revision fddd552. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7340//console This message is automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495133#comment-14495133 ] Vrushali C commented on YARN-3134: -- Thanks [~gtCarrera9]! bq. About posting metrics, I was thinking if it's possible to allow users just send the delta to storage, and we can use some information in the timeline entity to infer if the entity itself is already in the entity table? If that's possible then we can have some shortcut (not touching entity table) for faster metrics updating, which may generate the majority of our storage traffic. Yes, we do need a way to update a single metric for an entity (regardless of other implementation aspects like which table or if it's native hbase/phoenix etc). We had as part of the initial proposal for the TimelineWriter interface YARN-3031 but then as per review suggestions on that one, decided to add it in later. I think we do need a writer interface for writing/updating a single metric. Also [~gtCarrera9] I think in this patch, you haven't yet included the entity events? I think I recollect a discussion that we should be adding in only some lifecycle events but not all, but in any case, the TimelineWriter Implementation does need to write events to the backend. I think we may want an API that writes a single event to the backend as well. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493156#comment-14493156 ] Zhijie Shen commented on YARN-3134: --- Li, thanks for uploading the POC patch. I put some of my thoughts: 1. One entity may need multiple sql sentences to complete one entity write. Do we need to use transaction? If not (or phoenix doesn't support it), how do we handle the case if the first sql sentence completes, but the second doesn't. There will be just partial data. On the other hand, if we use the transaction, is it going to significantly downgrade the write throughput. 2. From YARN-3448, Jonathan suggested one performance improvement for Leveldb implementation: sequential writes may be quicker than random writes, such that we can reorder the records to persist to make them as sequential as possible. In this case, is it better to write the entity one-by-one (including config, info), assuming the records are in sequence by PK? 3. To answer the offline question of writing multiple metrics, my thought is: a) sync write: no matter if the user wrap a single metric or multiple metrics, we synchronously write it into the backend. b) async write: server can respond the client immediately, but it can buffer the entity into the queue, and later on, we merge these metrics together (not limited to metrics, but even whole entity), and asynchronously write them into the backend. 4. Do we have a simple deployment, which means by default the backend will start a HBase on local FS, and have the phoenix lib installed? It's related to the question what's the default backend of timeline service. If it were the single node HBase on local FS, we should make sure it is automatically deployed with default configs. Thoughts? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493215#comment-14493215 ] Zhijie Shen commented on YARN-3134: --- Another issue: It seems we don't need to build against phoenix 4.3, as the timeline service writer implementation use generic JDBC interface. Maybe even we don't need to include it inside Hadoop distribution, though it's contradictory to the aforementioned simple deployment. However, the benefit is that users can choose to deploy with Phoenix 2.x - HBase 0.94.x, Phoenix 3.x - HBase 0.94.x or Phoenix 4.x - HBase 0.98.1+. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493235#comment-14493235 ] Li Lu commented on YARN-3134: - Hi [~vrushalic] and [~zjshen]! Thanks for the comments! About [~vrushalic]'s questions, for this poc patch I'm not adding metrics info, but that's my next step. I'm storing configs in the entity table, under a separate column family CONFIG_COLUMN_FAMILY. Each config item C(k, v) for a PK will be stored at column CONFIG_COLUMN_FAMILY.k, row PK with value v. bq. One entity may need multiple sql sentences to complete one entity write. Do we need to use transaction? That's a very good question that I'm not sure about the answer right now. Now we're writing one entity (with a PK) with two writes, one only with static columns (C_s) and the other only with dynamic columns (C_d). Hbase will guarantee row level atomicity for each of the write, so I assume the result after the two calls will be (PK, C_s) or (PK, C_d) or (PK, C_s, C_d). The last one is the best case of course. bq. In this case, is it better to write the entity one-by-one (including config, info), assuming the records are in sequence by PK? I think you're right. Will look into this improvement. About deployment, for end users we can either feed then a predefined version of phoenix+hbase for simpler deployment, or we can allow users to specify the classpath for the phoenix JDBC driver and choose a version of phoenix+hbase in a customized way. The latter will unavoidably introduce some difficulties to deployment, but with more freedom. For now, I think our short-term focus is to wrap miniclusters to allow UTs pass in our branch (to be prepared for a branch merge). About posting metrics, I was thinking if it's possible to allow users just send the delta to storage, and we can use some information in the timeline entity to infer if the entity itself is already in the entity table? If that's possible then we can have some shortcut (not touching entity table) for faster metrics updating, which may generate the majority of our storage traffic. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491173#comment-14491173 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks for the patch! I took a quick look and had two questions: 1) are you also writing the entity metrics to the backend? I see the primary key, creation/modification times and configs being written but could not see entity.getMetrics() anywhere. 2) how are the config column names being written? Are all the columns joined by a separator? If I can see an example that would be great. For example, say a config value of pig.job.submitted.timestamp has a string value of 1340218739371, how does it show up in the column family for configs? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490804#comment-14490804 ] Hadoop QA commented on YARN-3134: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724738/YARN-3134-041015_poc.patch against trunk revision 2c17da4. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7310//console This message is automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490791#comment-14490791 ] Hadoop QA commented on YARN-3134: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724737/YARN-3134-041015_poc.patch against trunk revision 2c17da4. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7309//console This message is automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349365#comment-14349365 ] Vrushali C commented on YARN-3134: -- Hi [~swagle] bq. Do the responses to these API calls return any timeseries data?: _GetFlowByAppId_ and _GetAppDetails_ No, not for these two. These are specific to that hadoop application id which is unique. bq. The set of access patterns do not cover query directly by a metricName. Is there a use case for this? (Note: General use case for driving graphs) In hRaven, we usually fetch everything for a given flow and time range and allow filtering/searching in the UI for querying for a particular metric. bq. Do you use the hbase native timestamp for querying? This is an obvious optimization for timeseries data. No, we don’t use that one at all. We have the submit time of a flow stored as run id in row key (as well as in columns). The row key for job history is {code} cluster ! user ! flow name ! run id ! app id {code} where run id is the submit time of the flow. It is stored as an inverted long, which helps maintain the sorting such that most recent flow runs are stored first for that flow. When querying for time series or time range, having this inverted long in row key helps to set the start and stop rows for scan so that it's time bound. Eg: https://github.com/twitter/hraven/blob/master/hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java#L277 bq. however how do you handle out of band data I am sorry, I didn’t get what is out of band data? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349302#comment-14349302 ] Siddharth Wagle commented on YARN-3134: --- Thanks [~vrushalic] *Questions*: - Do the responses to these API calls return any timeseries data?: _GetFlowByAppId_ and _GetAppDetails_ - The set of access patterns do not cover query directly by a metricName. Is there a use case for this? (Note: General use case for driving graphs) - Do you use the hbase native timestamp for querying? This is an obvious optimization for timeseries data, however how do you handle out of band data? [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347858#comment-14347858 ] Vrushali C commented on YARN-3134: -- There is a draft on some flow (and user and queue) based queries to be supported put up on jira YARN-3050 that could help us with the schema design. https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx Sharing the schema of some of the hbase tables in hRaven: (detailed schema at https://github.com/twitter/hraven/blob/master/bin/create_schema.rb) {code} create 'job_history', {NAME = 'i', COMPRESSION = 'LZO'} create 'job_history_task', {NAME = 'i', COMPRESSION = 'LZO'} # job_history (indexed) by jobId table contains 1 column family: # i: job-level information specifically the rowkey into the create 'job_history-by_jobId', {NAME = 'i', COMPRESSION = 'LZO'} # job_history_app_version - stores all version numbers seen for a single app ID # i: info -- version information create 'job_history_app_version', {NAME = 'i', COMPRESSION = 'LZO'} # job_history_agg_daily - stores daily aggregated job info # the s column family has a TTL of 30 days, it's used as a scratch col family # it stores the run ids that are seen for that day # we assume that a flow will not run for more than 30 days, hence it's fine to expire that data create 'job_history_agg_daily', {NAME = 'i', COMPRESSION = 'LZO', BLOOMFILTER = 'ROWCOL'}, {NAME = 's', VERSIONS = 1, COMPRESSION = 'LZO', BLOCKCACHE = false, TTL = '2592000'} # job_history_agg_weekly - stores weekly aggregated job info # the s column family has a TTL of 30 days # it stores the run ids that are seen for that week # we assume that a flow will not run for more than 30 days, hence it's fine to expire that data create 'job_history_agg_weekly', {NAME = 'i', COMPRESSION = 'LZO', BLOOMFILTER = 'ROWCOL'}, {NAME = 's', VERSIONS = 1, COMPRESSION = 'LZO', BLOCKCACHE = false, TTL = '2592000'} {code} job_history is the main table. It's row key: cluster!user!application!timestamp!jobID cluster, user, application are stored as Strings. timestamp and jobID are stored as longs. cluster - unique cluster name (ie. “cluster1@dc1”) user - user running the application (“edgar”) application - application ID (aka flow name) derived from job configuration: uses “batch.desc” property if set otherwise parses a consistent ID from “mapreduce.job.name” timestamp - inverted (Long.MAX_VALUE - value) value of submission time. Storing the value as an inverted timestamp ensures the latest jobs are stored first for that cluster!user!app. This enables faster retrieval of more recent jobs for this flow. jobID - stored as Job Tracker/Resource Manager start time (long), concatenated with job sequence number job_201306271100_0001 - [1372352073732L][1L] How the columns are named in hRaven: - each key in the job history file becomes the column name. For example, for finishedMaps, it would be stored as {code} column=i:finished_maps, timestamp= 1425515902000, value=\x00\x00\x00\x00\x00\x00\x00\x05 {code} In the output above, timestamp is the hbase cell timestamp. - we store the configuration information with a column name prefix of c! {code} column=i:c!yarn.sharedcache.manager.client.thread-count, timestamp= 1425515902000, value=50 {code} - each counter is stored with a prefix of g! or gr! or gm! {code} For reducer counters, there is a prefix of gr! column=i:gr!org.apache.hadoop.mapreduce.TaskCounter!SPILLED_RECORDS, timestamp= 1425515902000 value=\x00\x00\x00\x00\x00\x00\x00\x02 For mapper counters, there is a prefix of gm! column=i:gm!org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter!BYTES_READ, timestamp= 1425515902000, value=\x00\x00\x00\x00\x00\x00\x00\x02 {code} [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens