[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461790#comment-16461790 ] Ankit Singhal commented on PHOENIX-4701: [~jamestaylor] , have started looking into this from today. do you want it for 4.14 as well? > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454449#comment-16454449 ] Ankit Singhal commented on PHOENIX-4701: bq. I think the best approach would be to persist our client metrics in the SYSTEM.LOG instead of inventing a new mechanism. The metrics captures all the same information as your QueryLogInfo (and much more), rolls all the information up to a single set of metrics for each Phoenix statement (aggregating/merging parallel scans, etc), and can emits a single log line (which could be written in a single upsert statement). At SFDC, we emit this information in a layer above (and use Splunk to produce nifty dashboard for monitoring), but this could easily be emitted directly in Phoenix and go through your asynchronous write path (and then use Phoenix queries to produce the same kind of dashboards). The only piece would be to add the concept of a log level to each metric to enable statically controlling which metrics are output. Thanks [~jamestaylor], I like the approach too. bq. Would you have any cycles to take my patch further? I'll try to make the changes accordingly and take it further. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454340#comment-16454340 ] Josh Elser commented on PHOENIX-4701: - {quote}I think the best approach would be to persist our [client metrics|http://phoenix.apache.org/metrics.html] in the SYSTEM.LOG instead of inventing a new mechanism. The metrics captures all the same information as your QueryLogInfo (and much more), rolls all the information up to a single set of metrics for each Phoenix statement (aggregating/merging parallel scans, etc), and can emits a single log line (which could be written in a single upsert statement) {quote} I like this idea. Trying to make the QueryLog as the source of truth for what is happening in a cluster sounds great. {quote}I've attached a wip2 patch that writes using Phoenix APIs so that you can have a composite row key to allow querying, use salting, have the table be column encoded, and potentially add secondary indexes. {quote} Missed commenting on this the first time, but that's awesome, James! > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452923#comment-16452923 ] James Taylor commented on PHOENIX-4701: --- FYI, [~karanmehta93]. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452569#comment-16452569 ] James Taylor commented on PHOENIX-4701: --- Thanks for the response, [~an...@apache.org] and [~samarthjain]. I think the best approach would be to persist our [client metrics|http://phoenix.apache.org/metrics.html] in the SYSTEM.LOG instead of inventing a new mechanism. The metrics captures all the same information as your QueryLogInfo (and much more), rolls all the information up to a single set of metrics for each Phoenix statement (aggregating/merging parallel scans, etc), and can emits a single log line (which could be written in a single upsert statement). At SFDC, we emit this information in a layer above, but this could easily be emitted directly in Phoenix and go through your asynchronous write path. The only piece would be to add the concept of a log level to each metric to enable statically controlling which metrics are output. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451930#comment-16451930 ] Ankit Singhal commented on PHOENIX-4701: {quote} If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) from being declared as immutable with a column encoding of 1 byte and a storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. {quote} {quote}I've attached a wip2 patch that writes using Phoenix APIs so that you can have a composite row key to allow querying, use salting, have the table be column encoded, and potentially add secondary indexes. It needs a bit more work, though, because the column values are set in various places for the same query.I think we should log a single line per query - we can cover the failed/exception case and the success case. I think having so many individual Put RPCs won't scale well and even if it did, having a table with only the query ID in the PK means every query would be a full table scan. {quote} Agreed [~jamestaylor] , this is a good idea, I thought of using Phoenix API but it was just impacting the extensibility as we can't prepare multiple upsert statement before hand for the set of attributes originating at a different part of the code. We need the logging to start as soon as the query is submitted, so to reduce no. of RPC, the best we can do here is to do one UPSERT for every QueryLogState after merging logging information in LogWriter(using Merger) received from different part of code as per query log state. Let me know if you are fine with this, I can put up the patch soon for the same. {quote} But do you think we can run into some kind of infinite loop by using the Phoenix API for writing to the SYSTEM.LOG table? If so, we may need to do something similar like what our tracing framework does where it makes sure writes to SYSTEM.TRACE table do not generate traces themselves. {quote} [~samarthjain] , in addition to what [~elserj] said, as we are only logging SELECT queries, so writing to SYSTEM.LOG will not generate any new log. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451546#comment-16451546 ] Josh Elser commented on PHOENIX-4701: - {quote}If so, we may need to do something similar like what our tracing framework does where it makes sure writes to SYSTEM.TRACE table do not generate traces themselves. {quote} This is something that Ankit implemented after the initial patch. I believe any queries against system tables are not logged. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451462#comment-16451462 ] Samarth Jain commented on PHOENIX-4701: --- I haven't closely looked at the original commit, [~jamestaylor]. But do you think we can run into some kind of infinite loop by using the Phoenix API for writing to the SYSTEM.LOG table? If so, we may need to do something similar like what our tracing framework does where it makes sure writes to SYSTEM.TRACE table do not generate traces themselves. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451450#comment-16451450 ] James Taylor commented on PHOENIX-4701: --- [~an...@apache.org] - I've attached a wip2 patch that writes using Phoenix APIs so that you can have a composite row key to allow querying, use salting, have the table be column encoded, and potentially add secondary indexes. It needs a bit more work, though, because the column values are set in various places for the same query. I think we should log a single line per query - we can cover the failed/exception case and the success case. I think having so many individual Put RPCs won't scale well and even if it did, having a table with only the query ID in the PK means every query would be a full table scan. Would you have any cycles to take my patch further? Barring that, have we documented this yet? Perhaps we can just call this beta and let users know that the table structure will likely change in the future. Your thoughts would be valuable too, [~elserj]. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448655#comment-16448655 ] James Taylor commented on PHOENIX-4701: --- Attached WIP patch. Will make schema changes in next patch with salting buckets based on config property. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)