[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-05-02 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461790#comment-16461790
 ] 

Ankit Singhal commented on PHOENIX-4701:


[~jamestaylor] , have started looking into this from today. do you want it for 
4.14 as well?

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-26 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454449#comment-16454449
 ] 

Ankit Singhal commented on PHOENIX-4701:


bq. I think the best approach would be to persist our client metrics in the 
SYSTEM.LOG instead of inventing a new mechanism. The metrics captures all the 
same information as your QueryLogInfo (and much more), rolls all the 
information up to a single set of metrics for each Phoenix statement 
(aggregating/merging parallel scans, etc), and can emits a single log line 
(which could be written in a single upsert statement). At SFDC, we emit this 
information in a layer above (and use Splunk to produce nifty dashboard for 
monitoring), but this could easily be emitted directly in Phoenix and go 
through your asynchronous write path (and then use Phoenix queries to produce 
the same kind of dashboards). The only piece would be to add the concept of a 
log level to each metric to enable statically controlling which metrics are 
output.

Thanks [~jamestaylor], I like the approach too.

bq. Would you have any cycles to take my patch further? 

I'll try to make the changes accordingly and take it further. 

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-26 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454340#comment-16454340
 ] 

Josh Elser commented on PHOENIX-4701:
-

{quote}I think the best approach would be to persist our [client 
metrics|http://phoenix.apache.org/metrics.html] in the SYSTEM.LOG instead of 
inventing a new mechanism. The metrics captures all the same information as 
your QueryLogInfo (and much more), rolls all the information up to a single set 
of metrics for each Phoenix statement (aggregating/merging parallel scans, 
etc), and can emits a single log line (which could be written in a single 
upsert statement)
{quote}
I like this idea. Trying to make the QueryLog as the source of truth for what 
is happening in a cluster sounds great.
{quote}I've attached a wip2 patch that writes using Phoenix APIs so that you 
can have a composite row key to allow querying, use salting, have the table be 
column encoded, and potentially add secondary indexes.
{quote}
Missed commenting on this the first time, but that's awesome, James!

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-25 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452923#comment-16452923
 ] 

James Taylor commented on PHOENIX-4701:
---

FYI, [~karanmehta93].

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-25 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452569#comment-16452569
 ] 

James Taylor commented on PHOENIX-4701:
---

Thanks for the response, [~an...@apache.org] and [~samarthjain]. I think the 
best approach would be to persist our [client 
metrics|http://phoenix.apache.org/metrics.html] in the SYSTEM.LOG instead of 
inventing a new mechanism. The metrics captures all the same information as 
your QueryLogInfo (and much more), rolls all the information up to a single set 
of metrics for each Phoenix statement (aggregating/merging parallel scans, 
etc), and can emits a single log line (which could be written in a single 
upsert statement). At SFDC, we emit this information in a layer above, but this 
could easily be emitted directly in Phoenix and go through your asynchronous 
write path. The only piece would be to add the concept of a log level to each 
metric to enable statically controlling which metrics are output.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-25 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451930#comment-16451930
 ] 

Ankit Singhal commented on PHOENIX-4701:


{quote} If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf 
gain) from being declared as immutable with a column encoding of 1 byte and a 
storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.
{quote}
{quote}I've attached a wip2 patch that writes using Phoenix APIs so that you 
can have a composite row key to allow querying, use salting, have the table be 
column encoded, and potentially add secondary indexes. It needs a bit more 
work, though, because the column values are set in various places for the same 
query.I think we should log a single line per query - we can cover the 
failed/exception case and the success case. I think having so many individual 
Put RPCs won't scale well and even if it did, having a table with only the 
query ID in the PK means every query would be a full table scan.
{quote}
Agreed [~jamestaylor] , this is a good idea, I thought of using Phoenix API but 
it was just impacting the extensibility as we can't prepare multiple upsert 
statement before hand for the set of attributes originating at a different part 
of the code. We need the logging to start as soon as the query is submitted, so 
to reduce no. of RPC,  the best we can do here is to do one UPSERT for every 
QueryLogState after merging logging information in LogWriter(using Merger) 
received from different part of code as per query log state. Let me know if you 
are fine with this, I can put up the patch soon for the same.
{quote} But do you think we can run into some kind of infinite loop by using 
the Phoenix API for writing to the SYSTEM.LOG table? If so, we may need to do 
something similar like what our tracing framework does where it makes sure 
writes to SYSTEM.TRACE table do not generate traces themselves. 
{quote}
[~samarthjain] , in addition to what [~elserj] said, as we are only logging 
SELECT queries, so writing to SYSTEM.LOG will not generate any new log.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-24 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451546#comment-16451546
 ] 

Josh Elser commented on PHOENIX-4701:
-

{quote}If so, we may need to do something similar like what our tracing 
framework does where it makes sure writes to SYSTEM.TRACE table do not generate 
traces themselves.
{quote}
This is something that Ankit implemented after the initial patch. I believe any 
queries against system tables are not logged.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-24 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451462#comment-16451462
 ] 

Samarth Jain commented on PHOENIX-4701:
---

I haven't closely looked at the original commit, [~jamestaylor]. But do you 
think we can run into some kind of infinite loop by using the Phoenix API for 
writing to the SYSTEM.LOG table? If so, we may need to do something similar 
like what our tracing framework does where it makes sure writes to SYSTEM.TRACE 
table do not generate traces themselves.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-24 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451450#comment-16451450
 ] 

James Taylor commented on PHOENIX-4701:
---

[~an...@apache.org] - I've attached a wip2 patch that writes using Phoenix APIs 
so that you can have a composite row key to allow querying, use salting, have 
the table be column encoded, and potentially add secondary indexes. It needs a 
bit more work, though, because the column values are set in various places for 
the same query. I think we should log a single line per query - we can cover 
the failed/exception case and the success case. I think having so many 
individual Put RPCs won't scale well and even if it did, having a table with 
only the query ID in the PK means every query would be a full table scan.

Would you have any cycles to take my patch further? Barring that, have we 
documented this yet? Perhaps we can just call this beta and let users know that 
the table structure will likely change in the future.

Your thoughts would be valuable too, [~elserj].

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-23 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448655#comment-16448655
 ] 

James Taylor commented on PHOENIX-4701:
---

Attached WIP patch. Will make schema changes in next patch with salting buckets 
based on config property.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)