Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Josh Elser
Yup, totally agree. Will also try to make these more clear in the doc. We want to queue up the work to "log" and do it in the background. If that queue gets too full, we drop it right off that cliff :) Thanks, Samarth! On 3/2/18 5:47 PM, Samarth Jain wrote: A couple more points which I

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Samarth Jain
A couple more points which I think you alluded to, Josh, but I would still like to call out: 1) Writing of these query logs to a phoenix table should be best effort i.e. a query definitely shouldn't fail because we encountered an issue while writing its log 2) Writing of query logs should happen

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Josh Elser
Thanks Nick and Andrew! These are great points. * A TTL out of the box is a must. That's such a good suggestion * Sensitivity of data being stored is also a tricky-serious issue to consider. We'll want to lock the table down and be able to state very clearly what data may show up in it. * I

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Josh Elser
My gut reaction would be to avoid storing queries over the system tables as they would have more noise than value, but I think this is something that could be entertained! On 3/2/18 8:02 AM, Artem Ervits wrote: +1 Is idea here to collect information on all types of queries even against

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Andrew Purtell
Agree with Nick's points but let me augment with an additional suggestion: Tunable/configurable threshold for sampling. In many cases it's sufficient to sample e.g. 1% of queries to get sufficient coverage and this would prune 99% of actual load from the query log. Also let me underline that

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Nick Dimiduk
I'm a big fan of this idea. There was a brief discussion on the topic over on PHOENIX-2715. My first concern is that the collected information is huge -- easily far larger than the user data for a busy cluster. For instance, a couple 10's of GB stored user data, guideposts set to default 100mb,

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Artem Ervits
+1 Is idea here to collect information on all types of queries even against system tables? It would be nice to keep counts of how many times a query was executed over time. In my mssql days we also had ability to check when was the last time update stats ran and all kinds of impact information

Re: [DISCUSS] Design for a "query log"

2018-03-01 Thread YoungWoo Kim
Hi Josh, Thanks for starting this discussion. Overall the design looks good to me! Actually I have an internal implementation for logging user queries to text file over PQS. Let me explain our use cases. We wanted to find out following facts using our ugly hacks: - Running queries against PQS -

Re: [DISCUSS] Design for a "query log"

2018-03-01 Thread Josh Elser
Any feedback from folks? Not sure if the silence should be interpreted as ambivalence or plain old being busy :) On Mon, Feb 26, 2018 at 4:57 PM, Josh Elser wrote: > Hiya, > > I wanted to share this little design doc with you about some feature work > we've been thinking

[DISCUSS] Design for a "query log"

2018-02-26 Thread Josh Elser
Hiya, I wanted to share this little design doc with you about some feature work we've been thinking about. The following is a Google doc in which anyone should be allowed to comment. Feel free to comment there, or here on the thread. https://s.apache.org/phoenix-query-log The high-level