Re: [GENERAL] Predicting query runtime

2016-09-13 Thread Istvan Soos
On Tue, Sep 13, 2016 at 2:06 AM, Vinicius Segalin  wrote:
> 2016-09-12 18:22 GMT-03:00 Istvan Soos :
>> At Heap we have non-trivial complexity in our analytical queries, and
>> some of them can take a long time to complete. We did analyze features
>> like the query planner's output, our query properties (type,
>> parameters, complexity) and tried to automatically identify factors
>> that contribute the most into the total query time. It turns out that
>> you don't need to use machine learning for the basics, but at this
>> point we were not aiming for predictions yet.
>
> And how did you do that? Manually analyzing some queries?

In this case, it was automatic analysis and feature discovery. We were
generating features out of our query parameters, out of the SQL
string, and also out of the explain analyze output. For each of these
features, we have examined the P(query is slow | feature is present),
and measured its statistical properties (precision, recall,
correlations...).

With these we have built a decision tree-based partitioning, where our
feature-predicates divided the queries into subsets. Such a tree could
be used for predictions, or if we would like to be fancy, we could use
the feature vectors to train a neural network.

Hope this helps for now,
  Istvan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Istvan Soos
Hi Vinicius,

At Heap we have non-trivial complexity in our analytical queries, and
some of them can take a long time to complete. We did analyze features
like the query planner's output, our query properties (type,
parameters, complexity) and tried to automatically identify factors
that contribute the most into the total query time. It turns out that
you don't need to use machine learning for the basics, but at this
point we were not aiming for predictions yet.

As a spoiler: queries take long time because they do a lot of IO.
Features like reachback depth and duration (e.g. what period is the
analytical query about) can contribute a lot to the amount of IO,
thus, the query time. I have a blog post in my queue about our
analysis, would gladly bump its priority if there is interest in such
details.

I'm also curious: if you had a great way to predict the time/cost of
the queries, how would you use it?

Best regards,
  Istvan

On Mon, Sep 12, 2016 at 4:03 PM, Vinicius Segalin  wrote:
> Hi everyone,
>
> I'm trying to find a way to predict query runtime (I don't need to be
> extremely precise). I've been reading some papers about it, and people are
> using machine learning to do so. For the feature vector, they use what the
> DBMS's query planner provide, such as operators and their cost. The thing is
> that I haven't found any work using PostgreSQL, so I'm struggling to adapt
> it.
> My question is if anyone is aware of a work that uses machine learning and
> PostgreSQL to predict query runtime, or maybe some other method to perform
> this.
>
> Thank you.
>
> Best regards,
>
> Vinicius Segalin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general