Hi all,
As some you already know I was forced to implement an improved log support
for my own needs.
The quick solution generally isn't the best one and I understand this
applies to my implementation.
However, I'm willing to fix that so I would like to make some comments on
the last discussion on logging.
On Wed, Apr 10, 2013 at 5:25 PM, Mateusz Loskot <mate...@loskot.net> wrote:
> On 9 April 2013 23:39, Vadim Zeitlin <vz-s...@zeitlins.org> wrote:
> >
> > First, let me explain once again what are my goals here. Basically I'd
> > like to provide more information in the error messages from my code to
> > allow diagnosing what exactly went wrong when using prepared statements.
>
My needs aren't related to what went wrong during execution but what
happened while building the query.
In my case, soci's exception messages are handled externally to soci but I
understand it can be done internally.
The more important thing for me was knowing the values of the query
parameters for each execution.
> > Currently if you run a statement like "insert into foo(bar) values(:bar)"
> > and it fails with e.g. "unique constraint violation on foo.bar" error
> > message, you have no idea which record failed to be inserted because the
> > actual value of ":bar" placeholder doesn't appear anywhere. The best I
> can
> > do is to use session::get_last_query() to give the error message of the
> > form
> >
> > unique constraint violation on foo.bar while executing
> > "insert into foo(bar) values(:bar)"
> >
> > Instead I'd like to be able to say
> >
> > unique constraint violation on foo.bar while executing
> > "insert into foo(bar) values(:bar)" with "bar"='baz'
> >
>
> or maybe even just
> >
> > unique constraint violation on foo.bar while executing
> > "insert into foo(bar) values('baz')"
>
> The first (with separated parameters) is my preferred format and similar
to the one I used in my implementation.
Mixing the values of the query parameters with the literal values in the
query may lead to confusion about the responsibility of each value.
The problem of logging the parameters lies in logging it with the query of
in a separate line.
I tried logging query and params in the same line but I had a hard time
controlling that because where query is logged (prepare) can be too far in
terms of time from where parameters get logged (execute).
I decided for logging it a separate line but that can cause some confusion
if multiple queries get logged at the same time in a same log destination.
I haven't worked on that but a separate line for parameters might need some
indication of which query it belongs to.
> I like your idea. It is important do feature to be able to report errors
> with
> meaningful context. The actual format of report is a secondary issue,
> so it can be decided later.
>
Mateusz, what is your idea of 'meaningful context' for error reporting?
Do you mean including extra info about the source of the error?
(e.g. Error on prepare: bla bla bla)
> > For consistency with the last query itself, it would seem to be logical
> to
> > save the parameters values in statement_impl itself: it already calls
> > session::log_query() in its ctor and it could call some log_parameters()
> in
> > its define_and_bind(). But this is where I have my first question: do you
> > think it could be a noticeable pessimization, from performance point of
> > view, to always log the input parameters here?
>
> There is potential of performance decrease.
> So, we should be careful about it, indeed.
>
I don't think saving parameter values makes much sense.
The query is saved because the statement is prepared with such query.
In the case of parameters, they're just references to variables so what
matters is their values at the time of the execution.
>
> > I haven't done any profiling
> > yet but I have a suspicion that it could add a noticeable overhead,
> > considering that queries can have a lot of parameters and all of them
> would
> > need to be converted to strings (I think, see below) during each
> execution
> > of the query.
>
> I doubt we can perform meaningful benchmarks in short time.
> It would have to take awful lot of factors into account to draw any
> meaningful
> conclusion. In my opinion it's easier to base on assumption that strings
> bashing and formatting is a slow character by character operation.
>
> > And if we need the parameter values for error reporting only, then we
> > could avoid this potential overhead by logging them instead only in case
> an
> > exception is indeed thrown, i.e. catch all the exceptions in
> > statement_impl::execute() (and also in fetch() probably?), log the values
> > of the parameters, and then rethrow. Do you think such approach would be
> > better?
>
> If I had to choose between permanent logging vs on-error logging,
> I'd vote for this approach, to process values on error only.
> Although, it will require us to do more coding work to consider all
> exception aware places and handle the exceptions as you describe.
>
> But this, AFAIU, will loose the feature of logging on demand.
>
> So, perhaps we could have both and still save the performance
> by controlling this behaviour conditionally with use of dedicated
> configuration properties. Have you considered it?
>
> First, in the current implementation (3.2.1) queries are always logged if
log is enabled.
In case of an error, the exception info plus the query and its parameters
are useful indeed.
However, there other kinds of situations (logical errors) out of the scope
of soci.
A user expects that soci as the chosen database interface should (must?) be
able to lend a helping hand when these errors happen.
So I would vote for allowing the user to choose if s/he wants to log never,
only on errors or always.
> > Finally there is also a question of how to log the parameter values
> > exactly. The problem here is that use_type_base itself doesn't provide
> any
> > access to its value. Is there any better way to show parameters than
> > straightforwardly extending use_type_base with some as_string() virtual
> > method? Also, converting all the parameters to strings risks is what
> > worries me from performance point of view, if we do it unconditionally.
> It
> > would be nice to avoid doing this unless really necessary but to make
> this
> > possible we'd need to have a shared (i.e. ref-counted) pointer to
> > use_type_base but this would change the life time of use_type_base
> objects
> > and I'm not sure that it's not going to create any problems. Am I
> worrying
> > too much and should we just make these objects ref-counted or should we
> use
> > the assuredly safe convert-to-string-immediately approach?
>
> This is tough one and I don't have time at the moment, but I will think
> about it
> during this weekend. Perhaps, I will be able to give a sensible response.
>
> Some backends transmit the parameter values as text so we can take
advantage of this fact calling these virtual functions from the soci core
to obtain these values as suggested by Vadin.
Probably we are talking here about calling virtual functions implemented on
the backend classes for the 'standard_use' and the 'vector_use'.
The instances of these classes might hold temporarily the textual
representation of values used to execute query giving us the opportunity to
get them logged at a reduced cost.
For the backends transmitting the parameter values in binary form,
unfortunately there is some cost involved in converting them to strings.
But if the logging is disabled, the future virtual function will never get
called so you only pay for what you use.
Regarding the log format, 'use_type' knows type so it can participate in
defining the log format for its type.
> > Ah, and I lied about "finally". The really last question is about vector
> > use types. I don't use them myself (although I should), so I'm not sure
> how
> > should we log those. For now I was thinking about logging just the number
> > of elements in the vector and its first element, does anybody have any
> > other proposals for them?
>
> I took the approach of logging all the values enclosed between square
brackets paying attention to the nulls on the indicator vector.
Logging the size may be interesting but it's a redundant info if you have
all the values.
For the ORM types (soci::values), I put the number of fields preceding the
field values.
But it only makes sense if we can use multiple ORM types or mix them with
the regular types as the parameter for the same query.
> Ideally, if this was configurable I think:
> - by default, report containre metadata only (size, type(s))
> (BTW, we may also consider {boost|std}::tuple, boost::optional).
> - on demand, dump all content
>
> I personally don't mind but do we (the user) really need this level of
flexibility?
I'm looking forward to hearing your thoughts.
Regards,
Ricardo Andrade
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
soci-devel mailing list
soci-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/soci-devel