[Maria-developers] WL#143 New (by Sergei): full-text search engine plugin

2010-09-09 Thread worklog-noreply
---
  WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...: full-text search engine plugin
CREATION DATE..: Thu, 09 Sep 2010, 07:38
SUPERVISOR.: Sergei
IMPLEMENTOR: 
COPIES TO..: 
CATEGORY...: Server-RawIdeaBin
TASK ID: 143 (https://askmonty.org/worklog/?tid=143)
VERSION: WorkLog-4.0
STATUS.: Un-Assigned
PRIORITY...: 80
WORKED HOURS...: 0
ESTIMATE...: 320 (hours remain)
ORIG. ESTIMATE.: 320

PROGRESS NOTES:



DESCRIPTION:

A new plugin type - full-text search engine.

Ideally, it'll allow to add full-text search to any table, independently from
the storage engine, providing fully integrated SQL syntax, still allowing to
chose different underlying FTS implementations


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
---
WorkLog (v4.0.0)




___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


[Maria-developers] WL#144 New (by Sergei): query rewrite api

2010-09-09 Thread worklog-noreply
---
  WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...: query rewrite api
CREATION DATE..: Thu, 09 Sep 2010, 08:14
SUPERVISOR.: 
IMPLEMENTOR: 
COPIES TO..: 
CATEGORY...: Server-RawIdeaBin
TASK ID: 144 (https://askmonty.org/worklog/?tid=144)
VERSION: WorkLog-4.0
STATUS.: Un-Assigned
PRIORITY...: 60
WORKED HOURS...: 0
ESTIMATE...: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:



DESCRIPTION:

An API for query rewrites.

It may be a special rewrite plugin or part of the storage engine or some other
API. Preferably it should not force plugin to parse or work with the sql string,
but it may provide a DOM-like representation of the query and let the plugin to
manipulate the tree nodes. Perhaps this needs the Abstract Query Tree task to
be done first.


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
---
WorkLog (v4.0.0)




___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


[Maria-developers] WL#145 New (by Sergei): user defined data types

2010-09-09 Thread worklog-noreply
---
  WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...: user defined data types
CREATION DATE..: Thu, 09 Sep 2010, 08:46
SUPERVISOR.: 
IMPLEMENTOR: 
COPIES TO..: 
CATEGORY...: Server-RawIdeaBin
TASK ID: 145 (https://askmonty.org/worklog/?tid=145)
VERSION: WorkLog-4.0
STATUS.: Un-Assigned
PRIORITY...: 40
WORKED HOURS...: 0
ESTIMATE...: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:



DESCRIPTION:

soften we get requests for new data types like timestamps with microsecond
precision or ipv4/ipv6. This could be solved with user defined types -
implemented via plugins or special SQL syntax, whatever is appropriate.



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
---
WorkLog (v4.0.0)




___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


[Maria-developers] Phone home

2010-09-09 Thread Sergei Golubchik
Hi.

So, Phone Home or MySQL feedback daemon or better name wanted
feature.
It is something that can be installed together with MariaDB, it will
gather different statistic about how MariaDB is used and will send this
information anonymously to mariadb.org.

Not unlike the Uptimes Project or Debian Popularity Contest.

The complete specs will be here:
http://askmonty.org/worklog/Server-Sprint/?tid=12

There are basically four questions I'm thinking on.

1. Should that be a MariaDB plugin or a separate executable ?

I tend to prefer a separate executable. There is no need to keep it in
memory constantly - cron job can do. Being separate its bugs won't
affect the server. Being separate one instance can monitor many MariaDB
servers. It can be upgraded separately - and it's not tied to the server
release schedule.

The drawback - it won't be able to grab MariaDB internals easily, which
means it may not report some data that are worth reporting. But to solve
this we can add an I_S table that provides this information. This way
there's no hidden data to report, everything is available from the
SQL. Which is good :)

2. How to send the data.

We'll use HTTP. Seems to be the most universally working transport.
That's what other projects are using too - Uptimes Project uses UDP or
HTTP, Debian Popularity Contest - SMTP or HTTP.

We *may* want to add SMTP later, if needed.

3. Auditing.

How can we prove to paranoid users that we only send what we are saying
we send, and none of potentially private information.

Possible solution:

  http sending should support a proxy (to work behind firewalls), so one
  can install a logging proxy and record all the data sent. On the other
  hand, we'd like to use SSL too.

  We can support, besides direct http, a wget mode where the data are
  sent by invoking wget (which supports proxies, SSL and --post-file)
  and one could easily replace wget with a simple script that logs
  all the data.

4. What to report.

That's the most interesting part :)

note that not everything from below is collected in MariaDB now, but I
describe the ideal case, what would be useful to know to steer MariaDB
development in the right direction.

The principle I used was not let's grab as much as we can but on a
need-to-know basis. For example, we may need to decide whether to
optimize huge IN (...) lists or GIS first. Knowing what is used more
often would help to make a correct decision.

 hardware: CPU, RAM
 OS (linux distribution, kernel)
 mariadb version, memory usage
 parts of config (e.g. buffer sizes)
 list of installed plugins
 number of databases, max/avg number of tables in a database,
 max/avg db/table size
 uptime
 something that indicates the load, e.g. average qps
 how much a particular feature is used:
   Com_ counters from SHOW STATUS
   plugin usage counters
   per feature, like GIS, replication, etc.
   per query parts, like ORDER BY, subquery in the FROM, IN subquery ...
   how useful is query cache (hit ratio?)

What else ?

Regards,
Sergei

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Phone home

2010-09-09 Thread Adam M. Dutko
 So, Phone Home or MySQL feedback daemon or better name wanted
 feature.


Maybe call it Butler  ??? Just a thought...

Not unlike the Uptimes Project or Debian Popularity Contest.


Opt-in only with an easy disable option after opting in... correct?


 The complete specs will be here:
 http://askmonty.org/worklog/Server-Sprint/?tid=12


I imagine the following ...

  (optionally by user) geographic location
  (optionally by user) user information / company name
  (optionally by user) Monty Program Ab customer support contract id

won't be shown to everyone, correct?  So maybe a filtered public versus
unfiltered private view?


 1. Should that be a MariaDB plugin or a separate executable ?


A separate executable would probably be the best for the reasons you
highlight in your first paragraph.  The drawbacks are probably covered by
the fact that 1) if a user is having that awful of a time, they are probably
able to step through the executing code or 2) the user probably has a
support contract with a company that can step through the code and debug the
problem.  Granted more in depth statistics would be useful, but maybe it
would make sense to have a separate project to create a loadable module that
would be more invasive.  This tool seems to be oriented towards usage and
usage related data, not necessarily troubleshooting/fixing.


 2. How to send the data.


I imagine if the code is generated with this in mind it should be easy to
switch out the transport (read transmission method) layer at a later
time.  Unless the person coding it really ties the data formatting and
submission process to the protocol.

3. Auditing.


I think the proxy idea, as well as the wget mode are great ideas.  If the
user isn't paranoid and doesn't want to sniff traffic one could also
provide a log of all activities and a separate log for all messages.


 4. What to report.

  hardware: CPU, RAM


maybe disk speeds? and type?  (SATA vs SAS vs IDE)


  OS (linux distribution, kernel)


any libraries?


  number of databases, max/avg number of tables in a database,


the slightly insane might also run multiple instances on a single machine,
so what about checking for other installations?



Just a few thoughts, hopefully they're not distracting or useless.

-Adam
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp