On 08/10/12 15:44, Yaron Koren wrote:
> Hi,
>
> I'm finally trying out SMWSQLStore3, and figuring out how to get my
> extensions compatible with it, so I'm looking at the new table structure
> for the first time.
>
> First of all, is there any documentation about the design decisions that
> went into this new structure? Because it could be that this issue has
> been answered already.

Hi Yaron,

I don't think we have a comprehensive documentation yet (Nischay may 
correct me though). You already have a good idea of certain things; I 
will try to further clarify this below (I think there are some 
misunderstandings).

>
> If not - basically, the way the new database structure seem to work is
> that there's a separate table for each special property.

That is not the case. You are right that the new structure supports 
tables that are used for one property only. This has some advantages:

* The property is determined from the table; no need to store the 
property in each row -- less storage space, smaller indexes, less memory
* Changes for that property can be written independently of changes for 
other properties. For example, modification date changes on each edit, 
yet we do not need to update other properties/tables on each edit -- 
reduced write activity
* The table is smaller and more specific than general-purpose 
property-value tables. This gives MySQL (or any DB) a better chance for 
guessing selectivity when doing join optimisations in queries. -- faster 
query execution

There is a limit on how many tables a DBMS is happy with, but 30 tables 
are not a problem. Reading all data for one subject can become slower if 
the data resides in many tables, but reading data for a particular 
property should be faster, esp. if the respective property is used a lot 
(small tables fit into memory).

Now to your original question: it is not the case that all special 
properties have their own tables. The two things are independent: 
special properties can be stored in the common catch-all tables and 
normal properties can have their own table. It is just that SMW by 
default provides special tables for its own special properties. This 
includes SF properties that are heavily used (this dependency between 
SMW and SF has been there in similar form for many versions; properties 
used by SF all the time [on every page build] always had their datatype 
hardcoded in SMW).

There is a mechanism for users to create tables for "important" 
properties at their own discretion. I am not sure if this is documented yet.

> "Creation date"
> and "Modification date" both have one, as do "Has improper value for",
> and probably some others. Interestingly, there are also tables for the
> Semantic Forms special properties "Has default form" and "Has alternate
> form".
>
> This strikes me as strange design, for a few reasons:
>
> - I assume that this is done to speed up querying; but, as far as I
> know, many of SMW's special properties - like "Creation date" and the
> rest - are rarely queried on.

See above. Querying is only one aspect. Putting rarely used/rarely 
updated data into its own table allows this data to be disregarded in 
many cases, thus taking load off the rest of the DB.

>
> - This could lead to an explosion of database tables - it looks like
> there are about 30 in the new structure, which some might consider an
> explosion already, and if there are a bunch more special properties
> added for metadata (like "last author", "first author", etc.) the number
> could just keep growing indefinitely.

30 should not be a problem for any DBMS, and tables are not added 
automatically when special properties are added.

>
> - It's understandable that Semantic Forms would get special handling,
> but still, having one extension handle things for another introduces
> dependency issues. What if Semantic Forms got other special properties?
> Or what if, say, "Has alternate form" were removed? It could potentially
> lead to compatibility problems.

We can think about how to reduce this dependency. SF could probably 
control the property tables it wants to use by itself. However, I would 
recommend having tables for all of its frequently used properties.

>
> - On that note, SF already has two special properties that don't have
> their own table - "Page has default form" and "Creates pages with form",
> both of which can apply to forms just like "Has default form" and "Has
> alternate form" do. Not that I'm suggesting that SMW should get two more
> tables for these, but on the other hand, I believe SF queries on these
> on a fairly regular basis.

Then it would probably be good to have extra tables.

>
> So what could be done instead? I can think of a few options:
>
> - Store all properties, special and otherwise, in the same set of
> tables, and make better use of indexing to speed up queries.

We already make better use of indexing anyway, to the extent that we 
know how ;-). But MySQL is mainly using table-based selectivity 
measures, so that join optimisation is not very good if everything is 
stored in only a few tables.

The other motive for having many tables is to split frequently changing 
data from infrequently changing data to allow for better update control 
(fewer writes, smaller writes).

>
> - Create tables for general use by all special properties. In the new
> design, SMW has a separate table for each property type, but for special
> properties I think there are really only three types that have been
> used: "Page", "Date" and "String". So it could be that only three
> special property tables would be required. Again, indexing could be used
> here.

That would introduce a distinction between special properties and normal 
properties on the database level. We do not think that this is justified 
(as you argued, there can well be many "special" properties such as 
"first author" that are not so different from normal properties; storing 
them in the same table as "has type" would mix up requirements).

>
> - If Semantic Forms specifically is slowing things down with its
> querying, maybe that extension should change its own handling. It
> doesn't necessarily even need to use SMW to store information on "Has
> default form" and the rest - it could use the standard MediaWiki
> page_props table. SF's special properties are never queried on alongside
> regular properties, so I don't think that would have a negative impact
> on users. That could be fairly easily done (although the use of SMW
> would be preserved for backwards compatibility, probably with the aid of
> a LocalSettings.php setting). I don't know if that would improve
> performance at all, but it might be worth trying out.

I don't know of any problem with SF. It would of course be faster if you 
would store all SF data for one page in one value, instead of storing 
independent values for many properties. But then you would no longer be 
able to query for the pages that use a certain form with #ask. One could 
have both (store a blob with all information in one value and store each 
value independently for queries), but I am not sure that this would be 
worth the effort.

>
> - For that matter, maybe many of the other special properties could be
> stored using the page_props table instead. SMW has always been used to
> store metadata just because it was convenient to do - but if that
> storage is leading to significant performance problems, maybe page_props
> is the faster and (at least to the extent that it reduces table clutter)
> easier alternative.

I am not aware of performance problems related to special property reads 
in SMW. If SMW reads many special property values, then they come from 
many pages; no single page has very many special property values. So 
using page_props would probably not change much in terms of overall read 
counts.

The new system also has a way to recognize when a property has no value 
for a certain subject without looking at its table. So the common case 
that a property has no value (e.g., no "allowed values") could be 
optimised further (tbd).

Markus


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to