On 14/01/12 14:33, James Hong Kong wrote:
> Hi,

Hi James,

>
> Starting with SMW 1.7 and MW 1.18, we began to convert our old legacy
> document system into a SMW-MW based system which right now left us
> with more than 700.00 triplets stored in SMW but at the same time
> decreased our response time on SMW-related queries.
>
> Somewhere around 200.000 triplets (it does not mean the number is a
> threshold)  we recognized an increased impact on query performance
> where now every time we execute a query we feel the pinch. We are not
> talking about in-template query performance as seen by the
> Wikia/Familypedia example (we abandoned such practices some time ago).
> Nowadays we encourage users to execute all complex queries either via
> Special:Ask or provide an input form to run a RunQuery and yes we are
> using APC to improve caching and response time in general.
>
> We tried to look at external solutions such as 4Store which is not
> supported on Windows, Virtuoso has no real documentation available to
> make it work with SMW (at least we couldn't find one), and Jena which
> seems to require SMW+ leaving us with the native SMW store itself and
> we would like to keep it that way as every external software means an
> additional fault point and maintenances effort.

Getting Virtuoso to work properly is my next goal, but you are right 
that there is no official support there yet. There are already hacks to 
get it work but they have not been integrated into SMW so far.

>
> == Architectural question ==
>
> #1 Could their be an indexing problem on behalf of one of the primary
> SMW table key indexes?

Possibly, but none that I am aware of. Did you find out anything about 
the queries that cause the problems and the indexes that they are using? 
If you think that more/different indexes would help, you can also modify 
them manually to the SMW tables to see if this makes a difference 
(though running SMW_setup.php would undo these changes).

>
> # 2 Does SMW natively support MySQL internal
> query-cache-type/query-cache-size option to improve query performance?
> We made sure MySQL is using query-cache-type/query-cache-size option
> but somehow this don't show any effect for SMW-related queries.

SMW does not do anything with query-cache-type/query-cache-size. So it 
should not overwrite your global settings in this respect, but maybe the 
performance problem is not caused there.

SMW does have a simple Concept mechanism to manually manage query caches 
(kept in a database table). If you have a particularly common/heavily 
used query or query-part, then this could be an option. If you have 
thousands of very different queries that do not share similar 
conditions, then this will hardly be feasible.

>
> #3 Would a different approach to handle query data namely storing
> query data in a temporary in-memory table bring advantages compared to
> the current approach of accessing SMW disk tables every-time a query
> is executed? Would an in-memory concept for queried data (SMW data is
> mirrored into a temporary in-memory table for READ purpose only at the
> time of the actual MySQL session and every time MySQL is restarted
> temporary in-memory tables have to been rebuild) improve query and
> access performance of SMW related triplets. I guess (I don't know)
> neither MyISM or InnoDB would do have an impact since the bottleneck
> seems the disk access to execute queries on behalf of triplets stored
> in SMW-related tables.

It might of course be possible to optimize the MySQL-based query engine 
for better performance. It would also be possible to make use of memory 
caches in some cases, though this needs some thought about how to manage 
these caches. But overall, I would not put too much development effort 
into optimizing MySQL query performance in particular, given that there 
are projects like Virtuoso and 4Store who spend all their time doing 
mainly that. Connecting Virtuoso should not be so hard (mainly we are 
facing some protocol issues that I did not have time to look at yet; if 
Virtuoso would support SPARQL 1.1, then it should be working out of the 
box; we are mainly talking about proprietary tweaks in the query syntax 
here).

>
> Of course their is always a way to improve performance by using better
> hardware (RAID, SSD to improve output performance) but this a last
> resort approach which we would like to avoid for the moment.

Yes, I agree. I will increase the priority for finally getting Virtuoso 
working a bit; there are other open threads on this list related to this.

Regards,

Markus

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to