Re: [SMW-devel] [PATCH] Support LIKE in queries
Dan - I doubt that there will ever be both a regex and a wildcard option in SMW's query language - that seems like overkill, and somewhat bad design. A single such option is enough, and if it happens, behind the scenes, to use both SQL's and PHP's pattern-matching capabilities at different times, that should be hidden from the user. So I doubt that there'll be a need for two different symbols (Markus, or anyone else, correct me if I'm wrong). So, let me argue in favor of the ~ symbol - hopefully it's not too late before the Sunday evening deadline. :) The Halo extension is a helpful one, but it's a spinoff of SMW, and thus there's no reason why it should hamper design decisions in SMW. That goes for all extensions that use Semantic MediaWiki - I know, for my own part, that the extensions I've created have to do all sorts of work to be compatible with the different versions of SMW. That's as it should be - the spinoffs work around the main application. From what I understand, Halo is currently not compatible with the most recent versions of SMW anyway, so it needs to be modified anyway - there's no need to try to ensure backwards compatibility. And, as you point out, that functionality in Halo might not be getting used at all - though even if it were, that shouldn't affect how SMW is designed. -Yaron On Dec 29, 2007 9:54 PM, DanTMan [EMAIL PROTECTED] wrote: ^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ:[[property::value]] NEQ: [[property::!value]] GT:[[property::value]] LT:[[property::value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::=value]] LTEQ: [[property::=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#value]] (Natural order greater than) NLT: [[property::#value]] (Natural order less than) NGTEQ: [[property::#=value]] (Natural order greater than or equal to) NLTEQ: [[property::#=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. ( can be negated with =, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of 1.2.3 and 1.12.3. Using a normal it thinks that 1.2.3 is greater than 1.12.3 because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia ( http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com ( http://wiki-tools.com) Markus Krötzsch wrote: On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the
Re: [SMW-devel] Performance: (Was: {{#ask}})
On Sonntag, 30. Dezember 2007, Sergey Chernyshev wrote: snip In fact we have such a site, but it runs on a rather unstable hardware (we have a buggy RAID controller or driver :-(). It is our test server at test.ontoworld.org, which also was used for other experiments and is not in perfect shape right now (and querying was disabled in order to not impair other experiments). We might set up another more recent Wikipedia copy sometime in some future. Great, I'll definitely take a look. What were your main discoveries with English WP and SMW? How many Semantic Templates did you make? I'd say that I'm mostly interested in volume and performance on relatively simple queries since this is what most of large projects would need, although complex queries are also interesting. Results of a small evaluation on this site are reported in [1]. The problem is that detailed results can change quickly with software versions (SMW changes of course, but MySQL also does). So things should be taken with a grain of salt. It would be quite interesting to see latest WP and latest SMW to understand biggest pain-points. Biggest pain IMHO is unpredictability of query times. It turns out that most queries are rather fast even on big sites, while some are slow without apparent reasons. If we could just limit SQL-side query execution and report timeout errors if needed, things would be much simpler. But I do not know how to achieve that. Markus [1] http://korrekt.org/index.php/Semantic_Wikipedia_%28JWS2007%29 Since we're talking about performance, there is another side of performance tuning - percepted performance, this mostly concerns javascripts, css and so on - for example there is still a problem of SIMILE Timeline not being that fast to load (although performance of pages that didn't have it improved now, when client-side the code is loaded only on pages that need it). This kind of issues can be tracked using Firebog with Yahoo's YSlow add-on. True, and I hope Timeline is really the main performance problem there. I wonder whether we could ship a more stripped down version of the scripts to decrease load time. I guess we should ask the guys over at SAIL for that ... Can you describe modifications you made to original Timeline code? I used it some time ago and took closer look at how their code is bundled, I might be able to help to migrate it to new version. BTW, it might make sense to have SMW code separate from theirs to make such upgrades easier. I'll be happy to run the tests on the system with significant amount of data if you need a testbed. All profiling support is appreciated, but I am not sure how to operationalise testing on our servers (SQL profiling would probably need server access, which is not possible in this case). Insights on JavaScript performance are also useful, but I guess that MySQL tuning could be most important for approaching large sites. I you have know about DB optimisation, you can also have a look at our DB layout and at the SQL queries we generate (format=debug). Actually I was talking about standardizing some test process that all of us can run on our servers to compare performance and settings and help each other to derive best practices in terms of performance and maybe even find bottlenecks. Sergey Thanks, Markus -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] SMW Performance
On Sonntag, 30. Dezember 2007, Sergey Chernyshev wrote: Hmm. I didn't realize there is a way to remove $smwgQDefaultNamespaces restriction and this will enable all namespaces instead of disabling them. Why is it that this setting not set to NULL by default then? I don't see any point in restricting namespaces unless it's absolutely necessary for security reasons or something. I agree. Done. Markus Sergey On Dec 29, 2007 10:29 AM, Markus Krötzsch [EMAIL PROTECTED] wrote: On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote: We have a set of semantic queries in a template. That template is used in some pages. However, by looking at the database process list, it seems that those set of queries are processed whenever a page is requested, even when the template is not used by the requested page (e.g. special pages). Do I understand this correctly? The delivery of special pages that are completely unrelated to said template triggers the ask-queries contained therein? This would be very strange behaviour indeed (I cannot currently imagine how or why this should happen in MediaWiki)! All the SQL queries are generated by the getQueryResult function. Since those queries are very computational intensive, this bug slows down the entire site. If we take the inline queries out of the template or change $smwgQEnabled to false, the site becomes fast again. Has anyone experienced the same issue? In general, if queries on some site are too slow, it is useful to configure SMW to support faster querying (with less features, of course). Basic settings one can try to speed up querying are: include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php'); $smwgQSubcategoryDepth = 0; $smwgQSubpropertyDepth = 0; $smwgQEqualitySupport = SMW_EQ_NONE; $smwgQDefaultNamespaces = NULL; enableSemantics(semedia-wiki.localhost); Those settings will speed up basically all queries, disabling all support for property and category hierarchies, equality (redirects), and namespace restrictions (i.e. queries consider pages in all namespaces, including, e.g., User:). You can experiment which of those, if any, affects your query performance positively. If you have problems with too complex user-generated queries, then the parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict this. In general, it should be emphasised that queries should be used in a targetted way. Ontoworld.org had the infamous template {{ask}} for some time, which included queries for almost anything, which would just not appear if no results would be obtained. Most wikis should rather have single query templates for special purposes instead of trying to have one for all. Anyway, for further optimisation, we need some pointer to your site, or at least some statistical information concerning its size (Special:SemanticStatistics) and the query structure. Did you mention the SMW version you use? Some of the above assume SMW1.0-RC3, and none will work prior to SMW1.0-RC1. Markus -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
[SMW-devel] Updates: Images, LIKE-search, sorting tables
A brief feature update of some further things that just made it into the soon-to-be-released SMW1.0: * Properties with Type:Page will now display values in the image namespace by showing actual images, at least in Factboxes and #ask query results. [Contribution by Yaron Koren] * There is an optional pattern matching string search for properties of Type:String. For instance the query atom [[property::% Semantic*]] matches anything with a value for property that starts with Semantic. Since this feature might be costly, it is not enabled by default, but can be added with a configuration option (see SMW_Settings.php). [Contribution by Thomas Bleher] * Query result tables now support correct sorting for all kinds of values, especially for dates and for formatted numerical values (possibly with unit). So far the JavaScript sometimes would wrongly sort such values alphabetically. [Contribution by Tobias Matzner] * There is a new RSS-Feed feature that allows you to subscribe to query results. Besides the associated performance issues with thousands of users re-issuing queries regularly, this should be an extremely cool thing. It simply works by creating an inline query with the new format rss, which creates a link to an RSS feed. The parameters rsstitle and rssdescription can be used to describe the feed, searchlabel is available to customise the link text. Print requests are largely ignored right now, unless they use the label creator or date in which case they are used for specifying the according fields in RSS. Otherwise the last wiki editor and modification date is used. An example is to be found at the top of [1]. Configuration parameters are available for enabling/disabling this feature, and to choose between complete embedding of articles and plain result lists. Note that this is also very nice in conjunction with Firefox' dynamic bookmark feature, enabling browser bookmarks that link to recent query results from your favourite wiki. [Work by Denny myself] Thanks to all contributors for code, helpful discussions, and comments on improving SMW. Unless something terrible happens, SMW1.0 will be released in 2007. Cheers, Markus [1] http://ontoworld.org/wiki/Upcoming_events (rather short right now; we plan to add some more events in next year ...) -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] {{#ask}}
My point is that SMW's mission statement is to create Semantic Wikipedia - that's why I'm saying this. Or am I wrong? Sergey On Dec 30, 2007 4:54 AM, cnit [EMAIL PROTECTED] wrote: I'm not sure if restricting Ask functionality is along the lines of Wikipedia policies - it's not a modification operation therefore it should be public, I believe. Sure, but my MW sites aren't wikipedia. Also, imagine how much resourse intensive would it be on the huge datasets with lots of properties and property values defined. I agree, that abuse bocking and request throttling might be a solution here, but in general, I wouldn't recommend restriction of access, but a functionality instead, e.g. limited amount of joins or something like that. Maybe you're right. It would be nice to have the settings mentioned by Markus $smwgQSubpropertyDepth, $smwgQEqualitySupport separately for anonymous and the rest of the users. The problem is, that such queries might be unavailable sometimes. This kind of actions is generally hard to limit and predict therefore it's quite easy to abuse. This might be a serious bottleneck in SMW adoption by Wikipedia. Dmitriy - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Sergey Chernyshev http://www.sergeychernyshev.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel