Re: [SMW-devel] [PATCH] Support LIKE in queries

2007-12-30 Thread Yaron Koren
Dan - I doubt that there will ever be both a regex and a wildcard option
in SMW's query language - that seems like overkill, and somewhat bad design.
A single such option is enough, and if it happens, behind the scenes, to use
both SQL's and PHP's pattern-matching capabilities at different times, that
should be hidden from the user. So I doubt that there'll be a need for two
different symbols (Markus, or anyone else, correct me if I'm wrong).

So, let me argue in favor of the ~ symbol - hopefully it's not too late
before the Sunday evening deadline. :) The Halo extension is a helpful one,
but it's a spinoff of SMW, and thus there's no reason why it should hamper
design decisions in SMW. That goes for all extensions that use Semantic
MediaWiki - I know, for my own part, that the extensions I've created have
to do all sorts of work to be compatible with the different versions of SMW.
That's as it should be - the spinoffs work around the main application. From
what I understand, Halo is currently not compatible with the most recent
versions of SMW anyway, so it needs to be modified anyway - there's no need
to try to ensure backwards compatibility.

And, as you point out, that functionality in Halo might not be getting used
at all - though even if it were, that shouldn't affect how SMW is designed.

-Yaron


On Dec 29, 2007 9:54 PM, DanTMan  [EMAIL PROTECTED] wrote:

 ^_^ ok, I thought we escaped with a \, which isn't something that normal
 users would find easy to use. But a starting space escape is ok.

 I still would pick ~ as the best thing for use of REGEX and prefer a
 different operator for wild cards
 I guess the % is probably best for the wild card operator. Which brings
 me the thought of:

 EQ:[[property::value]]
 NEQ:   [[property::!value]]
 GT:[[property::value]]
 LT:[[property::value]]
 WILD:  [[property::%value]] (Using ? and *)

 Also, I propose a few more additions since they will probably have some
 good use to.

 GTEQ:  [[property::=value]]
 LTEQ:  [[property::=value]]
 NWILD: [[property::!%value]] (Negated wild card)
 REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could
 of course be replaced with !, [], etc... any valid in preg.
 NGT:   [[property::#value]] (Natural order greater than)
 NLT:   [[property::#value]] (Natural order less than)
 NGTEQ: [[property::#=value]] (Natural order greater than or equal to)
 NLTEQ: [[property::#=value]] (Natural order less than or equal to)

 Of course, the REGEX one is provided that we can fix the issue of
 colliding with Halo.
 But on note of that negated wild card. I added that one for one primary
 reason. Unlike any of the other things, you cannot negate a wild card
 with any other format. ( can be negated with =, eq with !, and regex
 can negate things inside of it. But you can't negate a wild card) Also,
 remember to escape things so that we can use (\* and \? to use those
 literally; I could draft all the replaces needed, but I got to go do
 something first)
 As for the Natural order ones, if you don't know what those are for,
 it's things like values of 1.2.3 and 1.12.3. Using a normal  it
 thinks that 1.2.3 is greater than 1.12.3 because the third character
 is a two and the third character in the other is a 1. But a natural
 order properly distinguishes the second number as 12. PHP has functions
 for these built in and would be nice for use.

 ~Daniel Friesen(Dantman) of:
 -The Gaiapedia ( http://gaia.wikia.com)
 -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
 -and Wiki-Tools.com ( http://wiki-tools.com)

 Markus Krötzsch wrote:
  On Samstag, 29. Dezember 2007, DanTMan wrote:
 
  A lot of people are accustomed to the ? (single-character match) and *
  (multi-character match) format. It would be easy to escape the '_'s and
  '%'s in a match and then do a replace of ? to _ and * to %. (A little
  preg and \ could still easily escape those.)
 
 
  Yes, I agree to that. I think, if nobody objects, this fixes the pattern
  syntax. So it remains to find a good symbol for the comparator.
 
 
  I don't know about ~ though, in the languages I've used I recall ~
  having something to do with regex. I'd rather save that character for
 in
  case we want to be able to use the REGEXP matching inside of SQL.
 
   From what I remember, I think most people with only a little insight
  into technical stuff, would adjust easiest to using this set:
  = Equals
 
Greater than
   = Greater than or equal to
 
   Less than or equal to
  ! Not
  * Multi-character match
  ? Single-character match
  ~ regex
 
 
  As a note: = is not available in parser function #ask, since it has a
  special meaning as parameter assignment, as e.g. in format=table. The
 query
  is distinguished from the other parameters and print requests in #ask
 since
  it has no = symbol and does not start on ?.
 
 
  But I did have a thought about the @... It's not used anywhere afaik.
  I did make a suggestion on using a pattern to separate the 

Re: [SMW-devel] Performance: (Was: {{#ask}})

2007-12-30 Thread Markus Krötzsch
On Sonntag, 30. Dezember 2007, Sergey Chernyshev wrote:
snip
  In fact we have such a site, but it runs on a rather unstable hardware
  (we have a buggy RAID controller or driver :-(). It is our test server at
  test.ontoworld.org, which also was used for other experiments and is not
  in
  perfect shape right now (and querying was disabled in order to not impair
  other experiments). We might set up another more recent Wikipedia copy
  sometime in some future.

 Great, I'll definitely take a look. What were your main discoveries with
 English WP and SMW? How many Semantic Templates did you make? I'd say that
 I'm mostly interested in volume and performance on relatively simple
 queries since this is what most of large projects would need, although
 complex queries are also interesting.

Results of a small evaluation on this site are reported in [1]. The problem is 
that detailed results can change quickly with software versions (SMW changes 
of course, but MySQL also does). So things should be taken with a grain of 
salt.


 It would be quite interesting to see latest WP and latest SMW to understand
 biggest pain-points.

Biggest pain IMHO is unpredictability of query times. It turns out that most 
queries are rather fast even on big sites, while some are slow without 
apparent reasons. If we could just limit SQL-side query execution and report 
timeout errors if needed, things would be much simpler. But I do not know how 
to achieve that.

Markus


[1] http://korrekt.org/index.php/Semantic_Wikipedia_%28JWS2007%29


   Since we're talking about performance, there is another side of
 
  performance
 
   tuning - percepted performance, this mostly concerns javascripts, css
 
  and
 
   so on - for example there is still a problem of SIMILE Timeline not
 
  being
 
   that fast to load (although performance of pages that didn't have it
   improved now, when client-side the code is loaded only on pages that
 
  need
 
   it). This kind of issues can be tracked using Firebog with Yahoo's
   YSlow add-on.
 
  True, and I hope Timeline is really the main performance problem there. I
  wonder whether we could ship a more stripped down version of the scripts
  to
  decrease load time. I guess we should ask the guys over at SAIL for that
  ...

 Can you describe modifications you made to original Timeline code? I used
 it some time ago and took closer look at how their code is bundled, I might
 be able to help to migrate it to new version. BTW, it might make sense to
 have SMW code separate from theirs to make such upgrades easier.

   I'll be happy to run the tests on the system with significant amount of
   data if you need a testbed.
 
  All profiling support is appreciated, but I am not sure how to
  operationalise
  testing on our servers (SQL profiling would probably need server access,
  which is not possible in this case). Insights on JavaScript performance
  are
  also useful, but I guess that MySQL tuning could be most important for
  approaching large sites. I you have know about DB optimisation, you can
  also
  have a look at our DB layout and at the SQL queries we generate
  (format=debug).

 Actually I was talking about standardizing some test process that all of us
 can run on our servers to compare performance and settings and help each
 other to derive best practices in terms of performance and maybe even find
 bottlenecks.

   Sergey

  Thanks,
 
  Markus



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] SMW Performance

2007-12-30 Thread Markus Krötzsch
On Sonntag, 30. Dezember 2007, Sergey Chernyshev wrote:
 Hmm. I didn't realize there is a way to remove $smwgQDefaultNamespaces
 restriction and this will enable all namespaces instead of disabling them.

 Why is it that this setting not set to NULL by default then? I don't see
 any point in restricting namespaces unless it's absolutely necessary for
 security reasons or something.

I agree. Done.

Markus


 Sergey

 On Dec 29, 2007 10:29 AM, Markus Krötzsch [EMAIL PROTECTED] wrote:
  On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote:
   We have a set of semantic queries in a template. That template is used
   in some pages. However, by looking at the database process list, it
   seems that those set of queries are processed whenever a page is
   requested, even when the template is not used by the requested page
   (e.g. special pages).
 
  Do I understand this correctly? The delivery of special pages that are
  completely unrelated to said template triggers the ask-queries contained
  therein? This would be very strange behaviour indeed (I cannot currently
  imagine how or why this should happen in MediaWiki)!
 
   All the SQL queries are generated by the
   getQueryResult function. Since those queries are very computational
   intensive, this bug slows down the entire site. If we take the inline
   queries out of the template or change $smwgQEnabled to false, the site
   becomes fast again. Has anyone experienced the same issue?
 
  In general, if queries on some site are too slow, it is useful to
  configure
  SMW to support faster querying (with less features, of course). Basic
  settings one can try to speed up querying are:
 
  include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php');
  $smwgQSubcategoryDepth = 0;
  $smwgQSubpropertyDepth = 0;
  $smwgQEqualitySupport = SMW_EQ_NONE;
  $smwgQDefaultNamespaces = NULL;
  enableSemantics(semedia-wiki.localhost);
 
  Those settings will speed up basically all queries, disabling all support
  for
  property and category hierarchies, equality (redirects), and namespace
  restrictions (i.e. queries consider pages in all namespaces, including,
  e.g.,
  User:). You can experiment which of those, if any, affects your query
  performance positively.
 
  If you have problems with too complex user-generated queries, then the
  parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict
  this.
 
  In general, it should be emphasised that queries should be used in a
  targetted
  way. Ontoworld.org had the infamous template {{ask}} for some time, which
  included queries for almost anything, which would just not appear if no
  results would be obtained. Most wikis should rather have single query
  templates for special purposes instead of trying to have one for all.
 
  Anyway, for further optimisation, we need some pointer to your site, or
  at least some statistical information concerning its size
  (Special:SemanticStatistics) and the query structure. Did you mention the
  SMW
  version you use? Some of the above assume SMW1.0-RC3, and none will work
  prior to SMW1.0-RC1.
 
  Markus
 
 
  --
  Markus Krötzsch
  Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
  phone +49 (0)721 608 7362fax +49 (0)721 608 5998
  [EMAIL PROTECTED]www  http://korrekt.org
 
  -
  This SF.net email is sponsored by: Microsoft
  Defy all challenges. Microsoft(R) Visual Studio 2005.
  http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
  ___
  Semediawiki-devel mailing list
  Semediawiki-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/semediawiki-devel



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


[SMW-devel] Updates: Images, LIKE-search, sorting tables

2007-12-30 Thread Markus Krötzsch
A brief feature update of some further things that just made it into the 
soon-to-be-released SMW1.0:

* Properties with Type:Page will now display values in the image namespace by 
showing actual images, at least in Factboxes and #ask query results. 
[Contribution by Yaron Koren]

* There is an optional pattern matching string search for properties of 
Type:String. For instance the query atom [[property::% Semantic*]] matches 
anything with a value for property that starts with Semantic. Since this 
feature might be costly, it is not enabled by default, but can be added with 
a configuration option (see SMW_Settings.php). 
[Contribution by Thomas Bleher]

* Query result tables now support correct sorting for all kinds of values, 
especially for dates and for formatted numerical values (possibly with unit). 
So far the JavaScript sometimes would wrongly sort such values 
alphabetically. 
[Contribution by Tobias Matzner]

* There is a new RSS-Feed feature that allows you to subscribe to query 
results. Besides the associated performance issues with thousands of users 
re-issuing queries regularly, this should be an extremely cool thing. It 
simply works by creating an inline query with the new format rss, which 
creates a link to an RSS feed. The parameters rsstitle and rssdescription 
can be used to describe the feed, searchlabel is available to customise the 
link text. Print requests are largely ignored right now, unless they use the 
label creator or date in which case they are used for specifying the 
according fields in RSS. Otherwise the last wiki editor and modification date 
is used. An example is to be found at the top of [1]. Configuration 
parameters are available for enabling/disabling this feature, and to choose 
between complete embedding of articles and plain result lists.

Note that this is also very nice in conjunction with Firefox' dynamic bookmark 
feature, enabling browser bookmarks that link to recent query results from 
your favourite wiki.
[Work by Denny  myself]


Thanks to all contributors for code, helpful discussions, and comments on 
improving SMW. Unless something terrible happens, SMW1.0 will be released in 
2007.

Cheers,

Markus

[1] http://ontoworld.org/wiki/Upcoming_events (rather short right now; we plan 
to add some more events in next year ...)

-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] {{#ask}}

2007-12-30 Thread Sergey Chernyshev
My point is that SMW's mission statement is to create Semantic Wikipedia -
that's why I'm saying this. Or am I wrong?

Sergey


On Dec 30, 2007 4:54 AM, cnit [EMAIL PROTECTED] wrote:

  I'm not sure if restricting Ask functionality is along the lines of
  Wikipedia policies - it's not a modification operation therefore it
 should
  be public, I believe.
 Sure, but my MW sites aren't wikipedia. Also, imagine how much resourse
 intensive would it be on the huge datasets with lots of properties and
 property values defined.

  I agree, that abuse bocking and request throttling might be a solution
 here,
  but in general, I wouldn't recommend restriction of access, but a
  functionality instead, e.g. limited amount of joins or something like
 that.
 Maybe you're right. It would be nice to have the settings mentioned by
 Markus
 $smwgQSubpropertyDepth, $smwgQEqualitySupport
 separately for anonymous and the rest of the users. The problem is, that
 such queries might be unavailable sometimes.

  This kind of actions is generally hard to limit and predict therefore
 it's
  quite easy to abuse. This might be a serious bottleneck in SMW adoption
 by
  Wikipedia.
 Dmitriy


 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2005.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Semediawiki-devel mailing list
 Semediawiki-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/semediawiki-devel




-- 
Sergey Chernyshev
http://www.sergeychernyshev.com/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel