Re: [SMW-devel] {{#ask}}

2007-12-28 Thread Markus Krötzsch
On Donnerstag, 27. Dezember 2007, cnit wrote:
  (2) Query answering is done without any caching, and this is clearly a
  problem. While inline queries are computed only once and stored in the
  parser cache afterwards, Special:Ask has no caching facility at all. This
  needs to change in the future. Targetted cache invalidation might still
  be difficult and it is not clear whether the effort is needed (one could
  enable manual cache clearing like for pages). A new query cache --
  design, architecture and implementation -- is needed here.

 Too much of caching can hurt dynamic content - it's nice to have the
 page with query being updated at least once per hour or two.

Well, that is not the case for the current parser cache, neither in MW nor in 
SMW. But if course it could be achieved with some server-side cronjobs.

 Speaking of Special:Ask I believe it should be limited to registered
 users only. It might slow down the operation 

Which is due to the lack of caching ...

 and also is suggestive 
 for hackers trying to build an exploiting query.

My strong hope is that none such query is possible. If security issues with 
queries should exist, I would like to find them rather sooner than later. 

I expected that it would be possible to limit Special page access based on 
some MW mechanism already. Is there no way of configuring MediaWiki this way?

Markus

-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] SMW performance

2007-12-28 Thread cnit
 Forget my previous post. The problem goesaway when I removed one
 template. It seems the performance issue is related tothe
 application instead of the database.
Try setting up eAccelerator for PHP, maybe it would help a bit. Also,
I believe that MW/SMW requires dedicated server (co-location). We've
tried usual low-cost hosting (with hundreds of other's virtual hosts)
and even MW alone was crawling..
Also, it was slow under Windows. It's ok with Linux server. Of course
you could also try dedicated MySQL server. MW even supports MySQL
clustering and web load-balancing, because it's being used by
wikipedia - one of the busiest and largest sites in the world.
Dmitriy


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] {{#ask}}

2007-12-28 Thread cnit
 Well, that is not the case for the current parser cache, neither in MW nor in
 SMW. But if course it could be achieved with some server-side cronjobs.
Ah, I didn't knew about MW cronjobs. That sounds nice. Will try to
find out some examples. Maybe you're right that such functionality
shouldn't belong to main application itself..

 Which is due to the lack of caching ...
Well, yes. Of course if someone wants to slow down the site, he could
use many different queries. But, it can be traced with apache logs and
banned by IP..

 My strong hope is that none such query is possible. If security issues with
 queries should exist, I would like to find them rather sooner than later.
I hope that, too.

 I expected that it would be possible to limit Special page access based on
 some MW mechanism already. Is there no way of configuring MediaWiki this way?
http://meta.wikimedia.org/wiki/Help:Special_page#Restricted_special_pages
e.g.
includes/SpecialBlockip.php
contains the following check:

# Permission check
if( !$wgUser-isAllowed( 'block' ) ) {
$wgOut-permissionRequired( 'block' );
return;
} 

BUT, I've remebered that further results links are Special:Ask with
query parameters. In such case, further results would be unavailable
to anonymous users, which is sad. Only if every ask query had it's own
ID, which would be passed to further results page instead of query
itself... Maybe I am asking too much and IP ban (see above) is enough.
Dmitriy


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] {{#ask}}

2007-12-28 Thread Markus Krötzsch
On Freitag, 28. Dezember 2007, cnit wrote:
  Well, that is not the case for the current parser cache, neither in MW
  nor in SMW. But if course it could be achieved with some server-side
  cronjobs.

 Ah, I didn't knew about MW cronjobs. That sounds nice. Will try to
 find out some examples. Maybe you're right that such functionality
 shouldn't belong to main application itself..

What I meant was: a simple cron-job can touch LocalSettings.php regularly to 
purge the MW cache globally. Not much interaction with MW needed for that.


  Which is due to the lack of caching ...

 Well, yes. Of course if someone wants to slow down the site, he could
 use many different queries. But, it can be traced with apache logs and
 banned by IP..

  My strong hope is that none such query is possible. If security issues
  with queries should exist, I would like to find them rather sooner than
  later.

 I hope that, too.

  I expected that it would be possible to limit Special page access based
  on some MW mechanism already. Is there no way of configuring MediaWiki
  this way?

 http://meta.wikimedia.org/wiki/Help:Special_page#Restricted_special_pages
 e.g.
 includes/SpecialBlockip.php
 contains the following check:

 # Permission check
 if( !$wgUser-isAllowed( 'block' ) ) {
 $wgOut-permissionRequired( 'block' );
 return;
 }

 BUT, I've remebered that further results links are Special:Ask with
 query parameters. In such case, further results would be unavailable
 to anonymous users, which is sad. Only if every ask query had it's own
 ID, which would be passed to further results page instead of query
 itself... Maybe I am asking too much and IP ban (see above) is enough.

I guess a strong solution for that will still take some time. One could of 
course store inline queries in some table, use IDs for each, and permit 
anyone to use ask with such an (internal) ID only, whereas making custom 
queries would require further permissions. But this is some more code, and I 
am not entirely convinced of that design.

Did you experience problems with anonymous users that access Special:Ask? On 
ontoworld it seems that a significant amount of Special:Ask requests really 
come from further results links.

Markus



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] [PATCH] Support LIKE in queries

2007-12-28 Thread Markus Krötzsch
On Freitag, 28. Dezember 2007, Yaron Koren wrote:
 How about ~%substring% instead? The ~ is the symbol for pattern matching
 in Perl and some UNIX languages, and it might be a clearer indicator of
 function than %.


I would immediately use that, but IFRC the Halo extension has a similar syntax 
for a custom editing-distance database function (requires modified MySQL 
version, and probably also has significant performance issues).

So the question is whether we want to overwrite that (assuming that this 
particular Halo function is not used widely), or is there another idea for 
doing it? Other imaginable operators on my keyboard would be #, , ?, @ -- 
none really as nice as ~ ...

Markus  


 On Dec 27, 2007 2:16 PM, Markus Krötzsch [EMAIL PROTECTED] wrote:
  Thanks. I have applied the patch, and added a way of configuring this
  feature:
  the parameter $smwgQComparators gives a (|-separated) list of supported
  comparators, and can be used to enable or disable any of , , !, and %.
  By
  default its value is  '||!|%'.
 
  In this way one can also disable ! or even ,  if these are considered
  to be
  problematic.
 
  I wonder whether one should use another character instead of % as a
  wildcard
  inside the pattern string, so that no double-% confusion can arise. Would
  *
  be an alternative or would it be too confusing w.r.t. the old ask print
  requests? What about +? According examples (preprocessing would in each
  case
  ensure full compatibility with SQL):
 
  - %%substring%
  - %*substring*
  - %+substring+
 
  Cheers
 
  Markus
 
  On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote:
   On Thu, 20 Dec 2007, Thomas Bleher wrote:
Yesterday I needed LIKE queries for properties, so I added it to SMW
(patch attached). It was surprisingly simple.
  
   This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki.
  
   It would be great if later SMW could have Valgol support
   http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html.
  
   -- Asheesh.
  
   P.S. In all total like seriousness, queries with LIKE support are a
   good idea
  
   --
   The star of riches is shining upon you.
 
  -
 
   This SF.net email is sponsored by: Microsoft
   Defy all challenges. Microsoft(R) Visual Studio 2005.
   http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
   ___
   Semediawiki-devel mailing list
   Semediawiki-devel@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
 
  --
  Markus Krötzsch
  Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
  phone +49 (0)721 608 7362fax +49 (0)721 608 5998
  [EMAIL PROTECTED]www  http://korrekt.org
 
  -
  This SF.net email is sponsored by: Microsoft
  Defy all challenges. Microsoft(R) Visual Studio 2005.
  http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
  ___
  Semediawiki-devel mailing list
  Semediawiki-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/semediawiki-devel



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] [PATCH] Support LIKE in queries

2007-12-28 Thread DanTMan
A lot of people are accustomed to the ? (single-character match) and * 
(multi-character match) format. It would be easy to escape the '_'s and 
'%'s in a match and then do a replace of ? to _ and * to %. (A little 
preg and \ could still easily escape those.)
I don't know about ~ though, in the languages I've used I recall ~ 
having something to do with regex. I'd rather save that character for in 
case we want to be able to use the REGEXP matching inside of SQL.


From what I remember, I think most people with only a little insight 
into technical stuff, would adjust easiest to using this set:

= Equals
 Greater than
= Greater than or equal to
 Less than or equal to
! Not
* Multi-character match
? Single-character match
~ regex

But I did have a thought about the @... It's not used anywhere afaik.
I did make a suggestion on using a pattern to separate the comparators 
from the match value. It was using [[Property::comparitor::match]], but 
as I now remember SMW lets you use :: to specify multiple properties. 
However it may be a good idea if the separator was one which wouldn't 
cause conflicting issues with other things. @ is not commonly used and 
does provide a little bit of a way for people to understand it's use. Or 
if you want a little farther from what can actually be used in a title 
(To avoid clashing with things) the # is always invalid.
Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has 
value::[EMAIL PROTECTED] or [[Has value::!#Value]].
I'm probably droning on now... But what about finding a good separator 
and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought 
of),LT[], GT[], REGEX(P)[~], LIKE[%_], wildcard[*?], etc...
There also is the possibility of instead of a separator, using brackets 
to encompass a comparator. I can hardly think of many places which would 
use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we 
also have the {} and [] type brackets. [] is used by external links, but 
{} is only used in multiples as a template or variable bit but never has 
use singularly, templates and values will have already been parsed out 
so only the singles remain, and as a bonus, { and } are illegal in 
titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with 
a legal title or match you can make. If you're worried about templates 
and parsing issues, those can't occur when your using something like 
{{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. 
The only potential class is if someone wants to use {{{comparator|EQ}}} 
to specify the comparator. In that case, we could easily make { EQ } 
valid (trim spaces), so { {{{comparator|EQ}}} } would work.


But... now I'm droning a bit much...

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Markus Krötzsch wrote:

On Freitag, 28. Dezember 2007, Yaron Koren wrote:
  

How about ~%substring% instead? The ~ is the symbol for pattern matching
in Perl and some UNIX languages, and it might be a clearer indicator of
function than %.




I would immediately use that, but IFRC the Halo extension has a similar syntax 
for a custom editing-distance database function (requires modified MySQL 
version, and probably also has significant performance issues).


So the question is whether we want to overwrite that (assuming that this 
particular Halo function is not used widely), or is there another idea for 
doing it? Other imaginable operators on my keyboard would be #, , ?, @ -- 
none really as nice as ~ ...


Markus  

  

On Dec 27, 2007 2:16 PM, Markus Krötzsch [EMAIL PROTECTED] wrote:


Thanks. I have applied the patch, and added a way of configuring this
feature:
the parameter $smwgQComparators gives a (|-separated) list of supported
comparators, and can be used to enable or disable any of , , !, and %.
By
default its value is  '||!|%'.

In this way one can also disable ! or even ,  if these are considered
to be
problematic.

I wonder whether one should use another character instead of % as a
wildcard
inside the pattern string, so that no double-% confusion can arise. Would
*
be an alternative or would it be too confusing w.r.t. the old ask print
requests? What about +? According examples (preprocessing would in each
case
ensure full compatibility with SQL):

- %%substring%
- %*substring*
- %+substring+

Cheers

Markus

On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote:
  

On Thu, 20 Dec 2007, Thomas Bleher wrote:


Yesterday I needed LIKE queries for properties, so I added it to SMW
(patch attached). It was surprisingly simple.
  

This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki.

It would be great if later SMW could have Valgol support
http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html.

-- Asheesh.

P.S. In all total like seriousness, queries with LIKE support are a
good