Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Oleg Bartunov

On Sun, 9 Mar 2008, Tom Lane wrote:


Simon Riggs [EMAIL PROTECTED] writes:

I've coded a small patch to allow CaseSensitive synonyms.


Applied with corrections (it'd be good if you at least pretended to test
stuff before submitting it).

Would a similar parameter be useful for any of the other dictionary
types?


There are many things desirable to do with dictionaries, for example,
say dictionary to return an original word plus it's normal form. Another
feature is a not recognize-and-stop dictionaries, but allow 
filtering dictionary. We have a feeling that a little middleware would help

implement this, and CaseSensitive too.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Sun, 2008-03-09 at 23:03 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I've coded a small patch to allow CaseSensitive synonyms.
 
 Applied with corrections (it'd be good if you at least pretended to test
 stuff before submitting it).

It is a frequent accusation of yours that I don't test things, which is
incorrect. Defending against that makes me a liar twice in your eyes. If
you look more closely at what happens you'll understand that your own
rigid expectations are what causes these problems. 

If you thought at all you'd realise that nobody would be stupid enough
to try to sneak untested code into Postgres; all bugs would point
directly back to anybody attempting that. That isn't true just of
Postgres, its true of any group of people working together on any task,
not just software or open source software.

As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. So people produce patches in various states of
readiness, knowing that they may have to produce many versions before it
is finally accepted. Grabbing at a piece of code, then shouting
unclean, unclean just destroys the feedback process and leaves
teamwork in tatters.

My arse doesn't need wiping, thanks, nor does my bottom need smacking,
nor are you ever likely to catch me telling fibs. If you think so,
you're wrong and you should reset.

What you will find from me and others, in the past and realistically in
the future too, are patches that vary according to how near to
completion they are. Not the same thing as completed, yet varying in
quality. If they are incomplete it is because of the idea to receive
feedback at various points. Some patches need almost none e.g. truncate
triggers (1-2 versions), some patches need almost constant feedback e.g.
async commit (24+ versions before commit). The existence of an
intermediate patch in no way signals laziness, lack of intention to
complete or any other failure to appreciate the software development
process.

If you want people to work on Postgres alongside you, I'd appreciate a
software development process that didn't roughly equate to charging at a
machine gun trench across a minefield. If you insist on following that
you should at least stop wondering why it is that the few people to have
made more than a few steps are determined and grim individuals and start
thinking about the many skilled people who have chosen non-combatant
status, and why.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Andrew Dunstan



Simon Riggs wrote:

As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. 
  


If you post a patch that is not intended to be of production quality, it 
is best to mark it so explicitly. Then nobody can point fingers at you. 
Also, Bruce would then know not to put it in the queue of patches 
waiting for application.


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 08:24 -0400, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  As Greg mentions on another thread, not all patches are *intended* to be
  production quality by their authors. Many patches are shared for the
  purpose of eliciting general feedback. You yourself encourage a group
  development approach and specifically punish those people dropping
  completely finished code into the queue and expecting it to be
  committed as-is. 

 If you post a patch that is not intended to be of production quality, it 
 is best to mark it so explicitly. Then nobody can point fingers at you. 
 Also, Bruce would then know not to put it in the queue of patches 
 waiting for application.

So it can be forgotten about entirely? H. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Andrew Dunstan



Simon Riggs wrote:

On Mon, 2008-03-10 at 08:24 -0400, Andrew Dunstan wrote:
  

Simon Riggs wrote:


As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. 
  


  
If you post a patch that is not intended to be of production quality, it 
is best to mark it so explicitly. Then nobody can point fingers at you. 
Also, Bruce would then know not to put it in the queue of patches 
waiting for application.



So it can be forgotten about entirely? H. 

  


I think if you post something marked Work In Progress, there is an 
implied commitment on your part to post something complete at a later stage.


So if it's forgotten you would be the one doing the forgetting. ;-)

cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I think if you post something marked Work In Progress, there is an 
 implied commitment on your part to post something complete at a later stage.

It *wasn't* marked Work In Progress, and Simon went out of his way to
cross-post it to -patches, where the thread previously had not been:

http://archives.postgresql.org/pgsql-patches/2007-09/msg00150.php

I don't think either Bruce or I can be faulted for assuming that it was
meant to be applied.  In future perhaps I should take it as a given that
Simon doesn't expect his patches to be applied?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 09:42 -0400, Andrew Dunstan wrote:

 I think if you post something marked Work In Progress, there is an 
 implied commitment on your part to post something complete at a later stage.
 
 So if it's forgotten you would be the one doing the forgetting. ;-)

But if they aren't on a review list, they won't get reviewed, no matter
what their status. So everybody has to maintain their own status list
and re-submit patches for review monthly until reviewed?

I like the idea of marking things WIP, but I think we need a clear
system where we agree that multiple statuses exist and that they are
described in particular ways.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 10:01 -0400, Tom Lane wrote:

 In future perhaps I should take it as a given that
 Simon doesn't expect his patches to be applied?

I think you should take it as a given that Simon would like to try to
work together, sharing ideas and code, without insults and public
derision when things don't fit. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Simon Riggs wrote:
  As Greg mentions on another thread, not all patches are *intended* to be
  production quality by their authors. Many patches are shared for the
  purpose of eliciting general feedback. You yourself encourage a group
  development approach and specifically punish those people dropping
  completely finished code into the queue and expecting it to be
  committed as-is. 

 
 If you post a patch that is not intended to be of production quality, it 
 is best to mark it so explicitly. Then nobody can point fingers at you. 
 Also, Bruce would then know not to put it in the queue of patches 
 waiting for application.

It would still be in that queue because we might just mark it as a TODO.

FYI, during this first release cycle, we need to apply patches and
decide on TODOs.  We skipped TODO discussion during feature freeze, so
we need to do it now for held ideas.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 On Sun, 9 Mar 2008, Tom Lane wrote:
 Would a similar parameter be useful for any of the other dictionary
 types?

 There are many things desirable to do with dictionaries, for example,
 say dictionary to return an original word plus it's normal form. Another
 feature is a not recognize-and-stop dictionaries, but allow 
 filtering dictionary. We have a feeling that a little middleware would help
 implement this, and CaseSensitive too.

Hmm, I can see how some middleware would help with folding or not
folding the input token, but what about the words coming from the
dictionary file (particularly the *output* lexeme)?  It's not apparent
to me that it's sensible to try to control that from outside the
dictionary.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev

Hmm, I can see how some middleware would help with folding or not
folding the input token, but what about the words coming from the
dictionary file (particularly the *output* lexeme)?  It's not apparent
to me that it's sensible to try to control that from outside the
dictionary.


Right now I see an significant advantage of such layer: two possible extension 
of dictionary  (filtering and storing original form) are independent from nature 
of dictionary. So, instead of modifying of every dictionary we can add some 
layer, common for all dictionary. With syntax like:


CREATE/ALTER TEXT SEARCH DICTIONARY foo  (...) WITH ( filtering=on|off, 
store_original=on|off );


Or per token's type/dictionary pair.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Hmm, I can see how some middleware would help with folding or not
 folding the input token, but what about the words coming from the
 dictionary file (particularly the *output* lexeme)?  It's not apparent
 to me that it's sensible to try to control that from outside the
 dictionary.

 Right now I see an significant advantage of such layer: two possible 
 extension 
 of dictionary  (filtering and storing original form) are independent from 
 nature 
 of dictionary. So, instead of modifying of every dictionary we can add some 
 layer, common for all dictionary. With syntax like:

 CREATE/ALTER TEXT SEARCH DICTIONARY foo  (...) WITH ( filtering=on|off, 
 store_original=on|off );

 Or per token's type/dictionary pair.

Well, if you think this can/should be done somewhere outside the
dictionary, should I revert the applied patch?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev

Well, if you think this can/should be done somewhere outside the
dictionary, should I revert the applied patch?


No, that patch is about case sensitivity of synonym dictionary. I suppose, Simon 
wants to replace 'bill' to 'account', but doesn't want to get 'account Clinton'


For another dictionary ( dictionary of number, snowball ) that option is a 
meaningless.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev
Right now I see an significant advantage of such layer: two possible 
extension of dictionary  (filtering and storing original form) are 


One more extension: drop too long words. For example, decrease limit of max 
length of word to prevent long to be indexed - word with 100 characters is 
suspiciously long for human input.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I've coded a small patch to allow CaseSensitive synonyms.

Applied with corrections (it'd be good if you at least pretended to test
stuff before submitting it).

Would a similar parameter be useful for any of the other dictionary
types?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Simon Riggs wrote:
 On Mon, 2007-09-10 at 10:21 -0400, Tom Lane wrote:
  Oleg Bartunov [EMAIL PROTECTED] writes:
   On Mon, 10 Sep 2007, Simon Riggs wrote:
   Can we include that functionality now?
  
   This could be realized very easyly using dict_strict, which returns
   only known words, and mapping contains only this dictionary. So, 
   feel free to write it and submit.
  
  ... for 8.4.
 
 I've coded a small patch to allow CaseSensitive synonyms.
 
   CREATE TEXT SEARCH DICTIONARY my_diction (
  TEMPLATE = biglist,
  DictFile = words,
  CaseSensitive = true
   );
 
 -- 
   Simon Riggs
   2ndQuadrant  http://www.2ndQuadrant.com

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2007-09-10 Thread Simon Riggs
On Mon, 2007-09-10 at 10:21 -0400, Tom Lane wrote:
 Oleg Bartunov [EMAIL PROTECTED] writes:
  On Mon, 10 Sep 2007, Simon Riggs wrote:
  Can we include that functionality now?
 
  This could be realized very easyly using dict_strict, which returns
  only known words, and mapping contains only this dictionary. So, 
  feel free to write it and submit.
 
 ... for 8.4.

I've coded a small patch to allow CaseSensitive synonyms.

  CREATE TEXT SEARCH DICTIONARY my_diction (
 TEMPLATE = biglist,
 DictFile = words,
 CaseSensitive = true
  );

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com
Index: src/backend/tsearch/dict_synonym.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/tsearch/dict_synonym.c,v
retrieving revision 1.4
diff -c -r1.4 dict_synonym.c
*** src/backend/tsearch/dict_synonym.c	25 Aug 2007 02:29:45 -	1.4
--- src/backend/tsearch/dict_synonym.c	10 Sep 2007 15:14:21 -
***
*** 29,34 
--- 29,35 
  typedef struct
  {
  	int			len;	/* length of syn array */
+ 	bool		case_sensitive;
  	Syn		   *syn;
  } DictSyn;
  
***
*** 83,88 
--- 84,90 
  			   *end = NULL;
  	int			cur = 0;
  	char	   *line = NULL;
+ 	bool		case_sensitive = false;
  
  	foreach(l, dictoptions)
  	{
***
*** 95,100 
--- 97,107 
  	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  	 errmsg(unrecognized synonym parameter: \%s\,
  			defel-defname)));
+ 
+ 		if (pg_strcasecmp(CaseSensitive, defel-defname) == 0 
+ 			pg_strcasecmp(True, defGetString(defel)) == 0)
+ 			case_sensitive = true;
+ 
  	}
  
  	if (!filename)
***
*** 168,173 
--- 175,182 
  	d-len = cur;
  	qsort(d-syn, d-len, sizeof(Syn), compareSyn);
  
+ 	d-case_sensitive = case_sensitive;
+ 
  	PG_RETURN_POINTER(d);
  }
  
***
*** 180,195 
  	Syn			key,
  			   *found;
  	TSLexeme   *res;
  
  	/* note: d-len test protects against Solaris bsearch-of-no-items bug */
  	if (len = 0 || d-len = 0)
  		PG_RETURN_POINTER(NULL);
  
! 	key.in = lowerstr_with_len(in, len);
  	key.out = NULL;
  
  	found = (Syn *) bsearch(key, d-syn, d-len, sizeof(Syn), compareSyn);
! 	pfree(key.in);
  
  	if (!found)
  		PG_RETURN_POINTER(NULL);
--- 189,214 
  	Syn			key,
  			   *found;
  	TSLexeme   *res;
+ 	bool		need_pfree = false;
  
  	/* note: d-len test protects against Solaris bsearch-of-no-items bug */
  	if (len = 0 || d-len = 0)
  		PG_RETURN_POINTER(NULL);
  
! 	if (d-case_sensitive)
! 		key.in = in;
! 	else
! 	{
! 		key.in = lowerstr_with_len(in, len);
! 		need_pfree = true;
! 	}
! 
  	key.out = NULL;
  
  	found = (Syn *) bsearch(key, d-syn, d-len, sizeof(Syn), compareSyn);
! 
! 	if (need_pfree)
! 		pfree(key.in);
  
  	if (!found)
  		PG_RETURN_POINTER(NULL);

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly