Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
Chris McDonough wrote: It just occurred to me that depending on the splitter to do positions makes it impossible to alter the splitter without reindexing the whole text index... but I think this is a reasonable tradeoff. Other opinions welcome. This raises the question how dependent the splitter on the paticularities of the document source - I do not really see how different splitters could be useful for one single document. This is perhaps less obvious than it appears, as you may want to use different splitters for documents in different languages. Taken as a whole I would say choosing a splitter would be a decision that had to be taken at indexing time anyway. But perhaps it's just my imagination that is lacking. There is a much greater dependence on the lexicon here. And indeed several different lexicons could be applied to a set of documents depending of what is wanted. my 2 cents Rik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
Once you're satisfied with the implementation, would you be willing submit the module to the collector? Do you think you (or someone else for that matter) could have a look at [1] the method that returns the position in the document - positionInDoc() - to how that could be made to run much faster? Maybe it is how it used... It is too slow to be very useful when indexing large amounts of data. Anyway, I suck at making Python fast (or using it the right way, which ever I've fallen pray for this time ;-), and any hints would be greatly appretiated. I've been indexing and searching a lot this weekend, and bar that problem with the indexing-speed it seems ok and I have no issues submitting it to the Collector. Doing something similar (in fact what I needed was citations of word usage) I took a two step approach, with the idea that most of the actual returning of results would have to be done on a much smaller subset of documents than if you'd have to index all documents with word indexes and positions. I use a normal textindex for querying. Then if a document is returned by the query I start processing the documents. This requires parsing the query in a slightly different way (throw out the NOTs). The two step approach has the advantage that you can postpone processing actual documents until you return the results for the specific documents. Using your positionInDoc will require a _lot_ of processing (why does it use string.split btw and not Splitter?; why split on and not on string.whitespace?). I have used string.find for finding word positions, which is probably faster than looping a list of words. BTW, I'd rather use Splitter, but word positions appeared not to be reliable (bug, or something I didn't understand; anyhow, string.find works for me and is fast) def splitit(txt, word): postions = [] start = 0 while 1: res = string.find(txt, word, start) if res is -1: break else: start = res+1 postions.append(res) return postions sidenotePerhaps using re would perhaps also be an option, but allowing regular expressions will complicate searching a lot, so I use globbing lexicon for expanding and then do the matching on the expanded items (if necessary - not if using [wordpart]*)/sidenote Advantages of using this approach: - it's faster. - it splits up the query processing part in different subparts which also contributes to speeding things up. - it's also more flexible, as you can divide searching and parsing over different webrequests, and even make them dependend on the number of results. For example: why return text fragments from all documents if your users will not be able to see all the results anyway. Or why return all fragments containing word combinations from one single document while returning a few occurrences from different documents is more useful for your users. Note that this will mainly affect returning text fragments, which may or may not be useful. There's also a couple of disadvantages (as I see them , but there may be more): - it only works with exact word positions and not numbers in a text. The within two words approach may be remedied by using string.split on substrings however if really needed. Depending on you purposes an even rougher approach is by taking some default length for words (this is a bit faster). These are not very elegant solutions, though. - because of an approach that is not so coupled with (Z)Catalog, integration strategies are less obvious (at least for me) - the positionIndex might be used for further processing as is, in my approach this is less obvious. another 2 cents Rik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] DCOracle2 Beta 3
On Saturday, June 16, 2001, at 08:41 PM, Andreas Repp wrote: (quoting himself) ... To solve the performance problem I´d suggest to cache the user connections for a certain period of time. ... Of course I haven't meant to 'cache' the connections but to keep em alive while the respective user is active and shut it down after a timeout period has passed. I was reading the Oracle9i OCI documentation (Oracle 9i was just released) and one of the interesting things it has is connection pooling, with some kind of proxy authorization. As I get a chance I may investigate this further; first to see if it works, then to get a feel for for how to integrate it into Zope. It looks like an extension of something you could already do, but the documentation is frustratingly light on examples. In any case, the promising part was that you could leave the proxied user's password blank, as long as session was authorized for proxies. That would eliminate the whole necessity of having to track the database level user passwords. ##default_connection_id=(changed via ZSQL manage_main Interface) ##connection_type=[ standard | custom | optional ] ##connection_options=(individual Oracle Connection String) sql yada yada 'standard' = obvious 'custum' = forced custom connection - will raise error if connection_options dont' have a valid connection string 'optional' = try 'custom' and fallback to 'standard' if it fails btw: would be nice to have a drop box in the manage_main screen for this stuff in a _far future_ release of ZSQL-Methods ;-) Well there are still userid mapping problems present. It's not necessarily reasonable to assume there is a 1:1 mapping between Zope userids and the database userids. What I'd probably be inclined to do is change the *connection* object to look for some other acquired authenticator, which it could use to start a session. That way you can write your own authentication conversions and plug them in. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] SI prefixes and Zope
Hi, I suggest to make Zope prefixes of object sizes be consistent with SI binary prefixes - see http://physics.nist.gov/cuu/Units/binary.html Using SI binary prefixes for object size will avoid any misunderstanding of real object size. Petr -- Petr Knpek Network Development NEXTRA Czech Republic s.r.o. http://www.nextra.cz/ V Celnici 10 / CZ - 117 21 Praha 1 / Czech Republic Tel: +420/2/96 355 111 / Mobile: +420/604-202 611 E-Mail: [EMAIL PROTECTED] Contact address: Hlinky 114 / CZ - 603 00 Brno / Czech Republic Tel: +420/5/43 554 150 / FAX: +420/5/43 554 214 see Disclaimer http://www.nextra.cz/disclaimer/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] dtml-in batching improved
On Wed, Jun 13, 2001 at 04:28:12PM -0700, Jean Lagarde wrote: Good day all, Here is the original code, with my annotated change (I deleted an if test in two places): for index in range(first,end): # preset kw['previous-sequence']= 0 kw['next-sequence']= 0 # now more often defined then previously # if index==first or index==last: # provide batching information if first 0: pstart,pend,psize=opt(0,first+overlap, sz,orphan,sequence) deleted this test -- if index==first: kw['previous-sequence']=1 kw['previous-sequence-start-index']=pstart-1 kw['previous-sequence-end-index']=pend-1 kw['previous-sequence-size']=pend+1-pstart (more similar code removed) This is basically my patch #1. It makes previous-sequence-* and next-sequence-* available throughout the entire dtml-in loop. This sounds like a good fix, but people may rely on these variables being only set at resp. the start and end of the iteration. So this patch may break existing dtml code. That's why I suggested patch #2, which introduces new variables. Old code will continue to work, but people who want the problem fixed can use the newly introduced variables, which are available throughout the iteration. Ivo -- Drs. I.R. van der Wijk -=- Brouwersgracht 132 Amaze Internet Services V.O.F. 1013 HA Amsterdam -=- Tel: +31-20-4688336 Linux/Web/Zope/SQL Fax: +31-20-4688337 Network Solutions Web: http://www.amaze.nl/Consultancy Email: [EMAIL PROTECTED] -=- ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] what transaction does get_transaction().commit() really commit?
Jephte Clain wrote: the question is: what transaction is commited with get_transaction().commit() ? It is only the one associated with the connection, or also the transaction in which is the caller? I mean, if my method is called from within Zope, is the transaction of the caller commited? In the default implementation, there is one transaction per thread. When you commit(), you commit everything your caller changed as well. Connections are somewhat independent of transactions, if that helps. Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Zope/z2 security issues
Hi All, We're currently working on some security issues when running zope in a INSTANCE_HOME (multiple instances running as different users) setup. The first issue is tightening of the current security. We introduced the ability to set the groupid under which the server runs, so you can put zope users in a zope group, not give non-group members read/execute permission (o-rwx) to the instance homes, zope directories, product directories, etc. These patches (along with a zopectl patch) can be found on: http://www.zope.org/Members/maurice The second issue, if z2.py is started as root, it will either setuid() to nobody, or to the username supplied with -u. However, z2.py initializes logging while still runing as root by importing/ using ZLogger. This means that the logfiles will be owned (if they did not yet exist) by root in INSTANCE_HOME/var, while you would want these to be owned by nobody or the user the instance should be running as. z2.py setuid()'s to the non-root user after (optionally) opening privileged ports. If you symlink the logfiles in INSTANCE_HOME/var to /etc (or worse, /etc/passwd or /etc/shadow), you might even be able to destroy these files or insert data into them. The correct solution would probably be something like: - run as root - seteuid(non-root-user) - intialize logging -\ - seteuid(root)} optional (only if privileged ports are required) - open priviliged ports -/ - setuid(non-root-user) (note both the setEuid and setuid calls) However, python 1.5.2 does not have seteuid() so this can/will only work with 2.0/2.1 or zope 2.4 (which requires 2.1) We don't want to run our production zope servers with python 1.5.2, so our current patch consists of a setuid() at the top of z2.py, we don't run zope on privileged ports anyway. If anyone wants a patch for this, please mail me. Cheers, Ivo -- Drs. I.R. van der Wijk -=- Brouwersgracht 132 Amaze Internet Services V.O.F. 1013 HA Amsterdam -=- Tel: +31-20-4688336 Linux/Web/Zope/SQL Fax: +31-20-4688337 Network Solutions Web: http://www.amaze.nl/Consultancy Email: [EMAIL PROTECTED] -=- ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
On Sun, 17 Jun 2001, Chris McDonough wrote: index_object, because the splitter return has all the words in order, even the dupes... as you iterate, you can mutate Is this part of the current formal Splitter Interface? If not, it needs to be if other code is going to depend on it. Oh, yeah, and where is the formal Splitter interface documented grin? I don't see anything in SearchIndex, and a search for splitter interface on zope.org didn't turn up anything useful. --RDM ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] replicating storages
We are considering storing some mission-critical data inside a ZODB, but are worried about data loss if the ZEO server should be destroyed. Has anyone looked at solving this problem? Replicated-ZEO looks like it will eventually be ideal solution; the use case is documented at http://www.zope.org/Wikis/DevSite/Projects/ZEOReplicatedStorage/SurviveTotalLossOfStorage Until then, http://www.it.uc3m.es/~ptb/nbd/ looks promising. Has anyone tried FileStorage over a network block device? Toby Dickenson [EMAIL PROTECTED] ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
The Splitter interface is not really document. However Zope 2.4 has a much better support for 3rd party splitters. Andreas - Original Message - From: R. David Murray [EMAIL PROTECTED] To: Chris McDonough [EMAIL PROTECTED] Cc: Erik Enge [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, June 18, 2001 11:39 AM Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited) On Sun, 17 Jun 2001, Chris McDonough wrote: index_object, because the splitter return has all the words in order, even the dupes... as you iterate, you can mutate Is this part of the current formal Splitter Interface? If not, it needs to be if other code is going to depend on it. Oh, yeah, and where is the formal Splitter interface documented grin? I don't see anything in SearchIndex, and a search for splitter interface on zope.org didn't turn up anything useful. --RDM ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Proposed proposals: password encryption, ZODB RAM
Here are a couple of ideas I'd like to toss out. Proposals can take a lot of time to write and it might be easier this way to flesh out the details. 1) Optional password encryption. Right now passwords are stored as clear text. What's interesting is that Zope can already authenticate against SHA encrypted passwords, it just won't encrypt user passwords unless you force it to. As a test of Zope's ability to authenticate against encrypted passwords, I sneakily implemented the inituser changes with SHA encryption by default. That means that the password for the initial user stored in the database is not possible to decrypt and yet nobody has had any problems with it AFAIK. Since it has been successful, I'd like to suggest we add a checkbox to basic user folders that enables encryption for new passwords, and have it turned on by default. The risk is incompatibility with HTTP digest auth, which I imagine nobody is using right now. 2) If cPickle were to do something similar to intern-ing strings when loading objects from the ZODB, Zope might consume significantly less RAM on busy servers. ZODB uses lots of strings. ZODB caches cannot be shared among threads. But strings, being immutable, can be safely shared. We couldn't just intern the strings since that would make them immortal, but if we used weak references it could work. The only risk is the speed impact during loading of objects. Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] dtml-in batching badly
Hi, First, I don't post to this list normally; is it the best place to discuss an apparent bug? Anyway, the lowdown: If you iterate over a list with a batch size of 1, it messes up towards the end of the sequence. For example, the following code: dtml-call REQUEST.set('hoo',(1,2,3,4)) dtml-in hoo calling lt;dtml-in hoo size=1 start=dtml-var sequence-itemgt;:br dtml-in hoo size=1 start=sequence-item dtml-var sequence-item /dtml-in hr /dtml-in produces the following output: calling dtml-in hoo size=1 start=1: 1 calling dtml-in hoo size=1 start=2: 2 3 4 calling dtml-in hoo size=1 start=3: 3 4 calling dtml-in hoo size=1 start=4: 4 That's not the behaviour I'd expect. Can anyone confirm this is a bug? Cheers, seb ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] dtml-in batching badly
No it's not a bug(in code) but a feature. May You call it a bug (in usability), though. What causes it is that the orphan value defaults to 3 when not explicitely set. See the online help for in tag. LEE Kwan Soo ¢zùYb²Öh¥àÞ¿:)zàÛiÿùb²Û3¢¨®æj)fjåËbú?Î^uëÍ¡Êè²Êh²Û(¬tÌ-éܡا¥j×+-²m§ÿåËlÎ^¢¸?¨¥©ÿ+-wèÿ:)y©ç¢éÜzm§ÿåËlÎ^¢¸?¨¥©ÿ+-wèÿ:)
Re: [Zope-dev] dtml-in batching badly
That's not the behaviour I'd expect. Can anyone confirm this is a bug? As LEE Kwan Soo has already said, it is not a bug, but a clever (too clever?) feature that should maybe not be enabled by default. Every second week or so somebody runs into this and thinks it is a bug. To the DC people: Do you think it would break any code badly if the default behaviour was orphans=0? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Proposed proposals: password encryption, ZODB RAM
On Mon, Jun 18, 2001 at 12:28:54PM -0400, Shane Hathaway wrote: 1) Optional password encryption. Right now passwords are stored as clear text. What's interesting is that Zope can already authenticate against SHA encrypted passwords, it just won't encrypt user passwords unless you force it to. As a test of Zope's ability to authenticate against encrypted passwords, I sneakily implemented the inituser changes with SHA encryption by default. That means that the password for the initial user stored in the database is not possible to decrypt and yet nobody has had any problems with it AFAIK. Since it has been successful, I'd like to suggest we add a checkbox to basic user folders that enables encryption for new passwords, and have it turned on by default. The risk is incompatibility with HTTP digest auth, which I imagine nobody is using right now. There is already a proposal for this: http://dev.zope.org/Wikis/DevSite/Proposals/EncryptedUserfolderPasswords You could, of course, create a counter proposal.. -- Martijn Pieters | Software Engineer mailto:[EMAIL PROTECTED] | Digital Creations http://www.digicool.com/ | Creators of Zope http://www.zope.org/ - ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Where did DocumentTemplate/VSEval.py go in 2.4.0a1?
Michel Pelletier wrote: Should we make an alias for bw-compatability? This is now in the trunk, along with some other compatibility changes that allow Python Methods to continue working, courtesy of the NewZopeOrg migration project. Cheers, Evan @ digicool ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] dtml-in batching badly
Joachim Werner wrote: That's not the behaviour I'd expect. Can anyone confirm this is a bug? As LEE Kwan Soo has already said, it is not a bug, but a clever (too clever?) feature that should maybe not be enabled by default. Every second week or so somebody runs into this and thinks it is a bug. To the DC people: Do you think it would break any code badly if the default behaviour was orphans=0? Here is a (again perhaps too clever?) suggestion. Make the orphan value equal zero by default if size = 3. Otherwise keep it at three. I'm not oppossed to making it zero all the time either. -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `-- ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Proposed proposals: password encryption, ZODB RAM
On Monday 18 June 2001 15:33, Martijn Pieters wrote: On Mon, Jun 18, 2001 at 12:28:54PM -0400, Shane Hathaway wrote: 1) Optional password encryption. Right now passwords are stored as clear text. What's interesting is that Zope can already authenticate against SHA encrypted passwords, it just won't encrypt user passwords unless you force it to. As a test of Zope's ability to authenticate against encrypted passwords, I sneakily implemented the inituser changes with SHA encryption by default. That means that the password for the initial user stored in the database is not possible to decrypt and yet nobody has had any problems with it AFAIK. Since it has been successful, I'd like to suggest we add a checkbox to basic user folders that enables encryption for new passwords, and have it turned on by default. The risk is incompatibility with HTTP digest auth, which I imagine nobody is using right now. There is already a proposal for this: http://dev.zope.org/Wikis/DevSite/Proposals/EncryptedUserfolderPasswords You could, of course, create a counter proposal.. I'm suggesting a checkbox that enables and disables encryption. Enabling encryption is actually very simple--I've had it enabled on my own box for nearly a year. :-) Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
Rik Hoekstra writes: This raises the question how dependent the splitter on the paticularities of the document source - I do not really see how different splitters could be useful for one single document. This is perhaps less obvious than it appears, as you may want to use different splitters for documents in different languages. Taken as a whole I would say choosing a splitter would be a decision that had to be taken at indexing time anyway. But perhaps it's just my imagination that is lacking. There are lots of things you may want to change based on experience with your index: * change the set of token boundary characters they define, where words are broken out. * change the set of removed characters they are removed from the words, usually for normalization. In German, e.g., you can write both Auto-Lackierer and Autolackierer. You want to normalize these different spellings. * change the set of composing characters German is very rich in composite terms. You may want to index under each component term. For this, you need the rules on how the composition is build. For text, it is usually '-'. But if you have computer sources, '_' or ':' may be relevant, too. Of couse, the search must follow the same splitting rules than the indexing did. Changing the rules (the splitter or its configuration) after indexing will make the index inconsistent. Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] better history diff output ?
Hi all, I want to highlight a wiki page's last changes, so I have a need for a better display of diffs between two revisions. Can you recommend any good free diff code out there with nice html output ? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
These are good ideas to improve the TextIndex. I already encouraged Erik to put alltogether into a Fishbowl proposal, Andreas - Original Message - From: Dieter Maurer [EMAIL PROTECTED] To: Rik Hoekstra [EMAIL PROTECTED] Cc: Chris McDonough [EMAIL PROTECTED]; Erik Enge [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, June 18, 2001 4:59 PM Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited) Rik Hoekstra writes: This raises the question how dependent the splitter on the paticularities of the document source - I do not really see how different splitters could be useful for one single document. This is perhaps less obvious than it appears, as you may want to use different splitters for documents in different languages. Taken as a whole I would say choosing a splitter would be a decision that had to be taken at indexing time anyway. But perhaps it's just my imagination that is lacking. There are lots of things you may want to change based on experience with your index: * change the set of token boundary characters they define, where words are broken out. * change the set of removed characters they are removed from the words, usually for normalization. In German, e.g., you can write both Auto-Lackierer and Autolackierer. You want to normalize these different spellings. * change the set of composing characters German is very rich in composite terms. You may want to index under each component term. For this, you need the rules on how the composition is build. For text, it is usually '-'. But if you have computer sources, '_' or ':' may be relevant, too. Of couse, the search must follow the same splitting rules than the indexing did. Changing the rules (the splitter or its configuration) after indexing will make the index inconsistent. Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] decapitate() still needed ?
from http://zwiki.org/FrontPage: simon - why is the antidecaptiation kludge there? tav - it was a temporary way around an undocumented feature of DTML Documents, namely if your document's text begins with one or more lines that look like a http: header, they get moved into the actual http headers. Naturally it has led to complications. aha! deltab: the undocumented feature seems rather unnecessary - DTML Documents can set their header fields through the RESPONSE object. db - I would dearly love for it to vanish. No time to raise the issue on zope-dev or tracker right now though --SM Well, I guess that's not true. Raising it here. -Simon ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )