It just occurred to me that depending on the splitter to do
positions makes it impossible to alter the splitter without
reindexing the whole text index... but I think this is a
reasonable tradeoff. Other opinions welcome.
On Sun, 17 Jun 2001 15:57:20 -0400
"Chris McDonough" <[EMAIL PROTECTED]> wrote:
> On Sun, 17 Jun 2001 21:05:47 +0200 (CEST)
> Erik Enge <[EMAIL PROTECTED]> wrote:
> > On Fri, 15 Jun 2001, Chris McDonough wrote:
> > > Once you're satisfied with the implementation, would
> > you be willing
> > > submit the module to the collector?
> > Do you think you (or someone else for that matter)
> > have a look at
> >  the method that returns the position in the
> > - positionInDoc()
> > - to how that could be made to run much faster? Maybe
> > is how it
> > used... It is too slow to be very useful when indexing
> > large amounts of
> > data.
> It looks like you call proximityInsert for each item
> returned from the splitter on the doc source. Instead of
> looking for the position in the source document by
> the source up again within proximityInsert, you can keep
> simple counter while you iterate over the splitter return
> index_object, because the splitter return has all the
> in order, even the dupes... as you iterate, you can
> the position entry for that word/documentId pair within
> proximityInsert. You never actually need to manually
> the document source, instead just always rely on the
> splitter to bust up the doc, and manipulate the position
> list in place. This is not the most efficient way, but
> more efficient than your current way.
> Therefore, the bit in index_object becomes:
> i = 0
> for word in splitter(source):
> self.proximityInsert(word, documentId, i)
> i = i + 1
> The proximityInsert method becomes:
> def proximityInsert(self, word, documentId, i):
> """Insert proximity information about this wid (word
> the index' proximity bucket."""
> if not prox.has_key(wid):
> self._p_changed = 1
> if i in prox[wid][documentId]: return
> self._p_changed = 1
> .. and the positionInDoc method goes away.
> I didn't scan too hard for what else in the source this
> would break.
> > Anyway, I suck at making Python fast (or using it the
> > right way, which
> > ever I've fallen pray for this time ;-), and any hints
> > would be greatly
> > appretiated.
> > I've been indexing and searching a lot this weekend,
> > bar that problem
> > with the indexing-speed it seems ok and I have no
> > submitting it to
> > the Collector.
> >  <URL:http://nittin.net/erik/software/PositionIndex/PositionIndex.py>
Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists -