On 3/31/06, Kent Johnson <[EMAIL PROTECTED]> wrote: > Steve Nelson wrote: > > Do you need help getting started with Python or with inverted indexing > in particular?
Sorry - I should have been clearer. I'm reasonably confident in Python, and if I get stuck with that side of things will ask for help. I was more struggling with the Inverted Indexing bit. I think I want to write a program that will do text searches on documents, to learn about and compare the two approaches I've heard of - Inverted Indexing and Signature Files. SO I think I am asking for help at the logical / implementation level. My understanding so far goes something like this: Suppose I have three documents - as a trivial but fun example, we could make them song lyrics. The simplest thing that could possibly work would be to supply one or two words and literally scan every word in the song for a match. This has two disadvantages - firstly it is likely to produce false positives, and secondly it is likely to be very inefficient. The next step would be to introduce an index. I think again, the simplest thing that could possibly work would be a literal index of every word and every document in which it appears. This would save processing time, but wouldn't be very intelligent. This is where I think the inverted indexing comes in. As I understand it we can now produce an index of key words, with document name and location in document for each key word. This makes the search more involved, and more intelligent. Finally we could have some logic that did some set analysis to return only results that make sense. Am I thinking along the right lines, or have I misunderstood? Could someone perhaps come up with some code snippets that make clearer examples? I also need to think about signature files, but I havn't really any idea how these work. Thanks all! S. > Kent > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor