Hi Nancy,
 
Sorry this is a week off. 
 
You have identified one of the common problems of not classic problems with 
searching --duplicates.  Often times duplicates are a side effect to the 
automatic indexing process where the same source text already has hyperlinks 
from multiple places in the help project, website, or body of content you are 
feeding. 
 
So if the anchor tag looks like this: 
<a href="ports.html">reserved TCP port range</a>
 
...and this complete anchor tag appears 2 or more times with the same anchor 
text (in blue for this explanation), and the end-user executes a keyword search 
on "TCP port," "port range," or something relevant, the example duplicate 
results you provided is analog to the results the user gets!  
 
Google and the rest do a 'link cardinality' trick where they check if the same 
achor text actually points to the same place on the same page so they can pick 
one of the results to be listed in the top ten or so hits you get back.  It is 
not a tough thing to script this if somebody at your site is handy with python, 
java, or javascript, or whatever web application language is handy or makes 
sense. It is simple string comparison operations. The sequence is this:
 
Query results come back and are fed to your dedup script -->  Dedup script 
searches for links that are the same.  If link is the same and anchor text is 
the same, delete the result record, repeat.  --> de-duplicated query results 
are passed to the user.
 
Just ensure that whoever writes the script gives you a switch to turn it on or 
off so you can check the query results in both cases to see if it is working to 
your satisfaction.
 
Cheers,
 
Reid

________________________________

From: Nancy Allison [mailto:nancy.allis...@verizon.net]
Sent: Wed 7/29/2009 8:32 AM
To: Reid Gray
Subject: [Free Framers] Search produces five identical listings




I've used Frame and Mif2Go to create a .chm file. When I use the  Search tab to 
search for a term, I get five identical listings for some results, but not all. 
Example: Search for "debounce" produces

Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Using the Favorites Tab
Welcome to the Online Help


The first five listings refer to a lengthy topic, which uses the word 
"debounce" seven times, if it matters.

The second two are the H1 and subhead, in the same topic, which uses the word 
twice.

What is causing the multiple listing, and is there anything I can do about it?

--Nancy


--
You are subscribed to the following list:       Free Framers
using the following email:                            
rg...@interactivesupercomputing.com

You may automatically unsubscribe from this list at any time by visiting the 
following URL:
 <http://www.omsys.com/cgi-bin/dada/mail.cgi/u/framers/>

You may also change your subscription by visiting this list's main screen:
<http://www.omsys.com/cgi-bin/dada/mail.cgi/list/framers>

If you're still having trouble, please contact the list owner at:
<mailto:owner-fram...@omsys.com>

Mailing List Powered by Dada Mail
http://www.omsys.com/cgi-bin/dada/mail.cgi/what_is_dada_mail/= 

_______________________________________________


You are currently subscribed to Framers as arch...@mail-archive.com.

Send list messages to fram...@lists.frameusers.com.

To unsubscribe send a blank email to 
framers-unsubscr...@lists.frameusers.com
or visit 
http://lists.frameusers.com/mailman/options/framers/archive%40mail-archive.com

Send administrative questions to listad...@frameusers.com. Visit
http://www.frameusers.com/ for more resources and info.

Reply via email to