Hi guys,
 
I have a very specific search engine need.
I have run across Nutch and it sounds very promising.
Thanks for your hard work on it.
 
Here is what I need to do:
I need to be able to search all the variables of a domain
URL text string.
 
An example of what I need to do is found in my home business on the Internet.
being the Domain URL and the "/62798" being the variable of the
domain that is a peculiar ID to me. All of 50,000 + Retire Quickly reps
have this URL www.RetireQuickly.com followed by 2 to currently 5
numbers that makes the replicated site peculiar to the Representative
that "owns" it. When I do search in Google for www.RetireQuickly.com
I get around 388 returns (many duplicated) when I know there should be
at least 50,000 www.RetireQuickly.com with peculiar rep ids.
 
Another example would be www.NewVision .net followed a "/" an then numbers and/or
letters for the rep ID. My research indicates there is around 600,000 representatives
for the company that provides replicated sites for their representatives. Yet
a Google search only turns up a few hundred with many duplicate links in the
search.
 
So my question is this:
Does there currently exist a search utility (prepackaged) that will find ALL the variations of
links to a search domain URL? Would Nutch be a good candidate for this type of
search? Would it take a lot of scripting and/or programming to make it do this?
The search that I need is simple even though the programming behind it
may not be. I need to be able to search the entire Internet for all the variations
of a specified domain while rejecting sites that duplicate peculiar links that have
already been found. Obviously, this is an ongoing "harvesting" project.
 
Any help or suggestions that you could give me would be most appreciated.
 
Thank you,
Sam Peeples
423-265-7038

Reply via email to