Sam - I'm not quite sure I follow you, but let's see if this fits...
you want to have a document and see if a query matches it? Please
elaborate more on what you're after. Maybe what you're looking for
is the contrib/memory and the MemoryIndex within that Subversion area.
Erik
On 22
Hey-
I have an indexer at my company that I wrote while back that indexes database
content (users and their profile)...one of the next req. of the project is to
avoid 'spam' in hits. For example if I do a search for oracle, and oracle is in
25 places in someones bio field...and another person
Sounds like you might have to consider both, if the first one doesn't solve
your issue. A company field sounds like it's a single entry, i.e. one that
can't be "spammed up" with multiple terms, i.e. "Oralce Oracle Oracle". It
also sounds as if you're searching multiple fields, and that some fields
ok, I am implementing a google adsense/adwords-like
system. For examples, the website has keywords "nike
red shoe", so it can match text ad with keywords "nike
shoe -blue". Of course, I can always use the text ad
keywords to match the website's keywords. But it will
take too much resource to hav
Index the keywords of your ads with lucene.
Extract all words from your page (ajax), remove stop words, build a
query from the page words by connect the words with OR and you will
find the best matching ad.
You may need to limit the words per page or set the maximum clauses
to a much higher
: Not sure if this makses sense...but curious if anyone has ideas, or has
: done something like this.
I have a few ideas, none of which are mutuallly exclusive...
1) look at the Explain output for the various queries you are generating
to help you understand why your boosts aren't having as much
Yes, I thought of that. But since the ads have
negative keywords, it's very possible for the webpage
to match the ads but not the other way around because
of the negative keywords. So the system cannot be
sure that the ads match the webpage until it uses ads'
keyword and negative keywords to rema
two document fields one named positive one called negative
you query have to look somehow like this:
positive: (keyword1 keywordN) AND NOT negative:(keyword1 keywordN)
Am 23.10.2005 um 20:50 schrieb Sam Lee:
Yes, I thought of that. But since the ads have
negative keywords, it's very possible f
yes, but I will have to do it for each ad as I stated
previously.
webpage www.mysite.com --match--> ad1.ad101
> >
> > Then I match each ad with the webpage.
> > But due to negative keywords:
> > ad1ad100 --NOT match--> www.mysite.com
> > ad101 --match--> www.mysite.com
> >
> > # of queries
interesting information you have here...I will look into this and let you know
what I come up with.
Thanks!
-Original Message-
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Sun, 23 Oct 2005 10:14:13 -0700 (PDT)
Subject: Re: Classifier4J and Lucene
How fast is MemoryIndex? For examples, I have a
webpage indexed, and I have 10 queries with
negative keywords to match against this webpage. How
much faster is it comparing to using normal method to
match the same 10 queries to this webpage?
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
Hi,
Someone suggested that I should use MemoryIndex to
match content to a large # of queries. e.g. "nike red
shoes" --match--> "nike shoes -blue" and --match-->
"nike shoes -black"... What if I have 10 of these
queries for each content? and there maybe 100 of
these contents.
But how f
12 matches
Mail list logo