thanks for your mail

2004-02-15 Thread [EMAIL PROTECTED]
Received your mail we will get back to you shortly


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



'Sponsored' links

2004-02-15 Thread Daniel B. Davis
I am a newbie to Lucene, and this is my first serious posting
to Lucene-user.
This is to solicit comment upon the problem of supplying
a sponsored links capability within Lucene. This capability
would not affect at all which documents are returned by a query,
but would cause any 'sponsored' documents present among the
results to be displayed before other documents in the list
returned.
I have looked over the correspondence in Lucene-user, but
not found anything addressing this topic; if I have missed it,
please tell me where and when, and ignore the rest of this.
It seems to me that there are three ways to achieve the
capability:
1. Preset boost values for 'sponsored' documents, with an
   implied burden of reindexing when sponsors are modified.
2. Post-qualify documents present in the hit list for their
   sponsorship status, building a new hit list.
3. Modify the query to search using both the full query as
   an unsponsored boolean clause with the default boost value,
   and for each sponsor, to repeat the full query ANDed with
   that sponsor with the appropriate boost value.
Are there other strategies not considered?

Assuming a small list of sponsors (10 or fewer), and low
volatility amongst the sponsors (1 change / month or less)
which method is best?
I have been pursuing method #1, almost to the exclusion of
the others, but have encountered an unknown difficulty in the
implementation (separate posting).  In particular, while it is clear
that #3 is doable, I know nothing about the searching burden
added by multiplying the user's query by one plus the count of
sponsors.
Regarding #3, if my understanding is right, then:
   Sponsors name: s1, s2, s3 ...
words or phrases: s1w1, s1w2, ... , s2w1, s2w2, ... , s3w1 ...
boost values: s1v, s2v, s3v
   then given query q as user input, form:
q
or (q and (s1w1 | s1w2 | s1w3 | ...)^s1v)
or (q and (s2w1 | s2w2 ...)^s2v)
or (q and (s3w1 ...)^s3v)
Is this correct?
Does the strategy of search identify any kind of intermediate
sublist to speed up searching? (But then it would start to
resemble #2.)
Rolling ones own for #2 would run query q, and get the
HitCollector. Separately running queries for each of:
s1w1 | s1w2 | s1w3 | ...,
s2w1 | s2w2 ...
s3w1 ...
and merge each hit collector with the one from query q.
(Just AND the bitsets???) Lastly adjust scores and form
a new composite HitCollecter.  By this time I have told
everyone much more than I know.
Stray thought:-- can HitCollectors be cached at application init?

There are many other questions regarding details of implementation,
but their proper place is another communication.
Just by preparing this document for dissemination has helped
greatly.  All and any comments are much appreciated.
Thank you all.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Intermediate indexing before final

2004-02-15 Thread Daniel B. Davis
I am a newbie to Lucene, and have been learning by experiment
and from the demos.   A problem has arisen in indexing
a document after creation, and before indexing in the
permanent index. It is being indexed to this small lookaside
index in order to determine whether it is
sponsored [i.e. contains any word that causes it to
be included in one of the 'sponsored' document levels.]
(A separate letter deals with the larger issues of
sponsorship.)  If it is sponsored, then a setBoost for
the document will be issued, with a level-dependent
value.
The code in question arises from within IndexHTML
near:
doc = new HTMLDocument(file);
writer.addDocument(doc);
In the case at issue, this code has been changed to:
doc = new HTMLDocument(file);
int boost = sponsoredValue(doc);
doc.setBoost(boost);
writer.addDocument(doc);
The sponsoredValue method never returns.

The exception occurs after a longish delay in
eclipse, about 2-3 seconds.  The document used is:
  http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.
HTMLDocument was modified in minor ways.  HTMLParser
is destined for modification, but is still vanilla.
Note that altering RAMDirectory to FSDirectory makes
no difference and does not change the behavior.
I greatly Appreciate any help, thank you all.

 -

the Document doc:
  url: Keyword, string
  file: Unindexed, string
  modified: Keyword, string
  uid: as in HTMLdemo, string
  contents: Text, reader
  title: Text, string
  metadata: Text, string
the code:

  private static RAMDirectory ramDir = null;
  private static IndexWriter ramWriter = null;
  private static IndexReader ramReader = null;
  private static IndexSearcher ramSearcher = null;
  public int sponsoredValue(Document doc) {
  .
  .
  .
  ramDir = new RAMDirectory();
  ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+--  ramWriter.addDocument(doc);
| ramWriter.close();
| ramWriter = null;
| ramReader = IndexReader.open(ramDir);
| ramSearcher = new IndexSearcher(ramReader);
| .
| .
| .
| }
|
the Exception:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.receive(Unknown Source)
at java.io.PipedInputStream.receive(Unknown Source)
at java.io.PipedOutputStream.write(Unknown Source)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.Writer.write(Unknown Source)
at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 'Sponsored' links

2004-02-15 Thread Grant Ingersoll
Does the sponsored information have to be in the index?  Couldn't you lookup the 
sponsor info in a database (or something else) after getting back your
initial results and then re-sort the hit list, moving up the sponsored elements while 
maintaining the rest of the results as is?  If your list of sponsors are truly that 
small, you could just put 'em in a file and load the list into memory.

Seems then you don't have to re-index when your sponsorships change and you really 
have no dependencies on Lucene with
trying to get boost values right, etc.

I guess this resembles #2.


 [EMAIL PROTECTED] 02/15/04 03:49PM 
I am a newbie to Lucene, and this is my first serious posting
to Lucene-user.

This is to solicit comment upon the problem of supplying
a sponsored links capability within Lucene. This capability
would not affect at all which documents are returned by a query,
but would cause any 'sponsored' documents present among the
results to be displayed before other documents in the list
returned.

I have looked over the correspondence in Lucene-user, but
not found anything addressing this topic; if I have missed it,
please tell me where and when, and ignore the rest of this.

It seems to me that there are three ways to achieve the
capability:

1. Preset boost values for 'sponsored' documents, with an
implied burden of reindexing when sponsors are modified.

2. Post-qualify documents present in the hit list for their
sponsorship status, building a new hit list.

3. Modify the query to search using both the full query as
an unsponsored boolean clause with the default boost value,
and for each sponsor, to repeat the full query ANDed with
that sponsor with the appropriate boost value.

Are there other strategies not considered?

Assuming a small list of sponsors (10 or fewer), and low
volatility amongst the sponsors (1 change / month or less)
which method is best?

I have been pursuing method #1, almost to the exclusion of
the others, but have encountered an unknown difficulty in the
implementation (separate posting).  In particular, while it is clear
that #3 is doable, I know nothing about the searching burden
added by multiplying the user's query by one plus the count of
sponsors.

Regarding #3, if my understanding is right, then:
Sponsors name: s1, s2, s3 ...
 words or phrases: s1w1, s1w2, ... , s2w1, s2w2, ... , s3w1 
 boost values: s1v, s2v, s3v

then given query q as user input, form:
 q
 or (q and (s1w1 | s1w2 | s1w3 | ...)^s1v)
 or (q and (s2w1 | s2w2 ...)^s2v)
 or (q and (s3w1 ...)^s3v)
Is this correct?

Does the strategy of search identify any kind of intermediate
sublist to speed up searching? (But then it would start to
resemble #2.)

Rolling ones own for #2 would run query q, and get the
HitCollector. Separately running queries for each of:
 s1w1 | s1w2 | s1w3 | ...,
 s2w1 | s2w2 ...
 s3w1 ...
and merge each hit collector with the one from query q.
(Just AND the bitsets???) Lastly adjust scores and form
a new composite HitCollecter.  By this time I have told
everyone much more than I know.

Stray thought:-- can HitCollectors be cached at application init?

There are many other questions regarding details of implementation,
but their proper place is another communication.

Just by preparing this document for dissemination has helped
greatly.  All and any comments are much appreciated.

Thank you all.



-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



can't create webapp demo index

2004-02-15 Thread mr scrub
Hi,

I'm having trouble creating the index for the webapp
demo. I had no trouble creating the index for the
non-webapp demo, but I get an NullPointerException
when I try it for the webapp.

I'm on Windows2000 and here's my input and the error
message that I got:

C:\tomcat\webapps\examplesjava
org.apache.lucene.demo.IndexHTML -create -index
C:\tomcat\webapps\index
 caught a class java.lang.NullPointerException
 with message: null

TIA,
Ted




__
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



thanks for your mail

2004-02-15 Thread [EMAIL PROTECTED]
Received your mail we will get back to you shortly


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Field Reindex Question

2004-02-15 Thread Tim Walters
Hi,

I'm thinking of using Lucene in an application that might change the 
field data without modifying the document. It would be nice to only have 
to rewrite the field index information, which is much smaller than the 
information for the document. Would anyone know if this is possible?

Thanks in Advance,
Tim


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Field Reindex Question

2004-02-15 Thread Erik Hatcher
You must remove and re-add the entire document to perform an update.  
Such is the (current) nature of Lucene.

	Erik

On Feb 15, 2004, at 10:25 PM, Tim Walters wrote:

Hi,

I'm thinking of using Lucene in an application that might change the 
field data without modifying the document. It would be nice to only 
have to rewrite the field index information, which is much smaller 
than the information for the document. Would anyone know if this is 
possible?

Thanks in Advance,
Tim


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]