Re: Hierarchical classified documents

2006-11-26 Thread karl wettin


25 nov 2006 kl. 17.26 skrev Robert Koberg:

What I do is add a 'path' field with the xpath to the node. Then  
you first narrow your search by finding documents with paths like:


/node[1]/node[3]*


You use a wildcard query? That can turn out to be very expensive if  
you have a thousand and thousand of heirarchies. Or?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Hierarchical classified documents

2006-11-26 Thread Robert Koberg

karl wettin wrote:


25 nov 2006 kl. 17.26 skrev Robert Koberg:

What I do is add a 'path' field with the xpath to the node. Then you 
first narrow your search by finding documents with paths like:


/node[1]/node[3]*


You use a wildcard query? That can turn out to be very expensive if you 
have a thousand and thousand of heirarchies. Or?




I think it is a valid option to have, at least for my needs.

I do it for a website hierarchy. Hopefully, a website will not have a 
very deep hierarchy :) Basically, it is presented to the user in a 
right+click context menu on a nav tree.


You /could/ use the wildcard query with the hope that the user has 
drilled down as far as possible. Or you could just search a directory 
and not go any deeper (by not including the wildcard char -- presented 
to the user as a search option).


best,
-Rob


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: RAMDirectory vs MemoryIndex

2006-11-26 Thread jm

I tested this. I use a single static analyzer for all my documents,
and the caching analyzer was not working properly. I had to add a
method to clear the cache each time a new document was to be indexed,
and then it worked as expected. I have never looked into lucenes inner
working so I am not sure if what I did is correct.

I also had to comment some code cause I merged the memory stuff from
trunk with lucene 2.0.

Performance was certainly much better (4 times faster in my very gross
testing), but for my processing that operation is only a very small,
so I will keep the original way, without caching the tokens, just to
be able to use the unmodified lucene 2.0.  I found a data problem in
my tests, but as I was not going to pursue that improvement for now I
did not look into it.

thanks,
javier

On 11/23/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:

Out of interest, I've checked an implementation of something like
this into AnalyzerUtil SVN trunk:

   /**
* Returns an analyzer wrapper that caches all tokens generated by
the underlying child analyzer's
* token stream, and delivers those cached tokens on subsequent
calls to
* tokenStream(String fieldName, Reader reader).
* 
* This can help improve performance in the presence of expensive
Analyzer / TokenFilter chains.
* 
* Caveats:
* 1) Caching only works if the methods equals() and hashCode()
methods are properly
* implemented on the Reader passed to tokenStream(String
fieldName, Reader reader).
* 2) Caching the tokens of large Lucene documents can lead to out
of memory exceptions.
* 3) The Token instances delivered by the underlying child
analyzer must be immutable.
*
* @param child
*the underlying child analyzer
* @return a new analyzer
*/
   public static Analyzer getTokenCachingAnalyzer(final Analyzer
child) { ... }


Check it out, and let me know if this is close to what you had in mind.

Wolfgang.

On Nov 22, 2006, at 9:19 AM, Wolfgang Hoschek wrote:

> I've never tried it, but I guess you could write an Analyzer and
> TokenFilter that no only feeds into IndexWriter on
> IndexWriter.addDocument(), but as a sneaky side effect also
> simultaneously saves its tokens into a list so that you could later
> turn that list into another TokenStream to be added to MemoryIndex.
> How much this might help depends on how expensive your analyzer
> chain is. For some examples on how to set up analyzers for chains
> of token streams, see MemoryIndex.keywordTokenStream and class
> AnalzyerUtil in the same package.
>
> Wolfgang.
>
> On Nov 22, 2006, at 4:15 AM, jm wrote:
>
>> checking one last thing, just in case...
>>
>> as I mentioned, I have previously indexed the same document in
>> another
>> index (for another purpose), as I am going to use the same analyzer,
>> would it be possible to avoid analyzing the doc again?
>>
>> I see IndexWriter.addDocument() returns void, so it does not seem to
>> be an easy way to do that no?
>>
>> thanks
>>
>> On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
>>>
>>> On Nov 21, 2006, at 12:38 PM, jm wrote:
>>>
>>> > Ok, thanks, I'll give MemoryIndex a go, and if that is not good
>>> enoguh
>>> > I will explore the other options then.
>>>
>>> To get started you can use something like this:
>>>
>>> for each document D:
>>>  MemoryIndex index = createMemoryIndex(D, ...)
>>>  for each query Q:
>>>  float score = index.search(Q)
>>> if (score > 0.0) System.out.println("it's a match");
>>>
>>>
>>>
>>>
>>>private MemoryIndex createMemoryIndex(Document doc, Analyzer
>>> analyzer) {
>>>  MemoryIndex index = new MemoryIndex();
>>>  Enumeration iter = doc.fields();
>>>  while (iter.hasMoreElements()) {
>>>Field field = (Field) iter.nextElement();
>>>index.addField(field.name(), field.stringValue(), analyzer);
>>>  }
>>>  return index;
>>>}
>>>
>>>
>>>
>>> >
>>> >
>>> > On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
>>> >> On Nov 21, 2006, at 7:43 AM, jm wrote:
>>> >>
>>> >> > Hi,
>>> >> >
>>> >> > I have to decide between  using a RAMDirectory and
>>> MemoryIndex, but
>>> >> > not sure what approach will work better...
>>> >> >
>>> >> > I have to run many items (tens of thousands) against some
>>> >> queries (100
>>> >> > at most), but I have to do it one item at a time. And I already
>>> >> have
>>> >> > the lucene Document associated with each item, from a previous
>>> >> > operation I perform.
>>> >> >
>>> >> > From what I read MemoryIndex should be faster, but apparently I
>>> >> cannot
>>> >> > reuse the document I already have, and I have to create a new
>>> >> > MemoryIndex per item.
>>> >>
>>> >> A MemoryIndex object holds one document.
>>> >>
>>> >> > Using the RAMDirectory I can use only one of
>>> >> > them, also one IndexWriter, and create a IndexSearcher and
>>> >> IndexReader
>>> >> > per item, for searching and removing the item each time.
>>> >> >
>>> >> > Any thoug

Re: Error in QueryTermExtractor.getTermsFromBooleanQuery

2006-11-26 Thread markharw00d

Nope, not seen that one.
Looks like the reference to no such field is in the Java instance data 
sense, not the Lucene document sense.

Class versioning issues somewhere?
That method takes a parameter called "prohibited" which is the name of 
the field reported in the error. Is the word "prohibited" a reserved 
Java word somewhere now? What JVM are you running on - 1.6?


Cheers
Mark

Otis Gospodnetic wrote:

Hi,

I just moved from 1.9.1 to 2.1-dev.  One error that seems to happen a lot now 
is below.  I haven't had the chance to investigate yet (note the time), but I 
thought I'd throw (no pun intended) it out there and see if anyone else has 
seen this before.

java.lang.NoSuchFieldError: prohibited
at 
org.apache.lucene.search.highlight.QueryTermExtractor.getTermsFromBooleanQuery(QueryTermExtractor.java:91)
at 
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:66)
at 
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:59)
at 
org.apache.lucene.search.highlight.QueryTermExtractor.getTerms(QueryTermExtractor.java:45)
at 
org.apache.lucene.search.highlight.QueryScorer.(QueryScorer.java:48)

The only thing I know so far is that the field I'm passing to the highlighter 
is actually empty, so there will be nothing to highlight, but it still 
shouldn't bomb.  Here is a snippet from my code:

TokenStream tokenStream = ANALYZER.tokenStream(_textFieldName, new 
StringReader(text));
 highlightText = _highlighter.getBestFragments(tokenStream, text, 
_maxNumFragmentsRequired, "...");
 ...

That "text" variable holds the content of the field, and I just happen to know 
it's empty/blank (I currently don't store anything in that Field).  I can't test with a 
non-empty field right now to check whether that throws QueryTermExtractor off.

Thanks,
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  



Send instant messages to your online friends http://uk.messenger.yahoo.com 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to set query time scoring

2006-11-26 Thread Sajid Khan

 I have already set some score at the index time. And now i want to set
some score at the query time. But i am not getting any idea of how to set
the score at query time in lucene.
 Has anybody an idea how to do this?

Regards
Sajid
-- 
View this message in context: 
http://www.nabble.com/How-to-set-query-time-scoring-tf2709773.html#a7554766
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to set query time scoring

2006-11-26 Thread Bhavin Pandya

Hi sajid,

As you already boost data at indexing time...
You can boost query at search time...
eg. If you are firing boolean query and phrasequery...you might need to 
boost phrasequery


PhraseQuery pq = new PhraseQuery();
pq.setBoost(2.0f);

Thanks.
Bhavin pandya

- Original Message - 
From: "Sajid Khan" <[EMAIL PROTECTED]>

To: 
Sent: Monday, November 27, 2006 10:17 AM
Subject: How to set query time scoring




I have already set some score at the index time. And now i want to set
some score at the query time. But i am not getting any idea of how to set
the score at query time in lucene.
Has anybody an idea how to do this?

Regards
Sajid
--
View this message in context: 
http://www.nabble.com/How-to-set-query-time-scoring-tf2709773.html#a7554766

Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie Search Question

2006-11-26 Thread sirakov


Erick Erickson wrote:
> 
> And how are you storing your date? Field.Store.YES? NO? COMPRESSED?
> 

I think, here is my problem...I have found this in the FileDocument.java:

doc.add(new Field("contents", new FileReader(f)));

Field.Store.YES is missing, but when I try to put this argument, i become an
error message. I'll try to found a solution for the problem, but if you have
any tips for me - please let me know :)

thanks in advance,
sirakov

-- 
View this message in context: 
http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7555948
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Database searching using Lucene....

2006-11-26 Thread Inderjeet Kalra
Hi,

I need some inputs on the database searching using lucene. 

Lucene directly supports the document searching but I am unable to find
out the easy and the fastest way for database searching. 


Which option would be better - SPs or Lucene search engine in terms of
implementation, performance and security...if anyone has already done
analysis on the same, can you please provide me the comparison matrix or
benchmarks for the same ?

 

Thanks in advance

 

Regards

Inderjeet



 
***The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged material. 
Any review,retransmission,dissemination or other use of, or taking of any 
action in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer.***