Hi All,
I try to store a string Variable as Field.Store.Compress,during
search is there any any inbuilt method to uncompress these records else we
can go for some other algorithm to retreive these records?
--
View this message in context:
http://www.nabble.com/Re%3ARetreive-Compressed-Fie
Hi all,
Given a term (e.g. "apple") and a document in index, how can I get the term
weight in this document? Is this weight equal to the tf*idf value of this
term?
Thanks!
--
View this message in context:
http://www.nabble.com/How-to-get-the-a-term-weight-%28tf*idf%29--tp16321424p16321424.html
[Lucas sent me a zip of the index - thanks!]
I ran CheckIndex on the index and it said this on your _al1 segment:
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1000
at org.apache.lucene.util.BitVector.get(BitVector.java:72)
at
org.apache.lucene.index.Segment
Thanks, Uwe, for your clarification and for sharing your experience
which is very helpful!
Jay
Uwe Goetzke wrote:
Hi Jay,
Sorry for the confusion, I wrote NgramStemFilter in an early stage of the project which is essentially the same as NGramTokenFilter from Otis with the addition that I add
100% Impossible...
My index has 1 xml, 3 number fields, 1 aphanumeric field. *always*
:-)
Lucas
Michael McCandless wrote:
OK.
I would recommend upgrading to 2.3.1. There were some corruption
issues with term vectors that could cause the wrong document's term
vectors to come back.
Tha
OK.
I would recommend upgrading to 2.3.1. There were some corruption
issues with term vectors that could cause the wrong document's term
vectors to come back.
That screen shot is spooky! Is it possible that one of the documents
you indexed had that content? (It could simply be a store
LOL, I know
Take a look, editing the cfs file:
http://img296.imageshack.us/my.php?image=indexow4.jpg
[]s,
Lucas
Yonik Seeley wrote:
On Wed, Mar 26, 2008 at 2:13 PM, Lucas F. A. Teixeira
<[EMAIL PROTECTED]> wrote:
one of the index files
has these log messages from my application ser
Thanks Michael!
Lucene version: 2.3.0
Here is some screenshot of editing the cfs file:
http://img296.imageshack.us/my.php?image=indexow4.jpg
Take a look!
[]s,
Lucas
Michael McCandless wrote:
OK I think I follow now.
Which version of Lucene was this?
If it's not too large, can you post
On Wed, Mar 26, 2008 at 2:13 PM, Lucas F. A. Teixeira
<[EMAIL PROTECTED]> wrote:
> one of the index files
> has these log messages from my application server inside it,
Wow! That's a new one...
-Yonik
-
To unsubscribe, e-mail
Hi all,
Suppose my query has "normal" part for which I want score as usual and
other part which is big disjunction (OR) query for which I just want
documents to match and don't care about scoring. Is there a way to
make it fast?
As far as I understand if 'no-score' part was the same in many querie
OK I think I follow now.
Which version of Lucene was this?
If it's not too large, can you post the CFS file that got mixed up?
Be sure to cc me directly on the mail because the mailing list
software removes attachments.
Mike
Lucas F. A. Teixeira wrote:
This is just one of the index fil
This is just one of the index files.
As I said, the local disk where the index is generated, it's not full,
the full disk it's the shared storage where my application server store
its logs.
When this disk hitted 100%, all the indexing process stop (of course,
all the processing instances of th
Since you're using all the results for a query, and ignoring the
score value, you might try and do the same thing with a relational
database. But I would not expect that to be much faster,
especially when using a field cache.
Other than that, you could also go the other way, and try and
add more
I couldn't quite follow the part about "_al1.cfs".
It sounds like your indexing process hit a disk full event, that led
to this error? Can you post the full exception(s) from the disk full?
Which version of Lucene are you using?
Mike
Lucas F. A. Teixeira wrote:
Hello all!
I had a problem
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__ NOD32 2974 (20080326) Information __
This message was checked by NOD32 antivirus system.
http://www.eset.com
---
Thank you for reply. What I did not mention before was that for
iteration we don't care about scoring, so that's not the issue at all.
Creating Filter with BitSet seems much better idea than keeping
HitIterator in memory. Am I right that in such a case with
MatchAllDocsQuery memory usage would be a
You can store term vectors with positions too. Wouldn't that work
for this case?
Erik
On Mar 25, 2008, at 11:59 PM, John Wang wrote:
I am not sure how term vectors would help me. Term vectors are
ordered by
frequency, not in lex order. Since I know in the dictionary the
terms are
Why not keep a Filter in memory? It consists of a single bit per document
and the ordinal position of that bit is the Lucene doc ID. You could create
this reasonably quickly for the *first* query that came in via HitCollector.
Then each time you wanted another chunk, use the filter to know which
d
Well, caching is designed to use memory. If you are saying that you
haven't got enough memory to cache all your values then caching them
all isn't going to work, at any level. If you implemented your own
cache you could control memory usage with an LRU algorithm or whatever
made sense for your app
Hello all!
I had a problem this week, and I like to share with you all.
My weblogic server that generate my index hrows its logs in a shared
storage. During my indexing process (SOLR+Lucene), this shared storage
became 100% full, and everything collapsed (all servers that uses this
shared stor
Ivan Vasilev a écrit :
Thanks Mathieu,
I tryed to checkout but without success. Anyway I can do it manually,
but as the contribution is still not approved from Lucene our chiefs
will not whant it to be included to our project by now.
It's a right decision. I hope the third patch will be good
jets/revuedepresse/browser/trunk/src/java
You can do a svn checkout.
M.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__ NOD32 2973 (20080326) Information _
> The bottom line is that reading fields from docs is expensive.
> FieldCache will, I believe, load fields for all documents but only
> once - so the second and subsequent times it will be fast. Even
> without using a cache it is likely that things will speed up because
> of caching by the OS.
A
Hi all,
Breaking proximity data has been discussed several times before, and concluded
that setPositionIncrement is the way to go. In regards of it:
1. Where should it be called exactly to create the gap properly?
2. Is there a way to call it directly somehow while indexing (e.g. after adding
On Wed, 2008-03-26 at 10:45 +, Ian Lea wrote:
> If you've got plenty of memory vs index size you could look at
> RAMDirectory or MMapDirectory. Or how about some solid state disks?
> Someone recently posted some very impressive performance stats.
That was probably me. A (very) quick test for
Ivan Vasilev a écrit :
Thanks Mathieu for your help!
The contribution that you have made to Lucene by this patch seems to
be great, but the hunspell dictionary is under LGPL which the lawyer
of our company does not like.
It's the spell tool used by Openoffice and firefox. Data must be multi
l
Hi
The bottom line is that reading fields from docs is expensive.
FieldCache will, I believe, load fields for all documents but only
once - so the second and subsequent times it will be fast. Even
without using a cache it is likely that things will speed up because
of caching by the OS.
If you'
Thanks Mathieu for your help!
The contribution that you have made to Lucene by this patch seems to be
great, but the hunspell dictionary is under LGPL which the lawyer of our
company does not like. Wordnet dictionary seems to be more free and may
be could help together with your patch.
In the
Hi Jay,
Sorry for the confusion, I wrote NgramStemFilter in an early stage of the
project which is essentially the same as NGramTokenFilter from Otis with the
addition that I add begin and end token markers (e.g. word gets and _word_ and
so _w wo rd d_ ).
As I modified a lot of our lucene co
Hi all,
our problem is to choose the best (the fastest) way to iterate over huge set
of documents (basic and most important case is to iterate over all documents
in the index). Some slow process accesses documents and now it is done via
repeating query (for instance MatchAllDocsQuery). It processe
Hi All,
Thanks for your reply. I would like to mention here is that the companyId is
a multivalued field. I tried paul's suggestions also but doesn't seem much
gain. Still the searcher.doc() method is taking almost the same amount of
time.
> you can use the FieldCache to lookup the compnayId for
HI Grant:
I don't see FunctionQuery in the javadocs. Can you post a link?
Thanks
-john
On Mon, Mar 24, 2008 at 3:32 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> See the FunctionQuery and the org.apache.lucene.search.function
> package. You can also implement your own query, as it's n
32 matches
Mail list logo