Hello,
We have recently upgraded from Lucene 3.6 to Lucene 4.7.2, and are facing
issues when we are having "*java.lang.OutOfMemoryError: GC overhead limit
exceeded*" while creating the object of WildCardQuery:
*Code:*
Query query = new WildcardQuery(new Term("id", "someTerm"));
*StackTrace: *j
hi,
I am second year undergraduate of University of Moratuwa,SriLanka.My second
year project I am doing Question answering system(Knowledge base).In this
project i have to suggest similar question perviously asked by other users.
I should find similarity of two Sentences in my application to sugges
My colleague at Lucid Imagination, Tom Hill, will be presenting a free
webinar focused on analysis in Lucene/Solr. If you're interested, please
sign up and join us.
Here is the official notice:
We'd like to invite you to a free webinar our company is offering next
Thursday, 28 January, at 2PM Eas
Are you fetching all of the results for your search? If so, you're
actually measuring the time to pull n stored documents out of the index,
not to search over an index of n documents. Which would of course be
linear, most of your cost there will be the i/o to actually pull the
document from disk,
sort parameter for the first query (on the EXACT field),
Sort sort = new Sort(new SortField[] { SortField.FIELD_SCORE, new
SortField(FieldConstants.ALBUM_PRIORITY, true) });
Add the first query on the BooleanQuery.
While the other two succeeding queries will use the default sort parameter.
Re
Hi,
Thanks for pointing me to the API. I found the explanation I'm looking for
at:
http://lucene.apache.org/java/2_4_0/api/core/index.html?org/apache/lucene/search/Hits.html
There's an example on how to use the TopDocCollector instead of Hits.
Regards,
Jay Joel Malaluan
Grant I
xx, xxx) - that will return a Hits object
I was searching the javadoc API (2.3 and 2.4) and didn't found any method
that returns TopDocCollector object from a searcher.search(xxx, xxx) call.
Would be a great help is someone can expound this. I might be able to use
this in future implementation.
Hi,
You can check out Nutch at http://lucene.apache.org/nutch/.
Regards,
Jay Joel Malaluan
Haroldo Nascimento-2 wrote:
>
>
> Hi,
>
> There is any crawler that integrate with index lucene ?
>
> T
d
thing is loveliness is stemmed to "loveli" and loveless is not stemmed at
all.
Does anyone already encountered this and have suggestions on other
Analyzers?
Regards,
Jay Malaluan
--
View this message in context:
http://www.nabble.com/Stemming-behavior-tp21089115p21089115.html
Sent fr
query of q2 have two process?
1. Run the query to get results.
2. For filtering
Regards,
Jay
From: Chris Hostetter
To: java-user@lucene.apache.org
Sent: Thursday, December 18, 2008 3:14:51 PM
Subject: Re: Unique results in BooleanQuery
: Let me expound more
Let me expound more on the question. Will the q1 be run on the BooleanQuery q2
and append the results that are not equal to the result of the first query of
q2?
From: Jay Joel Malaluan
To: java-user@lucene.apache.org
Sent: Wednesday, December 17, 2008 2:42
Hi Paul,
But will the q1 be run on the BooleanQuery q2 or q1 is just used for filtering?
Regards,
Jay Malaluan
From: Paul Cowan
To: java-user@lucene.apache.org
Sent: Wednesday, December 17, 2008 1:37:15 PM
Subject: Re: Unique results in BooleanQuery
Hi
Hi,
Anyone knowledgeable on how to get unique hits using the BooleanQuery?
If I have 2 queries so the when the 1st query is processed then the 2nd
query will not anymore return the same results from the 1st query.
Regards,
Jay Malaluan
--
View this message in context:
http://www.nabble.com
flash" when searched.
Regards,
Jay Malaluan
From: Erick Erickson
To: java-user@lucene.apache.org
Sent: Tuesday, December 16, 2008 10:14:13 PM
Subject: Re: Inquiry on Lucene Stemming
Why do you want to do this? The reason I ask is that you're
m
ot the original word "flashing".
Is there an API in Lucene or third-party APIs that can do the following, I
passed the word "flash" instead it will search for "flashing", "flashed",
"flashes" etc.?
Regards,
Jay Malaluan
BooleanQuery. BoostingQuery is one such attempt but it's not very
flexible (e.g. the damping is independent of the scores of sub queries).
Does anyone know any other existing examples similar to CustomScoreQuery
but deal with multiple sub queries?
Thanks!
Hi,
BoostingQuery is designed to demote the scores of documents when they match
the undesired
query by the boosting/demoting the final score. The problem I see is this
demoting factor is static/universal in the sense that it does not depend on
how much the docs match the negative query terms. Ideal
Hi,
BoostingQuery is designed to demote the scores of documents when they match
the undesired
query by the boosting/demoting the final score. The problem I see is this
demoting factor is static/universal in the sense that it does not depend on
how much the docs match the negative query terms. Ideal
If it's windows only, you can roll your own with IFilters (
http://www.ifilter.org/).
On Tue, May 13, 2008 at 10:23 AM, Lukas Vlcek <[EMAIL PROTECTED]> wrote:
> Does it make sense to consider using OpenOffice to convert from MS formats
> to PDF or HTML before indexing. Would this yield me a lower
Thanks, Uwe, for your clarification and for sharing your experience
which is very helpful!
Jay
Uwe Goetzke wrote:
Hi Jay,
Sorry for the confusion, I wrote NgramStemFilter in an early stage of the project which is essentially the same as NGramTokenFilter from Otis with the addition that I
Sorry, I could not find the filter in the 2.3 API class list (core +
contrib + test). I am not ware of lucene config file either. Could you
please tell me where it is in 2.3 release?
Thanks!
Jay
Otis Gospodnetic wrote:
Jay,
Have a look at Lucene config, it's all there, including
Hi Uwe,
I am curious what NGramStemFilter is? Is it a combination of porter
stemming and word ngram identification?
Thanks!
Jay
Uwe Goetzke wrote:
Hi Ivan,
No, we do not use StandardAnalyser or StandardTokenizer.
Most data is processed by
fTextTokenStream = result = new
My bad. Thanks for the link!
Jay
Chris Hostetter wrote:
: Do you know why FieldNormModifier is removed from Lucene 2.3?
: thanks.
it wasn't...
http://lucene.apache.org/java/2_3_0/api/contrib-misc/org/apache/lucene/index/FieldNormModifier.html
...it's in the "miscellaneous&quo
Do you know why FieldNormModifier is removed from Lucene 2.3?
thanks.
Jay
Chris Hostetter wrote:
: I read the doc for the api indexreader.setNorm() after I posted the question
: earlier. To use that setNorm() to modify the field boost, it seems to me that
: one has to know how the boost is
It'd be helpful if there is an api for getting the norm of a given field
in a given doc.
Thanks for the pointers.
Jay
Chris Hostetter wrote:
: I read the doc for the api indexreader.setNorm() after I posted the question
: earlier. To use that setNorm() to modify the field boost, it see
Hi,
It's clear that there is no easy way to do "in-place" doc update in the
lucene index, but I think it should be theoretically possible to update
the field and doc boostings in place, that is, without deleting and
re-adding the doc and it's fields. Does anyone know
Thanks for your clarifications, Mark!
Jay
Mark Miller wrote:
5. Although currently IndexSearcher.close() does almost nothing except
to close the internal index reader, it might be a safer to close
searcher itself as well in closeCachedSearcher(), just in case, the
searcher may have
hanks!
Jay
Mark Miller wrote:
For anyone following this thread who would like to check this out, I put
up the new code with the warming capability:
https://issues.apache.org/jira/browse/LUCENE-1026
<https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip>
In
You are right, Lucene only gives IllegalArgumentException when the value
is null. I assume it won't skip the field is the value is empty or null?
Thanks!
Jay
Michael McCandless wrote:
As far as I know, Lucene should accept a field with an empty string
value -- how did you hi
Thanks, Michael, for your quick reply and explanation.
One related question: is it true that Lucene indexer will reject a field
that has the empty string value? (I saw an IllegalArgumentException).
Will be nice if lucene just skip such a field silently, esp, for the new
2.3 api.
Jay
Michael
the same result.
Is there a way that I can avoid having QueryParser remove that part of my
query?
Thanks,
-Jay
e-specific Analyzer, and then still use the QueryParser with the
StandardAnalyzer at search time. I've considered building a BooleanQuery of
QueryParsers with each QueryParser built with a language-specific Analyzer,
but that seems like it would be bound to be very slow.
Any opinions or thoughts appreciated.
-Jay
xes in
multiple languages would be appreciated.
-Jay
ht be abandon preload the accessors. After all,
the accessors are cached and not created often.
Thanks!
Jay
Mark Miller wrote:
I think its just a compromise in the design, though it could be
improved. You only ever want a single Writer at a time on the index.
Those two flags are really just
over writes the existing index, then later he cannot append docs to
the index.
Do I miss sth here or you have not finished the implementation of
getWriter yet?
Thanks!
Jay
Mark Miller wrote:
Ah, thanks for catching that. One of the pieces I did not finish...the
keyword analyzer was placeholder
reset
analyzer/Dir as in my own version.
Jay
Mark Miller wrote:
One final noteif you are using the IndexAccessor and you are only
accessing the index from one JVM, you can use the NoLockFactory and save
some sync cost there.
Jay Yu wrote:
Mark,
Great effort getting the original
at your codes to see if I could help. I used a
slightly modified version of the original package in my project but it
breaks some of my tests. I hope your version works better.
Thanks a lot!
Jay
Mark Miller wrote:
I have sat down and rewrote IndexAccessor from scratch. I copied in the
same
total time to
parse a query and run a search. I'll try and get around to posting the
code tonight.
- Mark
Jay Yu wrote:
Mark Miller wrote:
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does
is sync Readers with Writers and allow multiple threads to share the
same in
Mark Miller wrote:
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is
sync Readers with Writers and allow multiple threads to share the same
instances of them -- nothing more. The code just forces Readers to
refresh when Writers are used to change the index. There
r sync.
I will probably give it a try to see how it performs in our system.
Thanks!
Jay
Mark Miller wrote:
The method is synched, but this is because each thread *does* share the
same Searcher. To maintain a cache of searchers across multiple threads,
you've got to sync -- to reference co
method of release(searcher) is costly. On the other hand, if
multiple threads share share one searcher then it'd defeat the
purpose of using LuceneIndexAccessor.
Do I miss sth here? What's your suggested use case for
LuceneIndexAccessor?
Thanks!
Jay
Mark Miller wrote:
Ill respond a
Thanks for your detailed explanation of the issues and your solutions.
It seems that LuceneIndexAccessor is worth trying first before I
implement other locking mechanism to ensure proper order.
I will appreciate it very much if you'd like your extension with us.
Jay
Mark Miller wrote:
been resolved?
Where did you get the latest release? It is not in the official Lucene
sandbox/contrib.
Finally, are you willing to share your extended version to include your
tweak relating to the MultiSearcher?
Thanks a lot!
Jay
Mark Miller wrote:
I use option 3 extensivley and find it very
?
Or do I miss other better solutions?
Thanks for any suggestion/comment!
Jay
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
bits
final BitSet filterBitSet = queryFilter.bits(reader);
filterBitSet.flip(0,filterBitSet.size());
Now you have a filter that contains document matching the opposite of
that specified by the query, and can use in subsequent queries
Dan
On Tue, 2007-07-24 at 09:40 -0700, Jay Yu wrote:
daniel ro
d
can cheaply be stored, generated once and used often.
Dan
On Mon, 2007-07-23 at 13:57 -0700, Jay Yu wrote:
If you want performance, a better way might be to assign some special
string/value (if it's easy to create) to the missing field of docs and
index the field without tokenizing it. Then you
If you want performance, a better way might be to assign some special
string/value (if it's easy to create) to the missing field of docs and
index the field without tokenizing it. Then you may search for that
special value to find the docs.
Jay
Les Fletcher wrote:
Does this particular
Thanks for clarifying this, Chris!
I agree with you that javadocs usual should doc all they do but often
times they skip few important things they do do.
Chris Hostetter wrote:
: Does anyone know if the RangeFilter is a cached filter? I could not
: tell from the api.
Generally speaking cla
Hi All,
Does anyone know if the RangeFilter is a cached filter? I could not
tell from the api.
Thanks!
Jay
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I had a similar problem with threading, the problem turned out to be that in
the back end of the FSDirectory class I believe it was, there was a
synchronized block on the actual RandomAccessFile resource when reading a
block of data from it... high-concurrency situations caused threads to stack
up
Thanks Richard, I'll check it out.
-Jay
On 6/16/05, Richard Krenek <[EMAIL PROTECTED]> wrote:
> To add to this option, you may want to use this patch
> http://issues.apache.org/bugzilla/show_bug.cgi?id=27743
> This way instead of pulling the entire document back each time, j
I like this approach. This may be what I'm looking for.
Thanks JP!
-Jay
On 6/15/05, Robichaud, Jean-Philippe
<[EMAIL PROTECTED]> wrote:
>
> It may be simpler and more effective to use the Hits object and keep the
> number of time each host was actually "returned&qu
using HitCollector as Tony
suggests. I was hoping to avoid the HitCollector, but there may be no
other way right now.
Many thanks,
-Jay
On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> On Jun 14, 2005, at 7:23 PM, Jay Hill wrote:
> > I have a need to limit my Hits return
t_id.
Any help is appreciated.
Thanks,
-Jay
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
55 matches
Mail list logo