java.lang.OutOfMemoryError: WildcardQuery

2016-01-06 Thread Pee Jay
Hello, We have recently upgraded from Lucene 3.6 to Lucene 4.7.2, and are facing issues when we are having "*java.lang.OutOfMemoryError: GC overhead limit exceeded*" while creating the object of WildCardQuery: *Code:* Query query = new WildcardQuery(new Term("id", "someTerm")); *StackTrace: *j

for check similarity of two sentences

2015-03-31 Thread hesh jay
hi, I am second year undergraduate of University of Moratuwa,SriLanka.My second year project I am doing Question answering system(Knowledge base).In this project i have to suggest similar question perviously asked by other users. I should find similarity of two Sentences in my application to sugges

Solr Analysis Webinar Jan 28, 2010

2010-01-20 Thread Jay Hill
My colleague at Lucid Imagination, Tom Hill, will be presenting a free webinar focused on analysis in Lucene/Solr. If you're interested, please sign up and join us. Here is the official notice: We'd like to invite you to a free webinar our company is offering next Thursday, 28 January, at 2PM Eas

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Jay Booth
Are you fetching all of the results for your search? If so, you're actually measuring the time to pull n stored documents out of the index, not to search over an index of n documents. Which would of course be linear, most of your cost there will be the i/o to actually pull the document from disk,

BooleanQuery inquiry

2009-03-16 Thread Jay Joel Malaluan
sort parameter for the first query (on the EXACT field), Sort sort = new Sort(new SortField[] { SortField.FIELD_SCORE, new SortField(FieldConstants.ALBUM_PRIORITY, true) }); Add the first query on the BooleanQuery. While the other two succeeding queries will use the default sort parameter. Re

Re: TopDocCollector vs Hits inquiry

2009-02-05 Thread Jay Malaluan
Hi, Thanks for pointing me to the API. I found the explanation I'm looking for at: http://lucene.apache.org/java/2_4_0/api/core/index.html?org/apache/lucene/search/Hits.html There's an example on how to use the TopDocCollector instead of Hits. Regards, Jay Joel Malaluan Grant I

Re: TopDocCollector vs Hits inquiry

2009-02-04 Thread Jay Malaluan
xx, xxx) - that will return a Hits object I was searching the javadoc API (2.3 and 2.4) and didn't found any method that returns TopDocCollector object from a searcher.search(xxx, xxx) call. Would be a great help is someone can expound this. I might be able to use this in future implementation.

Re: Crawler

2009-01-29 Thread Jay Malaluan
Hi, You can check out Nutch at http://lucene.apache.org/nutch/. Regards, Jay Joel Malaluan Haroldo Nascimento-2 wrote: > > > Hi, > > There is any crawler that integrate with index lucene ? > > T

Stemming behavior

2008-12-19 Thread Jay Malaluan
d thing is loveliness is stemmed to "loveli" and loveless is not stemmed at all. Does anyone already encountered this and have suggestions on other Analyzers? Regards, Jay Malaluan -- View this message in context: http://www.nabble.com/Stemming-behavior-tp21089115p21089115.html Sent fr

Re: Unique results in BooleanQuery

2008-12-19 Thread Jay Joel Malaluan
query of q2 have two process? 1. Run the query to get results. 2. For filtering Regards, Jay From: Chris Hostetter To: java-user@lucene.apache.org Sent: Thursday, December 18, 2008 3:14:51 PM Subject: Re: Unique results in BooleanQuery : Let me expound more

Re: Unique results in BooleanQuery

2008-12-16 Thread Jay Joel Malaluan
Let me expound more on the question. Will the q1 be run on the BooleanQuery q2 and append the results that are not equal to the result of the first query of q2? From: Jay Joel Malaluan To: java-user@lucene.apache.org Sent: Wednesday, December 17, 2008 2:42

Re: Unique results in BooleanQuery

2008-12-16 Thread Jay Joel Malaluan
Hi Paul, But will the q1 be run on the BooleanQuery q2 or q1 is just used for filtering? Regards, Jay Malaluan From: Paul Cowan To: java-user@lucene.apache.org Sent: Wednesday, December 17, 2008 1:37:15 PM Subject: Re: Unique results in BooleanQuery Hi

Unique results in BooleanQuery

2008-12-16 Thread Jay Malaluan
Hi, Anyone knowledgeable on how to get unique hits using the BooleanQuery? If I have 2 queries so the when the 1st query is processed then the 2nd query will not anymore return the same results from the 1st query. Regards, Jay Malaluan -- View this message in context: http://www.nabble.com

Re: Inquiry on Lucene Stemming

2008-12-16 Thread Jay Joel Malaluan
flash" when searched. Regards, Jay Malaluan From: Erick Erickson To: java-user@lucene.apache.org Sent: Tuesday, December 16, 2008 10:14:13 PM Subject: Re: Inquiry on Lucene Stemming Why do you want to do this? The reason I ask is that you're m

Inquiry on Lucene Stemming

2008-12-16 Thread Jay Joel Malaluan
ot the original word "flashing". Is there an API in Lucene or third-party APIs that can do the following, I passed the word "flash" instead it will search for "flashing", "flashed", "flashes" etc.? Regards, Jay Malaluan

CustomScoreQuery and BooleanQuery

2008-08-06 Thread Jay
BooleanQuery. BoostingQuery is one such attempt but it's not very flexible (e.g. the damping is independent of the scores of sub queries). Does anyone know any other existing examples similar to CustomScoreQuery but deal with multiple sub queries? Thanks!

testing, pls ignore

2008-06-24 Thread Jay dragon

BoostingQuery

2008-06-24 Thread Jay dragon
Hi, BoostingQuery is designed to demote the scores of documents when they match the undesired query by the boosting/demoting the final score. The problem I see is this demoting factor is static/universal in the sense that it does not depend on how much the docs match the negative query terms. Ideal

BoostingQuery

2008-06-23 Thread Jay dragon
Hi, BoostingQuery is designed to demote the scores of documents when they match the undesired query by the boosting/demoting the final score. The problem I see is this demoting factor is static/universal in the sense that it does not depend on how much the docs match the negative query terms. Ideal

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Jay O'Leary
If it's windows only, you can roll your own with IFilters ( http://www.ifilter.org/). On Tue, May 13, 2008 at 10:23 AM, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > Does it make sense to consider using OpenOffice to convert from MS formats > to PDF or HTML before indexing. Would this yield me a lower

Re: AW: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-26 Thread Jay
Thanks, Uwe, for your clarification and for sharing your experience which is very helpful! Jay Uwe Goetzke wrote: Hi Jay, Sorry for the confusion, I wrote NgramStemFilter in an early stage of the project which is essentially the same as NGramTokenFilter from Otis with the addition that I

Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-25 Thread Jay
Sorry, I could not find the filter in the 2.3 API class list (core + contrib + test). I am not ware of lucene config file either. Could you please tell me where it is in 2.3 release? Thanks! Jay Otis Gospodnetic wrote: Jay, Have a look at Lucene config, it's all there, including

Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-25 Thread Jay
Hi Uwe, I am curious what NGramStemFilter is? Is it a combination of porter stemming and word ngram identification? Thanks! Jay Uwe Goetzke wrote: Hi Ivan, No, we do not use StandardAnalyser or StandardTokenizer. Most data is processed by fTextTokenStream = result = new

Re: update field boost

2008-02-12 Thread Jay
My bad. Thanks for the link! Jay Chris Hostetter wrote: : Do you know why FieldNormModifier is removed from Lucene 2.3? : thanks. it wasn't... http://lucene.apache.org/java/2_3_0/api/contrib-misc/org/apache/lucene/index/FieldNormModifier.html ...it's in the "miscellaneous&quo

Re: update field boost

2008-02-12 Thread Jay
Do you know why FieldNormModifier is removed from Lucene 2.3? thanks. Jay Chris Hostetter wrote: : I read the doc for the api indexreader.setNorm() after I posted the question : earlier. To use that setNorm() to modify the field boost, it seems to me that : one has to know how the boost is

Re: update field boost

2008-02-12 Thread Jay
It'd be helpful if there is an api for getting the norm of a given field in a given doc. Thanks for the pointers. Jay Chris Hostetter wrote: : I read the doc for the api indexreader.setNorm() after I posted the question : earlier. To use that setNorm() to modify the field boost, it see

update field boost

2008-02-11 Thread Jay
Hi, It's clear that there is no easy way to do "in-place" doc update in the lucene index, but I think it should be theoretically possible to update the field and doc boostings in place, that is, without deleting and re-adding the doc and it's fields. Does anyone know

Re: DefaultIndexAccessor

2008-02-06 Thread Jay
Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have

Re: DefaultIndexAccessor

2008-02-05 Thread Jay
hanks! Jay Mark Miller wrote: For anyone following this thread who would like to check this out, I put up the new code with the warming capability: https://issues.apache.org/jira/browse/LUCENE-1026 <https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip> In

Re: Reuse single document and fields

2008-02-01 Thread Jay
You are right, Lucene only gives IllegalArgumentException when the value is null. I assume it won't skip the field is the value is empty or null? Thanks! Jay Michael McCandless wrote: As far as I know, Lucene should accept a field with an empty string value -- how did you hi

Re: Reuse single document and fields

2008-02-01 Thread Jay
Thanks, Michael, for your quick reply and explanation. One related question: is it true that Lucene indexer will reject a field that has the empty string value? (I saw an IllegalArgumentException). Will be nice if lucene just skip such a field silently, esp, for the new 2.3 api. Jay Michael

Stemmers remove part of a query when using QueryParser

2008-01-25 Thread Jay Hill
the same result. Is there a way that I can avoid having QueryParser remove that part of my query? Thanks, -Jay

Analyzer choices for indexing and searching multiple languages

2007-12-26 Thread Jay Hill
e-specific Analyzer, and then still use the QueryParser with the StandardAnalyzer at search time. I've considered building a BooleanQuery of QueryParsers with each QueryParser built with a language-specific Analyzer, but that seems like it would be bound to be very slow. Any opinions or thoughts appreciated. -Jay

Analyzer to use with MultiSearcher using various indexes for multiple languages

2007-12-17 Thread Jay Hill
xes in multiple languages would be appreciated. -Jay

Re: thread safe shared IndexSearcher

2007-09-25 Thread Jay Yu
ht be abandon preload the accessors. After all, the accessors are cached and not created often. Thanks! Jay Mark Miller wrote: I think its just a compromise in the design, though it could be improved. You only ever want a single Writer at a time on the index. Those two flags are really just

Re: thread safe shared IndexSearcher

2007-09-25 Thread Jay Yu
over writes the existing index, then later he cannot append docs to the index. Do I miss sth here or you have not finished the implementation of getWriter yet? Thanks! Jay Mark Miller wrote: Ah, thanks for catching that. One of the pieces I did not finish...the keyword analyzer was placeholder

Re: thread safe shared IndexSearcher

2007-09-24 Thread Jay Yu
reset analyzer/Dir as in my own version. Jay Mark Miller wrote: One final noteif you are using the IndexAccessor and you are only accessing the index from one JVM, you can use the NoLockFactory and save some sync cost there. Jay Yu wrote: Mark, Great effort getting the original

Re: thread safe shared IndexSearcher

2007-09-24 Thread Jay Yu
at your codes to see if I could help. I used a slightly modified version of the original package in my project but it breaks some of my tests. I hope your version works better. Thanks a lot! Jay Mark Miller wrote: I have sat down and rewrote IndexAccessor from scratch. I copied in the same

Re: thread safe shared IndexSearcher

2007-09-24 Thread Jay Yu
total time to parse a query and run a search. I'll try and get around to posting the code tonight. - Mark Jay Yu wrote: Mark Miller wrote: Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is sync Readers with Writers and allow multiple threads to share the same in

Re: thread safe shared IndexSearcher

2007-09-20 Thread Jay Yu
Mark Miller wrote: Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is sync Readers with Writers and allow multiple threads to share the same instances of them -- nothing more. The code just forces Readers to refresh when Writers are used to change the index. There

Re: thread safe shared IndexSearcher

2007-09-20 Thread Jay Yu
r sync. I will probably give it a try to see how it performs in our system. Thanks! Jay Mark Miller wrote: The method is synched, but this is because each thread *does* share the same Searcher. To maintain a cache of searchers across multiple threads, you've got to sync -- to reference co

Re: thread safe shared IndexSearcher

2007-09-19 Thread Jay Yu
method of release(searcher) is costly. On the other hand, if multiple threads share share one searcher then it'd defeat the purpose of using LuceneIndexAccessor. Do I miss sth here? What's your suggested use case for LuceneIndexAccessor? Thanks! Jay Mark Miller wrote: Ill respond a

Re: thread safe shared IndexSearcher

2007-09-19 Thread Jay Yu
Thanks for your detailed explanation of the issues and your solutions. It seems that LuceneIndexAccessor is worth trying first before I implement other locking mechanism to ensure proper order. I will appreciate it very much if you'd like your extension with us. Jay Mark Miller wrote:

Re: thread safe shared IndexSearcher

2007-09-19 Thread Jay Yu
been resolved? Where did you get the latest release? It is not in the official Lucene sandbox/contrib. Finally, are you willing to share your extended version to include your tweak relating to the MultiSearcher? Thanks a lot! Jay Mark Miller wrote: I use option 3 extensivley and find it very

thread safe shared IndexSearcher

2007-09-19 Thread Jay Yu
? Or do I miss other better solutions? Thanks for any suggestion/comment! Jay - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search for null

2007-07-25 Thread Jay Yu
bits final BitSet filterBitSet = queryFilter.bits(reader); filterBitSet.flip(0,filterBitSet.size()); Now you have a filter that contains document matching the opposite of that specified by the query, and can use in subsequent queries Dan On Tue, 2007-07-24 at 09:40 -0700, Jay Yu wrote: daniel ro

Re: Search for null

2007-07-24 Thread Jay Yu
d can cheaply be stored, generated once and used often. Dan On Mon, 2007-07-23 at 13:57 -0700, Jay Yu wrote: If you want performance, a better way might be to assign some special string/value (if it's easy to create) to the missing field of docs and index the field without tokenizing it. Then you

Re: Search for null

2007-07-23 Thread Jay Yu
If you want performance, a better way might be to assign some special string/value (if it's easy to create) to the missing field of docs and index the field without tokenizing it. Then you may search for that special value to find the docs. Jay Les Fletcher wrote: Does this particular

Re: RangeFilter

2007-07-10 Thread Jay Yu
Thanks for clarifying this, Chris! I agree with you that javadocs usual should doc all they do but often times they skip few important things they do do. Chris Hostetter wrote: : Does anyone know if the RangeFilter is a cached filter? I could not : tell from the api. Generally speaking cla

RangeFilter

2007-07-10 Thread Jay Yu
Hi All, Does anyone know if the RangeFilter is a cached filter? I could not tell from the api. Thanks! Jay - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Jay Booth
I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up

Re: Need a way to set a result limit on a particular field

2005-06-16 Thread Jay Hill
Thanks Richard, I'll check it out. -Jay On 6/16/05, Richard Krenek <[EMAIL PROTECTED]> wrote: > To add to this option, you may want to use this patch > http://issues.apache.org/bugzilla/show_bug.cgi?id=27743 > This way instead of pulling the entire document back each time, j

Re: Need a way to set a result limit on a particular field

2005-06-15 Thread Jay Hill
I like this approach. This may be what I'm looking for. Thanks JP! -Jay On 6/15/05, Robichaud, Jean-Philippe <[EMAIL PROTECTED]> wrote: > > It may be simpler and more effective to use the Hits object and keep the > number of time each host was actually "returned&qu

Re: Need a way to set a result limit on a particular field

2005-06-15 Thread Jay Hill
using HitCollector as Tony suggests. I was hoping to avoid the HitCollector, but there may be no other way right now. Many thanks, -Jay On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote: > > I have a need to limit my Hits return

Need a way to set a result limit on a particular field

2005-06-14 Thread Jay Hill
t_id. Any help is appreciated. Thanks, -Jay - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]