Re: [Bulk] RE: Exception at MultiSearcherThread.hits

2009-07-13 Thread Erick Erickson
Please don't hijack a thread, start a new topic. From Hossman: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the

Re: Use of Synonyms

2009-07-13 Thread Erick Erickson
What are you trying to do? I think you'd get a better response ifyou explained what higher-level task/feature you're trying to implement. Best Erick On Mon, Jul 13, 2009 at 4:54 AM, liat oren oren.l...@gmail.com wrote: Hi all, I have a list of synonyms for every word. Is there a good way to

Re: Is my app a good fit for Lucene?

2009-07-10 Thread Erick Erickson
It would be helpful if you told us what analyzers you're using andwhat your search code looks like. Even better would be a small, self-contained demonstration app showing the issue. You could well be right that the text format is tripping up tokenizing, but there are other issues. You may have to

Re: Using IN to retrieve data after lucene search.

2009-07-09 Thread Erick Erickson
It depends (tm). How much data are we talking about here?I dislike having to have two data sources for a running app just because it's more complicated, so my first try would be to store all the data in the index and try it. A several Gigabyte index is not a problem at all (depending upon how you

Re: How to use RegexTermEnum

2009-07-03 Thread Erick Erickson
WARNING: I haven't actually tried using RegexTermEnum in a long time, but... I *think* that the constructor positions you at the first term that matches, without calling next(). At least there's nothing I saw in the documentation that indicates you need to call next() before calling term().

Re: Storing a serialized object ?

2009-07-03 Thread Erick Erickson
H. I'm having trouble understanding what you want to accomplish and why you think storing a java object is appropriate to do in a Lucene index. Perhaps you could expand on your use case here. Best Erick On Fri, Jul 3, 2009 at 3:32 PM, MilleBii mille...@gmail.com wrote: I want to store in

Re: search for percent char with lucene

2009-07-03 Thread Erick Erickson
You have to tell us what analyzers you are using. Many analyzers will throw out non alpha-num characters. Even better, a small, self-contained test case illustrating your problem would help us help you. Best Erick On Fri, Jul 3, 2009 at 5:11 PM, shbn sharon.benkovi...@ewave.co.il wrote: Hi,

Re: optimized searching

2009-06-30 Thread Erick Erickson
in Ian's link, particularly see the section Don't iterate over morehits than necessary. A couple of other things: 1 Loading the entire document just to get a field or two isn't very efficient, think about lazy loading (See FieldSelector) 2 What do you mean when you say not very good? Using

Re: optimized searching

2009-06-30 Thread Erick Erickson
can you please tune my code to work it faster and better Are you willing to pay me to do your job for you? Sorry to besnarky, but please be aware that we're volunteers here, it's pretty presumptuous to ask for this. You still haven't answered what it is you're trying to do. Why are you

Re: Lucene Term Encoder

2009-06-29 Thread Erick Erickson
You probably need to make sure you understand analyzers beforeyou think about escaping/encoding. For instance, if you use StandardAnalyzer when indexing the text Las Vegas-Food Dining Place would index the tokens las vegas food dining place nary a hyphen to be seen. If you used StandardAnalyzer

Re: Indexing

2009-06-25 Thread Erick Erickson
This is really a permissions problem, which has been discussed frequently. I think you'd get farther faster by searching the mail archive (see this page, near the bottom: http://lucene.apache.org/java/docs/mailinglists.html http://lucene.apache.org/java/docs/mailinglists.htmland see if those

Re: Searching for a special character

2009-06-24 Thread Erick Erickson
First, I highly, highly recommend you get a copy of Luke to examineyour index. It'll also help you understand the role of Analyzers. Your first problem is that StandardAnalyzer probably removes the open and close parens. See: http://lucene.apache.org/java/2_4_1/api/index.html so you can't search

Re: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Erick Erickson
Opening a searcher and doing the first query incurs a significant amount of overhead, cache loading, etc. Inferring search times relative to index size with a program like you describe is unreliable. Try firing a few queries at the index without measuring, *then* measure the time it takes for

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Erick Erickson
Are you measuring search time *only* or are you measuring total response time including assembling whatever you assemble? If you're measuring total response time, everything from network latency to what you're doing with each hit may affect response time. This is especially true if you're

Re: Problem with NOT and OR Query

2009-06-16 Thread Erick Erickson
NOT isn't a boolean operator, which is a source of continuous confusion. See: http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#NOT for a part of the explanation, and http://wiki.apache.org/lucene-java/BooleanQuerySyntax Best Erick On Tue, Jun 16, 2009 at 11:24 AM, Sumanta Bhowmik

Re: Fuzzy vs Prefix query Performance

2009-06-15 Thread Erick Erickson
Well, if you're seeing it, it's possible G But the first question is always what were you measuring? Be aware that when you open a searcher, the first few queries can fill caches, etc and may take an anomalously long time, especially if you're sorting. So could you give more details of your

Re: How to filter data based on dates?

2009-06-12 Thread Erick Erickson
Why wouldn't two RangeQuerys work for this? Essentially something expressing startdate[0 TO Systemtime] AND enddate[Systemtime TO infinity]? Best Erick On Fri, Jun 12, 2009 at 1:00 PM, Muhammad Momin Rashid mo...@abdere.comwrote: Hello Everyone, I need to filter records based on whether

Re: cannot retrieve the values of a field is not stored in the index

2009-06-05 Thread Erick Erickson
Enumerating terms will be inefficient compared to getting the stored field.I'd try soring the fields first until and unless you can demonstrate a problem. BTW, if you're not going to *search* on the field, there's no reason to index it at all. Why do think you don't want to store the paths? How

Re: Index and search terms containing character -

2009-06-03 Thread Erick Erickson
to understand the queries and the content of the index. Thanks (Erick Balasubramanian Sudaakeran) Tom --- En date de : Dim 31.5.09, Erick Erickson erickerick...@gmail.com a écrit : De: Erick Erickson erickerick...@gmail.com Objet: Re: Index and search terms containing character - À: java-user

Re: Sorting fields while searching!

2009-06-01 Thread Erick Erickson
It's really unclear to me what PhysicianFieldInfo.FIRST_NAME_EXACT.toString() returns. I assume the intent is to return a field name, but how that relates to FIRST_NAME_EXACT(Field.Store.YES, Field.Index.UN_TOKENIZED) doesn't mean anything to me. Could you provide some details? Note that if you

Re: No hits while searching!

2009-05-27 Thread Erick Erickson
The most common issue with this kind of thing is that UN_TOKENIZEDimplies no case folding. So if your case differs you won't get a match. That aside, the very first thing I'd do is get a copy of Luke (google Lucene Luke) and examine the index to see if what's in your index is what you *think* is

Re: Searching index problems with tomcat

2009-05-27 Thread Erick Erickson
StandardAnalyzer is fine. I loaded your index into Luke and there is exactly one document with philipcimiano in the name field. There is only one document that has researcher in the name field. Both of these documents (using StandardAnalyzer) return one document (doc 12 for PHILIPCIMIANO and doc 4

Re: Hit highlighting for non-english unicode index/queries not working?

2009-05-26 Thread Erick Erickson
analyzer won't be that difficult after going thru your mail. I'll give it a try. I don't have any idea on filters but I'm pretty it must be simple and will definitely go through the examples of LIA 2ndEdn. Thank you. --KK On Tue, May 26, 2009 at 6:55 PM, Erick Erickson erickerick

Re: Which analyzer to use for non-english unicoded text?

2009-05-24 Thread Erick Erickson
I don't think there's anything you can use out of the box, but if you search for the mail thread (see serchable archives) for a thread titled Hebrew and Hindi analyzers you might find something useful. Not much help I know, but perhaps a place to start. And yes, you should use the same analyzer

Re: About sort questions

2009-05-21 Thread Erick Erickson
I suspect that your boost values are too small to really influencethe scores very much. Have you tried using boost values of, say, d:5^100 OR uid:10^10 OR lang:lisp ? But if you have specific documents that you *know* you want in specific places, why play around with boosting at all? You can use

Re: How to create a new index

2009-05-20 Thread Erick Erickson
Unless something about your problem space *requires* that you reopen theindex, you're better off just opining it once, writing all your documents to it, then closing it. Although what you're doing will work, it's not very efficient. And the same thing is *especially* true of the searcher. There's

Re: read between the lines of an index

2009-05-20 Thread Erick Erickson
The Lucene In Action book (at least the first edition and, I presume, the second) has exactly this, called SynonymAnalyzer. The basic idea is that at index time you index your multiple terms with no increment between, so all your synonyms get indexed in the same position. I highly recommend the

Re: Using Luke on a Lucene Index in a Database

2009-05-19 Thread Erick Erickson
Well, you haven't really provided much in the way of details.For instance, what does it mean that your Lucene index is stored in a database? Did you store it as a BLOB? Your problem statement is very hard to understand, please explain in more detail. Pretend you don't know a thing about your app

Re: Getting a score of a specific document

2009-05-18 Thread Erick Erickson
the fields I need. Could you please give me an example of how I creat the Filter that filters out a given list of ids? Thanks! Liat 2009/5/18 Erick Erickson erickerick...@gmail.com I'm still unclear what you want the statistics *for*. statistics are pretty meaningless as far as I understand

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
Have you looked at TopDocCollector? Basically, you can tell itto only return you the top N docs by score (N is arbitrary). What you then have is an array of raw score and doc ID pairs AND a max score. NOTE: raw score is not normalized, i.e. is not guaranteed to be between 0 and 1. So now you can

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
- From: Erick Erickson erickerick...@gmail.com Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for scores Date: Mon, 18 May 2009 09:13:27 -0400 Have you looked at TopDocCollector? Basically, you can tell itto only return you the top N docs

Re: Getting a score of a specific document

2009-05-17 Thread Erick Erickson
a list of ids that the query should look at, which Filter should I use? Thanks a lot, Liat 2009/5/14 Erick Erickson erickerick...@gmail.com Hmmm, come to think of it, if you pass the Filter to the search I*think* you don't get scores for that clause, but you may want to check it out

Re: Issues with escaping special characters

2009-05-15 Thread Erick Erickson
issue? On Thu, May 14, 2009 at 4:59 PM, Erick Erickson erickerick...@gmail.comwrote: I suspect that what's happening is that StandardAnalyzer is breaking your stream up on the odd characters. All escaping them on the query does is insure that they're not interpreted by the parser

Re: Getting a score of a specific document

2009-05-14 Thread Erick Erickson
I don't know if I'm understanding what you want, but if you havea pre-defined list of documents, couldn't you form a Filter? Then your results would only be the documents you care about. If this is irrelevant, perhaps you could explain a bit more about the problem you're trying to solve. Best

Re: Getting a score of a specific document

2009-05-14 Thread Erick Erickson
on these, but it will take the statictics of the whole index, right? 2009/5/14 Erick Erickson erickerick...@gmail.com I don't know if I'm understanding what you want, but if you havea pre-defined list of documents, couldn't you form a Filter? Then your results would only be the documents you care about

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread Erick Erickson
No. What is correctly? Are you stemming? in which case using thesame analyzer on different languages will not work. This topic have been discussed on the user list frequently, so if you searched that archive (see: http://wiki.apache.org/lucene-java/MailingListArchives) you'd find a wealth of

Re: Issues with escaping special characters

2009-05-14 Thread Erick Erickson
I suspect that what's happening is that StandardAnalyzer is breaking your stream up on the odd characters. All escaping them on the query does is insure that they're not interpreted by the parser as (in this case), the beginning of a group and a MUST operator. So, I claim it correctly feeds

Re: Alphanumeric Search Problem

2009-05-13 Thread Erick Erickson
I'd recommend you get a copy of Luke and examine what's actually in your index when anomalous things happen. In your first post you didn't specify what analyzer you used, I suspect you weren't getting the tokens broken up as you expected. Luke would have shown you. But if you're satisfied

Re: I can't found the package org.apache.lucene.index.memory.AnalyzerUtil

2009-05-11 Thread Erick Erickson
The class is contained in org.apache.lucene.index.memory.AnalyzerUtil Assuming you've installed 2.4, it's in... which is located in the contrib area. Try looking in your 2.4 installation directory/contrib/memory/lucene-memory-2.4.0.jar Best Erick 2009/5/11 Kamal Najib kamal.na...@mytum.de

Re: RegexQuery Incomplete Results

2009-05-08 Thread Erick Erickson
I don't understand your regex at all. Isn't it looking for in with any *single* character in front and back? Given your example, I don't see how you're getting anything back at all. Is this code you're actually executing or just an example? What does toString and/or Explain show? Think about

Re: why setPhraseSlop() not helping

2009-05-07 Thread Erick Erickson
You haven't forced the double quotes through to the parser. Try Query query = qp.parse(\word1 word2\); On Thu, May 7, 2009 at 11:14 AM, Seid Mohammed seidy...@gmail.com wrote: I have set the slop for my search to be some terms away for inclusion. unfortunately, the result is the same

Re: Exact match on entire field

2009-05-06 Thread Erick Erickson
how much data are you talking about here? Could you use a KeywordAnalyzer (perhaps in a duplicated field) with appropriate filtering (to lowercase, remove punctuation, etc)? Best Erick On Wed, May 6, 2009 at 4:50 AM, Laura Hollink lau...@cs.vu.nl wrote: Hi, I am trying to distinguish between

Re: Searching for partial matches

2009-05-04 Thread Erick Erickson
Why are you using MultiPhraseQuery? It appears (warning, I haven't really used it) to be designed to handle *phrases*. You're problem statement isn't looking at phrases at all, just a wildcard single terms. And you're supposed to call the first MPQ.add with, say, the first word of the *phrase*,

Re: Searching for partial matches

2009-05-04 Thread Erick Erickson
with '*' (e.g. * phrase *), so I tryed MultiPhraseQuery instead. Forgive me if I am too newbie, 10 days ago I didn't know this tool existed... Erick Erickson wrote: Why are you using MultiPhraseQuery? It appears (warning, I haven't really used it) to be designed to handle *phrases*. You're

Re: Searching for partial matches

2009-05-04 Thread Erick Erickson
RegexQuery that appears in the API documentation but doesn't exist in the lucene-core-2.4.1.jar? I think that class would be very useful for my problem... Thank you so much!! Erick Erickson wrote: the guys really helped me understand the issues with wildcards, it's harder than you think G

Re: multi-field index and search (Not MultiFieldQuery). Help setting up index and search

2009-05-04 Thread Erick Erickson
H, tricky. Let's see if I understand your problem. Basically, you have a bunch of HSTs that have had some number of items arbitrarily assigned to them, and you want to see if you can make Lucene behave as a kind of expert system to help you classify the next item. I *think* you'd get better

Re: multi-field index and search (Not MultiFieldQuery). Help setting up index and search

2009-05-04 Thread Erick Erickson
everyone's help Christian On Mon, May 4, 2009 at 11:40 AM, Erick Erickson erickerick...@gmail.com wrote: H, tricky. Let's see if I understand your problem. Basically, you have a bunch of HSTs that have had some number of items arbitrarily assigned to them, and you want to see if you

Re: MultiFieldQueryParser - using a different analyzer per field...

2009-05-01 Thread Erick Erickson
This looks like a job for PerFieldAnalyzerWrapper, no MultiFieldQueryparser required Best Erick On Fri, May 1, 2009 at 3:33 PM, theDude_2 aornst...@webmd.net wrote: Hello fellow Lucene developers! I have a bit of a question - and I can't find the answer in my lucene book Im

Re: MultiFieldQueryParser - using a different analyzer per field...

2009-05-01 Thread Erick Erickson
- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 01, 2009 11:42 PM To: java-user@lucene.apache.org Subject: Re: MultiFieldQueryParser - using a different analyzer per field... This looks like a job for PerFieldAnalyzerWrapper, no MultiFieldQueryparser required

Re: Indexing becomes slow with time

2009-04-30 Thread Erick Erickson
This is surprising behavior, which is another way of saying that, given what you've said so far, this shouldn't be happening. I'd really look at system metrics, like whether you're swapping etc. In particular you might want to try varying how big you allow your memory footprint to grow before you

Re: How to het the score in percentage

2009-04-29 Thread Erick Erickson
Would a TopDocCollector work for you? You can get a TopDoc object from that collector, from which you can get the max score. That, along with the score provided for each doc should give you a percentage. Best Erick On Wed, Apr 29, 2009 at 5:30 AM, joseph.christopher jos...@kottsoftware.com

Re: Search result ordering

2009-04-29 Thread Erick Erickson
People (including me) use Lucene to page through results all the time, so I'm pretty sure you're OK. so here's my answers... (1) yes. (2) Well, the default sort is by score so if you want some other ordering you have to sort. (3) You can boost things at index time, but I don't think that's

Re: Search result ordering

2009-04-29 Thread Erick Erickson
of a difference when paging through hits 1-10 vs. hits 300-310. They all seem to take about the same time to evaluate. I'll try using one of the HitCollectors as you suggest to see if it makes a difference. regards, -- Bill Chesky -Original Message- From: Erick Erickson

Re: NOT_ANALYZED field

2009-04-28 Thread Erick Erickson
Well, you haven't shown us your program, so it's hard to tellG But my first uninformed guess would be that the case of your search doesn't exactly match the case you indexed when you add letters to your IDs. We need to see the search code particularly, including the analyzers you use (a

Re: Getting values with low scores

2009-04-27 Thread Erick Erickson
Well, you can always implement your own HitCollector and just take the end of the list. But perhaps a fuller explanation of why you need to do this would lead to a better answer Best Erick On Sun, Apr 26, 2009 at 11:41 PM, samd sdoyl...@yahoo.com wrote: I have 2500 documents and need to

Re: Getting values with low scores

2009-04-27 Thread Erick Erickson
about ranking pieces, it's about all no matter what the rank should be available. Erick Erickson wrote: Well, you can always implement your own HitCollector and just take the end of the list. But perhaps a fuller explanation of why you need to do this would lead to a better answer

Re: How to search special characters in LUcene

2009-04-24 Thread Erick Erickson
. From that i'm able to do this kind of reaserch work. Please help me in this. Erick Erickson wrote: OK, this is a much different problem than you were originally asking about, effectively how to index/search mixed language documents. This topic has been discussed multiple times

Re: How to search special characters in LUcene

2009-04-23 Thread Erick Erickson
specifikation - aftaleseddel nr. 12.]]/com:Note im searching the word like rådgiver . When i see the result it is clearly searching for r dgiver. It is omitting the danish element. Please help me in this. Erick Erickson wrote: Are you *also* using the DutchAnalyzer for your *query

Re: Appropriate analyzer

2009-04-22 Thread Erick Erickson
*If* your terms are simple (that is, not wildcarded), you may get some joy from TermEnum. The idea here would be to find the longest term *already in your index* that satisfies your need and use that to form a simple TermQuery Essentially using TernEnum.skipTo on successively shorter strings

Re: How to search special characters in LUcene

2009-04-22 Thread Erick Erickson
to identify. Please tell me how to use DutchAnalzer in my application. Sample example or series of steps helps me. I also attached my index file(.java file). Please help me in this. please.. Erick Erickson wrote: Take a look at DutchAnalyzer. The problem you'll have is if you're indexing

Re: How to search special characters in LUcene

2009-04-21 Thread Erick Erickson
Take a look at DutchAnalyzer. The problem you'll have is if you're indexing this document along with a bunch of documents from other languages. You could search the mail archive for extensive discussions of indexing/ searching documents from several languages. Best Erick On Tue, Apr 21, 2009 at

Re: IndexWriter update method

2009-04-20 Thread Erick Erickson
to correctly find this. Thanks, Billy -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, April 17, 2009 8:08 PM To: java-user@lucene.apache.org Subject: Re: IndexWriter update method What you're missing is that the example has no unique ID, it wasn't

Re: Indexing Complex XML

2009-04-18 Thread Erick Erickson
Lucene is an *engine*, not an application. *You* have to process the XML, decide what the structure of your index is and index the data. There are many XML parser options, this is just straight Java code. You'll decide what's relevant, and add the contents of the relevant elements to a Lucene

Re: Query scoring

2009-04-17 Thread Erick Erickson
2009/4/16 Erick Erickson erickerick...@gmail.com Hmmm, try query.toString() and/or query.explain(). Also, try using Luke to see what is actually in the document. BTW, what analyzer did you use in Luke? Luke also has an explain (tab?) that will show you what Luke does, which may

Re: IndexWriter update method

2009-04-17 Thread Erick Erickson
What you're missing is that the example has no unique ID, it wasn't created with update in mind. There's no hidden magic for Lucene knowing *what* document you want to have updated, you have to provide it yourself, and it should be unique. Imagine a parts catalog, or an index of a directory

Re: Query scoring

2009-04-16 Thread Erick Erickson
Hmmm, try query.toString() and/or query.explain(). Also, try using Luke to see what is actually in the document. BTW, what analyzer did you use in Luke? Luke also has an explain (tab?) that will show you what Luke does, which may be useful. The default operator should be OR, but looking at the

Re: Best way for paging with TopDocs class?

2009-04-16 Thread Erick Erickson
Well, under the covers, the old Hits object *was* reloading the first N pages to get page N + 1, you just didn't see it. Hits also had other, undesirable behaviors. But loading docs N-1 times it's not as expensive as you perhaps fear. To get a sorted list, you must sort the entire set of

Re: Lucene searching across documents

2009-04-11 Thread Erick Erickson
http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of

Re: Sequential match query

2009-04-11 Thread Erick Erickson
Wildcard queries are not lowercased, so depending upon how you're indexing, that may be tripping you up. See http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a Best Erick On Fri, Apr 10, 2009 at 2:56 PM, John Seer pulsph...@yahoo.com wrote: Hello, I

Re: Sequential match query

2009-04-11 Thread Erick Erickson
That'll teach me to scan a post. The link I sent you is still relevant, but wildcards are NOT intended to be used to concatenate terms. You want a phrase query or a span query for that. i.e. A C F~# where # is the slop, that is, the number of other terms allowed to appear between your desired

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Erick Erickson
...@thetaphi.de -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, April 11, 2009 6:42 PM To: java-user@lucene.apache.org Subject: Re: RangeFilter performance problem using MultiReader OK, I scanned all the e-mails in this thread so I may

Re: Query any data

2009-04-09 Thread Erick Erickson
searching for fieldname:* will be *extremely* expensive as it will, by default, build a giant OR clause consisting of every term in the field. You'll throw MaxClauses exceptions right and left. I'd follow Tim's thread lead first Best Erick 2009/4/8 王巍巍 ww.wang...@gmail.com first you should

Re: How to customize score according to field value?

2009-04-07 Thread Erick Erickson
Do you want the dates to *influence* or *determine* the order? I don't have much help if what you're after is something like docs that are more recent tend to rank higher, although I vaguely remember this question coming up on the user list, maybe a search of the archive would turn something

Re: Multiple Analyzer on Single field

2009-04-07 Thread Erick Erickson
properly so that search become better. Regards, Allahbaksh -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, April 06, 2009 9:31 PM To: java-user@lucene.apache.org Subject: Re: Multiple Analyzer on Single field This really doesn't make sense

Re: How to search a phrase using quotes in a query ???

2009-04-07 Thread Erick Erickson
the documents that have the exact phrase the bank of america. Could you help me please ??? Regards Ariel On Mon, Apr 6, 2009 at 5:26 PM, Erick Erickson erickerick...@gmail.com wrote: If you have luke, you should be able to submit your query and use the explain functionality to gain some insights

Re: How to search a phrase using quotes in a query ???

2009-04-06 Thread Erick Erickson
We really need some more data. First, I *strongly* recommend you get a copy of Luke and examine your index to see what is *actually* there. Google lucene luke. That often answers many questions. Second, query.toString is your friend. For instance, if the query you provided below is all that

Re: How to search a phrase using quotes in a query ???

2009-04-06 Thread Erick Erickson
fine. The field where I am searching is the content field. I am using the same analyzer in query and indexing time: SnowBall English Analyzer. I am going to submit later the snippet code. Regards Ariel On Mon, Apr 6, 2009 at 4:37 PM, Erick Erickson erickerick...@gmail.com wrote: We

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
How much memory are you allocating for the JVM? And what are your various indexwriter settings (e.g. MaxBufferedDocs, MaxMergeDocs, etc). Have you tried different settings in setRamBufferSizeMB? Best Erick On Fri, Apr 3, 2009 at 7:13 AM, John Byrne john.by...@propylon.com wrote: Hi, I'm

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
on ly happened in a production environment that I can't mess with. I am planning to try reproducing it locally soon, but it takes quite a while before it happens. -John Erick Erickson wrote: How much memory are you allocating for the JVM? And what are your various indexwriter settings (e.g

Re: HeapedScorerDoc using all my memory

2009-04-03 Thread Erick Erickson
. Thanks for the ideas anyway - I know I really need to come up with some more info on the problem, so I think the next thing I'll do it try to reproduce it locally. -John Erick Erickson wrote: H, that's odd. how many is a large number of documents? And what is your index size when

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson
: Erick Erickson erickerick...@gmail.com To: java-user@lucene.apache.org Sent: Wednesday, April 1, 2009 6:51:13 PM Subject: Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces Think about putting this query in Luke and doing an explain for details, but I'm

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson
default Max Clause is 1024, is there any reason behind this max? Thanks, M From: Erick Erickson erickerick...@gmail.com To: java-user@lucene.apache.org Sent: Thursday, April 2, 2009 2:34:47 PM Subject: Re: Search using MultiSearcher generates OOM on a 1GB

Re: Speed of fuzzy searches

2009-04-02 Thread Erick Erickson
This seems really odd, especially with an index that size. The first question is usually Do you open an IndexReader for each query? If you do, be aware that opening a reader/searcher is expensive, and the first few queries through the system are slow as the caches are built up. The second

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-01 Thread Erick Erickson
Think about putting this query in Luke and doing an explain for details, but I'm surprised this is working at all without throwing TooManyClauses errors. Under the covers, Lucene expands your wildcards to all terms in the field that match. For instance, assume your document field has the

Re: Empty SinkTokenizer

2009-03-28 Thread Erick Erickson
What kind of failures do you get? And I'm confused by the code. Are you creating a new IndexWriter every time? Do you ever close it? It'd help to see the surrounding code... Best Erick On Sat, Mar 28, 2009 at 1:36 PM, Raymond Balmès raymond.bal...@gmail.comwrote: Hi guys, I'm using a

Re: Syncing lucene index with a database

2009-03-27 Thread Erick Erickson
Yes, updating a document in Lucene is expensive for two reasons: 1 deleting and adding a document does mean there's internal work being done. But it's not all *that* expensive. So this really comes down to how many records you expect to update every 15 minutes. You've gotta try it. 2

Re: Syncing lucene index with a database

2009-03-26 Thread Erick Erickson
You've got a great grasp of the issues, comments below. But before you do, a lot of this kind if thing is incorporated in SOLR, which is build on Lucene. Particularly updating an index then using it. So you might take a look over there. It even has a DataImportHandler... WARNING: I've only been

Re: i18n numbers

2009-03-26 Thread Erick Erickson
What does the front end look like? Is it a web page or a custom app? And do you expect your users to actually enter the field name? I'd be reluctant to allow any but the geekiest of users to enter the Lucene syntax (i.e. the field names). Users shouldn't know anything about the underlying

Re: query doc boost difference

2009-03-25 Thread Erick Erickson
Could you provide more information about what you expect and what you are seeing? As well as an example of what you've tried? Just saying it didn't work doesn't give us much to go on Best Erick On Wed, Mar 25, 2009 at 5:02 AM, m.harig m.ha...@gmail.com wrote: Hello all Can anyone

Re: How to know the matched field?

2009-03-22 Thread Erick Erickson
Try searching the mail archives, the searchable archive is linked to off the Wiki. This topic has been discussed multiple times but I forget the solutions... Best Erick On Sun, Mar 22, 2009 at 4:30 PM, Paul Libbrecht p...@activemath.org wrote: Hello list, in an auto-completion task, I would

Re: boosting query

2009-03-19 Thread Erick Erickson
This might help you understand Lucene scoring better... http://lucene.apache.org/java/2_4_1/scoring.html The number of occurrences is not the sole determinant of a document's score and boosting won't change that. But I have to ask why counting words is important to you. What problem are you

Re: ParseException: Cannot parse.... too many boolean clauses

2009-03-16 Thread Erick Erickson
What's the query? Wildcard or did you just construct a huge number of clauses? You can always bump the allowed, see BooleanQuery.setMaxClauseCount() Best Erick On Mon, Mar 16, 2009 at 6:52 AM, liat oren oren.l...@gmail.com wrote: Hi, I try to search a long query and get the following erroe:

Re: how to index keyword and value

2009-03-15 Thread Erick Erickson
Have you tried working through the getting started guide at http://lucene.apache.org/java/2_4_1/gettingstarted.html? That should give you a good idea of how to create a document in Lucene. Best Erick On Sun, Mar 15, 2009 at 8:49 AM, Seid Mohammed seidy...@gmail.com wrote: that is exactly my

Re: Pagination with MultiSearcher

2009-03-15 Thread Erick Erickson
You could do something with FieldSortedHitQueue as a post-search sort, but I wonder if this would work for you... public TopFieldDocs http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html *search*(Query

Re: Pagination with MultiSearcher

2009-03-15 Thread Erick Erickson
it can be applied to the search method exposed by MultiSearcher. Would it be possible to clarify abit more or even point to some reference documentation? Cheers Amin On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson erickerick...@gmail.com wrote: You could do something with FieldSortedHitQueue

Re: How to search both Tokenized and Untokenized fields

2009-03-11 Thread Erick Erickson
to parse the query and take out the fields and assign their specific analyzer to them. Rokham Erick Erickson wrote: PerFieldAnalyzerWrapper is your friend, assuming that you have separate fields, some tokenized and some not. If you *don't* have separate fields, then we need more details

Re: Questions about analyzer

2009-03-10 Thread Erick Erickson
- From: Erick Erickson erickerick...@gmail.com To: java-user@lucene.apache.org Sent: Friday, March 06, 2009 6:47 PM Subject: Re: Questions about analyzer See below On Fri, Mar 6, 2009 at 1:44 AM, Ganesh emailg...@yahoo.co.in wrote: Hello all 1) Which is best to use Snowball analyzer

Re: index large size file

2009-03-10 Thread Erick Erickson
Sure there are other options. You could decide to index in chunks rather then entire documents. You could decide many things. None of which we can recommend unless we have a clue what you're really trying to accomplish or whether you're encountering a specific problem. I can say that we've

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Erick Erickson
You have my sympathy. Let's see, you're being told we can't give you the tools you need to diagnose/fix the problem, but fix it anyway. Probably with the addendum And fix it by Friday. You might want to consider staging a mutiny until the powers that be can give you a solution. Perhaps working

Re: Using Lucene for user query parsing

2009-03-09 Thread Erick Erickson
to figure out as to whether Lucene is suited for this kind of application. Once again thanks for all the inputs. On Fri, Mar 6, 2009 at 7:15 PM, Erick Erickson erickerick...@gmail.com wrote: Whatever you do will be wrong G. What you're saying is that you have structured data that the user wants

<    4   5   6   7   8   9   10   11   12   13   >