Re: Lucene searching class

2007-10-25 Thread Steven Rowe
Hi Pooja, poojasreejith wrote: I am using lucene2.2.0 for my application. I have a searcher.java class. The problem I am facing is, it is not supporting Query query = QueryParser.parse(q, contents,new StandardAnalyzer()); it shows error; the method parse in the type QueryParser is not

Re: Corpus interpretation

2007-10-24 Thread Steven Rowe
Hi Liaqat, Liaqat Ali wrote: I want to index the Urdu language corpus (200 documents in CES XML DTD format). Is net necessary to break the XML file into 200 different files or it can be indexed in the original form using Lucene. Kindly guide in this regard. A Lucene document is composed of

Re: Is there bug in CJKAnalyzer?

2007-10-23 Thread Steven Rowe
Hi Ivan, Ivan Vasilev wrote: But how to understand the meaning of this: “To overcome this, you have to index chinese characters as single tokens (this will increase recall, but decrease precision).” I understand it so: To increase the results I have to use instead of the Chinese another

Re: Questions Lucene

2007-09-11 Thread Steven Rowe
Hi Durga, I have moved this discussion to the java-user list, since the java-dev list is devoted to development of the Java Lucene library, and not to questions about its capabilities. My answers are inline below. [EMAIL PROTECTED] wrote: 1) What are the various languages supported by

Re: Look for strange encodings -- tokenization

2007-09-05 Thread Steven Rowe
poeta simbolista wrote: I'd want to know the best way to look for strange encodings on a Lucene index. i have several inputs where input can have been encoded on different sets. I not always know if my guess about the encoding has been ok. Hence, I'd thought of querying the index for some

Re: Lucene indexing for pdf files

2007-08-31 Thread Steven Rowe
Hi Madhu, Madhu wrote: i am indexing pdf document using pdfbox 7.4, its working fine for some pdf files. for japanese pdf files its giving the below exception. caught a class java.io.IOException with message: Unknown encoding for 'UniJIS-UCS2-H' Can any one help me , how to set the

Re: Postal Code Radius Search

2007-08-29 Thread Steven Rowe
Mike wrote: I've searched the mailing list archives, the web, read the FAQ, etc and I don't see anything relevant so here it goes… I'm trying to implement a radius based searching based on zip/postal codes. Here is a selection of interesting threads from the Lucene ML with relevant info:

Re: performance on filtering against thousands of different publications

2007-08-14 Thread Steven Rowe
Hi Cedric, Cedric Ho wrote: On 8/13/07, Erick Erickson [EMAIL PROTECTED] wrote: Are you iterating through a Hits object that has more than 100 (maybe it's 200 now) entries? Are you loading each document that satisfies the query? Etc. Etc. Unfortunately, yes. And I know this is another big

Re: Lucene in large database contexts

2007-08-10 Thread Steven Rowe
Hi Antonello, Antonello Provenzano wrote: I've been working for a while on the implementation of a website oriented to contents that would contain millions of entries, most of them indexable (such as descriptions, texts, names, etc.). The ideal solution to make them searchable would be to use

Re: multiple field searcher

2007-08-03 Thread Steven Rowe
qaz zaq wrote: I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm. I want to search the match documents across F1 to Fm fields,i.e., all of the T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2, ... Tn field. I check the

Re: Search that supports all valid characters in a Unix filename

2007-07-09 Thread Steven Rowe
Hi Ed, Ed Murray wrote: Could someone let me know the best Analyzer to use to get an exact match on a Unix filename when it is inserted into an untokened field. Filenames obviously contain spaces and forward slashes along with other characters. I am using a WhitespaceAnalyzer but when

Re: Rewrite one phrase to another in search query

2007-06-27 Thread Steven Rowe
Hi Aliaksandr, Aliaksandr Radzivanovich wrote: What if I need to search for synonyms, but synonyms can be expanded to phrases of several words? For example, user enters query tcp, then my application should also find documents containing phrase Transmission Control Protocol. And conversely,

Re: JavaCC Download

2007-06-26 Thread Steven Rowe
[EMAIL PROTECTED]: Hi Steven. When i access to this address, this message appread Forbidden You don't have permission to access /servlets/ProjectHome on this server. What's the problem? Thakns. Steven Rowe wrote: Mahdi Rahimi wrote: Hi. How can I access JavaCC?? Thanks

Re: Porter stemming problem

2007-06-22 Thread Steven Rowe
Hi Rob, Robert Walpole wrote: At the moment I am attempting to do this as follows... analyzer = new PorterStemAnalyzer(); parser = new QueryParser(content, analyzer); Query query = parser.parse(keywords: relaxing); Hits hits = idxSearcher.search(query); ...but this is not returning any

Re: JavaCC Download

2007-06-21 Thread Steven Rowe
Mahdi Rahimi wrote: Hi. How can I access JavaCC?? Thanks https://javacc.dev.java.net/ -- Steve Rowe Center for Natural Language Processing http://www.cnlp.org/tech/lucene.asp - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: Facet searching on single field with multiple words value

2007-06-21 Thread Steven Rowe
Hi Sawan, Sawan Sharma wrote: Now, The problem occured when I passed the multiple words in term query. e.g.code QueryFilter filter = new QueryFilter(new TermQuery(new Term(FieldName, FieldValue))); code where field name and field value dynamically getting. here we take the example value.

Re: how to search the fields in SimpleAnalyzer

2007-06-19 Thread Steven Rowe
Hi Sebastin, Sebastin wrote: i index my document using SimpleAnalyzer() when i search the Indexed field in the searcher class it doesnt give me the results.help me to sort out this issue. My Code: test=9840836598 test1=bch01 testRecords=(test+ +test1);

Re: Position of matches to affect scoring

2007-06-19 Thread Steven Rowe
Hi Jes, Jesse Prabawa wrote: The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ mentions that the position of the matches in the text does not affect scoring. So is there anyway that I can make the position of the matches affect scoring? For example, I want matches that occur at

Re: negative queries

2007-06-18 Thread Steven Rowe
Hi Daniel, Daniel Noll wrote: On Saturday 16 June 2007 11:39:35 Chris Hostetter wrote: : The mailing list has already answered this question dozens of times. : I've been wondering lately, does this list have a FAQ? If so, is this : question on it? The wiki is open to editing by all.

Re: negative queries

2007-06-15 Thread Steven Rowe
Daniel Noll wrote: On Friday 15 June 2007 11:07:25 Antony Sequeira wrote: Hi I am aware that with Lucene I can not do negative only queries such as -foo:bar The mailing list has already answered this question dozens of times. I've been wondering lately, does this list have a FAQ? If

Re: QueryParser stripping special char

2007-06-12 Thread Steven Rowe
Hi Harini, Harini Raghavan wrote: I am trying to create a lucene query to search for companies based on areacode. The phone no. is stored in the lucene index in the form of '415-567-2323'. I need to create a query like +areaCode:415-. But the QueryParser is stripping off the hyphen(-).

Re: How can I search over all documents NOT in a certain subset?

2007-06-08 Thread Steven Rowe
Bowesman [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 06, 2007 11:36 PM To: java-user@lucene.apache.org Subject: Re: How can I search over all documents NOT in a certain subset? Steven Rowe wrote: Conceptually (caveat: untested), you could: 1. Extend Filter[1] (call it DejaVuFilter

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Steven Rowe
Hi Tim, Tim Smith wrote: How can I restore the behavior of the old WildcardQuery under 2.1? I badly need 'cat???' to match 'cat' again just like in the older versions. The behavior you want was last sighted in Java Lucene four releases ago (v1.4.3). See Doug Cutting's response to a similar

Re: Maintain a backup index

2007-06-05 Thread Steven Rowe
Hi Divya, The Lucene library itself provides no support for backup. You might be interested in the Solr project[1], which extends Lucene, and which automates index replication. From the Solr Introduction / Features page[2]: Replication * Efficient distribution of index parts that have

Re: How can I search over all documents NOT in a certain subset?

2007-06-05 Thread Steven Rowe
Hi Hilton, Hilton Campbell wrote: Hello all, In my application I want to perform a search over all the documents that are NOT in a certain subset, and I'm not sure how I should do it. Specifically, the application performs a search and the top N results are shown to the user. The user

Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-29 Thread Steven Rowe
Hi Mohammad, Mohammad Norouzi wrote: [Hoss wrote:] ...are there Persian characters with a category type of SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR ? How can I know that? The Unicode standard's codes[1] for these are: SPACE SEPARATOR: Zs LINE SEPARATOR: Zl

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Steven Rowe
Hi Michael, Michael Böckling wrote: Hi folks! The topic says it all: I want to modify the StandardAnalyzer so that it also splits words after punctuation characters (.,: etc.) that are NOT followed by a whitespace character, in addition to punctuation characters that ARE followed by

Re: KeywordAnalyzer vs. Field.Index.UN_TOKENIZED

2007-05-24 Thread Steven Rowe
Hi Terry, The one place I know where KeywordAnalyzer is definitely useful is when it is used in conjunction with PerFieldAnalyzerWrapper. Steve dontspamterry wrote: Hi Otis, I tried both ways, did some queries, and results are the same, so I guess it's a matter of preference??? -Terry

WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-23 Thread Steven Rowe
change I've made is not to ignoring unicode characters in Persian and arabic language, because with original WhitespaceAnalyzer it didnt work fine whether it ignore or something else, I dont know but I extends my classes and now I am using my analyzer to index. On 5/22/07, Steven Rowe [EMAIL

Re: regaridng Reader.terms()

2007-05-22 Thread Steven Rowe
Hi Mohammad, May I ask what your language is? And what kind of changes to WhitespaceAnalyzer were required to make it work with your language? If you have made modifications to WhitespaceAnalyzer that are generally useful, please consider contributing your changes back to the Lucene project.

Re: documents with large numbers of fields

2007-05-21 Thread Steven Rowe
Mike Klaas wrote: On 18-May-07, at 1:01 PM, charlie w wrote: Is there an upper limit on the number of fields comprising a document, and if so what is it? There is not. They are relatively costless if omitNorms=False Mike, I think you meant relatively costless if omitNorms=True. Steve

Re: Concept Search

2007-05-16 Thread Steven Rowe
Hi Charles, The need presented by your use case sounds very similar to that served by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's excellent book Lucene in Action - take a look: http://lucenebook.com/ Steve Charles Patridge wrote: I have looked around on Lucene web

Re: Concept Search

2007-05-16 Thread Steven Rowe
$ whenever you encountered any of the items in your list, then when concept searching is called for, search on WildAnimals$. Highlighting might be tricky, but certainly do-able, especially with the capabilities of a MemoryIndex.. Erick On 5/16/07, Steven Rowe [EMAIL PROTECTED] wrote: Hi

Re: Multi-field distinct query

2007-05-16 Thread Steven Rowe
due to requirements and the fact that we were having memory issues for cases where a parent had an extremely large number of children (~200,000). -Terry Steven Rowe wrote: Hi Terry, Why not have another index in which a document has one field for the parent and another field containing

Re: Indexing the ORACLE using lucene

2007-05-11 Thread Steven Rowe
Krishna Prasad Mekala wrote: I have to create the index from my Oracle database. Can anybody tell me how to create the index from Oracle using lucene? Check out Marcelo Ochoa's Oracle/Lucene integration: http://issues.apache.org/jira/browse/LUCENE-724 Steve

Re: Extracting a subset of an index

2007-04-03 Thread Steven Rowe
Karl Wettin's code to facilitate index copying may be useful (the below link is to a post of Karl's to the java-dev mailing list): http://www.nabble.com/Resolving-term-vector-even-when-not-stored--t3412160.html Steve Erick Erickson wrote: In the immortal words of Erik H. ...it depends...

Re: how to search over another search

2007-03-27 Thread Steven Rowe
Mohammad Norouzi wrote: Steven, what this means: Each index added must have the same number of documents, but typically each contains different fields. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are

Re: how to search over another search

2007-03-26 Thread Steven Rowe
Hi Mohammad, Have you looked at MultiSearcher? http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MultiSearcher.html Section 5.6 of Lucene in Action covers its use. Steve Mohammad Norouzi wrote: hi I have two separated index but there are some fields that are common between

Re: how to search over another search

2007-03-26 Thread Steven Rowe
to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior. - Steve Steven Rowe wrote: Hi Mohammad, Have you looked at MultiSearcher? http://lucene.apache.org/java/docs/api/org/apache/lucene/search

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
I think ParallelReader, first released in Lucene-Java 1.9, should meet your needs: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/ParallelReader.html - An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
Hi Chris, Chris Lu wrote: Hi, Steven, Thanks for the instant reply! But let's see the warning in the ParallelReader javadoc: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents

Re: pdf box help

2007-03-12 Thread Steven Rowe
This may help: http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration ashwin kumar wrote: hi all i am able to convert a pdf in to a text file using pdfbox. and this is the code that i used import org.pdfbox.pdfparser.PDFParser; import org.pdfbox.pdmodel.PDDocument; import

Re: search for phrase with specail chars?

2007-03-12 Thread Steven Rowe
Hi Ruchika, Are there are any quote characters in your index (may the Luke be with you[1])? If not, you could just remove all quotes from your query (except the surrounding ones indicating phrase matching, of course), and things will work, as you have indicated. Which version of Lucene are you

Re: Indexing search?

2007-03-06 Thread Steven Rowe
Hi senthil, senthil kumaran wrote: I've indexed 4 among 5 fields with Field.Store.YES Field.Index.NO. And indexed the remaining one, say it's Field Name is *content*, with Field.Store.YES Field.Index.Tokenized(It's value is collective value of other 4 fields and some more values).So my

Re: Why this query is not correct?

2007-01-30 Thread Steven Rowe
Check out QueryParser.setAllowLeadingWildcard(): http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#setAllowLeadingWildcard(boolean) (though AFAICT this feature is not in any released version of Lucene yet - you'll have to use a nightly build). poeta simbolista

Re: Highlighting brackets bug ?

2007-01-15 Thread Steven Rowe
heikki doeleman wrote: One question though .. is there an easy way to download the sources from the svn repository, in one go ? I did it now by right-clicking links to files The Source Code section of the Lucene Java Developer Resources page

Re: hithighlighter bug

2007-01-10 Thread Steven Rowe
Jason wrote: Hi all, I have come across what I think is a curious but insidious bug with the java lucene hit highlighter. [...] when I search for - Acquisition Plan - in my search results I get: summary(ancilliary stuff deleted) attached to the emAcquisition/em emPlan/emand

Re: Getting a Better Understanding of Lucene's Search Operators

2007-01-10 Thread Steven Rowe
Walt Stoneburner wrote: Do I have correct and complete understanding of the two operators? Not entirely complete :) - more information in the October 2006 thread QueryParser is Badly Broken: http://www.gossamer-threads.com/lists/lucene/java-user/40945

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Sdeck, sdeck wrote: The query for collecting a specific actor is around 200-300 milliseconds, and the movie one, that actually queries each actor, takes roughly 500-700 milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes 500 milliseconds*# of movies I'm having

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
/store/RAMDirectory.html? Hope it helps, Steve Steven Rowe wrote: Hi Sdeck, sdeck wrote: The query for collecting a specific actor is around 200-300 milliseconds, and the movie one, that actually queries each actor, takes roughly 500-700 milliseconds. Yet, for a genre, where you may have

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Scott, sdeck wrote: I guess, any ideas why I would run out of heap memory by combining all of those boolean queries together and then running the query? What is happening in the background that would make that occur? Is it storing something in memory, like all of the common terms or

Re: Nested Queries

2006-12-28 Thread Steven Rowe
Hi Kapil, Kapil Chhabra wrote: Hi Steve, Thanks for the response. Actually I am not looking for a query language. My question is, whether Lucene supports Nested Queries or self joins? As per http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html In BNF, the

Re: Lucene id generation

2006-12-19 Thread Steven Rowe
Antonio Bruno wrote: To use but directly the docId would render efficient and fastest the searches much. Thoughts to the possibility of being able to apply a first CachingWrapperFilter F1 on an index and a second CachingWrapperFilter F2 on an other index and after to make (F1 AND F2) and to

Re: Lucene change field values to wrong ones when indexing

2006-12-14 Thread Steven Rowe
Hi Adrian, I don't see anything obviously wrong with your code. Can you give more details about which field values are different from what you expect? I'm guessing it's the id field you're worried about, but it's not clear from what you have written whether it's the title or the id field which

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: The coord(q,d) normalisation is a score factor based on how many of the query terms are found in the specified document. and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: Is there any other paper that actually shows the benefit of doing this particular normalisation with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence how the idea developed. I think it's a mischaracterization to call coordination a

Re: Using Lucene to search log files

2006-12-11 Thread Steven Rowe
abdul aleem wrote: How to actually retrieve the content of search, Most of the examples in Lucene in Action Searcher gives the results found in number of documents but i coudln't find an API to retrieve the line or paragraph where the search is matched Hi Abdul, I don't know what

Re: is there any way to find unique records ?

2006-11-21 Thread Steven Rowe
Bhavin, Mark Harwood gives a solution that looks almost exactly like what you want: http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html Steve Chris Hostetter wrote: serach the archives for faceted searching and category counts and you should find lots of discussions on

Re: Limiting QueryParser

2006-11-21 Thread Steven Rowe
static String QueryParser.escape(String) should do the trick: http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String) Look at the bottom of the below-linked page for the list of characters that the above method will escape:

Re: Lucene 2.0.1 release date

2006-10-26 Thread Steven Rowe
George Aroush wrote: From your email, I take it that even for the Java folks, they can't accumulate the list of files that make up 2.0.1. Am I right? There has never been and likely will never be a 2.0.1 release. 2.0.1, 2.1 -- these are labels for *potential* future releases. 2.1 is much

Re: Looking for a stemmer that can return all inflected forms

2006-10-16 Thread Steven Rowe
Hi Jong, Jong Kim wrote: I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', and 'caring'. To

Re: Searching pdf, getting page number

2006-10-16 Thread Steven Rowe
Hi Bill, Bill Taylor wrote: On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: I know that I can index pdf-files (using a third-party library). Could you please tell me where to find this library? There are several PDF extraction packages listed here (look under the Lucene Document

Re: Performing a like query

2006-10-09 Thread Steven Rowe
Hi Rahil, Rahil wrote: I was just wondering whether there is a difference between the regular expression you sent me i.e. (i) \s*(?:\b|(?=\S)(?=\s)|(?=\s)(?=\S))\s* and (ii) \\b as they lead to the same output. For example, the string search testing a-new string=3/4 results in

Re: Case sensitive / insensitive

2006-10-06 Thread Steven Rowe
Marcus Falck wrote: Any good approaches for allowing case sensitive and case insensitive searches? Except adding an additional field and skipping the LowerCaseFilter. Since this severely increases the index size (and the index already is around 1 TB). Hi Marcus, How about a filter that

Re: Performing a like query

2006-10-06 Thread Steven Rowe
Hi Rahil, Rahil wrote: I couldnt figure out a valid regular expression to write a valid Pattern.compile(String regex) which can tokenise a string into O/E - visual acuity R-eye=6/24 into O,/,E, -, visual, acuity, R, -, eye, =, 6, /, 24. The following regular expression should match

Re: 'categorized-term' web index

2006-09-28 Thread Steven Rowe
Vladimir Olenin wrote: - is there a place I can get already crawled internet web pages in an archive (10 - 100Gb of data) I don't the sizes of the corpora mentioned on Lucene Wiki's Resources page, but it's a good place to start:

Re: Lucene In Action Book vs Lucene 2.0

2006-09-27 Thread Steven Rowe
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_0_0/CHANGES.txt Otis Gospodnetic wrote: CHANGES.txt is your best source for that answer. KEGan [EMAIL PROTECTED] wrote: What about the internal of Lucene? Are there any major changes in there?

Re: Help wanted

2006-09-20 Thread Steven Rowe
The Resources page on the Lucene Wiki has a collection of articles that may be useful to you: http://wiki.apache.org/jakarta-lucene/Resources Michael McCandless wrote: Mark Miller wrote: I'll one up you: http://www.manning.com/hatcher2/ Might as well save yourself a whole lot of time and

Re: Versions

2006-09-18 Thread Steven Rowe
Hi Luis, Chris Hostetter wrote: Luis Rodrigo Aguado wrote: : I've been looking through the documentation in the official : web-site, and the Javadoc belongs to v2.1, that I could not find : anywhere, anyone has a clue about where to find it or when will it be : officially released?

Re: Documents that know more?

2006-08-29 Thread Steven Rowe
There has been a long-running thread on the java-dev list about how to allow application-specific extra stuff to be placed in the index, at multiple levels of granularity. Some of this conversation is captured on the Wiki at: http://wiki.apache.org/jakarta-lucene/FlexibleIndexing Maybe you

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Steven Rowe
As Jason says, you can structure each Lucene document with one Field per content type, and index all data that way. The database is not required. To address your search complexity concern, you can create queries that search only those Field(s) the user wants -- there is no need to have a Field

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-31 Thread Steven Rowe
Michael J. Prichard wrote: Hey Otis, Sure I would love to! Can you ping me at [EMAIL PROTECTED] and let me know what I need to do? Do I just post it to JIRA? Thanks, Michael Otis Gospodnetic wrote: A good place for that in JIRA. could you put it there? We have a bunch of

Re: Matching accented with non-accented characters

2006-07-25 Thread Steven Rowe
Rajan, Renuka wrote: I am trying to match accented characters with non-accented characters in French/Spanish and other Western European languages. The use case is that the users may type letters without accents in error and we still want to be able to retrieve valid matches. The one idea,

Re: Exact Match Searches and Stop Words

2006-06-20 Thread Steven Rowe
Hugh Ross wrote: The problem is that the standard analyzer removes the stop word (i.e. of) before indexing and searching. Is there an workaround for this? See my response to a similar question here: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200510.mbox/[EMAIL PROTECTED] In

Re: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Steven Rowe
Mordo, Aviran (EXP N-NANNATEK) wrote: What you are asking is not possible. The whole purpose of the analyzer is to tokenize the fields, so if you want them to be tokenized don't use the Keyword fields. Um, KeywordAnalyzer?

Re: lucene search sentence

2006-04-27 Thread Steven Rowe
Anton Feldmann wrote: 3) How do I display the sentence before and after the sentence the hit is in? You could: 1. Make your Lucene Document be a set of three sentences (before, searchable, after), which you store, but write a custom Analyzer which only returns tokens for the searchable

Re: exact match ..

2006-02-20 Thread Steven Rowe
Mufaddal Khumri wrote: lets say i do this while indexing: doc.add(Field.Text(categoryNames, categoryNames)); Now while searching categoryNames, I do a search for digital cameras. I only want to match the exact phrase digital cameras with documents who have exactly the phrase digital cameras

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread Steven Rowe
MALCOLM CLARK wrote: Could you send me the url for HighFreqTerms.java in cvs? ViewCVS URL: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java - To

Re: lucene and databases

2005-10-24 Thread Steven Rowe
Code and examples for embedding Lucene in HSQLDB and Derby relational databases: http://issues.apache.org/jira/browse/LUCENE-434 Rick Hillegas wrote: Thanks to Yonik for replying to my last question about queries and filters. Now I have another issue. I would appreciate any pointers to

Re: Is there a way to get absolutely exact phrase matching (no stop words, etc)

2005-10-24 Thread Steven Rowe
Hi Bob, StandardAnalyzer filters the token stream created by StandardTokenizer through StandardFilter, LowercaseFilter, and then StopFilter. Unless you supply a stoplist to the StandardAnalyzer constructor, you get the default set of English stopwords, from StopAnalyzer: public static

Re: Indexing and Hit Highlighting OCR Data

2005-06-06 Thread Steven Rowe
There is a proposal to extend indexing (item #11 in the API Changes section): http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard An excerpt: 11. (Hard) Make indexing more flexible, so that one could e.g., not store positions or even frequencies, or alternately, to store extra