Re: Is Analyzer used when calling IndexWriter.addIndexesNoOptimize()?

2012-12-05 Thread Jack Krupansky
, but if they don't produce the same token stream at both the token and character/byte level, your queries may fail. Rule #1 with Lucene and Solr - always be prepared to completely reindex your data, precisely because ideas about analysis evolve over time. -- Jack Krupansky -Original Message- From

Re: Using alternative scoring mechanism.

2012-12-02 Thread Jack Krupansky
/PerFieldSimilarityWrapper.html This appear to be the change from that original design: https://issues.apache.org/jira/browse/LUCENE-3749 -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Sunday, December 02, 2012 9:54 AM To: java-user Subject: Re: Using alternative scoring

Re: What is flexible indexing in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Jack Krupansky
analyzer. -- Jack Krupansky -Original Message- From: Johannes.Lichtenberger Sent: Friday, November 30, 2012 10:15 AM To: java-user@lucene.apache.org Cc: Michael McCandless Subject: Re: What is flexible indexing in Lucene 4.0 if it's not the ability to make new postings codecs? On 11/28/2012

Re: How does lucene handle the wildcard and fuzzy queries ?

2012-11-27 Thread Jack Krupansky
aren't appropriate on user lists. -- Jack Krupansky -Original Message- From: sri krishna Sent: Tuesday, November 27, 2012 12:36 PM To: java-user@lucene.apache.org Subject: How does lucene handle the wildcard and fuzzy queries ? How does lucene handle the prefix queries(wild card

Re: handling different scores related to queries

2012-11-27 Thread Jack Krupansky
calculations. Some of these complex queries are constant score for performance reasons. -- Jack Krupansky -Original Message- From: sri krishna Sent: Tuesday, November 27, 2012 12:38 PM To: java-user Subject: handling different scores related to queries for a search string hello*~ how

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Jack Krupansky
Add debugQuery=true to your query and look at the explain section to see how the scoring is calculated for each document. Sometimes it is counter-intuitive and some factors may differ but those differences can be overwhelmed by other, unrelated factors. -- Jack Krupansky -Original

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Jack Krupansky
#explain(org.apache.lucene.search.Query, int) -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, November 21, 2012 11:44 AM To: java-user@lucene.apache.org Subject: Re: Question about ordering rule of SpanNearQuery Add debugQuery=true to your query and look

Re: Which stemmer?

2012-11-21 Thread Jack Krupansky
quite well, without highlighting the actual indexed term, which can be quite ugly. -- Jack Krupansky -Original Message- From: Elmer van Chastelet Sent: Wednesday, November 21, 2012 8:49 AM To: java-user@lucene.apache.org Subject: Re: Which stemmer? I've just created a small web

Re: Line feed on windows

2012-11-20 Thread Jack Krupansky
into a single string, line delimiters and all. Be careful about encoding though. -- Jack Krupansky -Original Message- From: Mansour Al Akeel Sent: Tuesday, November 20, 2012 1:19 PM To: java-user Subject: Line feed on windows Hello all, We are indexing and storing files contents in lucene

Re: Question about ordering rule of SpanNearQuery

2012-11-19 Thread Jack Krupansky
Unfortunately, there doesn't appear to be any Javadoc that discusses what factors are used to score spans. For example, how to relate the number of times a span matches in a document vs. the exactness of each span match. -- Jack Krupansky -Original Message- From: 杨光 Sent: Monday

Re: how do re-get the doc after the doc was indexed ?

2012-11-17 Thread Jack Krupansky
Sounds like you want path to be a unique key field. So, just do a Lucene search with a TermQuery for the path, which should return one document. No need to mess with Lucene internal doc ids. -- Jack Krupansky -Original Message- From: wgggfiy Sent: Saturday, November 17, 2012 8:08 AM

Re: Which stemmer?

2012-11-15 Thread Jack Krupansky
, and this is independent of what gets returned for a stored field. The stem is simply the means to THAT end. The fact that dog and dogs are not equivalent in KStem is in fact disheartening, at least to me, but it may not be problematic in some use cases. -- Jack Krupansky -Original Message

Re: Which stemmer?

2012-11-14 Thread Jack Krupansky
gave a handful of examples that illustrated how some common words are stemmed. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Wednesday, November 14, 2012 10:55 AM To: java-user@lucene.apache.org Subject: Which stemmer? Does anyone have any experience with the stemmers

Re: Which stemmer?

2012-11-14 Thread Jack Krupansky
/lucene/analysis/en/EnglishMinimalStemFilterFactory.html http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html -- Jack Krupansky -Original Message- From: Scott Smith Sent: Wednesday, November 14, 2012 5:17 PM To: java-user

Re: content disappears in the index

2012-11-12 Thread Jack Krupansky
(or author_sorted or author_string) field that you copy the name to: copyField source=author dest=author_s / Query on author, but sort on author_s. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Monday, November 12, 2012 5:28 AM To: java-user Subject: Re: content

Re: using CharFilter to inject a space

2012-11-04 Thread Jack Krupansky
. Tell us the full problem and then we can focus on legitimate solutions. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Sunday, November 04, 2012 8:06 AM To: java-user Subject: Re: using CharFilter to inject a space Ahh, I don't know of a better way. I can imagine complex

Re: Running Solr Core/ Tika on Azure

2012-10-29 Thread Jack Krupansky
: http://wiki.apache.org/solr/ExtractingRequestHandler Whether the Azure distribution is full Solr including Solr Cell or not, I cannot answer. Note: For future reference, Solr questions should be asked on the solr-user mailing list. -- Jack Krupansky -Original Message- From: Aloke

Re: lucene 4.0 indexReader is changed

2012-10-26 Thread Jack Krupansky
How about DirectoryReader.html#openIfChanged? See: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader) -- Jack Krupansky -Original Message- From: Scott Smith Sent: Friday, October 26, 2012 7:54

Re: Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches ?

2012-10-25 Thread Jack Krupansky
to score them. -- Jack Krupansky -Original Message- From: Paul Taylor Sent: Thursday, October 25, 2012 7:11 AM To: java-user@lucene.apache.org Subject: Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches

Re: App supplied docID in lucene possible?

2012-10-25 Thread Jack Krupansky
Lucene document id. -- Jack Krupansky -Original Message- From: Ravikumar Govindarajan Sent: Thursday, October 25, 2012 6:10 AM To: java-user@lucene.apache.org Subject: App supplied docID in lucene possible? We have the need to re-index some fields in our application frequently. Our

Re: How to use/create an alias to a field?

2012-10-25 Thread Jack Krupansky
With edismax in Solr 3.6/4.0 field aliases are supported: The syntax for aliasing is f.myalias.qf=realfield. A user query for myalias:foo will be queried as realfield:foo. See: http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming -- Jack Krupansky -Original Message

Re: query for documents WITHOUT a field?

2012-10-25 Thread Jack Krupansky
OR allergies IS NULL would be OR (*:* -allergies:[* TO *]) in Lucene/Solr. -- Jack Krupansky -Original Message- From: Vitaly Funstein Sent: Thursday, October 25, 2012 8:25 PM To: java-user@lucene.apache.org Subject: Re: query for documents WITHOUT a field? Sorry for resurrecting

Re: query for documents WITHOUT a field?

2012-10-25 Thread Jack Krupansky
://issues.apache.org/jira/browse/LUCENE-4386 I think it is: new ConstantScoreQuery(new FieldValueFilter(fieldname, false)) Use a SHOULD of that rather than a second level of BooleanQuery. Let us know if it actually works! -- Jack Krupansky -Original Message- From: Vitaly Funstein

Re: StandardAnalyzer functionality change

2012-10-24 Thread Jack Krupansky
Annex #29. That is a standard. See: http://lucene.apache.org/core/4_0_0-ALPHA/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html -- Jack Krupansky

Re: StandardAnalyzer functionality change

2012-10-24 Thread Jack Krupansky
I didn't explicitly say it, but ClassicAnalyzer does do exactly what you want it to do - work break plus email and URL, or StandardAnalyzer plus email and URL. -- Jack Krupansky -Original Message- From: kiwi clive Sent: Wednesday, October 24, 2012 1:27 PM To: java-user

Re: StandardAnalyzer functionality change

2012-10-24 Thread Jack Krupansky
s/work break/word break/ -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, October 24, 2012 3:52 PM To: java-user@lucene.apache.org ; kiwi clive Subject: Re: StandardAnalyzer functionality change I didn't explicitly say it, but ClassicAnalyzer does do exactly

Re: understanding the need to reindex a document

2012-10-22 Thread Jack Krupansky
source for all non-stored fields, or at least non-stored fields which are not copied from another field which is stored. -- Jack Krupansky -Original Message- From: Shaya Potter Sent: Monday, October 22, 2012 3:47 PM To: java-user@lucene.apache.org Subject: Re: understanding the need

Re: Searching for a search string containing a literal slash doesn't work with QueryParser

2012-10-01 Thread Jack Krupansky
The scape merely assures that the slash will not be parsed as query syntax and will be passed directly to the analyzer, but the standard analyzer will in fact always remove it. Maybe you want the white space analyzer or keyword analyzer (no characters removed.) -- Jack Krupansky

Re: Searching for a search string containing a literal slash doesn't work with QueryParser

2012-10-01 Thread Jack Krupansky
That's The escape merely... -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, October 01, 2012 9:58 AM To: java-user@lucene.apache.org Subject: Re: Searching for a search string containing a literal slash doesn't work with QueryParser The scape merely assures

Re: Searching for a search string containing a literal slash doesn't work with QueryParser

2012-10-01 Thread Jack Krupansky
You can apply the lower case filter to the whitespace or other analyzer and use that as the analyzer. -- Jack Krupansky -Original Message- From: Jochen Hebbrecht Sent: Monday, October 01, 2012 10:34 AM To: java-user@lucene.apache.org Subject: Re: Searching for a search string

Re: Searching for a search string containing a literal slash doesn't work with QueryParser

2012-10-01 Thread Jack Krupansky
Sorry, I meant apply the filter to the TOKENIZER that the analyzer uses. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, October 01, 2012 10:44 AM To: java-user@lucene.apache.org Subject: Re: Searching for a search string containing a literal slash doesn't

Re: Using stop words with snowball analyzer and shingle filter

2012-09-19 Thread Jack Krupansky
it to override the createComponents method that creates the StopFilter, so you would essentially have to copy the source for SnowballAnalyzer and then add in the code to invoke StopFilter.setEnablePositionIncrements the way StopFilterFactory does. -- Jack Krupansky -Original Message- From

Re: how to disable the field cache

2012-09-18 Thread Jack Krupansky
Could you suggest the code for a mock field cache? I mean, what would an anonymous instance look like. -- Jack Krupansky -Original Message- From: karsten-s...@gmx.de Sent: Tuesday, September 18, 2012 9:07 AM To: java-user@lucene.apache.org Subject: Re: how to disable the field cache

Re: how to fully preprocess query before fuzzy search?

2012-09-17 Thread Jack Krupansky
should escape all special characters, and then add the fuzzy query. Note: In 4.0 the fuzzy query is limited to an editing distance of 2. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Monday, September 17, 2012 10:41 AM To: java-user@lucene.apache.org Subject: how to fully

Re: how to fully preprocess query before fuzzy search?

2012-09-17 Thread Jack Krupansky
Either is fine. In fact just escape based on the individual character, not the context. The multi-character context is telling you places where escape is not essential, but that doesn't mean it would hurt. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Monday

Re: Lucene 4.0 GA time frame

2012-09-14 Thread Jack Krupansky
My personal estimate is that it will likely be within a week or two, but there is no official date. -- Jack Krupansky -Original Message- From: sausarkar Sent: Friday, September 14, 2012 1:05 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 GA time frame Now that the BETA

Re: Hibernate Search with Regex based on Table

2012-09-12 Thread Jack Krupansky
/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType That mapping-ISOLatin1Accent.txt file maps or folds all the accented characters into the base ASCII letter. -- Jack Krupansky -Original Message- From: Robert Streitberger Sent: Wednesday, September 12, 2012 8:45

Re: Hibernate Search with Regex based on Table

2012-09-12 Thread Jack Krupansky
MappingCharFilter can do all of that. The file I referenced already has ae, oe, and ss. That default file handles your umlauts differently, but you can change the rules to suit your exact needs. -- Jack Krupansky -Original Message- From: Robert Streitberger Sent: Wednesday

Re: How to create a Lucene in-memory index at webapp deployment time

2012-09-10 Thread Jack Krupansky
Follow the instruction here: http://lucene.apache.org/core/discussion.html -- Jack Krupansky -Original Message- From: Noopur Julka Sent: Monday, September 10, 2012 12:43 PM To: java-user@lucene.apache.org Cc: Dhananjeyan Balaretnaraja Subject: Re: How to create a Lucene in-memory

Re: Use an analyzer with Term, FuzzyQuery, BooleanQuery and friends

2012-08-27 Thread Jack Krupansky
See: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201208.mbox/%3c42b9aa72526143469066683a1a2fe86a95a...@clg-exch1.clg.Local%3E Just plug in whatever analyzer you use for indexing. -- Jack Krupansky From: Damian Birchler Sent: Monday, August 27, 2012 10:30 AM To: mailto:java-user

Re: Lucene Index backward compatibility related question

2012-08-27 Thread Jack Krupansky
file format (LUCENE-3082). Bottom line: Write a test and see for yourself. -- Jack Krupansky -Original Message- From: Sitowitz, Paul Sent: Wednesday, August 22, 2012 1:31 PM To: java-user@lucene.apache.org Cc: sitow...@gmail.com Subject: Lucene Index backward compatibility related question

Re: Efficient string lookup using Lucene

2012-08-24 Thread Jack Krupansky
I can't speak for any non-Latin languages, but how about simply using the StandardAnalyzer plus the EdgeNGramFilter for indexing (but not query.) The latter would allow a query of run to match running. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Friday, August 24

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
BooleanQuery at the Lucene Query level.) Read more detail at: http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/ -- Jack Krupansky -Original Message- From: heikki Sent: Thursday, August 23, 2012 8:09 AM To: java-user@lucene.apache.org Subject: Question about BooleanQuery

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
causing the problem since your nested query expression was not fully specified - it needed booleanField_1:true. But, of course, I am guessing what you really want. -- Jack Krupansky -Original Message- From: heikki Sent: Thursday, August 23, 2012 8:38 AM To: java-user@lucene.apache.org

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
And (NOT booleanField_2 = true ) is really just booleanField_2 = false, right? Unless you are looking for fields that are not populated with any value in addition to an explicit false value. -- Jack Krupansky -Original Message- From: heikki Sent: Thursday, August 23, 2012 9:13 AM

Re: Creating Span Queries from Boolean Queries

2012-08-21 Thread Jack Krupansky
that distance. That gives you a basic BooleanQuery with AND clauses converted to spans. -- Jack Krupansky -Original Message- From: Dave Seltzer Sent: Tuesday, August 21, 2012 6:53 PM To: java-user@lucene.apache.org Subject: Creating Span Queries from Boolean Queries Hi Everyone

Re: TermRangeQuery with multiple words

2012-08-20 Thread Jack Krupansky
You could index the values in both a text and a separate string field. Then you can query the text field by keyword as well as the string field by the full literal value, or as a wildcard or prefix query (e.g., Microsoft*), or as a range query with the full literal string values. -- Jack

Re: Supporting advanced search methods in a user interface

2012-08-16 Thread Jack Krupansky
Although not directly related to UI, the syntax for span queries supported by the LucidWorks Search platform may give you some ideas for how spans can be composed into larger queries: http://lucidworks.lucidimagination.com/display/lweug/Proximity+Operations -- Jack Krupansky -Original

Re: Solr adding Documents / Commit in different Threads

2012-08-14 Thread Jack Krupansky
to make such a large request work, but it may not be worth the effort. -- Jack Krupansky -Original Message- From: Ralf Heyde Sent: Tuesday, August 14, 2012 7:45 AM To: java-user@lucene.apache.org Subject: Solr adding Documents / Commit in different Threads Hello, we currently facing

Re: Solr adding Documents / Commit in different Threads

2012-08-14 Thread Jack Krupansky
Please send such inquiries to the Solr user email list, not the Lucene user list. -- Jack Krupansky -Original Message- From: Ralf Heyde Sent: Tuesday, August 14, 2012 7:45 AM To: java-user@lucene.apache.org Subject: Solr adding Documents / Commit in different Threads Hello, we

Re: Does the string Cla$$War affect Lucene?

2012-08-14 Thread Jack Krupansky
Add qp.setAutoGeneratePhraseQueries = true before calling qp.parse. Otherwise, the query (clause of the larger BooleanQuery) will be the same as cla OR war, which will match all war documents, plus any cla documents you may have. -- Jack Krupansky -Original Message- From

Re: UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Jack Krupansky
= new SpanMultiTermQueryWrapperWildcardQuery(wq); // will only match jumps over extremely very lazy broxn dog SpanFirstQuery sfq = new SpanFirstQuery(swq, 3); assertEquals(1, searcher.search(sfq, 10).totalHits); } Or, is the issue simply a peculiarity of getSpans? -- Jack Krupansky -Original

Re: Re:RE: Does the string Cla$$War affect Lucene?

2012-08-14 Thread Jack Krupansky
got no hits with autoGeneratePhraseQueries - which suggests that maybe the index didn't use the same analyzer or maybe the literal text in the title is not exactly what you think it is. You could use the WhitespaceAnalyzer, but that would leave leading and trailing punctuation. -- Jack

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
really do need a wiki page for Lucene term analysis. -- Jack Krupansky -Original Message- From: Bill Chesky Sent: Friday, August 03, 2012 9:19 AM To: simon.willna...@gmail.com ; java-user@lucene.apache.org Subject: RE: Analyzer on query question Thanks Simon, Unfortunately, I'm using Lucene

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
such as stemming) becomes unnecessary and risky if you are not very careful or very lucky. -- Jack Krupansky -Original Message- From: Ian Lea Sent: Friday, August 03, 2012 1:12 PM To: java-user@lucene.apache.org Subject: Re: Analyzer on query question Bill You're getting the snowball

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
); BytesRef bytes = termAtt.getBytesRef(); return new Term(BytesRef.deepCopyOf(bytes)); } else return null; // TODO: Close the StringReader // TODO: Handle terms that analyze into multiple terms (e.g., embedded punctuation) } -- Jack Krupansky -Original Message- From: Bill Chesky Sent

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
hope). In theory, this will give guarantee fidelity of the query and improve performance (the toString/parse round-trip is not cheap/free.) As I said, the toString/reparse may indeed work for your specific use-case, but isn't quite ideal for general use. -- Jack Krupansky -Original Message

Re: Lucene 4.0 GA time frame

2012-07-31 Thread Jack Krupansky
. The bottom line is that Lucene/Solr 4.0 GA in the fall (before December) is a reasonable expectation, based on the code's current trajectory. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Tuesday, July 31, 2012 8:46 AM To: java-user@lucene.apache.org Subject: Re

Re: Reindexing after database change

2012-07-28 Thread Jack Krupansky
was created and scan for files near that same date. They may also have a cron job that does incremental updates - look for the crontab. -- Jack Krupansky -Original Message- From: Rodrigo P. Bregalanti Sent: Saturday, July 28, 2012 10:09 AM To: java-user@lucene.apache.org Subject

Re: Question on ElisionFilter with d'

2012-07-25 Thread Jack Krupansky
The filter should work (remove the letter and apostrophe). Could you supply an exact code fragment that shows the literal term, the code invoking the filter, and the exact literal output? And, which release of Lucene? -- Jack Krupansky -Original Message- From: yamo93 Sent

Re: using phrase query with wildcard

2012-07-22 Thread Jack Krupansky
, such as to assure that the match occurs at the beginning, near the beginning, or end of a document. See: http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/spans/SpanPositionRangeQuery.html -- Jack Krupansky -Original Message- From: Levin, Ilya Sent: Sunday, July 22

Re: QueryParser and BooleanQuery

2012-07-22 Thread Jack Krupansky
The query parser/analyzer is lower-casing the query terms automatically. You have to do the same with with terms for BooleanQuery - Term(cs-method, GET) should be Term(cs-method, get). StandardAnalyzer is doing the lower-casing. -- Jack Krupansky -Original Message- From: Deepak

Re: QueryParser and BooleanQuery

2012-07-22 Thread Jack Krupansky
Yes, I failed to notice that the removal of the slash was yet another instance of the analyzer transforming its input. But the bottom line is that you must do 100% of the same steps that analysis performs. If in doubt, pass your literals through the standard analyzer itself. -- Jack Krupansky

Re: Starts with Query - Return like search

2012-07-04 Thread Jack Krupansky
situations, but maybe it would work well for your case. -- Jack Krupansky -Original Message- From: Ian Lea Sent: Wednesday, July 04, 2012 4:00 AM To: java-user@lucene.apache.org Subject: Re: Starts with Query - Return like search In fact there is an FAQ entry Can I combine wildcard

Re: Starts with Query - Return like search

2012-07-04 Thread Jack Krupansky
Oops... that's EdgeNGramTokenFilter in Lucene. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, July 04, 2012 4:52 PM To: java-user@lucene.apache.org Subject: Re: Starts with Query - Return like search Here's a Solr field type that supports edge n-grams

Re: Searching both phrase and it's words

2012-06-30 Thread Jack Krupansky
You didn't show us your luceneQuery, but the gist of the solution is to use MUST clauses for each of the individual terms and then a SHOULD of the phrase. You can add an additional boost to the phrase, but lucene should naturally boost documents containing the phrase. -- Jack Krupansky

Re: how to remove the dash

2012-06-25 Thread Jack Krupansky
through the field analyzer for the desired field type. -- Jack Krupansky -Original Message- From: lis...@alphamatrix.org Sent: Monday, June 25, 2012 4:12 PM To: java-user@lucene.apache.org Subject: Re: how to remove the dash More information... If I change System.out.println(Query

Re: how to remove the dash

2012-06-25 Thread Jack Krupansky
Oopd... I was mistaken to suggest that a simple term query would invoke the field analyzer - it passes the literal text without invoking any field analyzer. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, June 25, 2012 10:14 PM To: java-user@lucene.apache.org

Re: Fast way to get the start of document

2012-06-23 Thread Jack Krupansky
and highlighting on the limited body field. -- Jack Krupansky -Original Message- From: Paul Hill Sent: Friday, June 22, 2012 2:23 PM To: java-user@lucene.apache.org Subject: Fast way to get the start of document Our Hit highlighting (Using the older Highlighter) is wired with a too huge limit, so we

Re: filter by term frequency

2012-06-16 Thread Jack Krupansky
FunctionQuery, ValueSource, and TermFreqValueSource. See: http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html -- Jack Krupansky -Original Message- From: Mike Sokolov Sent: Saturday, June 16, 2012 2:33 PM To: java-user@lucene.apache.org Subject: filter by term

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Jack Krupansky
Look at this code: QueryTermExtractor.getTerms(Query query) http://lucene.apache.org/core/3_6_0/api/contrib-highlighter/org/apache/lucene/search/highlight/QueryTermExtractor.html -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Thursday, June 14, 2012 2:36 PM To: java

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-13 Thread Jack Krupansky
Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky

Re: problem understanding the documentation for the TieredMergePolicy class

2012-06-12 Thread Jack Krupansky
size is reached. Then the code compares the number of segments eligible for merge to that limit. If over that limit, the code then scores each merge and selects the merge with the best score. -- Jack Krupansky -Original Message- From: thomas Sent: Tuesday, June 12, 2012 4:43 AM

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
. That said, maybe you could clarify your specific intent with an example. Maybe you simple want to internally call some existing stemmer filter and output both the original and stemmed term at the same location? -- Jack Krupansky -Original Message- From: Paul Hill Sent: Tuesday, June 12, 2012

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
I forgot about the Solr/Lucene code shuffling. Back in 3.4, WDF was in Solr rather than Lucene. Here's the code: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?revision=1166268view=markup -- Jack Krupansky

Re: CPU usage increased using 3.4.0

2012-06-11 Thread Jack Krupansky
Is the process taking longer or even a lot longer than with the earlier release of Lucene? Is the amount of available JVM memory so low that repeated garbage collections might be occurring? -- Jack Krupansky -Original Message- From: Paul Hill Sent: Monday, June 11, 2012 12:56 PM

Re: easy one? IN and OR stopword help

2012-06-07 Thread Jack Krupansky
uses the default set of stopwords (which includes “or” and “in”)? Try passing null as the second argument to ClassicAnalyzer – it disables the default stop word list. -- Jack Krupansky From: Bob Rhodes Sent: Thursday, June 07, 2012 1:50 PM To: java-user@lucene.apache.org Subject: easy one

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jack Krupansky
, so copying data to Java heap space is not useful. -- Jack Krupansky -Original Message- From: Cheng Sent: Monday, June 04, 2012 10:08 AM To: java-user@lucene.apache.org Subject: RAMDirectory unexpectedly slows Hi, My apps need to read from and write to some big indexes frequently. So

Re: OOM during IndexReader open

2012-06-02 Thread Jack Krupansky
consuming memory. -- Jack Krupansky -Original Message- From: nishesh.gu...@emc.com Sent: Friday, June 01, 2012 7:53 PM To: java-user@lucene.apache.org Subject: OOM during IndexReader open Hi, I am getting the following OOM consistently whenever the index is opened . Is it because now

Re: How to implement fuzzy phrase search with Lucene?

2012-06-01 Thread Jack Krupansky
have individual fuzzy terms within a span - which can be more complex than a simple phrase: https://issues.apache.org/jira/browse/LUCENE-2754 -- Jack Krupansky -Original Message- From: harish...@thomsonreuters.com Sent: Friday, June 01, 2012 2:50 AM To: java-user@lucene.apache.org

Re: How to rebuild index

2011-01-21 Thread Jack Krupansky
retrieve the text to be re-indexed - if and only if it was indexed in stored fields. -- Jack Krupansky -Original Message- From: 黄靖宇 Sent: Friday, January 21, 2011 4:04 AM To: java-user@lucene.apache.org Subject: How to rebuild index Hi, I am new to lucene. Recently I was assigned

Re: Performing a query on token length

2011-01-21 Thread Jack Krupansky
do exactly what you asked. -- Jack Krupansky -Original Message- From: Camden Daily Sent: Friday, January 21, 2011 10:15 AM To: java-user@lucene.apache.org Subject: Performing a query on token length Hello all, Does anyone know if it is possible in Lucene to do a query based

Re: Performing a query on token length

2011-01-21 Thread Jack Krupansky
Oops... I only solved half the problem, the other half was to limit length to 20, which would be done with a negated leading wildcard of 21 question marks: first_name:??* -first_name:?* -- Jack Krupansky -Original Message- From: Jack Krupansky Sent

Re: Wildcard Case Sensitivity

2011-01-20 Thread Jack Krupansky
that query term will fail to match anything because the existence of the wildcard suppresses all analysis, including the lowercasing. Once again, different query parsers may behave differently. -- Jack Krupansky -Original Message- From: Amin Mohammed-Coleman Sent: Thursday, January

<    1   2   3