Re: Wildcard question

2013-10-09 Thread Jack Krupansky
on big indexes. * p * Default: false. */ @Override public void setAllowLeadingWildcard(boolean allowLeadingWildcard) { this.allowLeadingWildcard = allowLeadingWildcard; } And the default is false (leading wildcard not allowed.) -- Jack Krupansky -Original Message- From: Carlos de Luna Saenz

Re: Question about the CompoundWordTokenFilterBase

2013-09-18 Thread Jack Krupansky
Out of curiosity, what is your use case? I mean, the normal use of this filter is to permit a shorthand reference to a long term, but why would you necessarily want to preclude direct reference to the full term? -- Jack Krupansky -Original Message- From: Alex Parvulescu Sent

Re: Can you escape characters you don't want the analyzer to modify

2013-09-17 Thread Jack Krupansky
It sounds like you either need to have a custom analyzer or a field-aware analyzer. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Tuesday, September 17, 2013 4:26 PM To: java-user@lucene.apache.org Subject: Can you escape characters you don't want the analyzer

Re: Query type always Boolean Query even if * and ? are present.

2013-09-12 Thread Jack Krupansky
The trailing asterisk in your query input is escaped with a backslash, so the query parser will not treat it as a wildcard. -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Thursday, September 12, 2013 10:19 AM To: java-user@lucene.apache.org Subject: Query type always

Re: Query type always Boolean Query even if * and ? are present.

2013-09-12 Thread Jack Krupansky
You're not escaping white space, so your input will be a sequence of terms, which should generate a BooleanQuery. What is the last clause of the BQ? It should be your PrefixQuery. -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Thursday, September 12, 2013 11:25 AM

Re: Query type always Boolean Query even if * and ? are present.

2013-09-12 Thread Jack Krupansky
want, then you need to escape it. -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Thursday, September 12, 2013 11:36 AM To: java-user@lucene.apache.org Subject: Re: Query type always Boolean Query even if * and ? are present. BingoThis has solved my case... Thanks a ton

Re: Profiling Solr Lucene for query

2013-09-08 Thread Jack Krupansky
Please send Solr-related inquiries to the Solr user list - this is the Lucene (Java) user list. -- Jack Krupansky -Original Message- From: Manuel Le Normand Sent: Sunday, September 08, 2013 7:03 AM To: java-user@lucene.apache.org Subject: Profiling Solr Lucene for query Hello all

Re: Fuzzy Searching on Lucene / Solr

2013-08-14 Thread Jack Krupansky
The limit of 2 is hard-coded precisely because good performance for editing distances above 2 cannot be guaranteed. -- Jack Krupansky -Original Message- From: Michael Tobias Sent: Wednesday, August 14, 2013 1:00 AM To: java-user@lucene.apache.org Subject: Fuzzy Searching on Lucene

Re: DV limited to 32766 ?

2013-08-09 Thread Jack Krupansky
Check out the discussion on: https://issues.apache.org/jira/browse/LUCENE-4583 StraightBytesDocValuesField fails if bytes 32k -- Jack Krupansky -Original Message- From: Nicolas Guyot Sent: Friday, August 09, 2013 5:57 PM To: java-user@lucene.apache.org Subject: DV limited to 32766

Re: Query serialization/deserialization

2013-08-04 Thread Jack Krupansky
it would be nice to be able to emit classic Lucene query parser queries where possible Yeah, but then we hit the problem of the Query terms having been through analysis. Maybe it would be nice if we had query syntax to indicate that terms had already been analyzed. -- Jack Krupansky

Re: How to Index each file and then each Line for Complete Phrase Match. Sample Data shown.

2013-08-03 Thread Jack Krupansky
Why not start with something simple? Like, index each log line as a tokenized text field and then do PhraseQuery against that text field? Is there something else you need beyond that? -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Saturday, August 03, 2013 3:22 AM

Re: Query serialization/deserialization

2013-07-28 Thread Jack Krupansky
to serialization (I've done something similar myself.) This is what is output in the parsedquery section of debugQuery output for a Solr query response. -- Jack Krupansky -Original Message- From: Denis Bazhenov Sent: Sunday, July 28, 2013 1:59 AM To: java-user@lucene.apache.org Subject

Re: how to by pass analyzer one some fields in QueryParser ?

2013-07-28 Thread Jack Krupansky
PerFieldAnalyzerWrapper http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/PerFieldAnalyzerWrapper.html This analyzer is used to facilitate scenarios where different fields require different analysis techniques. -- Jack Krupansky -Original

Re: Performance measurements

2013-07-25 Thread Jack Krupansky
though it matches everything - no scoring. -- Jack Krupansky -Original Message- From: Arjen van der Meijden Sent: Thursday, July 25, 2013 3:06 PM To: java-user@lucene.apache.org Subject: Re: Performance measurements Hi Sriram, I don't see any obvious mistakes, although you don't need

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
, but more than that is uncharted territory that risks queries taking more than half a second or even multiple seconds and requires a proof of concept implementation to validate reasonable query times. -- Jack Krupansky -Original Message- From: Sriram Sankar Sent: Wednesday, July 24

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
-search. As Adrien indicates, try using raw Lucene filters and you should get much better results. Whether even that will compete with a use-case-specific (graph) search engine remains to be seen. -- Jack Krupansky -Original Message- From: Sriram Sankar Sent: Wednesday, July 24, 2013

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
be wrapped as a CSQ for search so that no scoring would be done. -- Jack Krupansky -Original Message- From: Sriram Sankar Sent: Wednesday, July 24, 2013 3:58 PM To: java-user@lucene.apache.org Subject: Re: Performance measurements On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky j

Re: QueryParser for DisjunctionMaxQuery, et al.

2013-07-23 Thread Jack Krupansky
and manipulate and then regenerate a true source query that doesn't have analysis or enrichment (except as the application may explicitly have performed on the tree.) -- Jack Krupansky -Original Message- From: Beale, Jim (US-KOP) Sent: Tuesday, July 23, 2013 10:07 AM To: java-user

Re: Trying to search java.lang.NullPointerException in log file.

2013-07-22 Thread Jack Krupansky
text at query time. What is the exact query text and what are the exact analyzer tokens for that query text and how many are there? -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Monday, July 22, 2013 10:29 AM To: java-user@lucene.apache.org Subject: Re: Trying to search

Re: Trying to search java.lang.NullPointerException in log file.

2013-07-22 Thread Jack Krupansky
to forget to do that. If you don't, then you will have to hand-analyze the query string and simulate exactly what the standard analyzer did at index time. So, please clarify your situation. -- Jack Krupansky -Original Message- From: Ankit Murarka Sent: Monday, July 22, 2013 6:24 AM To: java

Re: Indexing into SolrCloud

2013-07-18 Thread Jack Krupansky
Sorry, but you need to resend this message to the Solr user list - this is the Lucene user list. -- Jack Krupansky -Original Message- From: Beale, Jim (US-KOP) Sent: Thursday, July 18, 2013 12:34 PM To: java-user@lucene.apache.org Subject: Indexing into SolrCloud Hey folks, I've

Re: Searching for words begining with or

2013-07-18 Thread Jack Krupansky
it, or are you using some other analyzer? -- Jack Krupansky -Original Message- From: ABlaise Sent: Thursday, July 18, 2013 9:19 PM To: java-user@lucene.apache.org Subject: Searching for words begining with or Hi everyone, I am new to this forum, I have made some research for my question

Re: Searching for words begining with or

2013-07-18 Thread Jack Krupansky
( a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with ); And... or is on that list! So, the standard analyzer is removing or from the index! That's why the query can't find it. Unless you really want these stop words removed, construct your own analyzer that does not do stop word removal. -- Jack Krupansky -Original Message- From: ABlaise Sent

Re: What is text searching algorithm in Lucene 4.3.1

2013-07-17 Thread Jack Krupansky
class. Unfortunately, that has less Javadoc, although it does cite a key paper on that approach. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, July 17, 2013 8:17 AM To: java-user Subject: Re: What is text searching algorithm in Lucene 4.3.1 Note: as of Lucene

Re: Query expansion in Lucene (4.x)

2013-07-17 Thread Jack Krupansky
: http://en.wikipedia.org/wiki/Query_expansion Lucid: http://docs.lucidworks.com/display/help/Unsupervised+Feedback+Options http://docs.lucidworks.com/display/lweug/Understanding+and+Improving+Relevance#UnderstandingandImprovingRelevance-UnsupervisedFeedback -- Jack Krupansky -Original Message

Re: What is text searching algorithm in Lucene 4.3.1

2013-07-16 Thread Jack Krupansky
The source code is what most people use to understand how Lucene actually works. In some cases the Javadoc comments will point to published papers or web sites for algorithms or approaches. -- Jack Krupansky -Original Message- From: Vinh Đặng Sent: Tuesday, July 16, 2013 10:54 PM

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Jack Krupansky
that anybody on this mailing list would engage in such an unethical or unprofessional activity. -- Jack Krupansky -Original Message- From: Ramakrishna Sent: Monday, July 15, 2013 9:13 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Hi.. I'm trying nutch to crawl

Re: Lucene in Action

2013-07-10 Thread Jack Krupansky
into underlying Lucene concepts for Solr users, such as the structure of Query objects and tokenization and token filtering, mostly since advanced Solr users run into issues there, but those areas are difficult for Lucene users as well. My e-book: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive

Re: Lucene in Action

2013-07-10 Thread Jack Krupansky
are still valid. -- Jack Krupansky -Original Message- From: Ivan Brusic Sent: Wednesday, July 10, 2013 10:41 AM To: java-user@lucene.apache.org Subject: Re: Lucene in Action Jack, don't you also have a book coming out on O'Reilly? http://shop.oreilly.com/product/0636920028765.do Lucene

Re: Please Help solve problem of bad read performance in lucene 4.2.1

2013-07-07 Thread Jack Krupansky
To be clear, Lucene and Solr are search engines, NOT storage engines. Has someone claimed otherwise to you? What is your query performance in in 4.x vs. 3.x? That's the true, proper measure of Lucene and Solr performance. -- Jack Krupansky -Original Message- From: Chris Zhang Sent

Re: Forcing lucene to use specific field when processing parsed query

2013-07-06 Thread Jack Krupansky
of the getters, you create a new Query object of the current type (the type you referenced in the instanceof) and return that new Query object. Recursion would return the new Query object. -- Jack Krupansky -Original Message- From: Puneet Pawaia Sent: Saturday, July 06, 2013 12:54 PM

Re: handling nonexistent fields in an index

2013-07-03 Thread Jack Krupansky
There is a Lucene filter that you can use to check efficiently for whether a field has a value or not. new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate)) -- Jack Krupansky -Original Message- From: David Carlton Sent: Wednesday, July 03, 2013 4:27 PM To: java

Re: highlighting component to searchComponent

2013-07-01 Thread Jack Krupansky
Try asking your question on the “Solr user” email list – this is the Lucene user list! -- Jack Krupansky From: Adrien RUFFIE Sent: Monday, July 01, 2013 4:36 AM To: java-user@lucene.apache.org Subject: highlighting component to searchComponent Hello all I had the following configuration

Re: How to Perform a Full Text Search on a Number with Leading Zeros or Decimals?

2013-06-28 Thread Jack Krupansky
The user could use a regular expression query to match the numbers, but otherwise, you will have to write some specialized token filter to recognize numeric tokens and generate extra tokens at the same position for each token variant that you want to search for. -- Jack Krupansky

Re: Language detection

2013-06-27 Thread Jack Krupansky
sendting the document to Solr. Tika also has language detection, so you could call Tika from an external process before sending the document to Solr. -- Jack Krupansky -Original Message- From: Hang Mang Sent: Thursday, June 27, 2013 11:45 AM To: java-user@lucene.apache.org Subject

Re: Language detection

2013-06-27 Thread Jack Krupansky
Oops... sorry, I just realized this was on the Lucene-user list. My response was for Solr-ONLY! -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Thursday, June 27, 2013 1:11 PM To: java-user@lucene.apache.org Subject: Re: Language detection You can use

Re: Questions about doing a full text search with numeric values

2013-06-27 Thread Jack Krupansky
do a better job with punctuation. -- Jack Krupansky -Original Message- From: Todd Hunt Sent: Thursday, June 27, 2013 1:14 PM To: java-user@lucene.apache.org Subject: Questions about doing a full text search with numeric values I am working on an application that is using Tika to index

Re: Securing stored data using Lucene

2013-06-26 Thread Jack Krupansky
is to retrieve an encrypted blob based on an encrypted key, why are you even considering Lucene? -- Jack Krupansky -Original Message- From: Rafaela Voiculescu Sent: Wednesday, June 26, 2013 5:06 AM To: java-user@lucene.apache.org Subject: Re: Securing stored data using Lucene Hello, Thank you

Re: New Lucene User

2013-06-17 Thread Jack Krupansky
Try starting with Solr. You can have your search server up and running without writing any code. And Solr's Data Import Handler can load data direct from the database. -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Monday, June 17, 2013 5:03 PM

Re: compare paragraphs of text - which Query Class to use?

2013-06-14 Thread Jack Krupansky
technique for detecting plagiarism where a lot of the text is similar if not identical. Once you get experience using this technique in Solr, then simply look at the parsed query that edismax generates and do the same in your Lucene Java code. -- Jack Krupansky -Original Message- From

Re: Lucene Indexes explanantion

2013-06-10 Thread Jack Krupansky
code using Lucene. Otherwise, you won't have enough context to understand or even ask intelligent questions. -- Jack Krupansky -Original Message- From: nikhil desai Sent: Monday, June 10, 2013 1:24 PM To: java-user@lucene.apache.org Subject: Lucene Indexes explanantion Hello, My

Re: Lucene Indexes explanantion

2013-06-10 Thread Jack Krupansky
Your stored value could be very different from your indexed (searchable) value. You can also associate payloads with an indexed term. And there are DocValues as well. -- Jack Krupansky -Original Message- From: nikhil desai Sent: Monday, June 10, 2013 8:06 PM To: java-user

Re: Lucene Indexes explanantion

2013-06-10 Thread Jack Krupansky
: - Indexed terms - Stored values - Payloads - DocValues -- Jack Krupansky -Original Message- From: nikhil desai Sent: Monday, June 10, 2013 8:36 PM To: java-user@lucene.apache.org Subject: Re: Lucene Indexes explanantion I don't think I could get much from what you said, could you please

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
It might be nice to inquire as to the largest position for a field in a document. Is that information kept anywhere? Not that I know of, although I suppose it can be calculated at runtime by running though all the terms of the field. Then he could just divide by 1000. -- Jack Krupansky

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
Take a look at the Term Vectors Component: http://wiki.apache.org/solr/TermVectorComponent -- Jack Krupansky -Original Message- From: Igor Shalyminov Sent: Thursday, May 23, 2013 9:54 AM To: java-user@lucene.apache.org Subject: Re: Getting position increments directly from

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
If you add a special end of document term then some of these calculations might be easier. And, give that special term a payload of the sentence count. While you're at it, insert end of sentence terms that could have a a payload of the sentence number. -- Jack Krupansky -Original

Re: Case insensitive StringField?

2013-05-21 Thread Jack Krupansky
to trim exterior white space and normalize interior white space. -- Jack Krupansky -Original Message- From: Shahak Nagiel Sent: Tuesday, May 21, 2013 10:06 AM To: java-user@lucene.apache.org Subject: Case insensitive StringField? It appears that StringField instances are treated

Re: Query with phrases, wildcards and fuzziness

2013-05-21 Thread Jack Krupansky
Just escape embedded spaces with a backslash. -- Jack Krupansky -Original Message- From: Ross Simpson Sent: Tuesday, May 21, 2013 8:08 PM To: java-user@lucene.apache.org Subject: Query with phrases, wildcards and fuzziness Hi all, I'm trying to create a fairly complex query

Re: Case insensitive StringField?

2013-05-21 Thread Jack Krupansky
Yes it is. It always will. But... you can escape the spaces with a backslash: Query q = qp.parse(new\\ york); -- Jack Krupansky -Original Message- From: Shahak Nagiel Sent: Tuesday, May 21, 2013 10:09 PM To: java-user@lucene.apache.org Subject: Re: Case insensitive StringField? Jack

Re: classic.QueryParser - bug or new behavior?

2013-05-19 Thread Jack Krupansky
Yeah, just go ahead and escape the slash, either with a backslash or by enclosing the whole term in quotes. Otherwise the slash (even embedded in the middle of a term!) indicates the start of a regex query term. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Sunday

Re: lucene and mongodb

2013-05-14 Thread Jack Krupansky
Revolution, but my proposal was not accepted.) See: http://www.datastax.com/what-we-offer/products-services/datastax-enterprise As it says, DataStax Enterprise is completely free for development work. -- Jack Krupansky -Original Message- From: Rider Carrion Cleger Sent: Tuesday, May 14, 2013

Re: [PhraseQuery] Can jakarta apache~10 be searched by offset ?

2013-05-13 Thread Jack Krupansky
a Jira for a new Lucene Query for phrase and or span queries that measures distance by offsets rather than positions. -- Jack Krupansky -Original Message- From: wgggfiy Sent: Monday, May 13, 2013 3:47 AM To: java-user@lucene.apache.org Subject: Re: [PhraseQuery] Can jakarta apache~10

Re: [PhraseQuery] Can jakarta apache~10 be searched by offset ?

2013-05-06 Thread Jack Krupansky
Do you mean the raw character offsets of the starting and ending characters of the terms? No. Although, if you index the text as a raw string, you might be able to come up with a regex query like jakarta.{1,10}apache -- Jack Krupansky -Original Message- From: wgggfiy Sent: Monday

Re: Multiple PositionIncrement attributes

2013-04-25 Thread Jack Krupansky
length, say 500. The SpanNearQuery would use n-1 times the sentence separation plus the maximum sentence length. Well, you have to adjust that for how you count sentences - is 1 the current sentence or is that 0? -- Jack Krupansky -Original Message- From: Igor Shalyminov Sent: Thursday

Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread Jack Krupansky
I didn't read your code, but do you have the reset that is now mandatory and throws AIOOBE if not present? -- Jack Krupansky -Original Message- From: andi rexha Sent: Monday, April 15, 2013 10:21 AM To: java-user@lucene.apache.org Subject: WhitespaceTokenizer, incrementToke

Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread Jack Krupansky
of a contract violation). So, I should have said that the contract was mandatory but not enforced... which from a practical perspective negates its mandatory contractual value. -- Jack Krupansky -Original Message- From: Uwe Schindler Sent: Monday, April 15, 2013 11:53 AM To: java-user

Re: How to index Sharepoint files with Lucene

2013-04-10 Thread Jack Krupansky
, or maybe even send each file directly into Tika and then directly index the content into Lucene, if that's what you want. In any case, MCF handles the SharePoint access and crawling. See: http://manifoldcf.apache.org/en_US/index.html -- Jack Krupansky -Original Message- From: Álvaro

Re: MLT Using a Query created in a different index

2013-04-05 Thread Jack Krupansky
In a statistical sense, for the majority of documents, yes, but you could probably find quite a few outlier examples where the results from A to B or from B to A as significantly or even completely different or even non-existent. -- Jack Krupansky -Original Message- From: Peter

Re: MLT Using a Query created in a different index

2013-04-04 Thread Jack Krupansky
or terrible - the selected/query document may not have any representation in the target corpus. -- Jack Krupansky -Original Message- From: Peter Lavin Sent: Thursday, April 04, 2013 1:06 PM To: java-user@lucene.apache.org Subject: MLT Using a Query created in a different index Dear Users, I

Re: Indexing a long list

2013-03-31 Thread Jack Krupansky
whether v(i) or v(j) are or are not present as keywords. -- Jack Krupansky -Original Message- From: Paul Bell Sent: Sunday, March 31, 2013 8:21 AM To: java-user@lucene.apache.org Subject: Indexing a long list Hi All, Suppose I need to index a property whose value is a long list of terms

Re: Indexing a long list

2013-03-31 Thread Jack Krupansky
Multivalued fields are the other approach to keyword value pairs. And if you can denormalize your data, storing structure as separate documents can make sense and support more powerful queries. Although the join capabilities are rather limited. -- Jack Krupansky -Original Message

Re: Multi-value fields in Lucene 4.1

2013-03-22 Thread Jack Krupansky
I don't think there is a way of identifying which of the values of a multivalued field matched. But... I haven't checked the code to be absolutely certain whether their isn't some expert way. Also, realize that multiple values could match, such as if you queried for B*. -- Jack Krupansky

Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky
Try the ASCII Folding FIlter: https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html -- Jack Krupansky -Original Message- From: Jerome Blouin Sent: Friday, March 22, 2013 12:22 PM To: java-user@lucene.apache.org Subject

Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky
Start with the Standard Tokenizer: https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html -- Jack Krupansky -Original Message- From: Jerome Blouin Sent: Friday, March 22, 2013 12:53 PM To: java-user@lucene.apache.org Subject

Re: Getting documents from suggestions

2013-03-16 Thread Jack Krupansky
I don't have time right now to debug your code right now, but make sure that the analysis is consistent between index and query. For example, Apache vs. apache. -- Jack Krupansky -Original Message- From: Bratislav Stojanovic Sent: Saturday, March 16, 2013 7:29 AM To: java-user

Re: Getting documents from suggestions

2013-03-14 Thread Jack Krupansky
Could you give us some examples of what you expect? I mean, how is your suggested set of documents any different from simply executing a query with the list of suggested terms (using q.op=OR)? Or, maybe you want something like MoreLikeThis? -- Jack Krupansky -Original Message- From

Re: Getting documents from suggestions

2013-03-14 Thread Jack Krupansky
Let's refine this... If a top suggestion is X, do you simply want to know a few of the documents which have the highest term frequency for X? Or is there some other term-oriented metric you might propose? -- Jack Krupansky -Original Message- From: Bratislav Stojanovic Sent

Re: Boolean Query not working in Lucene 4.0

2013-02-26 Thread Jack Krupansky
Try detailing both your expected behavior and the actual behavior. Try providing an actual code snippet and actual index and query data. Is it failing for all types and titles or just for some? -- Jack Krupansky -Original Message- From: saisantoshi Sent: Tuesday, February 26, 2013 6

Re: possible bug on Spellchecker

2013-02-20 Thread Jack Krupansky
Any reason that you are not using the DirectSpellChecker? See: http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html -- Jack Krupansky -Original Message- From: Samuel García Martínez Sent: Wednesday, February 20, 2013 3:34 PM To: java-user

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky
Please clarify exactly what you want to group by - give a specific example that makes it clear what terms should affect grouping and which shouldn't. -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday, February 18, 2013 6:12 AM To: java-user

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky
Okay, so, fields that would normally need to be tokenized must be stored as both raw strings for grouping and tokenized text for keyword search. Simply use copyField to copy from one to the other. -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Monday

Re: Grouping and tokens

2013-02-18 Thread Jack Krupansky
Oops, sorry for the Solr answer. In Lucene you need to simply index the same value, once as a raw string and a second time as a tokenized text field. Grouping would use the raw string version of the data. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday

Re: fuzzy queries

2013-02-09 Thread Jack Krupansky
a document if EITHER term matches. So, if NEITHER matches (within an editing distance of 2), the document is not a match. -- Jack Krupansky -Original Message- From: Pierre Antoine DuBoDeNa Sent: Saturday, February 09, 2013 12:52 PM To: java-user@lucene.apache.org Subject: Re: fuzzy queries

Re: Wildcard in a text field

2013-02-08 Thread Jack Krupansky
an analyzer that preserves them since they generally will be treated as spaces. -- Jack Krupansky -Original Message- From: Nicolas Roduit Sent: Friday, February 08, 2013 2:49 AM To: java-user@lucene.apache.org Subject: Wildcard in a text field I'm looking for a way of making a query

Re: Wildcard in a text field

2013-02-08 Thread Jack Krupansky
Ah, okay... some people call that prospective search. In any case, there is no direct Lucene support that I know of. There are some references here: http://lucene.apache.org/core/4_0_0/memory/org/apache/lucene/index/memory/MemoryIndex.html -- Jack Krupansky -Original Message- From

Re: How to find related words ?

2013-01-31 Thread Jack Krupansky
of the box. -- Jack Krupansky -Original Message- From: Andrew Gilmartin Sent: Thursday, January 31, 2013 9:04 AM To: java-user@lucene.apache.org Subject: Re: How to find related words ? wgggfiy wrote: en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, what's the difference

Re: How to find related words ?

2013-01-30 Thread Jack Krupansky
keyword(s) and ask Lucene to extract relevant terms from the top document(s). -- Jack Krupansky -Original Message- From: wgggfiy Sent: Wednesday, January 30, 2013 12:27 PM To: java-user@lucene.apache.org Subject: How to find related words ? In short, you put in a term like Lucene

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread Jack Krupansky
That depends on the value of ed, and the indexed data. Another factor to take into consideration is that a case change (Star vs. star) also counts as an edit. -- Jack Krupansky -Original Message- From: George Kelvin Sent: Tuesday, January 29, 2013 11:49 AM To: java-user

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-29 Thread Jack Krupansky
and the query - all the literals. In other words, construct a minimal test case that shows the failure. -- Jack Krupansky -Original Message- From: George Kelvin Sent: Tuesday, January 29, 2013 12:28 PM To: java-user@lucene.apache.org Subject: Re: Questions about FuzzyQuery in Lucene 4.x

Re: Questions about FuzzyQuery in Lucene 4.x

2013-01-28 Thread Jack Krupansky
Let's see your code that calls FuzzyQuery . If you happen to pass a prefixLength (3rd parameter) of 3 or more, then ster would not match star (but prefixLength of 2 would match). -- Jack Krupansky -Original Message- From: George Kelvin Sent: Monday, January 28, 2013 5:31 PM To: java

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread Jack Krupansky
it works: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Adrien Grand Sent: Sunday, January 27, 2013 12:53 PM To: java-user@lucene.apache.org Subject: Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread Jack Krupansky
it, and Solr is based on Lucene. -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, January 27, 2013 2:09 PM To: java-user@lucene.apache.org Subject: Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content We are not using Solr

Re: Indexing multiple fields with one document position

2013-01-21 Thread Jack Krupansky
Send the same input text to two different analyzers for two separate fields. The first analyzer emits only the first attribute. The second analyzer emits only the second attribute. The document position in one will correspond to the document position in the other. -- Jack Krupansky

Re: SpanNearQuery with two boundaries

2013-01-18 Thread Jack Krupansky
+1 I think that accurately states the semantics of the operation you want. -- Jack Krupansky -Original Message- From: Alan Woodward Sent: Friday, January 18, 2013 1:08 PM To: java-user@lucene.apache.org Subject: Re: SpanNearQuery with two boundaries Hi Igor, You could try wrapping

Re: Combine two BooleanQueries by a SpanNearQuery.

2013-01-17 Thread Jack Krupansky
You need to express the boolean query solely in terms of SpanOrQuery and SpanNearQuery. If you can't, ... then it probably can't be done, but you should be able to. How about starting with a plan English description of the problem you are trying to solve? -- Jack Krupansky -Original

Re: Combine two BooleanQueries by a SpanNearQuery.

2013-01-17 Thread Jack Krupansky
exclude terms from a span. -- Jack Krupansky -Original Message- From: Michel Conrad Sent: Thursday, January 17, 2013 12:14 PM To: java-user@lucene.apache.org Subject: Re: Combine two BooleanQueries by a SpanNearQuery. The problem I would like to solve is to have two queries that I will get

Re: Lucene-MoreLikethis

2013-01-15 Thread Jack Krupansky
setMinTermFreq. The default is 2. You don't have any terms with a term frequency above 1. -- Jack Krupansky -Original Message- From: Thomas Keller Sent: Tuesday, January 15, 2013 3:22 PM To: java-user@lucene.apache.org Subject: Lucene-MoreLikethis Hey, I have a question about

Re: FuzzyQuery in lucene 4.0

2013-01-09 Thread Jack Krupansky
FWIW, new FuzzyQuery(term, 2 ,0) is the same as new FuzzyQuery(term), given the current values of defaultMaxEdits (2) and defaultPrefixLength (0). -- Jack Krupansky -Original Message- From: Ian Lea Sent: Wednesday, January 09, 2013 9:44 AM To: java-user@lucene.apache.org Subject: Re

Re: Differences in MLT Query Terms Question

2013-01-08 Thread Jack Krupansky
MoreLikeThis? Or, possibly arv appears later in a document on the second run, after the number of tokens specified by maxNumTokensParsed. -- Jack Krupansky -Original Message- From: Peter Lavin Sent: Tuesday, January 08, 2013 1:46 PM To: java-user@lucene.apache.org Subject: Differences in MLT

Re: TokenFilter state question

2012-12-26 Thread Jack Krupansky
You need a reset method that calls the super reset to reset the parent state and then reset your own state. http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html#reset() You probably don't have one, so only the parent state gets reset. -- Jack Krupansky

Re: Which token filter can combine 2 terms into 1?

2012-12-26 Thread Jack Krupansky
Ah! You're quoting full phrases. You weren't clear about that originally. Thanks for the clarification. -- Jack Krupansky -Original Message- From: Tom Sent: Wednesday, December 26, 2012 5:54 PM To: java-user@lucene.apache.org Subject: Re: Which token filter can combine 2 terms into 1

Re: TokenFilter state question

2012-12-26 Thread Jack Krupansky
: String querystr = product:(Spring Framework Core) vendor:(SpringSource); to String querystr = product:\Spring Framework Core\ vendor:(SpringSource); -- Jack Krupansky -Original Message- From: Jeremy Long Sent: Wednesday, December 26, 2012 5:52 PM To: java-user

Re: Retrieving granular scores back from Lucene/SOLR

2012-12-25 Thread Jack Krupansky
(org.apache.lucene.search.Query, int) -- Jack Krupansky -Original Message- From: Vishwas Goel Sent: Tuesday, December 25, 2012 11:30 PM To: java-user@lucene.apache.org Subject: Retrieving granular scores back from Lucene/SOLR Hi, I am looking to get a bit more information back from SOLR/Lucene

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Jack Krupansky
And to be more specific, most query parsers will have already separated the terms and will call the analyzer with only one term at a time, so no term recombination is possible for those parsed terms, at query time. -- Jack Krupansky -Original Message- From: Erick Erickson Sent

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Jack Krupansky
You still have the query parser's parsing before analysis to deal with, no matter what magic you code in your analyzer. -- Jack Krupansky -Original Message- From: Tom Sent: Friday, December 21, 2012 2:24 PM To: java-user@lucene.apache.org Subject: Re: Which token filter can combine 2

Re: NGramPhraseQuery with missing terms

2012-12-19 Thread Jack Krupansky
/4_0_0/core/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int) -- Jack Krupansky -Original Message- From: 김한규 Sent: Wednesday, December 19, 2012 2:36 AM To: java-user@lucene.apache.org Subject: NGramPhraseQuery with missing terms Hi. I am trying to make

Re: Help needed: search is returning no results

2012-12-18 Thread Jack Krupansky
that fail. -- Jack Krupansky -Original Message- From: Ramon Casha Sent: Tuesday, December 18, 2012 9:14 AM To: java-user@lucene.apache.org Subject: Help needed: search is returning no results I have just downloaded and set up Lucene 4.0.0 to implement a search facility for a web app I'm

precisionStep for days in TrieDate

2012-12-14 Thread Jack Krupansky
table that trie keeps beyond the raw data values or the data values themselves. -- Jack Krupansky

Re: precisionStep for days in TrieDate

2012-12-14 Thread Jack Krupansky
Thanks, you answered the main question - 26 doesn't simply lop off the time of day. Although, I still don't completely follow how trie works (without reading the paper itself.) -- Jack Krupansky -Original Message- From: Uwe Schindler Sent: Friday, December 14, 2012 5:58 PM To: java

Re: Boolean and SpanQuery: different results

2012-12-13 Thread Jack Krupansky
, but unexpected term. -- Jack Krupansky -Original Message- From: Carsten Schnober Sent: Thursday, December 13, 2012 10:49 AM To: java-user@lucene.apache.org Subject: Boolean and SpanQuery: different results Hi, I'm following Grant's advice on how to combine BooleanQuery and SpanQuery (http://mail

<    1   2   3   >