Re: TermVectorOffsetStrategy producing Passages with matches out of order? (causing IndexOutOfBoundsException)

2023-07-04 Thread Chris Hostetter
I hacked up the test a bit so it would compile against 9.0 and confirmed the problem existed there as well. So going back a little farther with some manual bisection (to account for the transition from ant to gradle) lead me to the following... # first bad commit:

Re: TermVectorOffsetStrategy producing Passages with matches out of order? (causing IndexOutOfBoundsException)

2023-06-29 Thread Chris Hostetter
With some trial and error I realized two things... 1) the order of the terms in the BooleanQuery seems to matter - but in terms of their "natural order", not the order in the doc (which is why i was so confused trying to reproduce it) 2) the problem happens when using termVectors but

TermVectorOffsetStrategy producing Passages with matches out of order? (causing IndexOutOfBoundsException)

2023-06-29 Thread Chris Hostetter
I've got a user getting java.lang.IndexOutOfBoundsException from the UnifiedHighlighter in Solr 9.1.0 w/Lucene 9.3.0 (And FWIW, this same data, w/same configs, in 8.11.1, purportedtly didn't have this problem) I don't really understand the highlighter code very well, but AFAICT: -

Re: Reproducible crash matching phrases

2021-02-10 Thread Chris Hostetter
: I'm attaching an updated file as well this this changes. : : This happens in Lucene 8.8.0 (and probably since 8.4.0). Ok -- cool ... with the udpated code i was able to reproduce on branch_8x, and with 8.8 & 8.7 (but not 8.4) -- I've distilled your patch into a test case and attached to a

Re: Reproducible crash matching phrases

2021-02-10 Thread Chris Hostetter
: I've been able to reproduce a crash we are seeing in our product with newer : Lucene versions. Can you be specific? What exact versions of Lucene are you using that reproduces this failure? If you know of other "older" versions where you can't reproduce the problem, that info would also

Re: explainOther SOLR concept?

2019-06-27 Thread Chris Hostetter
: It’s a Solr-only param for adding to debug=true…. at the Lucene level it's just calling the explain() method on an arbitrary docId regardless of whether that doc appear in the topN results for that query (or if it matches the query at all) -Hoss http://www.lucidworks.com/

Re: Question about usage of LuceneTestCase

2018-08-27 Thread Chris Hostetter
: Current version of Luke supports FS based directory implementations only. : (I think it will be better if future versions support non-FS based custom : implementations, such as HdfsDirectoryFactory for users who need it.) : Disabling the randomization, at least for now, sounds reasonable to me

Re: Practical usages of arbitrary Shingles when using a query parser?

2018-07-31 Thread Chris Hostetter
: The query parser is confused by these overlapping positions indeed, which : it interprets as synonyms. I was going to write that you should set the Sure -- i'm not blaming the QueryParser, what it does with the Shingles output makes sense (and actual works! .. just not as efficiently as

Practical usages of arbitrary Shingles when using a query parser?

2018-07-30 Thread Chris Hostetter
Although I've been aware of Shings and some of the useful applications for a long time, today is the first tiem i really sat down and tried to do something non-trivial with them myself. My objective seems realatively straight forard: given a corpus of text and some analyzer (for sake of

Re: Size of Document

2018-07-05 Thread Chris Hostetter
: Subject: Size of Document : To: java-user@lucene.apache.org : References: : : : Message-ID: : In-Reply-To: : https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: ClassicAnalyzer Behavior on accent character

2017-10-26 Thread Chris Hostetter
Classic is ... "classic" ... it exists largely for historical purposes to provide a tokenizer that does exactly what the javadocs say it does (regarding punctuation, "produc numbers", and email addresses), so that people who depend on that behavior can continue to rely on it. Standard is ...

Re: payload at the document level

2017-10-05 Thread Chris Hostetter
what you're describing is essentially just DocValues -- for each document, you can have an arbitrary bytes[] (or number, or sorted list of numbers), and you could write a custom query/similarity/collector that can access that "docvalue" at search time to decide if it's a match (or how to score

Re: Changing the default FSLockFactory implementation

2017-05-31 Thread Chris Hostetter
: We are experiencing some “Lock obtain timed out: NativeFSLock@” issues : on or NFS file system, could someone please show me, what’s the right : way to switch the Lucene default NativeFSLockFactory to : SimpleFSLockFactory? You can specify the LockFactory used when opening your

RE: Un-used index files are not getting released

2017-05-11 Thread Chris Hostetter
: We do not open any IndexReader explicitly. We keep one instance on : IndexWriter open (and never close) and for searching we use : SearcherManager. I checked the lsof and did not find any files with : delete status. what exactly does your SearchManager usage look like? is every searcher =

Re: will lucene traverse all segments to search a 'primary key'term or will it stop as soon as it get one?

2017-04-21 Thread Chris Hostetter
: Lucene by default will search all segments, because it does not know that : your field is a primary key. : : Trejkaz's suggestion to early-terminate should work well. You could also : write custom code that uses TermsEnum on each segment. Before you go too far down the rabit hole of writting

Re: How to get document effectively. or FieldCache example

2017-04-21 Thread Chris Hostetter
: then which one is right tool for text searching in files. please can you : suggest me? so far all you've done is show us your *indexing* code; and said that after you do a search, calling searcher.doc(docid) on 500,000 documents is slow. But you still haven't described the usecase you are

Re: Automata and Transducer on Lucene 6

2017-04-19 Thread Chris Hostetter
: pairs). It is this kind of : "high-level goal" I asked about. Your answer only adds to the mystery: https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are

RE: question

2017-01-19 Thread Chris Hostetter
: Yes, they should be the same unless the field is indexed with shingles, in that case order matters. : Markus just to clarify... The examples provided show *stirngs* which would have to be parsed into Query objects by a query parser. the *default* QueryParser will produce queries that

Re: Where did earthDiameter go?

2017-01-12 Thread Chris Hostetter
I don't konw the rhyme/reason but it looks like it was removed (w/o any deprecation first i guess) as part of LUCENE-7123 in commit: ce3114233bdc45e71a315cb6ece64475d2d6b1d4 in that commit, existing callers in the lucene code base were changed to use "2 * GeoProjectionUtils.SEMIMAJOR_AXIS"

Re: Problem sorting long integers

2016-12-13 Thread Chris Hostetter
How are you constructing your SortField at query time? Are you sure you are using SortField.Type.LONG ? Can you show us some minimally self contained reproducible code demonstrating your problem? (ie: create an index with 2 docs, then do a simple serach for both and sort them and show that

Re: How exclude empty fields?

2016-11-16 Thread Chris Hostetter
: The issue I have is that some promotions are permanent so they don't have : an endDate set. : : I tried doing: : : ( +Promotion.endDate:[210100TOvariable containing yesterday's date] : || -Promotion.endDate:* ) 1) mixing prefix ops with "||" like this is most certainly not doing what

Re: Unsubscribing problems

2016-09-07 Thread Chris Hostetter
Peyman: I'll contact you off list to try and address your specific problem. As a general reminder for all users: If you need help with the mailing list, step #1 should be to email the automated help system via java-user-help@lucene (identified in the Mailin-List and List-Help mail MIME

Re: BooleanQuery rewrite optimization

2016-08-08 Thread Chris Hostetter
Off the top of my head, i think any optimiation like that would also need to account for minNrShouldMatch, wouldn't it? if your query is "(X Y Z #X)" w/minshouldmatch=2, and you rewrite that query to "(+X Y Z)" w/minshouldmatch=2 you now have a semantically diff query that won't match as many

Re: disable field length normalization on specific fields?

2016-03-28 Thread Chris Hostetter
yep, just use a customied similarity that doesn't include a length factor when computing the norm. If you are currently using TFIDFSimilarity (or one of it's subclasses) then the computeNorm method delegates to a lengthNorm method, and you can override that to return "1" for fields with a

Re: 500 millions document for loop.

2015-11-15 Thread Chris Hostetter
: public void collect(int docID) throws IOException { : Document doc = indexSearcher.doc(docID, loadFields); : found.found(doc); : } Based on your description of the calculation you are doing

Re: Lucene 4.x -> 5.x: Converting FieldValueFilter to FieldValueQuery

2015-11-05 Thread Chris Hostetter
: > The fact that you need to index doc values is related to another change in : > which we removed Lucene's FieldCache and now recommend to use doc values : > instead. Until you reindex with doc values, you can temporarily use : > UninvertingReader[1] to have the same behaviour as in Lucene 4.x.

Re: sizes of non-fdt flies affected by compression settings

2015-11-04 Thread Chris Hostetter
: This setting can only affect the size of the fdt (and fdx) files. I suspect : you saw differences in the size of other files because it caused Lucene to : run different merges (because segments had different sizes), and the : compression that we use for postings/terms worked better, but it

Re: sizes of non-fdt flies affected by compression settings

2015-11-04 Thread Chris Hostetter
: This setting can only affect the size of the fdt (and fdx) files. I suspect : you saw differences in the size of other files because it caused Lucene to : run different merges (because segments had different sizes), and the : compression that we use for postings/terms worked better, but it

Re: Pagination using searchAfter

2015-09-04 Thread Chris Hostetter
: I want to use the searchAfter API in IndexSearcher. This API takes ScoreDoc as : argument. Do we need to store the last ScoreDoc value (ScoreDoc value from : previous search)? When multiple users perform search, then it might be : difficult to store the last ScoreDoc value. : : I guess, docid

Re: Getting a proper ID value into every document

2015-06-05 Thread Chris Hostetter
: If you cannot do this for whatever reason, I vaguely remember someone : posting a link to a program they'd put together to do this for a : docValues field, you'd have to search the archives to find it. It was Toke - he generated DocValues for an existing index by writing an IndexReader Filter

Re: multi valued facets

2015-06-04 Thread Chris Hostetter
: Set the field to multiValued=true in your schema. How'd you manage to : get multiple values in there without an indexing error? An existing : index built with Lucene directly? Erik: this isn't a Solr question -- the error message mentioned comes from the lucene/facets FacetsConfig class.

Re: Specifying a Version vs. not specifying a Version

2015-05-29 Thread Chris Hostetter
: Now StandardTokenizer(Version, Reader) is deprecated and the docs say : to use StandardTokenizer(Reader) instead. But I can't do that, because : that constructor hardcodes Version.LATEST, which will break backwards : compatibility in the future (its Javadoc even confirms that this is : the

Re: BytesRef violates the principle of least astonishment

2015-05-20 Thread Chris Hostetter
: I already know how Object#clone() works: May i humbly suggest that you: a) relax a bit; b) keep reading the rest of the javadocs for that method? : As BytesRef#clone() is overriding Object#clone(), I expect it to : comply with that. BytesRef#clone() functions virtually identical to the way

Re: Lucene indexing speed on NVMe drive

2015-04-30 Thread Chris Hostetter
: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : processor ,40 with hyperthreading, 64G Memory) and study indexing speed ... : I get best performance

Lucene/Solr Revolution 2015 - Austin Oct 13-16 - CFP ends next Week

2015-04-30 Thread Chris Hostetter
(cross posted, please confine any replies to general@lucene) A quick reminder and/or heads up for htose who haven't heard yet: this year's Lucene/Solr Revolution is happeing in Austin Texas in October. The CFP and Early bird registration are currently open. (CFP ends May 8, Early Bird ends

Re: Filters execution efficiency

2015-03-26 Thread Chris Hostetter
FWIW: If you're reading LIA, part of your confusion may be that Filters, and when/how they are factored into iterating over scorers, has changed significantly over the years. : Date: Fri, 27 Mar 2015 00:45:14 +0100 : From: Adrien Grand jpou...@gmail.com : Reply-To: java-user@lucene.apache.org

Re: Eclipse Compiled lucene-core-5.0.0.jar Not Working in Solr

2015-03-09 Thread Chris Hostetter
: If you need to make changes to an existing 4.10 installation, pull down the 4.10 : source code and work from _that_, which you can do with something like: based on the error, i don't think he's trying to drop the lucene-core-5.0.0.jar into a Solr 4 install -- i suspect he's compiled built

Re: Lucene Version Upgrade (3-4) and Java JVM Versions(6-8)

2015-01-27 Thread Chris Hostetter
: I seem to remember reading that certain versions of lucene were : incompatible with some java versions although I cannot find anything to : verify this. As we have tens of thousands of large indexes, backwards : compatibility without the need to reindex on an upgrade is of prime :

REMINDER: ApacheCon 2015 Call For Papers Ends This Week (February 1st)

2015-01-26 Thread Chris Hostetter
(cross posted, please confine replies to general@lucene) ApacheCon 2015 Will be in Austin Texas April 13-17. http://apachecon.com/ The Call For Papers is currently open, but it ends 2015-02-01 (11:55PM GMT-0600)

Re: Details on setting block parameters for Lucene41PostingsFormat

2015-01-13 Thread Chris Hostetter
: : The first int to Lucene41PostingsFormat is the min block size (default : 25) and the second is the max (default 48) for the block tree terms : dict. we were discussing over on the solr-user mailing list how Tom would/could go about configuring Solr to use a custom subclass of

RE: Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Chris Hostetter
: In Lucene you don't need to use a query parser for that, especially : because range Queries is suboptimal and slow: There is already a very : fast query/filter available. Ahmet Arslan already mentioned that, we had : the same discussion a few weeks ago: :

ANNOUNCE: CFP and Travel Assistance now open for ApacheCon North America 2015

2014-12-16 Thread Chris Hostetter
(NOTE: cross posted to several lucene lists, if you have replies, please confine them to general@lucene) -- Forwarded message -- In case you've missed it: - ApacheCon North America returns to Austin, Texas, 13-17 April 2015 http://apachecon.com/ - Call for Papers open

Re: Compiling and running Lucene/Solr based on github does not seem to work

2014-12-05 Thread Chris Hostetter
For future questions about solr, please use solr-user@lucene ... : ant compile : ant test : : successfully. Also Jetty seems to startup fine, but when I access : : http://localhost:8983/solr/ : : then I receive ... Note the Instructions for Building Apache Solr from Source section

Re: How best to compare tow sentences

2014-12-04 Thread Chris Hostetter
: For a number of years I've been doing this for some time by creating a : RAMDirectory, creating a document for one of the sentence and then doing a : search using the other sentence and seeing if we get a good match. This has : worked reasonably well but since improving the performance of

Re: How to map lucene scores to range from 0~100?

2014-11-12 Thread Chris Hostetter
: I met a new trouble. In my system, we should score the doc range from 0 : to 100. There are some easy ways to map lucene scores to this scope. : Thanks for your help~ https://wiki.apache.org/lucene-java/ScoresAsPercentages -Hoss http://www.lucidworks.com/

Re: Dangerous reflection access to sun.misc.Cleaner by class org.apache.lucene.store.MMapDirectory$MMapIndexInput$1 detected!

2014-11-03 Thread Chris Hostetter
FYI: random googling for Dangerous reflection access indicates these are logged by TopSecurityManager in Netbeans random clicking on random messages in the Netbeans forums suggests: 1) these INFO messages are designed to only show up if you run with assertions on (evidently under the

Re: Getting min/max of numeric doc-values facets

2014-10-09 Thread Chris Hostetter
: Is there some way when faceted search is executed, we can retrieve the : possible min/max values of numeric doc-values field with supplied custom : ranges in (LongRangeFacetCounts) or some other way to do it ? : : As i believe this can give application hint, and next search request can be :

Re: Notifications of new Lucene-Releases

2014-10-06 Thread Chris Hostetter
: Lucene doesn't have a dedicated announce list; maybe subscribe to : Apache's announce list? But then you get announcements for all Apache : projects ... maybe add a mail filter ;) there's also the product info feeds which you can subscribe to...

Re: NOTICE: Seeking Moderators for java-user@lucene

2014-10-03 Thread Chris Hostetter
: After a few days (probably on friday?) i'll file an infra request to replace : all current moderators with the new list of volunteers. Thanks to all our volunteers, watch this jira to know when the change happens... https://issues.apache.org/jira/browse/INFRA-8429 -Hoss

NOTICE: Seeking Moderators for java-user@lucene

2014-09-30 Thread Chris Hostetter
Hey folks, I was on facation for the psat 7 days - 6 days ago someone sent an email directly to the java-user moderator list asking for subscription help and never got any response -- indicating that all of our other list moderators are either no longer active, or just happened to be on

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-05 Thread Chris Hostetter
To see about improving the error messages when users make mistakes like this... https://issues.apache.org/jira/browse/LUCENE-5926 -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail:

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-04 Thread Chris Hostetter
Odd ... the class org/tartarus/snowball/ext/CatalanStemmer.class should exist in the same jar as SnowballPorterFilterFactory, can you please confirm that you see it there? $ jar tf lucene-analyzers-common-4.6-SNAPSHOT.jar | grep CatalanStemmer org/tartarus/snowball/ext/CatalanStemmer.class

Re: Should .tip/.doc/.tii files be missing/deleted?

2014-09-03 Thread Chris Hostetter
: following files (I'm not listing all extensions) are deleted immediately : upon IndexWriter.close() being called: : : *.fdt, *.tip, *.tii, .*pos : : Only the following 5 files are left in all cases : _0.cfe : _0.cfs ...you're got the CompoundFileFormat configured, so each time a segment is

Re: Seeking Additional Moderator Volunteers for java-user@lucene

2014-07-29 Thread Chris Hostetter
On Wed, 23 Jul 2014, Yalamarthi, Vineel wrote: : Can I be volunteer too Vineel: sorry i didn't see your response until now. Thanks for volunteering by asfinfra already processed the request and now we've got plenty of moderators. (i think it was actauly processed before you even replied)

Seeking Additional Moderator Volunteers for java-user@lucene

2014-07-23 Thread Chris Hostetter
We're doing some housekeeping of the moderators of this list, and looking for any new folks that would like to volunteer. (we currently have 3 active moderators, 1-2 additional mods would be helpful for good coverage) If you'd like to volunteer to be a moderator, please reply back to this

Re: Seeking Additional Moderator Volunteers for java-user@lucene

2014-07-23 Thread Chris Hostetter
Thanks folks, plenty of new volunteers https://issues.apache.org/jira/browse/INFRA-8082 -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Different Scores For Same Query on Identical Index

2014-07-16 Thread Chris Hostetter
: I created an index with three documents, ran a query, and noted the scores. : Then I deleted one of the documents using IndexWriter.tryDeleteDocument, and : then re-added the exact same document. (I saved the Document in an instance : variable, so I couldn't have accidentally changed any of the

Re: IndexSearcher.doc thread safe problem

2014-07-09 Thread Chris Hostetter
: 4. Syncronized searcher.doc method call in multi-thread(like this: public : synchronized Document getValue( IndexSearcher searcher, int docId ) { : return searcher.doc( docId ); }) : == every execution is same. :but If I use this method, It is no difference with single thread :

Re: Query rewriting - caching rewritten quries

2014-07-02 Thread Chris Hostetter
: In the system which I develop I have to store many query objects in memory. : The system also receives documents. For each document MemoryIndex is : instantiated. I execute all stored queries on this MemoryIndex. I realized : that searching over MemoryIndex takes much time for query rewriting.

ANNOUNCE: ApacheCon deadlines: CFP June 25 / Travel Assistance Jul 25

2014-06-12 Thread Chris Hostetter
(NOTE: cross-posted announcement, please confine any replies to general@lucene) As you may be aware, ApacheCon will be held this year in Budapest, on November 17-23. (See http://apachecon.eu for more info.) ### ### 1 - Call For Papers - June 25 The CFP for the conference is still open, but

RE: will score get changed as document continuously added.

2014-06-11 Thread Chris Hostetter
: Yes the score will change, because the new documents change the : statistics. In general, scores cannot be seen as absolute numbers, they : are only useful to compare between search results of the exact same : query at the same index snapshot. They have no global meaning. This wiki page

Re: absence of searchAfter method with Collector parameter in Lucene IndexSearcher

2014-06-06 Thread Chris Hostetter
: I was wondering why there is no search method in lucene Indexsearcher to : search after last reference by passing collector. Say a method with : signature like searchAfter(Query query, ScoreDoc after, Collector results). searchAfter only makes sense if there is a Sort involved -- either

Re: is there a historical reason why default conjunction operator is OR?

2014-04-16 Thread Chris Hostetter
: I recently wondered, : why lucene's default conjunction operator is OR. : Is there a historical reason for that? The only 'default' is in the query parser -- if you construct the BooleanQueyr objects programatically you must always be explicit about the Occur property of each Clause. In

Re: Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Chris Hostetter
: Our runtime/search use-case is very simple: run filters to select all docs : that match some conditions specified in a filter query (we do not use : Lucene scoring) and return the first 100 docs that match (this is an : over-simplification) first as defined how? in order collected by a custom

ANNOUNCE: Lucene/Solr @ ApacheCon (Denver, April 7-9)

2014-02-27 Thread Chris Hostetter
(cross posted, please keep any replies to general@lucene) ApacheCon Denver is coming up and registration is currently open. In addition to a solid 3 day track of Lucene Solr related talks, there are also some post confrence events that are open to anyone even if you don't attend the

Re: Limiting the fields a user can query on

2014-02-19 Thread Chris Hostetter
: Is there a way to limit the fields a user can query by when using the : standard query parser or a way to get all fields/terms that make up a query : without writing custom code for each query subclass? limit in what way? do you want to throw a parse error if they give you

[REMINDER] ApacheCon NA 2014 Travel Assistance Applications Due Feb 7

2014-02-05 Thread Chris Hostetter
(NOTE: cross posted, if you feel the need to reply, please keep it on general@lucene) As a reminder, Travel Assistance Applications for ApacheCon NA 2014 are due on Feb 7th (about 48 hours from now) Details are below, please note that if you have any questions about this program or the

ANNOUNCE: ApacheCon NA 2014 Travel Assistance Applications now open!

2014-01-15 Thread Chris Hostetter
(Note: cross-posted to various lucene user lists, if you have replies please keep them on general@lucene, but pleast note that specific questions should be addressed to travel-assista...@apache.org) - - - Forwarded Announcement - - - The Travel Assistance Committee (TAC) are pleased

Re: Scanning through inverted index

2013-11-27 Thread Chris Hostetter
: The goal is to construct the iterator : : Iterator: term - [doc1, doc2, ...] That iterator already exists -- it's a DocsEnum. Erick's question is what your *end* goal is .. what are you attempting to do that you are asking about accessing a low level iterator over all thd docs that contain

ANNOUNCE: Stump The Chump @ Lucene Revolution EU - Tommorrow

2013-11-05 Thread Chris Hostetter
(Note: cross posted announcement, please confine any replies to solr-user) Hey folks, On Wednesday, I'll be doing a Stump The Chump session at Lucene Revolution EU in Dublin Ireland. http://lucenerevolution.org/stump-the-chump If you aren't familiar with Stump The Chump it is a QA style

ANNOUNCE: Lucene/Solr Revolution EU 2013 - Session List Early Bird Pricing

2013-09-24 Thread Chris Hostetter
(NOTE: cross-posted to various lists, please reply only to general@lucene w/ any questions or follow ups) Hey folks, 2 announcements regarding the upcoming Lucene/Solr Revolution EU 2013 in Dublin (November 4-7)... ## 1) Session List Now Posted I'd like to thank everyone who helped vote

ANNOUNCE: Lucene/Solr Revolution EU 2013: Registration Community Voting

2013-08-26 Thread Chris Hostetter
(NOTE: cross-posted to various lists, please reply only to general@lucene w/ any questions or follow ups) 2 Announcements folks should be aware of regarding the upcoming Lucene/Solr Revolution EU 2013 in Dublin... # 1) Registration Now Open Registration is now open for Lucene/Solr

Re: QueryParser for DisjunctionMaxQuery, et al.

2013-07-23 Thread Chris Hostetter
: Subject: QueryParser for DisjunctionMaxQuery, et al. : References: 1374578398714-4079673.p...@n3.nabble.com : In-Reply-To: 1374578398714-4079673.p...@n3.nabble.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing

ANNOUNCE: CFP Lucene/Solr Revolution EU 2013 (Deadline August 2nd)

2013-07-08 Thread Chris Hostetter
(NOTE: cross-posted to variuous lists, please reply only to general@lucene w/ any questions or follow ups) The Call for Papers for Lucene/Solr Revolution EU 2013 is currently open. http://www.lucenerevolution.org/2013/call-for-papers Lucene/Solr Revolution is the biggest open source

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Chris Hostetter
: I used solr to query the index, and verified that each document does have a : non-blank date field. I suspect that it's because the lucene-3.6 api I am : using can not read datefield correctly from documents written in lucene 1.4 : format. how did you verify that they all have a non-blank

Re: ERROR help me please ,org.apache.lucene.search.IndexSearcher.init(Ljava/lang/String;)V

2013-05-17 Thread Chris Hostetter
: Well IndexSearcher doesn't have a constructor that accepts a string, : maybe you should pass in an indexreader instead? speciically: the code you are trying to run was compiled against a version of lucene in which the IndexSearcher class had a constructor that accepted a single string

Re: Why does index boosting a field to 2.0f on a document have such a dramatic effect

2013-04-04 Thread Chris Hostetter
: At index time I boost the alias field of a small set of documents, setting the : boost to 2.0f, which I thought meant equivalent to doubling the score this doc : would get over another doc, everything else being equal. 1) you haven't shown us enough details to be certian, but based on the

Re: StandardAnalyzer class not present in Lucene 4.2.0

2013-03-25 Thread Chris Hostetter
: Thank you very much Arjen. I had to separately download and install the : jar. it was not present in my lucene installation directory. I had : downloaded the lucene zip file and ran the command ant after extracting : it. Did i miss anything.? if you download build lucene from source, then:

Re: Migrating SnowballAnalyzer to 4.1

2013-02-28 Thread Chris Hostetter
: Subject: Migrating SnowballAnalyzer to 4.1 : References: : calde+61dm8kdn7fj_r+kntecznpgpbjnehz_ehgpm72b-nm...@mail.gmail.com : cafgogc9rcu0n8nd9kaehl9-jdxytjlnsc2xk5goaf2hsbvf...@mail.gmail.com : In-Reply-To: : cafgogc9rcu0n8nd9kaehl9-jdxytjlnsc2xk5goaf2hsbvf...@mail.gmail.com

RE: Searching for keywords .net,c#,...

2013-02-26 Thread Chris Hostetter
: which seems to override incrementToken() ( guess as I don't know java ) : however using lucene.net 3.0.3, I can override Lucene.Net is a completely seperate project from Lucene, with it's own APIs, release cycles, and user community. Your best bet at getting help from people who are familiar

Re: ApacheCon meetup

2013-02-19 Thread Chris Hostetter
: Subject: ApacheCon meetup : : Any other Lucene/Solr enthusiasts attending ApacheCon in Portland next week? I won't make it to ApacheCon this year (first time in a long time actually) but I'm fairly certain there will be a Lucene MeetUp of some kind -- there always is. This is usually

Re: Large Index Query Help!

2013-01-29 Thread Chris Hostetter
: Subject: Large Index Query Help! : References: 1359429227142-4036943.p...@n3.nabble.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Chris Hostetter
: I keep getting an NPE when trying to add a Doc to an IndexWriter. I've : minimized my code to very basic code. what am I doing wrong? pseudo-code: can you post a full test that other people can run to try and reproduce? it doesn't even have to be a junit test -- just some complete javacode

Re: NPE when adding a Document to an IndexWriter

2013-01-09 Thread Chris Hostetter
: thanks for your reply. please see attached. I tried to maintain the : structure of the code that I need to use in the library I'm building. I think : it should work for you as long as you remove the package declaration at the : top. I can't currently try your code, but skimming through it

Re: Which token filter can combine 2 terms into 1?

2012-12-21 Thread Chris Hostetter
: Unfortunately, no...I am not combine every two term into one. I am : combining a specific pair. I'm confused ... you've already said that you expect you will need a custom filter because your usecase is very special -- and you haven't given us many details about exactly when/why/how you want

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Chris Hostetter
: I am confused with the ordering rule about SpanNearQuery. For example, I : indicate the slot in SpanNearQuery is 10. And the results are all the : qualified documents. Is it true that any document with shorter distance ... : it till uses tf-idf algorithm to rank the docs. Or there is

Re: Is there anything in Lucene 4.0 that provides 'absolute' scoring so that i can compare the scoring results of different searches ?

2012-10-25 Thread Chris Hostetter
https://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_filter_by_score.3F https://wiki.apache.org/lucene-java/ScoresAsPercentages The fundemental problem of attempting to compare scores for different searches is the same in your situation as in the goal of trying to normalize scores to a fixed

Re: short search terms

2012-09-26 Thread Chris Hostetter
: I have a key field that will only ever have a length of 3 characters. I am : using a StandardAnalyzer and a QueryParser to create the Query : (parser.parse(string)), and an IndexReader and IndexSearcher to execute the : query (searcher(query)). I can't seem to find a setter to allow for a 3 :

Re: Issue with documentation for org.apache.lucene.analysis.synonym.SynonymMap.Builder.add() method

2012-09-06 Thread Chris Hostetter
: Converted to U+000 by what, I wonder? Javadoc shouldn't be doing that. If : it does, I wonder if we need \\u instead? aparently it is... https://mail-archives.apache.org/mod_mbox/harmony-dev/200802.mbox/%3c47b2f7ae.2000...@gmail.com%3E -Hoss

RE: Seeking more moderators for java-user@lucene

2012-08-28 Thread Chris Hostetter
: I have tried multiple times to unsubscribe, and it never works. Could you unsubscribe me? Anyone having trouble unsubscribing should read the help page on the wiki and follow the instructions there if thye need more help...

Seeking more moderators for java-user@lucene

2012-08-27 Thread Chris Hostetter
Greetings subscribers to java-user@lucene. I've been offline for the past ~5 days, and when i looked at my email again this morning I found a message to java-user@lucene sitting in the moderator queue since Aug 22nd. Messages sitting in the queue that long are a good indication that we

[ANNOUNCE] Lucene/Solr @ ApacheCon Europe - August 13th Deadline for CFP and Travel Assistance applications

2012-08-06 Thread Chris Hostetter
ApacheCon Europe will be happening 5-8 November 2012 in Sinsheim, Germany at the Rhein-Neckar-Arena. Early bird tickets go on sale this Monday, 6 August. http://www.apachecon.eu/ The Lucene/Solr track is shaping up to be quite impressive this year, so make your plans to attend

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Chris Hostetter
: What is the sense of removing the Index from the API Javadoc for Lucene and Solr? It was heavily bloating the size of the releases... https://issues.apache.org/jira/browse/LUCENE-3977 It's pretty easy to turn this back on and rebuild the docs locally. Feel free to open a jira and submit a

RE: How to unsubscribe from this list?

2012-06-25 Thread Chris Hostetter
G.Long: I'm Replying to list so this info is visibilt to anyone who is curious, but if you have specific followup questions, please reply to java-user-owner@lucene ... : Thanks. I tried this but it did not work so asking :). 1) sending an unsubscribe request will trigger an automated response

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Chris Hostetter
: Subject: need to find locations of query hits in doc: works fine for regular : text but not for phone numbers : Message-ID: a57498edec10c64781ea0f7dba665cef264de...@ex2010mb01-1.caci.com : References: 1339635547170-3989548.p...@n3.nabble.com : In-Reply-To:

Re: Bizarre Search order request

2012-05-25 Thread Chris Hostetter
: For example, if I display of 20 results, I might want to limit it to a : maximum of 10 mail, 10 blog and 10 website documents. Which ones : get displayed and how they were ordered would depend on the normal : relevancy ranking, but, for example, once I had 10 mail objects to : display on

Re: old fashioned.....Too many open files!

2012-05-18 Thread Chris Hostetter
: the point is that I keep the readers open to share them across search. Is : this wrong? your goal is fine, but where in your code do you think you are doing that? I don't see any readers ever being shared. You open new ones (which are never closed) in every call to getSearcher() :

Re: Repeatability of results

2012-04-04 Thread Chris Hostetter
: OK this could make sense (floating point math is frustrating!). : : But, Lucene generally scores one document at a time, so in theory just : changing its docid shouldn't alter the order of float operations. i haven't thought this through, but couldn't scorer re-ordering in BooleanScorer2

RE: SweetSpotSimilarity

2012-02-28 Thread Chris Hostetter
: A picture -- or more precisely a graph -- would be worth a 1000 words. fair enough. I think the reason i never committed one initially was because the formula in the javadocs was trivial to plot in gnuplot... gnuplot min=0 gnuplot max=2 gnuplot base=1.3 gnuplot xoffset=10 gnuplot set

RE: SweetSpotSimilarity

2012-02-28 Thread Chris Hostetter
: i'll try to get some graphs commited and linked to from the javadocs that : make it more clear how tweaking the settings affect the formula http://svn.apache.org/viewvc?rev=1294920view=rev -Hoss - To unsubscribe, e-mail:

  1   2   3   4   5   6   7   8   9   10   >