synonym filter and offsets

2010-04-19 Thread Joe Calderon
hello *, im having issues with the synonym filter altering token offsets, my input text is saturday night live its is tokenized by the whitespace tokenizer yielding 3 tokens [saturday, 0,8], [night, 9, 14], [live, 15,19] on indexing these are passed through a synonym filter that has this line

Re: Field Collapsing: How to estimate total number of hits

2010-05-12 Thread Joe Calderon
dont know if its the best solution but i have a field i facet on called type its either 0,1, combined with collapse.facet=before i just sum all the values of the facet field to get the total number found if you dont have such a field u can always add a field with a single value --joe On Wed,

Re: how to have shards parameter by default

2010-06-10 Thread Joe Calderon
youve created an infinite loop, the shard you query calls all other shards and itself and so on, create a separate requestHandler and query that, ex requestHandler name=/distributed_select class=solr.SearchHandler lst name=defaults str

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon
the qs parameter affects matching , but you have to wrap your query in double quotes,ex q=oil spillqf=title descriptionqs=4defType=dismax im not sure how to formulate such a query to apply that rule just to description, maybe with nested queries ... On Thu, Jun 17, 2010 at 12:01 PM, Blargy

Re: Exact match on a filter

2010-06-17 Thread Joe Calderon
use a copyField and index the copy as type string, exact matches on that field should then work as the text wont be tokenized On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski pchudykow...@shopzilla.com wrote: Hi, I'm trying with no luck to filter on the exact-match value of a field.

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon
see yonik's post on nested queries http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ so for example i thought you could possibly do a dismax query across the main fields (in this case just title) and OR that with _query_:{!description:'oil spill'~4} On Thu, Jun 17, 2010 at

Re: federated / meta search

2010-06-17 Thread Joe Calderon
yes, you can use distributed search across shards with different schemas as long as the query only references overlapping fields, i usually test adding new fields or tokenizers on one shard and deploy only after i verified its working properly On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma

Re: SOLR partial string matching question

2010-06-22 Thread Joe Calderon
you want a combination of WhitespaceTokenizer and EdgeNGramFilter http://lucene.apache.org/solr/api/org/apache/solr/analysis/WhitespaceTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/EdgeNGramFilterFactory.html the first will create tokens for each word the second

Re: preside != president

2010-06-28 Thread Joe Calderon
the general consensus among people who run into the problem you have is to use a plurals only stemmer, a synonyms file or a combination of both (for irregular nouns etc) if you search the archives you can find info on a plurals stemmer On Mon, Jun 28, 2010 at 6:49 AM, dar...@ontrenet.com wrote:

Re: Strange query behavior

2010-06-28 Thread Joe Calderon
splitOnCaseChange is creating multiple tokens from 3dsMax disable it or enable catenateAll, use the analysys page in the admin tool to see exactly how your text will be indexed by analyzers without having to reindex your documents, once you have it right you can do a full reindex. On Mon, Jun 28,

Re: questions about Solr shards

2010-06-28 Thread Joe Calderon
there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata. On Sun, Jun 27, 2010 at 7:32 PM, Babak

Re: dismax request handler without q

2010-07-20 Thread Joe Calderon
try something like this: q.alt=*:*fq=keyphrase:hotel though if you dont need to query across multiple fields, dismax is probably not the best choice On Tue, Jul 20, 2010 at 4:57 AM, olivier sallou olivier.sal...@gmail.com wrote: q will search in defaultSearchField if no field name is set, but

Re: Solr 1.4 release candidate

2009-10-14 Thread Joe Calderon
maybe im just not familiar with the way the version numbers works in trunk but when i build the latest nightly the jars have names like *-1.5-dev.jar, is that normal? On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: Folks, we've been in code freeze since Monday

how to get field contents out of Document object

2009-10-14 Thread Joe Calderon
hello *, sorry if this seems like a dumb question, im still fairly new to working with lucene/solr internals. given a Document object, what is the proper way to fetch an integer value for a field called num_in_stock, it is both indexed and stored thx much --joe

lucene 2.9 bug

2009-10-16 Thread Joe Calderon
hello * , ive read in other threads that lucene 2.9 had a serious bug in it, hence trunk moved to 2.9.1 dev, im wondering what the bug is as ive been using the 2.9.0 version for the past weeks with no problems, is it critical to upgrade? --joe

max words/tokens

2009-10-20 Thread Joe Calderon
i have a pretty basic question, is there an existing analyzer that limits the number of words/tokens indexed from a field? let say i only wanted to index the top 25 words... thx much --joe

Re: max words/tokens

2009-10-20 Thread Joe Calderon
cool np, i just didnt want to duplicate code if that already existed. On Tue, Oct 20, 2009 at 12:49 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Oct 20, 2009 at 1:53 PM, Joe Calderon calderon@gmail.com wrote: i have a pretty basic question, is there an existing analyzer

boostQParser and dismax

2009-10-22 Thread Joe Calderon
hello *, i was just reading over the wiki function query page and found this little gem for boosting recent docs thats much better than what i was doing before recip(ms(NOW,mydatefield),3.16e-11,1,1) my question is, at the bottom it says The most effective way to use such a boost is to multiply

field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-23 Thread Joe Calderon
seems to happen when sort on anything besides strictly score, even score desc, num desc triggers it, using latest nightly and 10/14 patch Problem accessing /solr/core1/select. Reason: 4731592 java.lang.ArrayIndexOutOfBoundsException: 4731592 at

profiling solr

2009-10-26 Thread Joe Calderon
as a curiosity ide like to use a profiler to see where within solr queries spend most of their time, im curious what tools if any others use for this type of task.. im using jetty as my servlet container so ideally ide like a profiler thats compatible with it --joe

field collapsing exception

2009-10-26 Thread Joe Calderon
found another exception, i cant find specific steps to reproduce besides starting with an unfiltered result and then given an int field with values (1,2,3) filtering by 3 triggers it sometimes, this is in an index with very frequent updates and deletes --joe java.lang.NullPointerException

tokenize after filters

2009-11-02 Thread Joe Calderon
is it possible to tokenize a field on whitespace after some filters have been applied: ex: A + W Root Beer the field uses a keyword tokenizer to keep the string together, then it will get converted to aw root beer by a custom filter ive made, i now want to split that up into 3 tokens (aw, root,

Re: apply a patch on solr

2009-11-03 Thread Joe Calderon
patch -p0 /path/to/field-collapse-5.patch On Tue, Nov 3, 2009 at 7:48 PM, michael8 mich...@saracatech.com wrote: Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch for SOLR-236 and each file listed in the patch has its own revision #. E.g. from

wildcard oddity

2009-12-15 Thread Joe Calderon
im trying to do a wild card search q:item_title:(gets*)returns no results q:item_title:(gets)returns results q:item_title:(get*)returns results seems like * at the end of a token is requiring a character, instead of being 0 or more its acting like1 or more the text im trying to

Re: SOLR Performance Tuning: Pagination

2009-12-24 Thread Joe Calderon
fwiw, when implementing distributed search i ran into a similar problem, but then i noticed even google doesnt let you go past page 1000, easier to just set a limit on start On Thu, Dec 24, 2009 at 8:36 AM, Walter Underwood wun...@wunderwood.org wrote: When do users do a query like that?

boosting on string distance

2009-12-29 Thread Joe Calderon
hello *, i want to boost documents that match the query better, currently i also index my field as a string an boost if i match the string field but im wondering if its possible to boost with bf parameter with a formula using the function strdist(), i know one of the columns would be the field

score = result of function query

2009-12-30 Thread Joe Calderon
how can i make the score be solely the output of a function query? the function query wiki page details something like q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score but that doesnt seems to work --joe

analyzer type=query with NGramTokenFilterFactory forces phrase query

2009-12-31 Thread Joe Calderon
Hello *, im trying to make an index to support spelling errors/fuzzy matching, ive indexed my document titles with NGramFilterFactory minGramSize=2 maxGramSize=3, using the analysis page i can see the common grams match between the indexed value and the query value, however when i try to do a

Re: analyzer type=query with NGramTokenFilterFactory forces phrase query

2009-12-31 Thread Joe Calderon
if this is the expected behaviour is there a way to override it?[1] [1] me On Thu, Dec 31, 2009 at 10:13 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hello *, im trying to make an index to support spelling errors/fuzzy matching, ive indexed my document titles with NGramFilterFactory

custom wildcarding in qparser

2010-01-08 Thread Joe Calderon
hello *, what do i need to do to make a query parser that works just like the standard query parser but also runs analyzers/tokenizers on a wildcarded term, specifically im looking to only wildcarding the last token ive tried the edismax qparser and the prefix qparser and neither is exactly what

help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
hello *, im looking for help on writing queries to implement a few business rules. 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser currently but i want to exclude any results that only match against a field called

Re: help implementing a couple of business rules

2010-01-11 Thread Joe Calderon
matches sorry if i was unclear --joe On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher erik.hatc...@gmail.com wrote: On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote: 1. given a set of fields how to return matches that match across them but not just one specific one, ex im using a dismax parser

Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-11 Thread Joe Calderon
it seems to be in flux right now as the solr developers slowly make improvements and ingest the various pieces into the solr trunk, i think your best bet might be to use the 12/24 patch and fix any errors where it doesnt apply cleanly im using solr trunk r892336 with the 12/24 patch --joe

Re: question about date boosting

2010-01-12 Thread Joe Calderon
I think you need to use the new trieDateField On 01/12/2010 07:06 PM, Daniel Higginbotham wrote: Hello, I'm trying to boost results based on date using the first example here:http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents However, I'm getting an

Re: Field collapsing patch error

2010-01-19 Thread Joe Calderon
this has come up before, my suggestions would be to use the 12/24 patch with trunk revision 892336 http://www.lucidimagination.com/search/document/797549d29e1810d9/solr_1_4_field_collapsing_what_are_the_steps_for_applying_the_solr_236_patch 2010/1/19 Licinio Fernández Maurelo

create requesthandler with default shard parameter for different query parser

2010-01-21 Thread Joe Calderon
hello *, what is the best way to create a requesthandler for distributed search with a default shards parameter but that can use different query parsers thus far i have requestHandler name=/ds class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults

Re: create requesthandler with default shard parameter for different query parser

2010-01-21 Thread Joe Calderon
the main reason im creating the new request handler, or do i put them all as defaults under my new request handler and let the query parser use whichever ones it supports? On Thu, Jan 21, 2010 at 11:45 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jan 21, 2010 at 2:39 PM, Joe Calderon

Re: index of facet fields are not same as original string

2010-01-28 Thread Joe Calderon
facets are based off the indexed version of your string nor the stored version, you probably have an analyzer thats removing punctuation, most people index the same field multiple ways for different purposes, matching. storting, faceting etc... index a copy of your field as string type and facet

distributed search and failed core

2010-01-29 Thread Joe Calderon
hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid the error and return the results from the shards that are still up? thx much --joe

Re: Basic indexing question

2010-02-02 Thread Joe Calderon
by default solr will only search the default fields, you have to either query all fields field1:(ore) or field2:(ore) or field3:(ore) or use a different query parser like dismax On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric sma...@ntlworld.com wrote: I have got a basic configuration of Solr up

Re: Basic indexing question

2010-02-02 Thread Joe Calderon
associated information into a presentable screen anyhow - so I'm not too worried about info being returned by Solr as such) Would that be a reasonable way of using Solr -Original Message- From: Joe Calderon [mailto:calderon@gmail.com] Sent: 02 February 2010 23:42 To: solr-user

Re: distributed search and failed core

2010-02-03 Thread Joe Calderon
a shard has failed On Wed, Feb 3, 2010 at 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon calderon@gmail.com wrote: hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid

fuzzy matching / configurable distance function?

2010-02-04 Thread Joe Calderon
is it possible to configure the distance formula used by fuzzy matching? i see there are other under the function query page under strdist but im wondering if they are applicable to fuzzy matching thx much --joe

source tree for lucene

2010-02-04 Thread Joe Calderon
i want to recompile lucene with http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure which source tree to use, i tried using the implied trunk revision from the admin/system page but solr fails to build with the generated jars, even if i exclude the patches from 2230... im wondering

old wildcard highlighting behaviour

2010-02-05 Thread Joe Calderon
hello *, currently with hl.usePhraseHighlighter=true, a query for (joe jack*) will highlight emjoe jackson/em, however after reading the archives, what im looking for is the old 1.1 behaviour so that only emjoe jack/em is highlighted, is this possible in solr 1.5 ? thx much --joe

Re: old wildcard highlighting behaviour

2010-02-06 Thread Joe Calderon
...@gmail.com wrote: On iPhone so don't remember exact param I named it, but check wiki - something like hl.highlightMultiTerm - set it to false. - Mark http://www.lucidimagination.com (mobile) On Feb 6, 2010, at 12:00 AM, Joe Calderon calderon@gmail.com wrote: hello *, currently

analysing wild carded terms

2010-02-09 Thread Joe Calderon
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis?

Re: analysing wild carded terms

2010-02-10 Thread Joe Calderon
sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi f...@efendi.ca wrote: hello *, quick question,

Re: How to reindex data without restarting server

2010-02-11 Thread Joe Calderon
if you use the core model via solr.xml you can reload a core without having to to restart the servlet container, http://wiki.apache.org/solr/CoreAdmin On 02/11/2010 02:40 PM, Emad Mushtaq wrote: Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets

reloading sharedlib folder

2010-02-12 Thread Joe Calderon
when using solr.xml, you can specify a sharedlib directory to share among cores, is it possible to reload the classes in this dir without having to restart the servlet container? it would be useful to be able to make changes to those classes on the fly or be able to drop in new plugins

problem with edgengramtokenfilter and highlighter

2010-02-13 Thread Joe Calderon
i ran into a problem while using the edgengramtokenfilter, it seems to report incorrect offsets when generating tokens, more specifically all the tokens have offset 0 and term length as start and end, this leads to goofy highlighting behavior when creating edge grams for tokens beyond the first

Re: problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Joe Calderon
lucene-2266 filed and patch posted. On 02/13/2010 09:14 PM, Robert Muir wrote: Joe, can you open a Lucene JIRA issue for this? I just glanced at the code and it looks like a bug to me. On Sun, Feb 14, 2010 at 12:07 AM, Joe Calderoncalderon@gmail.comwrote: i ran into a problem while

Re: defaultSearchField and DisMaxRequestHandler

2010-02-15 Thread Joe Calderon
no but you can set a default for the qf parameter with the same value On 02/15/2010 01:50 AM, Steve Radhouani wrote: Hi there, Can thedefaultSearchField option be used by the DisMaxRequestHandler? Thanks, -Steve

Re: Reindex after changing defaultSearchField?

2010-02-17 Thread Joe Calderon
no, youre just changing how your querying the index, not the actual index, you will need to restart the servlet container or reload the core for the config changes to take effect tho On 02/17/2010 10:04 AM, Frederico Azeiteiro wrote: Hi, If i change the defaultSearchField in the core

Re: including 'the' dismax query kills results

2010-02-18 Thread Joe Calderon
use the common grams filter, itll create tokens for stop words and their adjacent terms On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search The British Open,

Re: Autosuggest/Autocomplete with solr 1.4 and EdgeNGrams

2010-02-24 Thread Joe Calderon
i had to create a autosuggest implementation not too long ago, originally i was using faceting, where i would match wildcards on a tokenized field and facet on an unaltered field, this had the advantage that i could do everything from one index, though it was also limited by the fact suggestions

Re: Changing term frequency according to value of one of the fields

2010-02-26 Thread Joe Calderon
extend the similarity class, compile it against the jars in lib, put in a path solr can find and set your schema to use it http://wiki.apache.org/solr/SolrPlugins#Similarity On 02/25/2010 10:09 PM, Pooja Verlani wrote: Hi, I want to modify Similarity class for my app like the following- Right

Re: Solr 1.4 distributed search configuration

2010-02-26 Thread Joe Calderon
you can set a default shard parameter on the request handler doing distributed search, you can set up two different request handlers one with shards default and one without On Thu, Feb 25, 2010 at 1:35 PM, Jeffrey Zhao jeffrey.z...@metalogic-inc.com wrote: Now I got it, just forgot put qt=search

Re: Search Result differences Standard vs DisMax

2010-03-01 Thread Joe Calderon
what are you using for the mm parameter? if you set it to 1 only one word has to match, On 03/01/2010 05:07 PM, Steve Reichgut wrote: ***Sorry if this was sent twice. I had connection problems here and it didn't look like the first time it went out I have been testing out results for some

Re: Issue on stopword list

2010-03-02 Thread Joe Calderon
or you can try the commongrams filter that combines tokens next to a stopword On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote: Don't remove stopwords if you want to search on them. --wunder On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote: This is a classic

Re: indexing a huge data

2010-03-05 Thread Joe Calderon
ive found the csv update to be exceptionally fast, though others enjoy the flexibility of the data import handler On Fri, Mar 5, 2010 at 10:21 AM, Mark N nipen.m...@gmail.com wrote: what should be the fastest way to index a documents , I am indexing huge collection of data after extracting

Re: Highlighting

2010-03-09 Thread Joe Calderon
did u enable the highlighting component in solrconfig.xml? try setting debugQuery=true to see if the highlighting component is even being called... On Tue, Mar 9, 2010 at 12:23 PM, Lee Smith l...@weblee.co.uk wrote: Hey All I have indexed a whole bunch of documents and now I want to search

Re: Highlighting

2010-03-10 Thread Joe Calderon
just to make sure were on the same page, youre saying that the highlight section of the response is empty right? the results section is never highlighted but a separate section contains the highlighted fields specified in hl.fl= On Wed, Mar 10, 2010 at 5:23 AM, Ahmet Arslan iori...@yahoo.com

Re: Highlighting

2010-03-10 Thread Joe Calderon
with the query. But from what I believe it should wrap em/ around the text in the result. So if I search ie Andrew  within the return content Ie would have the contents with the word emAndrew/em and hl.fl=attr_content Thank you for you help Begin forwarded message: From: Joe Calderon calderon

Re: Need help in deploying the modified SOLR source code

2010-03-12 Thread Joe Calderon
do `ant clean dist` within the solr source and use the resulting war file, though in the future you might think about extending the built in parser and creating a parser plugin rather than modifying the actual sources see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin for more info

how to create this highlighter behaviour

2010-03-29 Thread Joe Calderon
hello *, ive been using the highlighter and been pretty happy with its results, however theres an edge case im not sure how to fix for query: amazing grace the record matched and highlighted is emamazing/em rendition of emamazing grace/em is there any way to only highlight amazing grace

highlighter issue

2010-04-02 Thread Joe Calderon
hello *, i have a field that is indexing the string the ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed to the edgengram filter, this allows me to match different user spellings and allows for partial highlighting, however a token like 'ex' would get

Re: highlighter issue

2010-04-02 Thread Joe Calderon
wrote: Will adding the RemoveDuplicatesTokenFilter(Factory) do the trick here?        Erik On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote: hello *, i have a field that is indexing the string the ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed

Re: dealing with duplicates

2009-08-01 Thread Joe Calderon
: Joe Calderon calderon@gmail.com To: solr-user@lucene.apache.org Sent: Friday, July 31, 2009 5:06:48 PM Subject: dealing with duplicates hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been clustered with a simple algorithm, i

concurrent csv loading

2009-08-06 Thread Joe Calderon
for first time loads i currently post to /update/csv?commit=falseseparator=%09escape=\stream.file=workfile.txtmap=NULL:keepEmpty=false, this works well and finishes in about 20 minutes for my work load. this is mostly cpu bound, i have an 8 core box and it seems one takes the brunt of the work.

Re: dealing with duplicates

2009-08-10 Thread Joe Calderon
- Original Message From: Joe Calderon calderon@gmail.com To: solr-user@lucene.apache.org Sent: Friday, July 31, 2009 5:06:48 PM Subject: dealing with duplicates hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been

where to get solr 1.4 nightly

2009-08-20 Thread Joe Calderon
i want to try out the improvements in 1.4 but the nightly site is down http://people.apache.org/builds/lucene/solr/nightly/ is there a mirror for nightlies? --joe

shingle filter

2009-08-24 Thread Joe Calderon
hello *, im currently faceting on a shingled field to obtain popular phrases and its working well, however ide like to limit the number of shingles that get created, the solr.ShingleFilterFactory supports maxShingleSize, can it be made to support a minimum as well? can someone point me in the

extended documentation on analyzers

2009-08-27 Thread Joe Calderon
is there an online resource or a book that contains a thorough list of tokenizers and filters available and their functionality? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters is very helpful but i would like to go through additional filters to make sure im not reinventing the wheel

Re: Responses getting truncated

2009-08-28 Thread Joe Calderon
i had a similar issue with text from past requests showing up, this was on 1.3 nightly, i switched to using the lucid build of 1.3 and the problem went away, im using a nightly of 1.4 right now also without probs, then again your mileage may vary as i also made a bunch of schema changes that

Re: Responses getting truncated

2009-08-28 Thread Joe Calderon
yonik has a point, when i ran into this i also upgraded to the latest stable jetty, im using jetty 6.1.18 On 08/28/2009 04:07 PM, Rupert Fiasco wrote: I deployed LucidWorks with my existing solrconfig / schema and re-indexed my data into it and pushed it out to production, we'll see how it

score = sum of boosts

2009-09-02 Thread Joe Calderon
hello *, what would be the best approach to return the sum of boosts as the score? ex: a dismax handler boosts matches to field1^100 and field2^50, a query matches both fields hence the score for that row would be 150 is this something i could do with a function query or do i need to hack up

stemming plurals

2009-09-04 Thread Joe Calderon
i saw some post regarding stemming plurals in the archives from 2008, i was wondering if this was ever integrated or if custom hackery is still needed, is there something like a stemplurals analyzer is the kstemmer the closest thing? thx much --joe

Re: Geographic clustering

2009-09-08 Thread Joe Calderon
there are clustering libraries like http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have bindings to perl/python, you can preprocess your results and create clusters for each zoom level On Tue, Sep 8, 2009 at 8:08 AM, gwkg...@eyefi.nl wrote: Hi, I just completed a simple

help with solr.PatternTokenizerFactory

2009-09-09 Thread Joe Calderon
hello *, im not sure what im doing wrong i have this field defined in schema.xml, using admin/analysis.jsp its working as expected, fieldType name=text_spell class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer

query parser question

2009-09-10 Thread Joe Calderon
i have field called text_stem that has a kstemmer on it, im having trouble matching wildcard searches on a word that got stemmed for example i index the word america's, which according to analysis.jsp after stemming gets indexed as america when matching i do a query like myfield:(ame*) which

Re: KStem download

2009-09-14 Thread Joe Calderon
is the source for the lucid kstemmer available ? from the lucid solr package i only found the compiled jars On Mon, Sep 14, 2009 at 11:04 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote: Pascal Dimassimo wrote: Hi, I

field collapsing sums

2009-09-30 Thread Joe Calderon
hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support

changing dismax parser to not treat symbols differently

2009-09-30 Thread Joe Calderon
how would i go about modifying the dismax parser to treat +/- as regular text?

Re: field collapsing sums

2009-10-01 Thread Joe Calderon
efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go

Re: field collapsing sums

2009-10-01 Thread Joe Calderon
thx for the reply, i just want the number of dupes in the query result, but it seems i dont get the correct totals, for example a non collapsed dismax query for belgian beer returns X number results but when i collapse and sum the number of docs under collapse_counts, its much less than X it

JVM OOM when using field collapse component

2009-10-01 Thread Joe Calderon
i gotten two different out of memory errors while using the field collapsing component, using the latest patch (2009-09-26) and the latest nightly, has anyone else encountered similar problems? my collection is 5 million results but ive gotten the error collapsing as little as a few thousand

Re: JVM OOM when using field collapse component

2009-10-02 Thread Joe Calderon
of a few million. Martijn 2009/10/2 Joe Calderon calderon@gmail.com: i gotten two different out of memory errors while using the field collapsing component, using the latest patch (2009-09-26) and the latest nightly, has anyone else encountered similar problems? my collection is 5

Re: stats page slow in latest nightly

2009-10-06 Thread Joe Calderon
when Hoss made my life easier with his simpler patch. Yonik Seeley wrote: Might be the new Lucene fieldCache stats stuff that was recently added? -Yonik http://www.lucidimagination.com On Tue, Oct 6, 2009 at 3:56 PM, Joe Calderon calderon@gmail.com wrote: hello *, ive been noticing

concatenating tokens

2009-10-08 Thread Joe Calderon
hello *, im using a combination of tokenizers and filters that give me the desired tokens, however for a particular field i want to concatenate these tokens back to a single string, is there a filter to do that, if not what are the steps needed to make my own filter to concatenate tokens? for