hello *, im having issues with the synonym filter altering token offsets,
my input text is
saturday night live
its is tokenized by the whitespace tokenizer yielding 3 tokens
[saturday, 0,8], [night, 9, 14], [live, 15,19]
on indexing these are passed through a synonym filter that has this line
dont know if its the best solution but i have a field i facet on
called type its either 0,1, combined with collapse.facet=before i just
sum all the values of the facet field to get the total number found
if you dont have such a field u can always add a field with a single value
--joe
On Wed,
youve created an infinite loop, the shard you query calls all other
shards and itself and so on, create a separate requestHandler and
query that, ex
requestHandler name=/distributed_select class=solr.SearchHandler
lst name=defaults
str
the qs parameter affects matching , but you have to wrap your query in
double quotes,ex
q=oil spillqf=title descriptionqs=4defType=dismax
im not sure how to formulate such a query to apply that rule just to
description, maybe with nested queries ...
On Thu, Jun 17, 2010 at 12:01 PM, Blargy
use a copyField and index the copy as type string, exact matches on
that field should then work as the text wont be tokenized
On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski
pchudykow...@shopzilla.com wrote:
Hi,
I'm trying with no luck to filter on the exact-match value of a field.
see yonik's post on nested queries
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
so for example i thought you could possibly do a dismax query across
the main fields (in this case just title) and OR that with
_query_:{!description:'oil spill'~4}
On Thu, Jun 17, 2010 at
yes, you can use distributed search across shards with different
schemas as long as the query only references overlapping fields, i
usually test adding new fields or tokenizers on one shard and deploy
only after i verified its working properly
On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma
you want a combination of WhitespaceTokenizer and EdgeNGramFilter
http://lucene.apache.org/solr/api/org/apache/solr/analysis/WhitespaceTokenizerFactory.html
http://lucene.apache.org/solr/api/org/apache/solr/analysis/EdgeNGramFilterFactory.html
the first will create tokens for each word the second
the general consensus among people who run into the problem you have
is to use a plurals only stemmer, a synonyms file or a combination of
both (for irregular nouns etc)
if you search the archives you can find info on a plurals stemmer
On Mon, Jun 28, 2010 at 6:49 AM, dar...@ontrenet.com wrote:
splitOnCaseChange is creating multiple tokens from 3dsMax disable it
or enable catenateAll, use the analysys page in the admin tool to see
exactly how your text will be indexed by analyzers without having to
reindex your documents, once you have it right you can do a full
reindex.
On Mon, Jun 28,
there is a first pass query to retrieve all matching document ids from
every shard along with relevant sorting information, the document ids
are then sorted and limited to the amount needed, then a second query
is sent for the rest of the documents metadata.
On Sun, Jun 27, 2010 at 7:32 PM, Babak
try something like this:
q.alt=*:*fq=keyphrase:hotel
though if you dont need to query across multiple fields, dismax is
probably not the best choice
On Tue, Jul 20, 2010 at 4:57 AM, olivier sallou
olivier.sal...@gmail.com wrote:
q will search in defaultSearchField if no field name is set, but
maybe im just not familiar with the way the version numbers works in
trunk but when i build the latest nightly the jars have names like
*-1.5-dev.jar, is that normal?
On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
Folks, we've been in code freeze since Monday
hello *, sorry if this seems like a dumb question, im still fairly new
to working with lucene/solr internals.
given a Document object, what is the proper way to fetch an integer
value for a field called num_in_stock, it is both indexed and stored
thx much
--joe
hello * , ive read in other threads that lucene 2.9 had a serious bug
in it, hence trunk moved to 2.9.1 dev, im wondering what the bug is as
ive been using the 2.9.0 version for the past weeks with no problems,
is it critical to upgrade?
--joe
i have a pretty basic question, is there an existing analyzer that
limits the number of words/tokens indexed from a field? let say i only
wanted to index the top 25 words...
thx much
--joe
cool np, i just didnt want to duplicate code if that already existed.
On Tue, Oct 20, 2009 at 12:49 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Tue, Oct 20, 2009 at 1:53 PM, Joe Calderon calderon@gmail.com wrote:
i have a pretty basic question, is there an existing analyzer
hello *, i was just reading over the wiki function query page and
found this little gem for boosting recent docs thats much better than
what i was doing before
recip(ms(NOW,mydatefield),3.16e-11,1,1)
my question is, at the bottom it says
The most effective way to use such a boost is to multiply
seems to happen when sort on anything besides strictly score, even
score desc, num desc triggers it, using latest nightly and 10/14 patch
Problem accessing /solr/core1/select. Reason:
4731592
java.lang.ArrayIndexOutOfBoundsException: 4731592
at
as a curiosity ide like to use a profiler to see where within solr
queries spend most of their time, im curious what tools if any others
use for this type of task..
im using jetty as my servlet container so ideally ide like a profiler
thats compatible with it
--joe
found another exception, i cant find specific steps to reproduce
besides starting with an unfiltered result and then given an int field
with values (1,2,3) filtering by 3 triggers it sometimes, this is in
an index with very frequent updates and deletes
--joe
java.lang.NullPointerException
is it possible to tokenize a field on whitespace after some filters
have been applied:
ex: A + W Root Beer
the field uses a keyword tokenizer to keep the string together, then
it will get converted to aw root beer by a custom filter ive made, i
now want to split that up into 3 tokens (aw, root,
patch -p0 /path/to/field-collapse-5.patch
On Tue, Nov 3, 2009 at 7:48 PM, michael8 mich...@saracatech.com wrote:
Hmmm, perhaps I jumped the gun. I just looked over the field collapse patch
for SOLR-236 and each file listed in the patch has its own revision #.
E.g. from
im trying to do a wild card search
q:item_title:(gets*)returns no results
q:item_title:(gets)returns results
q:item_title:(get*)returns results
seems like * at the end of a token is requiring a character, instead
of being 0 or more its acting like1 or more
the text im trying to
fwiw, when implementing distributed search i ran into a similar
problem, but then i noticed even google doesnt let you go past page
1000, easier to just set a limit on start
On Thu, Dec 24, 2009 at 8:36 AM, Walter Underwood wun...@wunderwood.org wrote:
When do users do a query like that?
hello *, i want to boost documents that match the query better,
currently i also index my field as a string an boost if i match the
string field
but im wondering if its possible to boost with bf parameter with a
formula using the function strdist(), i know one of the columns would
be the field
how can i make the score be solely the output of a function query?
the function query wiki page details something like
q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score
but that doesnt seems to work
--joe
Hello *, im trying to make an index to support spelling errors/fuzzy
matching, ive indexed my document titles with NGramFilterFactory
minGramSize=2 maxGramSize=3, using the analysis page i can see the
common grams match between the indexed value and the query value,
however when i try to do a
if this is the expected behaviour is there a way to override it?[1]
[1] me
On Thu, Dec 31, 2009 at 10:13 AM, AHMET ARSLAN iori...@yahoo.com wrote:
Hello *, im trying to make an index
to support spelling errors/fuzzy
matching, ive indexed my document titles with
NGramFilterFactory
hello *, what do i need to do to make a query parser that works just
like the standard query parser but also runs analyzers/tokenizers on a
wildcarded term, specifically im looking to only wildcarding the last
token
ive tried the edismax qparser and the prefix qparser and neither is
exactly what
hello *, im looking for help on writing queries to implement a few
business rules.
1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser currently
but i want to exclude any results that only match against a field
called
matches
sorry if i was unclear
--joe
On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher erik.hatc...@gmail.com wrote:
On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote:
1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser
it seems to be in flux right now as the solr developers slowly make
improvements and ingest the various pieces into the solr trunk, i think
your best bet might be to use the 12/24 patch and fix any errors where
it doesnt apply cleanly
im using solr trunk r892336 with the 12/24 patch
--joe
I think you need to use the new trieDateField
On 01/12/2010 07:06 PM, Daniel Higginbotham wrote:
Hello,
I'm trying to boost results based on date using the first example
here:http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
However, I'm getting an
this has come up before, my suggestions would be to use the 12/24
patch with trunk revision 892336
http://www.lucidimagination.com/search/document/797549d29e1810d9/solr_1_4_field_collapsing_what_are_the_steps_for_applying_the_solr_236_patch
2010/1/19 Licinio Fernández Maurelo
hello *, what is the best way to create a requesthandler for
distributed search with a default shards parameter but that can use
different query parsers
thus far i have
requestHandler name=/ds class=solr.SearchHandler
!-- default values for query parameters --
lst name=defaults
the main reason im creating the new request
handler, or do i put them all as defaults under my new request handler
and let the query parser use whichever ones it supports?
On Thu, Jan 21, 2010 at 11:45 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Thu, Jan 21, 2010 at 2:39 PM, Joe Calderon
facets are based off the indexed version of your string nor the stored
version, you probably have an analyzer thats removing punctuation,
most people index the same field multiple ways for different purposes,
matching. storting, faceting etc...
index a copy of your field as string type and facet
hello *, in distributed search when a shard goes down, an error is
returned and the search fails, is there a way to avoid the error and
return the results from the shards that are still up?
thx much
--joe
by default solr will only search the default fields, you have to
either query all fields field1:(ore) or field2:(ore) or field3:(ore)
or use a different query parser like dismax
On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric sma...@ntlworld.com wrote:
I have got a basic configuration of Solr up
associated
information into a presentable screen anyhow - so I'm not too worried about
info being returned by Solr as such)
Would that be a reasonable way of using Solr
-Original Message-
From: Joe Calderon [mailto:calderon@gmail.com]
Sent: 02 February 2010 23:42
To: solr-user
a shard has failed
On Wed, Feb 3, 2010 at 10:55 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon calderon@gmail.com wrote:
hello *, in distributed search when a shard goes down, an error is
returned and the search fails, is there a way to avoid
is it possible to configure the distance formula used by fuzzy
matching? i see there are other under the function query page under
strdist but im wondering if they are applicable to fuzzy matching
thx much
--joe
i want to recompile lucene with
http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure
which source tree to use, i tried using the implied trunk revision
from the admin/system page but solr fails to build with the generated
jars, even if i exclude the patches from 2230...
im wondering
hello *, currently with hl.usePhraseHighlighter=true, a query for (joe
jack*) will highlight emjoe jackson/em, however after reading the
archives, what im looking for is the old 1.1 behaviour so that only
emjoe jack/em is highlighted, is this possible in solr 1.5 ?
thx much
--joe
...@gmail.com wrote:
On iPhone so don't remember exact param I named it, but check wiki -
something like hl.highlightMultiTerm - set it to false.
- Mark
http://www.lucidimagination.com (mobile)
On Feb 6, 2010, at 12:00 AM, Joe Calderon calderon@gmail.com wrote:
hello *, currently
hello *, quick question, what would i have to change in the query
parser to allow wildcarded terms to go through text analysis?
sorry, what i meant to say is apply text analysis to the part of the
query that is wildcarded, for example if a term with latin1 diacritics
is wildcarded ide still like to run it through ISOLatin1Filter
On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi f...@efendi.ca wrote:
hello *, quick question,
if you use the core model via solr.xml you can reload a core without
having to to restart the servlet container,
http://wiki.apache.org/solr/CoreAdmin
On 02/11/2010 02:40 PM, Emad Mushtaq wrote:
Hi,
I would like to know if there is a way of reindexing data without restarting
the server. Lets
when using solr.xml, you can specify a sharedlib directory to share
among cores, is it possible to reload the classes in this dir without
having to restart the servlet container? it would be useful to be able
to make changes to those classes on the fly or be able to drop in new
plugins
i ran into a problem while using the edgengramtokenfilter, it seems to
report incorrect offsets when generating tokens, more specifically all
the tokens have offset 0 and term length as start and end, this leads
to goofy highlighting behavior when creating edge grams for tokens
beyond the first
lucene-2266 filed and patch posted.
On 02/13/2010 09:14 PM, Robert Muir wrote:
Joe, can you open a Lucene JIRA issue for this?
I just glanced at the code and it looks like a bug to me.
On Sun, Feb 14, 2010 at 12:07 AM, Joe Calderoncalderon@gmail.comwrote:
i ran into a problem while
no but you can set a default for the qf parameter with the same value
On 02/15/2010 01:50 AM, Steve Radhouani wrote:
Hi there,
Can thedefaultSearchField option be used by the DisMaxRequestHandler?
Thanks,
-Steve
no, youre just changing how your querying the index, not the actual
index, you will need to restart the servlet container or reload the core
for the config changes to take effect tho
On 02/17/2010 10:04 AM, Frederico Azeiteiro wrote:
Hi,
If i change the defaultSearchField in the core
use the common grams filter, itll create tokens for stop words and
their adjacent terms
On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
I've noticed some peculiar behavior with the dismax searchhandler.
In my case I'm making the search The British Open,
i had to create a autosuggest implementation not too long ago,
originally i was using faceting, where i would match wildcards on a
tokenized field and facet on an unaltered field, this had the
advantage that i could do everything from one index, though it was
also limited by the fact suggestions
extend the similarity class, compile it against the jars in lib, put in
a path solr can find and set your schema to use it
http://wiki.apache.org/solr/SolrPlugins#Similarity
On 02/25/2010 10:09 PM, Pooja Verlani wrote:
Hi,
I want to modify Similarity class for my app like the following-
Right
you can set a default shard parameter on the request handler doing
distributed search, you can set up two different request handlers one
with shards default and one without
On Thu, Feb 25, 2010 at 1:35 PM, Jeffrey Zhao
jeffrey.z...@metalogic-inc.com wrote:
Now I got it, just forgot put qt=search
what are you using for the mm parameter? if you set it to 1 only one
word has to match,
On 03/01/2010 05:07 PM, Steve Reichgut wrote:
***Sorry if this was sent twice. I had connection problems here and it
didn't look like the first time it went out
I have been testing out results for some
or you can try the commongrams filter that combines tokens next to a stopword
On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote:
Don't remove stopwords if you want to search on them. --wunder
On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote:
This is a classic
ive found the csv update to be exceptionally fast, though others enjoy
the flexibility of the data import handler
On Fri, Mar 5, 2010 at 10:21 AM, Mark N nipen.m...@gmail.com wrote:
what should be the fastest way to index a documents , I am indexing huge
collection of data after extracting
did u enable the highlighting component in solrconfig.xml? try setting
debugQuery=true to see if the highlighting component is even being
called...
On Tue, Mar 9, 2010 at 12:23 PM, Lee Smith l...@weblee.co.uk wrote:
Hey All
I have indexed a whole bunch of documents and now I want to search
just to make sure were on the same page, youre saying that the
highlight section of the response is empty right? the results section
is never highlighted but a separate section contains the highlighted
fields specified in hl.fl=
On Wed, Mar 10, 2010 at 5:23 AM, Ahmet Arslan iori...@yahoo.com
with the query.
But from what I believe it should wrap em/ around the text in the result.
So if I search ie Andrew within the return content Ie would have the
contents with the word emAndrew/em
and hl.fl=attr_content
Thank you for you help
Begin forwarded message:
From: Joe Calderon calderon
do `ant clean dist` within the solr source and use the resulting war
file, though in the future you might think about extending the built in
parser and creating a parser plugin rather than modifying the actual sources
see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin for more info
hello *, ive been using the highlighter and been pretty happy with
its results, however theres an edge case im not sure how to fix
for query: amazing grace
the record matched and highlighted is
emamazing/em rendition of emamazing grace/em
is there any way to only highlight amazing grace
hello *, i have a field that is indexing the string the
ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend]
then they are passed to the edgengram filter, this allows me to match
different user spellings and allows for partial highlighting, however
a token like 'ex' would get
wrote:
Will adding the RemoveDuplicatesTokenFilter(Factory) do the trick here?
Erik
On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote:
hello *, i have a field that is indexing the string the
ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend]
then they are passed
: Joe Calderon calderon@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, July 31, 2009 5:06:48 PM
Subject: dealing with duplicates
hello all, i have a collection of a few million documents; i have many
duplicates in this collection. they have been clustered with a simple
algorithm, i
for first time loads i currently post to
/update/csv?commit=falseseparator=%09escape=\stream.file=workfile.txtmap=NULL:keepEmpty=false,
this works well and finishes in about 20 minutes for my work load.
this is mostly cpu bound, i have an 8 core box and it seems one takes
the brunt of the work.
- Original Message
From: Joe Calderon calderon@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, July 31, 2009 5:06:48 PM
Subject: dealing with duplicates
hello all, i have a collection of a few million documents; i have many
duplicates in this collection. they have been
i want to try out the improvements in 1.4 but the nightly site is down
http://people.apache.org/builds/lucene/solr/nightly/
is there a mirror for nightlies?
--joe
hello *, im currently faceting on a shingled field to obtain popular
phrases and its working well, however ide like to limit the number of
shingles that get created, the solr.ShingleFilterFactory supports
maxShingleSize, can it be made to support a minimum as well? can
someone point me in the
is there an online resource or a book that contains a thorough list of
tokenizers and filters available and their functionality?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
is very helpful but i would like to go through additional filters to
make sure im not reinventing the wheel
i had a similar issue with text from past requests showing up, this was
on 1.3 nightly, i switched to using the lucid build of 1.3 and the
problem went away, im using a nightly of 1.4 right now also without
probs, then again your mileage may vary as i also made a bunch of schema
changes that
yonik has a point, when i ran into this i also upgraded to the latest
stable jetty, im using jetty 6.1.18
On 08/28/2009 04:07 PM, Rupert Fiasco wrote:
I deployed LucidWorks with my existing solrconfig / schema and
re-indexed my data into it and pushed it out to production, we'll see
how it
hello *, what would be the best approach to return the sum of boosts
as the score?
ex:
a dismax handler boosts matches to field1^100 and field2^50, a query
matches both fields hence the score for that row would be 150
is this something i could do with a function query or do i need to
hack up
i saw some post regarding stemming plurals in the archives from 2008,
i was wondering if this was ever integrated or if custom hackery is
still needed, is there something like a stemplurals analyzer is the
kstemmer the closest thing?
thx much
--joe
there are clustering libraries like
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have
bindings to perl/python, you can preprocess your results and create
clusters for each zoom level
On Tue, Sep 8, 2009 at 8:08 AM, gwkg...@eyefi.nl wrote:
Hi,
I just completed a simple
hello *, im not sure what im doing wrong i have this field defined in
schema.xml, using admin/analysis.jsp its working as expected,
fieldType name=text_spell class=solr.TextField
analyzer
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer
i have field called text_stem that has a kstemmer on it, im having
trouble matching wildcard searches on a word that got stemmed
for example i index the word america's, which according to
analysis.jsp after stemming gets indexed as america
when matching i do a query like myfield:(ame*) which
is the source for the lucid kstemmer available ? from the lucid solr
package i only found the compiled jars
On Mon, Sep 14, 2009 at 11:04 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote:
Pascal Dimassimo wrote:
Hi,
I
hello all, i have a question on the field collapsing patch, say i have
an integer field called num_in_stock and i collapse by some other
column, is it possible to sum up that integer field and return the
total in the output, if not how would i go about extending the
collapsing component to support
how would i go about modifying the dismax parser to treat +/- as regular text?
efficient.
Cheers,
Uri
Joe Calderon wrote:
hello all, i have a question on the field collapsing patch, say i have
an integer field called num_in_stock and i collapse by some other
column, is it possible to sum up that integer field and return the
total in the output, if not how would i go
thx for the reply, i just want the number of dupes in the query
result, but it seems i dont get the correct totals,
for example a non collapsed dismax query for belgian beer returns X
number results
but when i collapse and sum the number of docs under collapse_counts,
its much less than X
it
i gotten two different out of memory errors while using the field
collapsing component, using the latest patch (2009-09-26) and the
latest nightly,
has anyone else encountered similar problems? my collection is 5
million results but ive gotten the error collapsing as little as a few
thousand
of a few million.
Martijn
2009/10/2 Joe Calderon calderon@gmail.com:
i gotten two different out of memory errors while using the field
collapsing component, using the latest patch (2009-09-26) and the
latest nightly,
has anyone else encountered similar problems? my collection is 5
when Hoss made my
life easier with his simpler patch.
Yonik Seeley wrote:
Might be the new Lucene fieldCache stats stuff that was recently added?
-Yonik
http://www.lucidimagination.com
On Tue, Oct 6, 2009 at 3:56 PM, Joe Calderon calderon@gmail.com wrote:
hello *, ive been noticing
hello *, im using a combination of tokenizers and filters that give me
the desired tokens, however for a particular field i want to
concatenate these tokens back to a single string, is there a filter to
do that, if not what are the steps needed to make my own filter to
concatenate tokens?
for
90 matches
Mail list logo