impact of omitTermFreqAndPositions=true

2012-01-10 Thread Samarendra Pratap
Hi, I understand that setting omitTermFreqAndPositions=true for a field in schema.xml stores less information in the index with some restrictions e.g. phrase search. But does setting this property as true for a field which is of type string, int or is analyzed by KeywordAnalyzer makes any

Question about updating index with custom field types

2012-01-10 Thread 罗赛
Hello everyone, I have a question on how to update index using xml messages when there are some complex custom field types in my index...like: fieldtype name=offer class=com.xxx.OfferField/ And field offer has some attributes in it... I've read page, http://wiki.apache.org/solr/UpdateXmlMessages

Re: best way to force substitutions in data

2012-01-10 Thread Dmitry Kan
how about using regular expressions: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory On Tue, Jan 10, 2012 at 1:14 AM, geeky2 gee...@hotmail.com wrote: Hello all, i have been reading the solr book as well as searching the archives of this

Re: best way to force substitutions in data

2012-01-10 Thread Gora Mohanty
On Tue, Jan 10, 2012 at 4:44 AM, geeky2 gee...@hotmail.com wrote: [...] i have a database with approximately 7Million rows that i am bringing in to solr. for a very small sub-set of these 7Million rows (about 130 rows), i need to substitute an old part number for a new part number.  i know

Facet Query using Dates

2012-01-10 Thread Mauro Asprea
Hi, I;m having issues using the new way of faceting dates with the Query Facets. The issue is that it is returning wrong counts. I tested it using a Date Facet instead and the Dated one did result correct counters. I'm using Sunspot RSolr client and I'm using also new folding/group feature.

Re: Facet Query using Dates

2012-01-10 Thread Mauro Asprea
I think I solve it... It seems to be because of the - that's just before the query facet name -- Mauro Asprea E-Mail: mauroasp...@gmail.com Mobile: +34 654297582 Skype: mauro.asprea On Tuesday, January 10, 2012 at 11:33 AM, Mauro Asprea wrote: Hi, I;m having issues using the new way

Re: Solr core as a dispatcher

2012-01-10 Thread Hector Castro
In my case the cores are populated with different records that adhere to the same schema. The question about randomly distributing requests is because each core has the `shards` parameter populated so that it can hit the other core's indexes. My question is more about the advantages (if any)

Two documents with same ID but different hash

2012-01-10 Thread Hyttinen Lauri
Hello, I sent some data into the solr/lucene index but when I query the data I see weird results. There are documents with identical id fields but they have different hash values. Apart from the hash values the results are the same. I thought it was impossible to have documents with same

RE: Match raw query string

2012-01-10 Thread McCarroll, Robert
Thank you for your patience and assistance. XML is not my forte, but layoffs and attrition have reduced IT staff well below minimum functional levels here. Thanks to your help, the exact title matches have made it to the first page of results. Robert McCarroll Systems Administration NYS

Re: how to rebuild snowball lib in solr

2012-01-10 Thread Erick Erickson
On a very quick glance, it looks like the source is at: ./lucene/contrib/analyzers/common/src/java/org/tartarus/snowball and from there just compile Lucene and/or Solr as you normally would. See: http://wiki.apache.org/solr/HowToContribute Best Erick On Mon, Jan 9, 2012 at 2:13 PM,

Re: Two documents with same ID but different hash

2012-01-10 Thread Hyttinen Lauri
Hello again, Well after further review the ID's are different. The difference was just so small I missed it after staring it for a few hours. BR, Lauri On 01/10/2012 02:20 PM, Hyttinen Lauri wrote: Hello, I sent some data into the solr/lucene index but when I query the data I see weird

Re: Do Hignlighting + proximity using surround query parser

2012-01-10 Thread Ahmet Arslan
I am not able to do highlighting with surround query parser on the returned results. I have tried the highlighting component but it does not return highlighted results. Highlighter does not recognize Surround Query. It must be re-written to enable highlighting in

Re: best way to force substitutions in data

2012-01-10 Thread geeky2
thank you both for the information. Gora, when you mentioned: - For keeping both values, use synonyms. what did you mean exactly. mark -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-to-force-substitutions-in-data-tp3646195p3647920.html Sent from the Solr -

Re: Doing url search in solr is slow

2012-01-10 Thread yu shen
Hi Erick, I change all my url fields into text (they were string fields before), and added a WordDelimiterFilterFactory, so that url fields can be tokenized into several words. But I still got around 15 seconds response time measured using debugyQuery=on, and most of the time still spend on

Re: best way to force substitutions in data

2012-01-10 Thread Gora Mohanty
On Tue, Jan 10, 2012 at 9:04 PM, geeky2 gee...@hotmail.com wrote: thank you both for the information. Gora, when you mentioned: - For keeping both values, use synonyms. what did you mean exactly. [...] Please take a look at

Re: Two documents with same ID but different hash

2012-01-10 Thread Erick Erickson
I have no idea what you mean by different hash, and you haven't provided much information go on here. What is your evidence that the document is in the index twice? If you're inspecting the index at a low level that's expected, since documents are just marked as deleted not immediately removed

Re: Solr core as a dispatcher

2012-01-10 Thread Shawn Heisey
On 1/9/2012 5:15 PM, Hector Castro wrote: Hi, Has anyone had success with multicore single node Solr configurations that have one core acting solely as a dispatcher for the other cores? For example, say you had 4 populated Solr cores – configure a 5th to be the definitive endpoint with

How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
I see a missing required title field for every document when I'm using DIH. Yes, these documents have titles in the database. Is there a way to see what exact queries are sent to MySQL or received by MySQL? Here is a relevant chunk of the dataConfig: entity name=book query=select * from

Re: How to debug DIH with MySQL?

2012-01-10 Thread dan whelan
just a guess but this might need to change from ${biblio.id} to ${book.id} Since the entity name is book instead of biblio On 1/10/12 10:37 AM, Walter Underwood wrote: I see a missing required title field for every document when I'm using DIH. Yes, these documents have titles in the

Re: How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
Thanks! That looks like it fixed the problem. This list continues to be awesome. Is the function of the name attribute actually described in the docs? I could not figure out what it was for. wunder On Jan 10, 2012, at 10:41 AM, dan whelan wrote: just a guess but this might need to change

Re: How to debug DIH with MySQL?

2012-01-10 Thread Gora Mohanty
On Wed, Jan 11, 2012 at 12:37 AM, Walter Underwood wun...@wunderwood.org wrote: Thanks! That looks like it fixed the problem. This list continues to be awesome. Is the function of the name attribute actually described in the docs? I could not figure out what it was for. Yes, it is, though

Re: How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
Right, but that says exactly nothing about how that identifier is used. --wunder On Jan 10, 2012, at 11:23 AM, Gora Mohanty wrote: On Wed, Jan 11, 2012 at 12:37 AM, Walter Underwood wun...@wunderwood.org wrote: Thanks! That looks like it fixed the problem. This list continues to be awesome.

SpellCheck Help

2012-01-10 Thread Donald Organ
I am trying to get the IndexBasedSpellChecker to work. I believe I have everything setup properly and the spellcheck component seems to be running but the suggestions list is empty. I am using SOLR 3.5 with Jetty. My solrconfig.xml and schema.xml are as follows: solrconfig.xml:

RE: SpellCheck Help

2012-01-10 Thread Dyer, James
Three things to check: 1. Use a higher spellcheck.count than 1. Try 10. IndexBasedSpellChecker pre-filters the possibilities in a first pass of a 2-pass process. If spellcheck.count is too low, all the good suggestions might get filtered on the first pass and then it won't find anything on

Re: SpellCheck Help

2012-01-10 Thread Donald Organ
my copyField was defined as copyfield --- notice the lowercase f On Tue, Jan 10, 2012 at 2:50 PM, Dyer, James james.d...@ingrambook.comwrote: Three things to check: 1. Use a higher spellcheck.count than 1. Try 10. IndexBasedSpellChecker pre-filters the possibilities in a first pass

Stemming numbers

2012-01-10 Thread Tanner Postert
We've had some issues with people searching for a document with the search term '200 movies'. The document is actually title 'two hundred movies'. Do we need to add every number to our synonyms dictionary to accomplish this? Is it best done at index or search time?

RE: ignoreTikaException value

2012-01-10 Thread TRAN-NGOC Minh
Thanks for your reply. I added the argument in the solrconfig.xml and it worked like a charm. Thanks again Minh -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: mardi 10 janvier 2012 01:25 To: solr-user@lucene.apache.org Subject: Re: ignoreTikaException value

Re: Stemming numbers

2012-01-10 Thread Ted Dunning
On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert tanner.post...@gmail.comwrote: We've had some issues with people searching for a document with the search term '200 movies'. The document is actually title 'two hundred movies'. Do we need to add every number to our synonyms dictionary to

Re: Stemming numbers

2012-01-10 Thread Tanner Postert
You mention that is one way to do it is there another i'm not seeing? On Jan 10, 2012, at 4:34 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert tanner.post...@gmail.comwrote: We've had some issues with people searching for a document with the

Re: Stemming numbers

2012-01-10 Thread Ted Dunning
I was afraid you would say that. See http://fora.tv/2009/10/14/ACM_Data_Mining_SIG_Ted_Dunning#fullprogram, click on the Recommendations section to skip to the good part. The point is that cross recommendation can let you learn what sorts of rewrites of this kind are needed. The idea is that

Re: Stemming numbers

2012-01-10 Thread Otis Gospodnetic
Hi Tanner, Here is another simple way: AutoComplete. You know what your users are searching for, you can identify top queries and you can identify common queries that are not finding matches.  This all allows you to figure out what to feed in AutoComplete.  And hopefully your AutoComplete

Re: stopwords as privacy measure

2012-01-10 Thread Michael Lissner
It's a bit of a privacy through obscurity measure, unfortunately. The problem is that American courts do a lousy job of removing social security numbers from cases that I put on my site. I do anonymization before sending the cases to Solr, but if you're clever (and the stopwords weren't in

Re: Solr core as a dispatcher

2012-01-10 Thread shlomi java
Straying a bit from the subject, don't you think it will be useful to have the shards parameter used also in the index, in order to maintain document uniqueness? I mean as an out of the box feature of Solr. Because the situation today is that a Solr's client working with a sharded Solr is

Multiple Sort for Group/Folding

2012-01-10 Thread Mauro Asprea
Hi, I'm having some issues trying to sort my grouped results by more than one field. If I use just one, independently of which I use it just work fine (I mean it sorts). I have a case that the first sorting key is equal for all the head docs of each group, so I expect to return the groups

Re: FastVectorHighlighter wiki corrections

2012-01-10 Thread Michael Lissner
Hi, I didn't hear any responses here, so I went ahead and made a bunch of changes to the highlighting parameters wiki: - Highlighter is now known as Original Highlighter so it's more clear that Highlighter doesn't just refer to the highlighting utilities generally. - I need help with