Dismax, Sharding and Elevation

2011-01-13 Thread Oliver Marahrens
Hi all, I have discovered a strange thing with Dismax and Elevation and hope someone can enlighten me what to do. Whenever I search for something using the elevation Request Handler the hits are from a normal Lucene query (with elevated results if the search term was defined in elevation.xml).

Re: spell suggest response

2011-01-13 Thread Grijesh.singh
Similar type of work I have done earlier by using spell-check component with auto-suggest combined. Autosuggest will provide the words starting with query term and spellcheck returns the words similar to that. I have combined both suggestion in single list to display - Grijesh -- View this

Re: Dismax, Sharding and Elevation

2011-01-13 Thread Grijesh.singh
As I seen the code for QueryElevationComponent ,there is no supports for Distributed Search i.e. query elevation does not works with shards. - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-Sharding-and-Elevation-tp2247156p2247522.html Sent from the Solr

range queries in solr

2011-01-13 Thread ur lops
Hi, I am sorry to ask this silly question but I could not find the documentation about this and I am very new to lucene solr. I want to run a range query on one of the multivalued field e.g. I have a point say [10,20], which is the point of intersection of the diagonals of a rectangle. Now I

Solr + Hadoop

2011-01-13 Thread Joan
Hi, I'm trying build solr index with MapReduce (Hadoop) and I'm using https://issues.apache.org/jira/browse/SOLR-1301 but I've a problem with hadoop version and this patch. When I compile this patch, I use 0.21.0 hadoop version, I don't have any problem but when I'm trying to run my job in

Re: Question on deleting all rows for an index

2011-01-13 Thread kenf_nc
If this is a one-time cleanup, not something you need to do programmatically, you could delete the index directory ( solrDir/data/index ). In my case I have to stop Tomcat, delete .\index and restart Tomcat. It is very fast and starts me out with a fresh, empty, index. Noticed you are multi-core,

Re: basic document crud in an index

2011-01-13 Thread kenf_nc
A/ You have to update all the fields, if you leave one off, it won't be in the document anymore. I have my 'persisted' data stored outside of Solr, so on update I get the stored data, modify it and update Solr with every field (even if one changed). You could also do a Query/Modify/Update

Solr boolean operators

2011-01-13 Thread Xavier Schepler
Hi, with the Lucene query syntax, is : a AND (a OR b) equivalent to : a (absorption) ?

Re: basic document crud in an index

2011-01-13 Thread Markus Jelsma
To fill the gaps: b. the old version remains on disk but is flagged for deletion d. optimize equals merging, the difference is how many segments come out e. yes On Thursday 13 January 2011 15:21:54 kenf_nc wrote: A/ You have to update all the fields, if you leave one off, it won't be in the

Re: Solr boolean operators

2011-01-13 Thread dante stroe
To my understanding: in terms of the results that will be matched by your query ... it's the same. In terms of the score of the results no, since, if you are using the first query, the documents that will match both the a and the b terms, will match higher then the ones matching just the a

Re: Solr boolean operators

2011-01-13 Thread Xavier SCHEPLER
Ok, thanks. That's what I expected :D From: dante stroe dante.st...@gmail.com Sent: Thu Jan 13 15:56:33 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Solr boolean operators To my understanding: in terms of the results that will be matched

Get nearby words?

2011-01-13 Thread darren
Hi, Is there a way to get the relevant nearby words in the entire index given a single word? I want to know all the relevance ranked words before and after the queried word. thanks for any tips. Darren

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Adam Estrada
Hi, the following seems to work pretty well. fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=true

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Stéphane Delprat
I understand less and less what is happening to my solr. I did a checkIndex (without -fix) and there was an error... So a did another checkIndex with -fix and then the error was gone. The segment was alright During checkIndex I do not shut down the solr server, I just make sure no client

Re: StopFilterFactory and qf containing some fields that use it and some that do not

2011-01-13 Thread Jonathan Rochkind
It's a known 'issue' in dismax, (really an inherent part of dismax's design with no clear way to do anything about it), that qf over fields with different stop word definitions will produce odd results for a query with a stopword. Here's my understanding of what's going on:

Re: Tuning StatsComponent

2011-01-13 Thread Johannes Goll
What field type do you recommend for a float stats.field for optimal Solr 1.4.1 StatsComponent performance ? float, pfloat or tfloat ? Do you recommend to index the field ? 2011/1/12 stockii st...@shopgate.com my field Type is double maybe sint is better ? but i need double ... =( --

RE: StopFilterFactory and qf containing some fields that use it and some that do not

2011-01-13 Thread Dyer, James
I appreciate the reply and blog posting. For now, I just enabled stopwords for all the fields on Qf. We have a very short list anyhow and our legacy search engine didn't even allow field-by-field configuration (stopwords are global on that system). I do wonder...what if (e)dismax had a flag

Re: Improving Solr performance

2011-01-13 Thread supersoft
On the one hand, I found really interesting those comments about the reasons for sharding. Documentation agrees you about why to split an index in several shards (big sizes problems) but I don't find any explanation about the inconvenients as an Access Control List. I guess there should be some

Re: Term frequency across multiple documents

2011-01-13 Thread Ahmet Arslan
So you are interested in collection frequency of words. TermsComponent gives you document frequency of terms. You can modify it to give collection frequency info. http://search-lucene.com/m/of5Fn1PUOHU/ --- On Wed, 1/12/11, Juan Grande juan.gra...@gmail.com wrote: From: Juan Grande

Adding a new site to existing solr configuration

2011-01-13 Thread PeterKerk
I still have the default Solr example config running on Jetty. I use Cygwin to start my current site. Now I already have fully configured one solr instance with these files: \example\example-DIH\solr\db\conf\my-data-config.xml \example\example-DIH\solr\db\conf\schema.xml

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Peter Karich
take a look also into icu4j which is one of the contrib projects ... converting on the fly is not supported by Solr but should be relative easy in Java. Also scanning is relative simple (accept only a range). Detection too: http://www.mozilla.org/projects/intl/chardet.html We've created an

DataimportHandler development issue

2011-01-13 Thread Derek Werthmuller
We're just getting started with Solr and are very interested in using Solr for search applications. I've got the rss example working 1.4.1 didn't work out of the box, but we figured it out -then found fixes in the svn. Any way we are learning how to load the data/rss atom feeds into the Solr

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
Scanning for only 'valid' utf-8 is definitely not simple. You can eliminate some obviously not valid utf-8 things by byte ranges, but you can't confirm valid utf-8 alone by byte ranges. There are some bytes that can only come after or before other certain bytes to be valid utf-8. There is no

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Michael McCandless
Generally it's not safe to run CheckIndex if a writer is also open on the index. It's not safe because CheckIndex could hit FNFE's on opening files, or, if you use -fix, CheckIndex will change the index out from under your other IndexWriter (which will then cause other kinds of corruption). That

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Michael McCandless
The tokens that Lucene sees (pre-4.0) are char[] based (ie, UTF16), so the first place where invalid UTF8 is detected/corrected/etc. is during your analysis process, which takes your raw content and produces char[] based tokens. Second, during indexing, Lucene ensures that the incoming char[]

Variable datasources

2011-01-13 Thread tjpoe
I have several similar databases that I'd like to import from 14 to be exact. there is also a 15th database where I can get a listing of the 14 database. I'm trying to do a variable datasource such as: datasource url=jdbc:mysql://localhost/${local.code} name=content / datasource

start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon
Do I even need a body for this message? ;-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Robert Muir
On Thu, Jan 13, 2011 at 2:05 PM, Jonathan Rochkind rochk...@jhu.edu wrote: There are various packages of such heuristic algorithms to guess char encoding, I wouldn't try to write my own. icu4j might include such an algorithm, not sure. it does:

Re: start value in queries zero or one based?

2011-01-13 Thread Walter Underwood
On Jan 13, 2011, at 1:28 PM, Dennis Gearon wrote: Do I even need a body for this message? ;-) Dennis Gearon Are you asking is it or should it be? If the latter, we can also discuss Emacs and vi. wunder -- Walter Underwood K6WRU

Re: Solr + Hadoop

2011-01-13 Thread Em
Hi Joan, I am not sure whether it applies, but are you really using Solr 1.4 (not 1.4.1) and were also using the Hadoop-Jars provided by this patch (0.20.1 not 0.0.21)? I ask, because I had some other issues with other classes that were related to different package-definitions etc. - in short:

Re: start value in queries zero or one based?

2011-01-13 Thread Markus Jelsma
Perhaps it would be more useful to RTFM instead of messing around on the mailing list: http://wiki.apache.org/solr/CommonQueryParameters#start Please, read every wiki page you can find and write notes. Do I even need a body for this message? ;-) Dennis Gearon Signature Warning

RE: start value in queries zero or one based?

2011-01-13 Thread Steven A Rowe
Please, read every wiki page you can find and write notes. NO!!! Once you start down this road, there is no turning back! Soon you will feel the need to turn your notes into a new wiki page or a blog post, and people will read those and write notes, and the process will repeat, ad

Re: start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon
I'm migrating to CTO/CEO status in life due to building a small company. I find I don't have too much time for theory. I work with wht is. So, what is it, not what should it be. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Paul
Thanks for all the responses. CharsetDetector does look promising. Unfortunately, we aren't allowed to keep the original of much of our data, so the solr index is the only place it exists (to us). I do have a java app that reindexes, i.e., reads all documents out of one index, does some transform

RE: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
So you're allowed to put the entire original document in a stored field in Solr, but you aren't allowed to stick it in, say, a redis or couchdb too? Ah, beaurocracy. But no reason what you are doing won't work, as you of course already know from doing it. If you actually know the charset of

RE: start value in queries zero or one based?

2011-01-13 Thread Jonathan Rochkind
You could have tried it and seen for yourself on any Solr server in your possession in less time than it took to have this thread. And if you don't have a Solr server, then why do you care? But the answer is 0. http://wiki.apache.org/solr/CommonQueryParameters#start The default value is 0

Searchers and Warmups

2011-01-13 Thread David Cramer
I'm trying to understand the mechanics behind warming up, when new searchers are registered, and their costs. A quick Google didn't point me in the right direction, so hoping for some of that here. -- David Cramer

Re: Solr + Hadoop

2011-01-13 Thread Alexander Kanarsky
Joan, make sure that you are running the job on Hadoop 0.21 cluster. (It looks like you have compiled the apache-solr-hadoop jar with Hadoop 0.21 but using it on 0.20 cluster). -Alexander

[sfield] Missing in Spatial Search

2011-01-13 Thread Adam Estrada
According to the documentation here: http://wiki.apache.org/solr/SpatialSearch the field that identifies the spatial point data is sfield. See the console output below. Jan 13, 2011 6:49:40 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Chamnap Chhorn
Thanks for your reply. However, it doesn't work for my case at all. I think it's the problem with query parser or something else. It forces me to put double quote to the search query in order to get the results found. str name=rawquerystringsim 010/str str name=querystringsim 010/str str

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Lance Norskog
1) CheckIndex is not supposed to change a corrupt segment, only remove it. 2) Are you using local hard disks, or do run on a common SAN or remote file server? I have seen corruption errors on SANs, where existing files have random changes. On Thu, Jan 13, 2011 at 11:06 AM, Michael McCandless

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Estrada Groups
Ahhh...the fun of open source software ;-). Requires a ton of trial and error! I found what worked for me and figured it was worth passing it along. If you don't mind...when you sort everything out on your end, please post results for the rest of us to take a gander at. Cheers, Adam On Jan

use of schema.xml

2011-01-13 Thread Dennis Gearon
I'm going to buy the book for Solr, since it looks like I need to do more of the work than I thought I would. But, from looking at it, the schema file only says: A/ What types of data can be in the 'fields' of the documents B/ If there are any dynamically assigned fields. C/ What parsers are

Re: Solr 4.0 = Spatial Search - How to

2011-01-13 Thread Lance Norskog
Spatial does not support separate separate fields: you don't need lat/long, only 'coord'. To get latitude/longitude in the coord field from the DIH, you need to use a transformer in the DIH script. It would populate a field 'coord' with a text string made from the lat and lon fields:

Re: use of schema.xml

2011-01-13 Thread Lance Norskog
Correct. Solr and Lucene do not store or enforce the schema. You're on your own :) On Thu, Jan 13, 2011 at 8:09 PM, Dennis Gearon gear...@sbcglobal.net wrote: I'm going to buy the book for Solr, since it looks like I need to do more of the work than I thought I would. But, from looking at

Re: use of schema.xml

2011-01-13 Thread Lance Norskog
Wait- it does enforce the schema names. What it does not enforce is field contents when you change the schema. Since Lucene does not have field replacement, it is not practical to remove or add a field to all existing documents when you change the schema. On Thu, Jan 13, 2011 at 8:15 PM, Lance

Re: use of schema.xml

2011-01-13 Thread Dennis Gearon
I could put 1-10,000 fileds in any one document, as long as they are told what type or they are dynamically matched by dynamic fields relative to what's in the schema.xml file? It's very much like google 'big tables' or 'elastic search' that way, right? It's up to me to enforce any field

Re: Improving Solr performance

2011-01-13 Thread Gora Mohanty
On Thu, Jan 13, 2011 at 10:10 PM, supersoft elarab...@gmail.com wrote: On the one hand, I found really interesting those comments about the reasons for sharding. Documentation agrees you about why to split an index in several shards (big sizes problems) but I don't find any explanation about

Re: Variable datasources

2011-01-13 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 1:02 AM, tjpoe tanner.post...@gmail.com wrote: [...] I also tried creating datasources for each local and then using a variable datasource in the entity such as: datasource url=jdbc:mysql://localhost/aaa name=content_aaa / datasource url=jdbc:mysql://localhost/bbb

Re: Adding a new site to existing solr configuration

2011-01-13 Thread Gora Mohanty
On Thu, Jan 13, 2011 at 10:47 PM, PeterKerk vettepa...@hotmail.com wrote: I still have the default Solr example config running on Jetty. I use Cygwin to start my current site. Now I already have fully configured one solr instance with these files:

Re: Solr 4.0 = Spatial Search - How to

2011-01-13 Thread Grijesh.singh
I have used that type of location searching. But I have not used spatial search. I wrote my logic at application end. I have cached the location ids and their lat/lang. When queries are comming for any location say New Delhi then my location searche logic at application end calculate the distance

Re: Solr 4.0 = Spatial Search - How to

2011-01-13 Thread caman
Thanks Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted it to a BLOB. Indexing would fail in storing BLOB in 'location' type field. After BLOB issue was resolved, all worked ok. Thank you all for your help -- View this message in context: