Custom Handler support in Solr-ruby

2011-06-28 Thread Pranav Prakash
Hi, I found solr-ruby gem (http://wiki.apache.org/solr/solr-ruby) really inflexible in terms of specifying handler. The Solr::Request::Select class defines handler as select and all other classes inherit from this class. And since the methods in Solr::Connection use one of the classes from

Include synonys in solr

2011-06-28 Thread Romi
Hi, i am using solr for my searches. in this i found a synonyms.text file in which you can include synonyms manually for the words u want. But as i suppose it would be very hard to include synonyms manually for each word as my application has large data. I want to know is there any way that this

Re: Include synonys in solr

2011-06-28 Thread Gora Mohanty
On Tue, Jun 28, 2011 at 12:54 PM, Romi romijain3...@gmail.com wrote: Hi, i am using solr for my searches. in this i found a synonyms.text file in which you can include synonyms manually for the words u want. Please see

Re: Include synonys in solr

2011-06-28 Thread Michael Kuhlmann
Am 28.06.2011 09:24, schrieb Romi: But as i suppose it would be very hard to include synonyms manually for each word as my application has large data. I want to know is there any way that this synonym.text file generate automatically referring to all dictionary words I don't get the point

Re: Analyzer creates PhraseQuery

2011-06-28 Thread lboutros
You could add this filter after the NGram filter to prevent the phrase query creation : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory Ludovic. - Jouve France. -- View this message in context:

Find results with or without whitespace

2011-06-28 Thread Frankie
I'm looking for a way to index/search on terms that may or may not contain spaces. An example will explain better : - Loooking for healthcare, I want to find both healthcare and health care. - Loooking for health care, I want to find both health care and healthcare. My other constraints are - I

Re: multiple spatial values

2011-06-28 Thread marthinal
Yonik Seeley-2-2 wrote: On Sat, Jun 25, 2011 at 5:56 AM, marthinal lt;jm.rodriguez.ve...@gmail.comgt; wrote: sfield, pt and d can all be specified directly in the spatial functions/filters too, and that will override the global params. Unfortunately one must currently use lucene query

Saravanan Chinnadurai/Actionimages is out of the office.

2011-06-28 Thread Saravanan . Chinnadurai
I will be out of the office starting 28/06/2011 and will not return until 30/06/2011. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data

Index Version and Epoch Time?

2011-06-28 Thread Pranav Prakash
Hi, I am not sure what is the index number value? It looks like an epoch time, but in my case, this points to one month back. However, i can see documents which were added last week, to be in the index. Even after I did a commit, the index number did not change? Isn't it supposed to change on

Re: Include synonys in solr

2011-06-28 Thread Romi
Please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory No offence, but a simple Google search, or a search of the Wiki would have turned this up. Please try such simpler avenues before dashing off a message to the list. Gora, I heve already read the

Re: Include synonys in solr

2011-06-28 Thread Romi
I don't want to add all dictionary words to my synonyms.txt, but i wanted to include synonyms for the words which i am having in my data...as you can imagine if i have suppose 1000 words then i would be very tough to enter synonyms for these 1000 words in synonyms.txt manually. I just want to know

Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
Well you need to find word lists and/or a thesaurus. This is one place to start: http://wordlist.sourceforge.net/ I used the US/UK english word list for my synonyms for an index I have because it contains both US and UK english terms, the list lacks some medical terms though so we

Re: Find results with or without whitespace

2011-06-28 Thread roySolr
I had the same problem: http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-td2934742.html#a2964942 -- View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117386.html Sent from the

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
I also have the problem of duplicate docs. I am indexing news articles, Every news article will have the source URL, If two news-article has the same URL, only one need to index, removal of duplicate at index time. On 23 June 2011 21:24, simon mtnes...@gmail.com wrote: have you checked out

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Create a hash from the url and use that as the unique key, md5 or sha1 would probably be good enough. Cheers François On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote: I also have the problem of duplicate docs. I am indexing news articles, Every news article will have the source URL, If

Re: multiple spatial values

2011-06-28 Thread Darren Govoni
Will it be possible to do spatial searches on multi-valued spatial fields soon? I have a latlon field (point) that is multi-valued and don't know how to search against it such that the lats and lons match correctly - since they are split apart. e.g. I have a document with 10 point/latlon

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
I am making the Hash from URL, but I can't use this as UniqueKey because I am using UUID as UniqueKey, Since I am using SOLR as index engine Only and using Riak(key-value storage) as storage engine, I dont want to do the overwrite on duplicate. I just need to discard the duplicates. 2011/6/28

Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Tod
On 06/27/2011 11:23 AM, lee carroll wrote: Hi Tod, A list of keywords would be fine in a non multi valued field: keywords : xxx yyy sss aaa multi value field would allow you to repeat the field when indexing keywords: xxx keywords: yyy keywords: sss etc Thanks Lee. the problem is I'm

Re: Find results with or without whitespace

2011-06-28 Thread Frankie
Thank you for your answer. I agree, I can manage predictable values through synonyms. However most data in this index are company and product names, leading sometimes to rather strange syntax (mix of upper/lower case, misplaced dash or spaces). One purpose to using solr was to help in finding

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Maybe there is a way to get Solr to reject documents that already exist in the index but I doubt it, maybe someone else with can chime here here. You could do a search for each document prior to indexing it so see if it is already in the index, that is probably non-optimal, maybe it is easiest

Re: Removing duplicate documents from search results

2011-06-28 Thread Pranav Prakash
I found the deduplication thing really useful. Although I have not yet started to work on it, as there are some other low hanging fruits I've to capture. Will share my thoughts soon. *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google

Re: Analyzer creates PhraseQuery

2011-06-28 Thread Koji Sekiguchi
(11/06/28 16:40), lboutros wrote: You could add this filter after the NGram filter to prevent the phrase query creation : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory Ludovic. There is an option to avoid producing phrase queries,

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Indeed, take a look at this: http://wiki.apache.org/solr/Deduplication I have not used it but it looks like it will do the trick. François On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote: I found the deduplication thing really useful. Although I have not yet started to

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
Hey François, thanks for your suggestion, I followed the same link ( http://wiki.apache.org/solr/Deduplication) they have the solution*, either make Hash as uniqueKey OR overwrite on duplicate, I dont need either. I need Discard on Duplicate. * I have not used it but it looks like it will do

Re: Include synonys in solr

2011-06-28 Thread Romi
Thanks François Schiettecatte, information you provided is very helpful. i need to know one more thing, i downloaded one of the given dictionary but it contains many files, do i need to add all this files data in to synonyms.text ?? - Thanks Regards Romi -- View this message in context:

Re: Removing duplicate documents from search results

2011-06-28 Thread Paul Libbrecht
Mohammad, just in case you meant it, I would like to discourage you to try to deduplicate *the search result*. There are many things that go wrong if you do that; we had it in one version of the ActiveMath search environment (which uses Lucene): - paging is inappropriate - total count is wrong

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Yeah, I read the overview which suggests that duplicates can be prevented from entering the index and scanned the rest, it does not look like you can actually drop the document entirely. Maybe I am missing something here. François On Jun 28, 2011, at 9:14 AM, Mohammad Shariq wrote: Hey

Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
Well no, you need to see which files (if any) will suit your needs, they are not all synonyms files, I only needed the UK/US english file and I needed to process it into a format suitable for the synonyms file. There may well be other word lists on the net suitable for your needs. I would not

Re: multiple spatial values

2011-06-28 Thread Smiley, David W.
It is precisely this limitation which triggered me to develop a grid indexing approach using Geohashes: https://issues.apache.org/jira/browse/SOLR-2155 This patch requires a Solr trunk release. If you have a small number of distinct points in total, and you only need filtering, then the geohash

Re: Index Version and Epoch Time?

2011-06-28 Thread Shalin Shekhar Mangar
On Tue, Jun 28, 2011 at 4:18 PM, Pranav Prakash pra...@gmail.com wrote: I am not sure what is the index number value? It looks like an epoch time, but in my case, this points to one month back. However, i can see documents which were added last week, to be in the index. The index version

Using FieldCache in SolrIndexSearcher - crazy idea?

2011-06-28 Thread Michael Ryan
I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards. Correct me if I am wrong: In a standard distributed search with QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or

Records disappearing

2011-06-28 Thread Brian Lamb
Hi all, I'm having some weird behavior with my dataimport script. Because of memory issues, I've taken to doing a delta import as doing a fullimport with clean=false. My dataimport config file is set up like: entity name=findDelta query=SELECT id FROM mytable WHERE date_added gt;

Re: Default schema - 'keywords' not multivalued

2011-06-28 Thread Chris Hostetter
: I'm streaming over the document content (presumably via tika) and its : gathering the document's metadata which includes the keywords metadata field. : Since I'm also passing that field from the DB to the REST call as a list (as : you suggested) there is a collision because the keywords field

Does Smart Chinese filter work for Traditional Chinese?

2011-06-28 Thread Andy
Hi, According to the doc: http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean solr.SmartChineseWordTokenFilterFactory is for Simplified Chinese. Does it work for Traditional Chinese too? If not, is there anything equivalent for Traditional Chinese? Thanks.

Re: Analyzer creates PhraseQuery

2011-06-28 Thread entdeveloper
Thanks guys. Both the PositionFilterFactory and the autoGeneratePhraseQueries=false solutions solved the issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3118471.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Version and Epoch Time?

2011-06-28 Thread Pranav Prakash
Hi, I am facing multiple issues with solr and I am not sure what happens in each case. I am quite naive in Solr and there are some scenarios I'd like to discuss with you. We have a huge volume of documents to be indexed. Somewhere about 5 million. We have a full indexer script which essentially

Re: Custom Query Processing

2011-06-28 Thread Dmitry Kan
You should modify the SolrCore for this, if I'm not mistaken. Would extending LuceneQParserPlugin (solr 1.4) be an option for you? On Tue, Jun 28, 2011 at 12:25 AM, Jamie Johnson jej2...@gmail.com wrote: I have a need to take an incoming solr query and apply some additional constraints to it

Re: Unique document count from index?

2011-06-28 Thread Dmitry Kan
can you use facet search? facet=truefacet.field=order_nofq=order_no:(1234 OR 5678 OR ...)fq=artist:Pink Floyd On Mon, Jun 27, 2011 at 6:44 PM, Olson, Ron rol...@lbpc.com wrote: Hi all- I have a problem that I'm not sure how it can be (if it can be) solved in Solr. I am using Solr 3.2 with

Re: Index Version and Epoch Time?

2011-06-28 Thread Jonathan Rochkind
On 6/28/2011 1:38 PM, Pranav Prakash wrote: - Will the commit by incremental indexer script also commit the previously uncommitted changes made by full indexer script before it broke? Yes, as long as the Solr instance hasn't crashed. Anything added but not yet committed sticks around

moving to multicore without changing existing index

2011-06-28 Thread lee carroll
hi I'm looking at setting up multi core indices but also have an exiting index. Can I run this index along side new index set up as cores. On a dev machine I've experimented with simply adding solr.xml in slor home and listing the new cores in the cores element but this breaks the existing index.

Re: moving to multicore without changing existing index

2011-06-28 Thread Jonathan Rochkind
Nope. But you can move your existing index into a core in a multi-core setup. But a multi-core setup is a multi-core setup, there's no way to have an index accessible at a non-core URL in a multi-core setup. On 6/28/2011 2:53 PM, lee carroll wrote: hi I'm looking at setting up multi core

Dynamic Fields vs. Multicore

2011-06-28 Thread Briggs Thompson
Hi All, I was searching around for documentation of the performance differences of having a sharded, single schema, dynamic field set up vs. a multi-core, static multi-schema setup (which I currently have), but I have not had much luck finding what I am looking for. I understand commits and

Solr - search queries not returning results

2011-06-28 Thread Walter Closenfleight
Hello everyone, I believe I am missing something very elementary. The following query returns zero hits: http://localhost:8983/solr/core0/select/?q=testabc However, using solritas, it finds many results: http://localhost:8983/solr/core0/itas?q=testabc Do you have any idea what the issue may

overwirite if not already in index?

2011-06-28 Thread eks dev
Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on,

conditionally update document on unique id

2011-06-28 Thread eks dev
Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on,

Re: Solr - search queries not returning results

2011-06-28 Thread Tomás Fernández Löbbe
Hi Walter, probably solritas is using Dismax with a set of fields on the qf parameter, while with your first query, you are just querying to the default field. On Tue, Jun 28, 2011 at 5:07 PM, Walter Closenfleight walter.p.closenflei...@gmail.com wrote: Hello everyone, I believe I am

edismax - Handling collocations mapped to a single token . . ?

2011-06-28 Thread CRB
We are trying to get edismax to handle collocations mapped to a single token. To do so we need to manipulate the chunks (as Hoss referred to them in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/) generated by the dismax parser. We have numerous collocations (terms of speech

Re: moving to multicore without changing existing index

2011-06-28 Thread Tomás Fernández Löbbe
But a multi-core setup is a multi-core setup, there's no way to have an index accessible at a non-core URL in a multi-core setup. Isn't there? What about defaultCoreName parameter? from the wiki: The name of a core that will be used for requests that don't specify a core. If you have one core and

How to Create a weighted function (dismax or otherwise)

2011-06-28 Thread aster
I am trying to create a feature that allows search results to be displayed by this formula sum(weight1*text relevance score, weight2 * price). weight1 and weight2 are numeric values that can be changed to influence the search results. I am sending the following query params to the Solr instance

Fuzzy Query Param

2011-06-28 Thread entdeveloper
According to the docs on lucene query syntax: Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. I was messing around with this and started

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-28 Thread Lance Norskog
Using RAMDirectory really does not help performance. Java garbage collection has to work around all of the memory taken by the segments. It works out that Solr works better (for most indexes) without using the RAMDirectory. On Sun, Jun 26, 2011 at 2:07 PM, nipunb ni...@walmartlabs.com wrote: