Re: Solr Cloud, Commits and Master/Slave configuration

2012-03-01 Thread eks dev
Thanks Mark, Good, this is probably good enough to give it a try. My analyzers are normally fast, doing duplicate analysis (at each replica) is probably not going to cost a lot, if there is some decent batching Can this be somehow controlled (depth of this buffer / time till flush or some

Re: Too many values for UnInvertedField faceting on field topic

2012-03-01 Thread Michael Jakl
Hi! On Wed, Feb 29, 2012 at 22:21, Emmanuel Espina espinaemman...@gmail.com wrote: No. But probably we can find another way to do what you want. Please describe the problem and include some numbers to give us an idea of the sizes that you are handling. Number of documents, size of the index,

Re: Couple issues with edismax in 3.5

2012-03-01 Thread Ahmet Arslan
I don't think mm will help here because it defaults to 100% already by the following code. Default behavior of mm has changed recently. So it is a good idea to explicitly set it to 100%. Then all of the search terms must match. Regarding multi-word synonym, what is the best way to handle

Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, I've got an issue when searching with a searchtstring like: 'title:Blue on Blu' . the original searchstring is: 'title:Blue on Blue' and this works well. If I now delete the last double quote and the e than I get the error below. Is there any filter that can handle such searches which I

Re: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
I've got an issue when searching with a searchtstring like:  'title:Blue on Blu' . the original searchstring is: 'title:Blue on Blue' and this works well. If I now delete the last double quote and the e than I get the error below. Is there any filter that can handle such searches which I

AW: Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, does that effect my result list? Because if i use the dismax, and type into my search field the title blue on blue (without quotes), I get this product as a first result. If I use dismax without boosting and search for blue on blue (without quotes) I'm not getting this result in the first 10

[SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a

Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Hi For spell checking component I set extendedResults to get the frequencies and then select the word with the best frequency. I understand the spell check algorithm based on Edit Distance. For an example: Query to Solr: Marien Spell Check Text Returned: Marine (Freq: 120), Market (Freq: 900)

Re: AW: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
does that effect my result list? Because if i use the  dismax, and type into my search field the title blue on blue (without quotes), I get this product as a first result. If I use dismax without boosting and search for blue on blue (without quotes) I'm not getting this result in the

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Bernd Fehling
What is netstat telling you about the connections on the servers? Any connections in CLOSE_WAIT (passive close) hanging? Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It

flashcache and solr/lucene

2012-03-01 Thread dan sutton
Hi, Just wondering if anyone had any experience with solr and flashcache [https://wiki.archlinux.org/index.php/Flashcache], my guess it might be particularly useful for indicies not changing that often, and for large indicies where an SSD of that size is prohibitive. Cheers, Dan

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Sami Siren
Do you have autocommit enabled? I tested this with 1m docs indexed by using the default example config and saw used file descriptors go up to 2400 (did not come down even after the final commit at the end). Then I disabled autocommit, reindexed and the descriptor count stayed pretty much flat at

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote: What is netstat telling you about the connections on the servers? Any connections in CLOSE_WAIT (passive close) hanging? I can't tell exact numbers right now but there were a lot between all the cores and the indexing clients. Saw

Re: performance between ExternalFileField and Join

2012-03-01 Thread Erick Erickson
Hmmm. ExternalFileFields can only be float values, so I'm not sure the necessary data is straight-forward. Additionally, they are used in function queries. Does this still work? I really don't know the performance characteristics if, say, you have users with access to all documents for SOLR-2272,

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Erick Erickson
Currently, the page you referenced here: http://wiki.apache.org/solr/SolrReplication is the standard way to replicate incremental indexes. You say your worried about the extra http. Why? Do you have any evidence that this would be a problem? Http isn't inherently inefficient at all, and even if

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Erick Erickson
Right, there's nothing in Solr that I know of that'll help here. How would a tokenizer understand that smartphone should be smart phone? There's no general solution for this issue. You can do domain-specific solutions with synonyms for instance, or some other word list that contains terms you're

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
I think I didnt explain myself clearly: I need to be able to find substrings. So, its not that I'd expect Solr to find synonyms, but rather if a piece of text contains the searched text, for example: if title holds smartphone I want it to be found when someone types martph or smar or smart. I

Re: searching top matches of each facet

2012-03-01 Thread Paul
Perfect! Thanks! On Wed, Feb 29, 2012 at 3:29 PM, Emmanuel Espina espinaemman...@gmail.com wrote: I think that what you want is FieldCollapsing: http://wiki.apache.org/solr/FieldCollapsing For example q=my searchgroup=truegroup.field=subjectgroup.limit=5 Test it to see if that is what you

AW: AW: Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, what about, if a search string starts with $o$ ? this is not recognized by dismax too, right? Is there another filter I have to use? Thanks, Ramo -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori...@yahoo.com] Gesendet: Donnerstag, 1. März 2012 12:44 An:

Re: AW: AW: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
what about, if a search string starts with $o$ ? this is not recognized by dismax too, right? Is there another filter I have to use? I don't fully follow your question but it seems that you want to search special characters too? With raw or term query parser plugin you can do that.

handling case insensitive and regex

2012-03-01 Thread Neil Hart
I'm just starting out... for either testing QA TESTING QA I can query with the following strings and find my text: testing TESTING testing* but the following doesn't work. TESTING* any ideas? thanks Neil

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Neel
Hi Erick, Thanks for your post. We are not directly providing search result from lucene index to user. We are processing the lucene search result and adding additional information to it by getting from different sources[from other lunce indexes or from databases]. So, consuming search results

Re: flashcache and solr/lucene

2012-03-01 Thread Robert Stewart
Any segment files on SSD will be faster in cases where the file is not in OS cache. If you have enough RAM a lot of index segment files will end up in OS system cache so it wont have to go to disk anyway. Since most indexes are bigger than RAM an SSD helps a lot. But if index is much larger

errata for solr tutorial

2012-03-01 Thread Nicolai Scheer
Hi! Having just worked through the solr tutorial (http://lucene.apache.org/solr/tutorial.html) I think I found two minor bugs: 1. The delete by query example java -Ddata=args -jar post.jar deletequeryname:DDR/query/delete should read java -Ddata=args -jar post.jar

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Walter Underwood
I once used a spell checker to break up compound words. It was slow, but worked pretty well. wunder On Mar 1, 2012, at 5:53 AM, Erick Erickson wrote: Right, there's nothing in Solr that I know of that'll help here. How would a tokenizer understand that smartphone should be smart phone?

RE: Spelling Corrector Algorithm

2012-03-01 Thread Dyer, James
Yavar, When you listed what the spell checker returns you put them in this order: Marine (Freq: 120), Market (Freq: 900) and others Was Marine listed first, and then did you pick Market because you thought higher frequency is better? If so, you probably have the right settings already but

RE: Need tokenization that finds part of stringvalue

2012-03-01 Thread Dyer, James
Speaking of which, there is a spellchecker in jira that will detect word-break errors like this. See WordBreakSpellChecker at https://issues.apache.org/jira/browse/LUCENE-3523 . To use it with Solr, you'd also need to apply SOLR-2993 (https://issues.apache.org/jira/browse/SOLR-2993). This

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
if title holds smartphone I want it to be found when someone types martph or smar or smart. Peter, so you want to beginsWith startsWith type of search? You can use use wildcard search (with start operator) for this. e.g. q=smar* Alternatively, if your index size is not huge, you can use

Re: handling case insensitive and regex

2012-03-01 Thread Ahmet Arslan
but the following doesn't work. TESTING* Please see the following writeups: http://wiki.apache.org/solr/MultitermQueryAnalysis http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

Re: Spelling Corrector Algorithm

2012-03-01 Thread Robert Muir
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar yhus...@firstam.com wrote: Hi For spell checking component I set extendedResults to get the frequencies and then select the word with the best frequency. I understand the spell check algorithm based on Edit Distance. For an example: Query to

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
@iorixxx: yes, that is what I need. But also when its IN the text, not necessarily at the beginning. So using the * character like: q=smart* the product is found, but when I do this: q=*mart* it isnt...why is that? -- View this message in context:

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Thanks James. I loved the last line in your mail But in the end, especially with 1-word queries, I doubt even the best algorithms are going to always accurately guess what the user wanted. Absolutely I agree to this; if it is a phrase (instead of single word) then probably we can apply some

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Thanks Robert. Yes thats right I can get some more accuracy if I use transposition in addition to substitution, insert and deletion. From: Robert Muir [rcm...@gmail.com] Sent: Thursday, March 01, 2012 9:50 PM To: solr-user@lucene.apache.org Subject: Re:

alphanumeric buckets

2012-03-01 Thread AlexR
Hi i need to build buckets with alphanumeric values. for example: facet.field=person person: Alex(10), Ben(5), George(8), Paul(3), Peter(2), Stefan(9) now i need all person in the interval of A-C with facet.query=person[A TO C] i only get the number of matches (15) but i wanna have the values

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
Added it back in. I still get the same result. On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller markrmil...@gmail.com wrote: Do you have a _version_ field in your schema? I actually just came back to this thread with that thought and then saw your error - so that remains my guess. I'm going to

Simple poll

2012-03-01 Thread ku3ia
Hi, all! It may be seems strange, but can you who read this post answer at some questions. I want to understand, that maybe I want to much from my Solr, so: 1) Solr version; 2) Summary doc count; 3) Shards count (if exists); 4) rows count at query (from ... into); 5) Average queries per minute

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread andrew
I have the same problem. This happens only for some documents in the index. Like sharadgaur, the problem ceased when I removed ReversedWildcardFilterFactory from my analysis chain, HTMLStripCharFilterFactory has been there before and after. I am running branch-3.6 r1238628. As far as I can

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller
P.S. FYI you will have to reindex after adding _version_ back the schema... On Mar 1, 2012, at 3:35 PM, Mark Miller wrote: Any other customizations you are making to solrconfig? On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote: Added it back in. I still get the same result. On Wed, Feb

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread Ahmet Arslan
I have the same problem. This happens only for some documents in the index. Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
--- On Thu, 3/1/12, PeterKerk vettepa...@hotmail.com wrote: From: PeterKerk vettepa...@hotmail.com Subject: Re: Need tokenization that finds part of stringvalue To: solr-user@lucene.apache.org Date: Thursday, March 1, 2012, 6:59 PM @iorixxx: yes, that is what I need. But also when its IN

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
@iorixxx: Where can I find that example schema.xml? I downloaded the latest version here: ftp://apache.mirror.easycolocate.nl//lucene/solr/3.5.0 And checked \example\example-DIH\solr\db\conf\schema.xml But no text_rev type is defined in there. And when I find it, can I just make the title field

Re: Solr Design question on spatial search

2012-03-01 Thread Venu Gmail Dev
I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them. a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Erick Erickson
I'm really confused here. Your first question seemed to be about http involved in index replication, which really doesn't seem to be related to your latest post. Can you start over from the beginning? Best Erick On Thu, Mar 1, 2012 at 9:56 AM, Neel neelkant.potlap...@aspiresys.com wrote: Hi

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Erick Erickson
On frequent method of doing leading and trailing wildcards is to use ngrams (as distinct from edgengrams). That in combination with phrase queries might work well in this case. You also might be surprised at how little space bigrams take, give it a test and see G.. Best Erick On Thu, Mar 1,

Re: alphanumeric buckets

2012-03-01 Thread Emmanuel Espina
Only one interval? in that case you could add a filter query and facet in the regular way. That is: facet.field=personfq=person:[A TO C] But consider that you will get the search results that include those persons only. Thanks Emmanuel 2012/3/1 AlexR alexanderroessler1...@hotmail.com: Hi i

Making additional solr requests in an QueryResponseWriter

2012-03-01 Thread Donnie McNeal
Hi all, The documents in our solr index have an parent child relationship which we have basically flattened in our solr queries. We have messaged solr into being the query API for a 3rd party data. The relationship is simple parent-child relationship as follows: category +-sub-category this

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
@iorixxx: Where can I find that example schema.xml? Please find text_general_rev at http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/schema.xml And when I find it, can I just make the title field which currently is of text type then of text_rev type? Yes, also you

Re: Too many values for UnInvertedField faceting on field topic

2012-03-01 Thread Yonik Seeley
On Thu, Mar 1, 2012 at 3:34 AM, Michael Jakl jakl.mich...@gmail.com wrote: The topic field holds roughly 5 values per doc, but I wasn't able to compute the correct number right now. How many unique values for that field in the whole index? If you have log output (or output from the stats page

Using MLT Handler to find similar documents but also filter similar documents by a keyword.

2012-03-01 Thread Ravish Bhagdev
Hi, Apologies if this has been answered before, I tried searching for it and didn't find anything answering this exactly. I want to find similar documents using MLT Handler using some specified fields but I want to filter down the returned matches with some keywords as well. I looked at the

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
I tried publishing to /update/extract request handler using manifold, but got the same result. I also tried swapping out the replication handlers too, but that didn't do anything. Otherwise, that's it. On Thu, Mar 1, 2012 at 3:35 PM, Mark Miller markrmil...@gmail.com wrote: Any other

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
I reindex every time I change something. I also delete any zookeeper data too. I assuming the windows configuration looked correct? On Thu, Mar 1, 2012 at 3:39 PM, Mark Miller markrmil...@gmail.com wrote: P.S. FYI you will have to reindex after adding _version_ back the schema... On Mar 1,

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller
I assuming the windows configuration looked correct? Yeah, so far I can not spot any smoking gun...I'm confounded at the moment. I'll re read through everything once more... - Mark

Search by url starting with

2012-03-01 Thread lackadaisical
Hi, I am sorry if this has already been posted. I am new to the solr. I am crawling my site using Nutch and posting it to Solr. I am trying to implement a feature where I want to get all data where url starts with http://someurl/; Any thoughts? Thanks, Stan -- View this message in

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread Koji Sekiguchi
(12/03/02 6:05), Ahmet Arslan wrote: I have the same problem. This happens only for some documents in the index. Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it. +1.

Re: Couple issues with edismax in 3.5

2012-03-01 Thread Way Cool
Thanks Ahmet! That's good to know someone else also tried to make phrase queries to fix multi-word synonym issue. :-) On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan iori...@yahoo.com wrote: I don't think mm will help here because it defaults to 100% already by the following code. Default

Re: Making additional solr requests in an QueryResponseWriter

2012-03-01 Thread Mikhail Khludnev
Hello Donnie, 1. Nothing beside of design consideration prevents you form doing search in QueryResponseWriter. You have a request, which isn't closed yet, where you can obtain searcher from. 2. Your usecase isn't clear. If you need just to search categories, and return the lists of subcategories

Architectural question structuring solr, multiple instances or filters

2012-03-01 Thread Ramo Karahasan
Hi I face the issue that i have n business-user. Each business-user has it's own amount products. I want to provide an interface for each business-user where he can find only the products he offers. What would be a be a better solution: 1.)To have one big index and filter by

Re: performance between ExternalFileField and Join

2012-03-01 Thread Tommaso Teofili
Also regarding the Join functionality I remember Yonik pointed out it's O(# unique terms) but I agree with Erik on the ExternalFileField as you can use it just inside a function query, for example, for boosting. Tommaso 2012/3/1 Erick Erickson erickerick...@gmail.com Hmmm. ExternalFileFields