Replace Patter , with

2012-01-16 Thread stockii
Why does this not work? fieldType name=city class=solr.TextField analyzer charfilter class=solr.PatternReplaceFilterFactory pattern=^(\, )$ replacement= replace=first / OR charfilter

Re: Replace Patter , with

2012-01-16 Thread Koji Sekiguchi
(12/01/16 19:43), stockii wrote: Why does this not work? fieldType name=city class=solr.TextField analyzer charfilter class=solr.PatternReplaceFilterFactory pattern=^(\, )$ replacement= replace=first / OR

Re: Replace Patter , with

2012-01-16 Thread stockii
okay, thx =) but i replace it now in my data-config ;) - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every

Solr 3.5 MoreLikeThis on Date fields

2012-01-16 Thread Jaco Olivier
Hi Everyone, Please help out if you know what is going on. We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on our data. Everything seems OK, but Date Fields seem to be broken when using with the MoreLikeThis handler (I also saw the same error on Date Fields using

Re: Can Apache Solr Handle TeraByte Large Data

2012-01-16 Thread Otis Gospodnetic
Hello, From: mustafozbek mustafoz...@gmail.com All documents that we use are rich text documents and we parse them with tika. we need to search real time. Because of real-time requirement, you'll need to use unreleased/dev version of Solr. Robert Stewart

Re: best query for one-box search string over multiple types fields?

2012-01-16 Thread Otis Gospodnetic
Johnny, If you are indexing a catalog of songs and artists you can write a query parser or search component that recognizes known things like song (you must have bohemian rhapsody in your catalog) or artist names (you must have the exact string queen in your catalog) or even their combinations

Re: Solr - Tika(?) memory leak

2012-01-16 Thread Otis Gospodnetic
Wayne, Have you asked on Tika's ML? You may also want to watch https://issues.apache.org/jira/browse/SOLR-2901 Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Wayne W

Replication and segment files

2012-01-16 Thread Herman Kiefus
We are at times having some difficulty achieving a 'successful' replication. Our Operations personnel have reported the following behavior (which I cannot attest to): A master has a set of segment files (let's say 25). A slave then polls the master, get the list of segment files that differ

Re: Replication and segment files

2012-01-16 Thread Otis Gospodnetic
Hi Herman, Try adding this to your replication config: str name=commitReserveDuration00:00:10/str See also http://search-lucene.com/?q=commitReserveDurationfc_project=Solr Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html

RE: Improving Solr Spell Checker Results

2012-01-16 Thread Dyer, James
David, The spellchecker normally won't give suggestions for any term in your index. So even if wever is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x

Re: Solr - Tika(?) memory leak

2012-01-16 Thread P Williams
Hi, I'm not sure which version of Solr/Tika you're using but I had a similar experience which turned out to be the result of a design change to PDFBox. https://issues.apache.org/jira/browse/SOLR-2886 Tricia On Sat, Jan 14, 2012 at 12:53 AM, Wayne W waynemailingli...@gmail.comwrote: Hi,

SolrJ Embedded

2012-01-16 Thread spring
Hi, is it possible to use the same index in a solr webapp and additionally in a EmbeddedSolrServer? The embbedded one would be read only. Thank you.

RE: Can Apache Solr Handle TeraByte Large Data

2012-01-16 Thread Burton-West, Tom
Hello , Searching real-time sounds difficult with that amount of data. With large documents, 3 million documents, and 5TB of data the index will be very large. With indexes that large your performance will probably be I/O bound. Do you plan on allowing phrase or proximity searches? If so,

Re: Detecting replication slave health

2012-01-16 Thread astubbs
Did this ever progress? Shall we make a jira? -- View this message in context: http://lucene.472066.n3.nabble.com/Detecting-replication-slave-health-tp677584p3664739.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re:Re: Re:Re: problem of solr replcation's speed

2012-01-16 Thread astubbs
For future reference, I had this problem, and it was the debug statements in commons HTTP that were printing out all the binary data to the log, but my console appender was set to INFO so I wasn't seeing them. Setting http commons to INFO level fixed my speed issue (two orders of magnitude

Re: Merging text nodes/blocks of the catchall field

2012-01-16 Thread Erick Erickson
I don't know where the commas are coming from, as far as I know that's not part of Solr. You must have the catchall field defined with 'multiValued=true ', so if you set the increment gap to 0, that should help. When you do that, what does your return look like? Best Erick P.S. It's rather

Re: Solr Query Multiple words

2012-01-16 Thread Erick Erickson
What have you tried and what have the results been? Because this is well within Solr's oob capabilities. Best Erick On Fri, Jan 13, 2012 at 10:37 AM, vibhoreng04 vibhoren...@gmail.com wrote: Hi , I want to do a 800 words multiple search across the index of 1 million records. Can anyone

Re: FacetComponent: suppress original query

2012-01-16 Thread Erick Erickson
Why not just up the maxBooleanClauses parameter in solrconfig.xml? Best Erick On Sat, Jan 14, 2012 at 1:41 PM, Dmitry Kan dmitry@gmail.com wrote: OK, let me clarify it: if solrconfig has maxBooleanClauses set to 1000 for example, than queries with clauses more than 1000 in number will be

Re: SolrJ Embedded

2012-01-16 Thread Erick Erickson
I don't see why not. I'm assuming a *nix system here so when Solr updated an index, any deleted files would hang around. But I have to ask why bother with the Embedded server in the first place? You already have a Solr instance up and running, why not just query that instead, perhaps using SolrJ?

Trying to understand SOLR memory requirements

2012-01-16 Thread Dave
I'm trying to figure out what my memory needs are for a rather large dataset. I'm trying to build an auto-complete system for every city/state/country in the world. I've got a geographic database, and have setup the DIH to pull the proper data in. There are 2,784,937 documents which I've formatted

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
What is the largest -Xmx value you have tried? Your index size seems not very big Try -Xmx2048m , it should work On Tue, Jan 17, 2012 at 9:31 AM, Dave dla...@gmail.com wrote: I'm trying to figure out what my memory needs are for a rather large dataset. I'm trying to build an auto-complete

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread Dave
I've tried up to -Xmx5g On Mon, Jan 16, 2012 at 9:15 PM, qiu chi chiqiu@gmail.com wrote: What is the largest -Xmx value you have tried? Your index size seems not very big Try -Xmx2048m , it should work On Tue, Jan 17, 2012 at 9:31 AM, Dave dla...@gmail.com wrote: I'm trying to figure

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
you may disable FST look up and use lucene index as the suggest method FST look up loads all documents into the memory, you can use the lucene spell checker instead On Tue, Jan 17, 2012 at 10:31 AM, Dave dla...@gmail.com wrote: I've tried up to -Xmx5g On Mon, Jan 16, 2012 at 9:15 PM, qiu chi

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread Dave
According to http://wiki.apache.org/solr/Suggester FSTLookup is the least memory-intensive of the lookupImpl's. Are you suggesting a different approach entirely or is that a lookupImpl that is not mentioned in the documentation? On Mon, Jan 16, 2012 at 9:54 PM, qiu chi chiqiu@gmail.com

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread Robert Muir
looks like https://issues.apache.org/jira/browse/SOLR-2888. Previously, FST would need to hold all the terms in RAM during construction, but with the patch it uses offline sorts/temporary files. I'll reopen the issue to backport this to the 3.x branch. On Mon, Jan 16, 2012 at 8:31 PM, Dave

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
I remembered there is another implementation using lucene index file as the look up table not the in memory FST FST has its advantage in speed but if you writes documents during runtime, reconstructing FST may cause performance issue On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir rcm...@gmail.com

Solr Cloud Indexing

2012-01-16 Thread Sujatha Arun
Would it make sense to Index on the cloud and periodically [2-4 times /day] replicate the index at our server for searching .Which service to go with for solr Cloud Indexing ? Any good and tried services? Regards Sujatha

Re: Solr - Tika(?) memory leak

2012-01-16 Thread Wayne W
Thanks for the links - I've put a posting on the Tika ML. I've just checked and we using tika-0.2.jar - does anyone know which version I can use with solr 1.3? Is there any info on upgrading from this far back to the latest version - is it even possible? or would I need to re-index everything?