Re: when to change rows param?

2011-04-12 Thread Paul Libbrecht
Hoss, as of now I managed to adjust this in the client code before it touches the server so it is not urgent at all anymore. I wanted to avoid touching the client code (which is giving, oh great fun, MSIE concurrency miseries) hence I wanted a server-side rewrite of the maximum number of hits

Re: Can I set up a config-based distributed search

2011-04-12 Thread Ran Peled
Thanks, Ludovic and Jonathan. Yes, this configuration default is exactly what I was looking for. Ran On Mon, Apr 11, 2011 at 7:12 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I have not worked with shards/distributed, but I think you can probably specify them as defaults in your

exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
hello. my NRT-Search is not correctly configured =( 2 Solr-Instances. one searcher and one updater the updater start every minute an update of around 3000 documents. and the searcher start an commit ervery minute to refresh the index and read the new doc`s these are my Cache values for an 36

Re: Solr 3.1 performance compared to 1.4.1

2011-04-12 Thread Marius van Zwijndregt
Hi Lance, Well not actually copied over the whole configuration files, instead i just added in the missing configuration (into a fresh copy of the example directory). By the directory implementation do you mean the readers used by SolrIndexSearcher ? These are: reader :

High (io) load and org.mortbay.jetty.EofException

2011-04-12 Thread Marius van Zwijndregt
Hello ! Every night within my maintenance window, during high load caused by postgresql (vacuum analyze), i see a few (10-30) messages showing up in the solr 3.1 logfile. SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
i start a commit on searcher-Core with: .../core/update?commit=truewaitFlush=false - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for

Berlin Buzzwords - conference schedule released

2011-04-12 Thread Simon Willnauer
Hey folks, The Berlin Buzzwords team recently released the schedule for the conference on high scalability. The conference focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking forward to two awesome keynote speakers who shaped

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
my filterCache has a warmupTime from ~6000 ... but my config is like this: LRU Cache(maxSize=3000, initialSize=50, autowarmCount=50 ...) should i set maxSize to 50 or similar value ? - --- System One Server, 12 GB RAM, 2

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
oooh. my queryResultCache has a warmupTime from 54000 = ~1 Minute any suggestions ?? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for

Re: Decrease warmupTime

2011-04-12 Thread stockii
i fighting with the same problem but with jetty. its in this case necessary to delete also the jetty work-DIR ??? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores

Re: Indexing Best Practice

2011-04-12 Thread Darx Oman
Hi Lance thanx for your reply, but I have a question is this patch committed to trunk?

AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Tommaso Teofili
Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't included in the solr-core 3.1.0 artifact as it's in the solr/src/test directory. Was

function query apply only in the subset of the query

2011-04-12 Thread Marco Martinez
Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs 1- name=car ford marketValue=1 score=1.3 2- name=car citroen marketValue=2 score=1.3 3- name=car mercedes

Help with Nested Query

2011-04-12 Thread Hasnain
Hi, Im trying to do somethinglike this in Solr 1.4.1 fq=category_id:(24 79) However the values inside the parenthesis will be fetched through another query, so far I’ve tried using _query_ but it doesnt work the way I want it to. Here is what im trying fq=category_id:(_query_:”{!lucene

Solrj retry handling - prevent ProtocolException: Unbuffered entity enclosing request can not be repeated

2011-04-12 Thread Martin Grotzke
Hi, from time to time we're seeing a ProtocolException: Unbuffered entity enclosing request can not be repeated. in the logs when sending ~500 docs to solr (the stack trace is at the end of the email). I'm aware that this was discussed before (e.g. [1]) and our solution was already to reduce the

Updates during Optimize

2011-04-12 Thread stockii
Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? -

Re: AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Robert Muir
On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Erick Erickson
Chris: Here's the nabble URL: http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html The message in the Solr list is from alexei on 8-April. Strip spaces and newline characters from data. This started happening a couple (?) of weeks ago and I

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Erick Erickson
FWIW, I see the xml I just sent in gMail, so I'm guessing things are over on the nabble side, but I have very little evidence.. Erick P.S. It's not a huge deal, getting to the correct message on nabble is just a click away. But it is a bit annoying. On Tue, Apr 12, 2011 at 8:38 AM, Erick

Re: DIH OutOfMemoryError?

2011-04-12 Thread stockii
Make sure streaming is on. -- how to check ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB

SolrException: Unavailable Service

2011-04-12 Thread Phong Dais
Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I

RE: XML not coming through from nabble to Gmail

2011-04-12 Thread Steven A Rowe
I've asked on Nabble if they know of a fix for the problem: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-tp6023495p6264955.html Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 12, 2011 8:43 AM To: Chris Hostetter

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client

Searching during postcommit

2011-04-12 Thread Reeza Edah Tally
Hi, I have been trying to perform a search using a CommonsHttpSolrServer when my postCommit event listener is called. I am not able to find the documents just commited; the post in postCommit caused me to assume that I would; it seems that the commit only takes effect when all postCommit have

Re: function query apply only in the subset of the query

2011-04-12 Thread Erik Hatcher
Try using AND (or set q.op): q=car+AND+_val_:marketValue On Apr 12, 2011, at 07:11 , Marco Martinez wrote: Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs

Analysing all tokens in a stream

2011-04-12 Thread bjornbear
Hi I would like to build a component that during indexing analyses all tokens in a stream and adds metadata to a new field based on my analysis. I have different tasks that I would like to perform, like basic classification and certain more advanced phrase detections. How would I do this? A

Re: AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Tommaso Teofili
Thanks Robert, that was very useful :) Tommaso 2011/4/12 Robert Muir rcm...@gmail.com On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some

Re: function query apply only in the subset of the query

2011-04-12 Thread Marco Martinez
Thanks but I tried this and I saw that this work in a standard scenario, but in my query i use a my own query parser and it seems that they dont doing the AND and returns all the docs in the index: My query: _query_:{!bm25}car AND _val_:marketValue - 67000 docs returned Solr query parser car

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Chris Hostetter
: : Here's the nabble URL: : : http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html : : The message in the Solr list is from alexei on 8-April. Strip spaces and : newline characters from data. And the raw message as recieved by apache...

Re: Updates during Optimize

2011-04-12 Thread Shawn Heisey
On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same

Re: Updates during Optimize

2011-04-12 Thread Jason Rutherglen
You can index and optimize at the same time. The current limitation or pause is when the ram buffer is flushing to disk, however that's changing with the DocumentsWriterPerThread implementation, eg, LUCENE-2324. On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey s...@elyograg.org wrote: On 4/12/2011

Re: Fwd: machine tags, copy fields and pattern tokenizers

2011-04-12 Thread straup
I'm not sure it's a 100% solution but the new path hierarchy tokenizer seems promising. I've only played with a little bit with a little too booze and not enough sleep (in the sky) so apologies for the potty-mouth-ness of this blog post. http://www.aaronland.info/weblog/2011/04/02/status/#sky

Solr 1.30 Collection Distribution Search

2011-04-12 Thread Li Tan
I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does query to the master, do I need to change any code in the web application to query on the slaves? or does the master requests query from the slaves automatcially? Please help thx.

Re: SolrException: Unavailable Service

2011-04-12 Thread Phong Dais
Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed

Re: Solr 1.30 Collection Distribution Search

2011-04-12 Thread Erick Erickson
Yes. You need to put, say, a load balancer on front of your slaves and distribute the requests to the slave. Best Erick On Tue, Apr 12, 2011 at 2:20 PM, Li Tan litan1...@gmail.com wrote: I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais phong.gd...@gmail.com wrote: Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple

Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Estrada Groups
Did this go to the list? I think I may need to resubscribe... Sent from my iPhone On Apr 12, 2011, at 12:55 AM, Estrada Groups estrada.adam.gro...@gmail.com wrote: Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone

Re: Solr 1.30 Collection Distribution Search

2011-04-12 Thread Li
Thanks Eric, I thought the master does automatically when you setup collection distribution. I wish there are more document for 1.3 collection distribution. Do you know how to show the slave stats on the Master admin page, the distribution tab? Thanks in advance guys. Sent from my iPhone On

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Otis Gospodnetic
It did: http://search-lucene.com/?q=panaramio Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Estrada Groups estrada.adam.gro...@gmail.com To: Estrada Groups

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Otis Gospodnetic
Hi, Does spellchecking in Chinese actually make sense? I once asked a native Chinese speaker about that and the person told me it didn't really make sense. Anyhow, with n-grams, I don't think this could technically work even if it made sense for Chinese, could it? Otis Sematext ::

Re: Searching during postcommit

2011-04-12 Thread Otis Gospodnetic
If I follow things correctly, I think you should be seeing new documents only after the commit is done and the new index searcher is open and available for search. If you are searching before the new searcher is available, you are probably still hitting the old searcher. Otis Sematext ::

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Péter Király
Hi, I did Flickr into Lucene about 3 years ago. There is a Flickr API, which covers almost everything you need (as I remember, not always Flickr feature was implemented at that time in the API, like the collection was not searchable). You can harvest by user ID or searching for a topic. You can

Re: function query apply only in the subset of the query

2011-04-12 Thread Yonik Seeley
On Tue, Apr 12, 2011 at 10:25 AM, Marco Martinez mmarti...@paradigmatecnologico.com wrote: Thanks but I tried this and I saw that this work in a standard scenario, but in my query i use a my own query parser and it seems that they dont doing the AND and returns all the docs in the index: My

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Luke Lu
It doesn't make sense to spell check individual character sized words, but makes a lot of sense for phrases. Due to pervasive use of pinyin IM, it's very easy to write phrases that are totally wrong in semantics and but sounds correct. n-gram should work if it doesn't mangle the characters. On

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Renee Sun
Hi Hoss, thanks for your response... you are right I got a typo in my question, but I did use maxSegments, and here is the exactly url I used: curl 'http://localhost:8080/solr/97/update?optimize=truemaxSegments=10waitFlush=true' I used jconsole and du -sk to monitor each partial optimize, and

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Thanks Otis and Luke. Yes it does make sense to spellcheck phrases in Chinese. Looks like the default Solr spellCheck component is already doing some kind of NGram-ing. When examining the spellCheck index, I did see gram1, gram2, gram3, gram4... The problem is no Chinese terms were indexed into

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Chris Hostetter
: /tmp # ls /xxx/solr/data/32455077/index | wc --- this is the start point, 150 seg files : 150 150 946 : /tmp # time curl the number of files i nthe index directory is not the number of segments the number of segments is an internal lucene concept that impacts

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Estrada Groups
Thanks Peter! I am thinking that I may just use Nutch to do the crawl and index off of these sites. I need to check out the APIs for each to make sure I'm not missing anything related to the geospatial data for each image. Obviously both do the extraction when the images are uploaded so I'm

Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Parker Johnson
I am hoping to get some feedback on the architecture I've been planning for a medium to high volume site. This is my first time working with Solr, so I want to be sure what I'm planning isn't totally weird, unsupported, etc. We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Erick Erickson
I think the repeaters are misleading you a bit here. The purpose of a repeater is usually to replicate across a slow network, say in a remote data center, then slaves at that center can get more timely updates. I don't think they add anything to your disaster recovery scenario. So I'll ignore

Re: Solr and Permissions

2011-04-12 Thread Liam O'Boyle
ManifoldCF sounds like it might be the right solution, so long as it's not secretly building a filter query in the back end, otherwise it will hit the same limits. In the meantime, I have made a minor improvement to my filter query; it now scans the permitted IDs and attempts to build a filter

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Otis Gospodnetic
Hi Parker, Lovely ASCII art. :) Yes, I think you can simplify this by introducing shared storage (e.g., SAN) that hosts the index to which you active/primary master writes. When your primary master dies, you start your stand-by master that is configured to point to the same index. If there