proximity question

2010-07-06 Thread mike anderson
, Mike Anderson

Re: SolrCloud in production?

2010-08-01 Thread mike anderson
I'd second the request for more information on the current state of SolrCloud. I have a 16 shard Solr setup in production running 1.3, and a lot of the features of SolrCloud would make my life a lot easier. Cheers, Mike On Sat, Jul 24, 2010 at 12:52 PM, Dennis Gearon gear...@sbcglobal.netwrote:

Re: How to retrieve the full corpus

2010-09-06 Thread mike anderson
You might check out Luke, the Lucene Index Toolbox. http://www.getopt.org/luke/ I know you can browse the index and get frequency counts, though I'm not sure if you can export the entire index as a list like what you're looking for. Hope this helps, Mike On Mon, Sep 6, 2010 at 10:52 AM, Roland

Re: upgrade index from 2.9 to 3.x

2010-09-24 Thread mike anderson
at 10:33 AM, Markus Jelsma markus.jel...@buyways.nlwrote: There is a recent thread on this one http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html On Friday 24 September 2010 16:30:36 mike anderson wrote: What is the right way to upgrade a solr index from Lucene 2.9.1 to 3.x

benchmarking tools

2009-10-27 Thread Mike Anderson
. If anybody has some insight into this kind of project I'd love to get some feedback. Thanks in advance, Mike Anderson

Re: benchmarking tools

2009-10-28 Thread mike anderson
at java.net's Faban benchmarking framework. We use it extensively for our acceptance tests and tuning excercises. Joshua On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote: I've been making modifications here and there to the Solr source code in hopes to optimize for my particular setup. My

field queries seem slow

2009-11-02 Thread mike anderson
I took a look through my Solr logs this weekend and noticed that the longest queries were on particular fields, like author:albert einstein. Is this a result consistent with other setups out there? If not, Is there a trick to make these go faster? I've read up on filter queries and use those when

Re: apply a patch on solr

2009-11-02 Thread mike anderson
You can see what revision the patch was written for at the top of the patch, it will look like this: Index: org/apache/solr/handler/MoreLikeThisHandler.java === --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)

Re: field queries seem slow

2009-11-04 Thread mike anderson
erickerick...@gmail.com wrote: H, are you sorting? And has your readers been reopened? Is the second query of that sort also slow? If the answer to this last question is no, have you tried some autowarming queries? Best Erick On Mon, Nov 2, 2009 at 4:34 PM, mike anderson

atypical MLT use-case

2009-12-09 Thread Mike Anderson
This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would

content stream/MLT

2009-12-09 Thread Mike Anderson
I'm trying to understand how content stream works with respect to MLT. I did a regular MLT query using a document ID and specifying two fields to do MLT on and got back a set of results. I then copied the xml for the document with the aforementioned ID and pasted it to a text file. Then I made the

MLT calculation

2009-12-16 Thread Mike Anderson
How exactly is MLT calculated? I'm trying to gain an intuition for it by tweaking the parameters MLT.qf, MLT.mintf, and MLT.mindf (mostly the former, changing boosts), but so far it's a bit counter intuitive. How does MLT.boost play in? If anybody could point me to a technical description

Re: Lock problems: Lock obtain timed out

2010-01-25 Thread mike anderson
I am getting this exception as well, but disk space is not my problem. What else can I do to debug this? The solr log doesn't appear to lend any other clues.. Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990 Jan 25,

Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread mike anderson
I think you might be looking for Apache Tika. On Mon, Jan 25, 2010 at 3:55 PM, Frank van Lingen fr...@vanlingen.namewrote: I recently started working with solr and find it easy to setup and tinker with. I now want to scale up my setup and was wondering if there is an application/component

Re: Best OCR API for solr

2010-02-04 Thread mike anderson
There might be an OCR plugin for Apache Tika (which does exactly this out of the box except for OCR capability, i believe). http://lucene.apache.org/tika/ -mike 2010/2/4 Krantiā„¢ K K Parisa kranti.par...@gmail.com Hi, Can anyone list the best OCR APIs available to use in combination with

solr-ruby with clustering

2010-03-22 Thread mike anderson
Has anybody got solr-ruby to return a clustering result? (using the clustering component) I'm almost certain the query is correct (I check the solr logs for the query and run it in my browser, get back the cluster output as expected). But when I dump the response from my solr-ruby query the

phrase query with autosuggest (SOLR-1316)

2010-10-06 Thread mike anderson
It seemed like SOLR-1316 was a little too long to continue the conversation. Is there support for quotes indicating a phrase query. For example, my autosuggest query for mike sha ought to return mike shaffer, mike sharp, etc. Instead I get suggestions for mike and for sha, resulting in a collated

how well does multicore scale?

2010-10-21 Thread mike anderson
I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from

Re: how well does multicore scale?

2010-10-22 Thread mike anderson
On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr

Re: how well does multicore scale?

2010-10-26 Thread mike anderson
://wiki.apache.org/solr/CoreAdmin Since Solr 1.3 On Fri, Oct 22, 2010 at 1:40 PM, mike anderson saidthero...@gmail.com wrote: Thanks for the advice, everyone. I'll take a look at the API mentioned and do some benchmarking over the weekend. -Mike On Fri, Oct 22, 2010 at 8:50 AM

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind rochk...@jhu.eduwrote: mike anderson wrote: I'm really curious if there is a clever solution to the obvious problem with: So your better off using a single index and with a user id and use a query filter with the user id when fetching data., i.e

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
: On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: [...] By my simple math, this would mean that if we want each shard's index to be able to fit in memory, [...] Might I ask why you're planning on using memory-based sharding? The performance gap between memory and SSDs is not very big so

Re: Improving Solr performance

2011-01-07 Thread mike anderson
Making sure the index can fit in memory (you don't have to allocate that much to Solr, just make sure it's available to the OS so it can cache it -- otherwise you are paging the hard drive, which is why you are probably IO bound) has been the key to our performance. We recently opted to use less

Re: Improving Solr performance

2011-01-10 Thread mike anderson
Not sure if this was mentioned yet, but if you are doing slave/master replication you'll need 2x the RAM at replication time. Just something to keep in mind. -mike On Mon, Jan 10, 2011 at 5:01 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2011-01-10 at 21:43 +0100, Paul wrote: I

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread mike anderson
[x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On

Re: Multicore boosting to only 1 core

2011-02-15 Thread mike anderson
Could you make an additional date field, call it date_boost, that gets populated in all of the cores EXCEPT the one with the newest articles, and then boost on this field? Then when you move articles from the 'newest' core to the rest of the cores you copy over the date to the date_boost field. (I

Re: Index Solr Logs

2011-06-26 Thread mike anderson
Check out Logg.ly. http://www.loggly.com/. They use SOLR to index all kinds of logs, SOLR included. This is a paid service, so maybe not what you're looking for. I've used it though, works great. -Mike On Sun, Jun 26, 2011 at 5:49 AM, Mr Havercamp mrhaverc...@gmail.com wrote: I'm interested to

spellcheck component in 1.4 distributed

2009-08-07 Thread mike anderson
I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any help would be much appreciated. Thanks in advance, Mike

Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
Hi all, I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be for 1.5. Any help would be much appreciated. Thanks in advance, Mike (sorry if this sent twice)

Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
whoops, sorry guys On Sat, Aug 8, 2009 at 12:37 PM, mike anderson saidthero...@gmail.comwrote: Hi all, I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be for 1.5. Any help would be much

spellcheck component in 1.4 distributed

2009-08-08 Thread mike anderson
Hi all, I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be for 1.5. Any help would be much appreciated. Thanks in advance, Mike

ruby client and building spell check dictionary

2009-08-14 Thread Mike Anderson
I set up the spell check component with this code in the config file: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=nametitleCheck/str str name=classnamesolr.IndexBasedSpellChecker/str str

MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
I'm trying to get MLT working in 1.4 distributed mode. I was hoping the patch *SOLR-788 /jira/browse/SOLR-788 *would do the trick, but after applying the patch by hand to revision 737810 (it kept choking on component/MoreLikeThisComponent.java) I still get nothing. The URL I am using is this:

Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
/solrq=theory+of+colorful+graphsmlt.mintf=1mlt=true} status=0 QTime=164 On Tue, Aug 18, 2009 at 11:30 AM, Grant Ingersoll gsing...@apache.orgwrote: Are there errors in the logs? -Grant On Aug 18, 2009, at 10:42 AM, mike anderson wrote: I'm trying to get MLT working in 1.4 distributed mode

Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
PM, mike anderson saidthero...@gmail.comwrote: There doesn't appear to be any related errors in the log. I've included it below anyhow (there is a java.lang.NumberFormatException, i'm not sure what that is). thanks, mike for the query: http://localhost:8983/solr/select?q=%22theory%20of

stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
I'm kind of stumped by this one.. is it something obvious? I'm running the latest trunk. In some cases the stopFilterFactory isn't removing the field name. Thanks in advance, -mike From debugQuery (both words are in the stopwords file):

Re: stopfilterFactory isn't removing field name

2009-09-14 Thread mike anderson
the problem is. -mike On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley yo...@lucidimagination.comwrote: That's pretty strange... perhaps something to do with your synonyms file mapping for to a zero length token? -Yonik http://www.lucidimagination.com On Mon, Sep 14, 2009 at 12:13 AM, mike anderson

Re: stopfilterFactory isn't removing field name

2009-09-15 Thread mike anderson
Could this be related to SOLR-1423? On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley yo...@lucidimagination.comwrote: Thanks, I'll see if I can reproduce... -Yonik http://www.lucidimagination.com On Mon, Sep 14, 2009 at 2:10 AM, mike anderson saidthero...@gmail.com wrote: Yeah