Re: persistent cache

2010-02-16 Thread Tim Terlegård
2010/2/15 Toke Eskildsen t...@statsbiblioteket.dk: From: Tim Terlegård [tim.terleg...@gmail.com] If the index size is more than you can have in RAM, do you recommend to split the index to several servers so it can all be in RAM? I do expect phrase queries. Total index size is 107 GB. *prx

Pragmatic more or less high availability option on 2 servers

2010-02-16 Thread Robert Krüger
Hi, I have to set up a SOLR cluster with some availability concept (is allowed to require manual interaction on fault, however, if there is a better way, I'd be interested in recommendations). I have two servers (A and B for the example) at my disposal. What I was thinking about was the

Query or FilterQuery for exact field match

2010-02-16 Thread gabriele renzi
Hi everyone, in our app we sometimes use solr programmatically to retrieve all the elements that have a certain value in a single-valued single-token field ( brand:xxx). Since we are not interested in scoring this results, I was thinking that maybe this should be performed as a filterQuery

WG: Performance-Issues and raising numbers of cumulative inserts

2010-02-16 Thread Bohnsack, Sven
Hi Shalin! Thanks for quick response. Sadly it tells me, that i have to look elsewhere to fix the problem. Anyone an idea what could cause the increasing warmup-Times? If required I can post some stats. Thanking you in anticipation! Regards, Sven Feed: Solr-Mailing-List

Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Rakhi Khatwani
Hi, i have indexed some data on solr 1.3.0. Now i wanna upgrade to solr 1.4.0 but on the same data. so here are the following steps i performed: 1. extract solr 1.4.0 2. copied the conf and data folder of my index from solr 1.3.0/examples/multicore to solr1.4.0/examples/multicore/ 3.

Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Steve Radhouani
Hi there, Is there any analysis out there that may help to choose between Tomcat and Jetty to deploy Solr? I wonder wether there's a significant difference between them in terms of performance. Any advice would be much appreciated, -Steve

Re: Query or FilterQuery for exact field match

2010-02-16 Thread gabriele renzi
On Tue, Feb 16, 2010 at 2:04 PM, NarasimhaRaju rajux...@yahoo.com wrote: Hi, using filterQuery(fq) is more efficient because SolrIndexSearcher will make use of filterCache and in your case it returns entire set from the cache instead of searching from the entire index. more info about

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Ron Chan
I'd doubt if a performance benchmark would be very useful, it ultimately depends on what you are trying to do and what you are comfortable with. We've had successful deployments on both. Any difference in performance is far outweighed by ease of setup/support that you personally find in

IndexSchema object

2010-02-16 Thread Gargate, Siddharth
How can we get instance of IndexSchema object in Tokenizer subclass?

multivalued : how to get file names

2010-02-16 Thread Kranti™ K K Parisa
Hi, When we index using SOLR, we have an option called multivalued. How does that work with multiple files associated with same document. For example: submiting a form with some fields + list of pdf files index process: 1) considering all the form fields as individual solr input document fields

Re: regarding ranking

2010-02-16 Thread Smith G
Hello , Thanks. That clears my doubts.Coming to the point two, Can you please tell me which part of the Similarity takes care of the same. Is it possible to implement in such a way that we give more preference to number of found terms. Also, here in our case we need to give more

dataimporthandler and expungeDeletes=false

2010-02-16 Thread Jorg Heymans
Hi, Can anybody tell me if [1] still applies as of version trunk 03/02/2010 ? I am removing documents from my index using deletedPkQuery and a deltaimport. I can tell from the logs that the removal seems to be working: 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder

Re: multivalued : how to get file names

2010-02-16 Thread Erick Erickson
Unless you have *evidence* that the indexing each pdf with the form data as a single SOLR document is a problem, I would just index the fields with each document rather than try to index the PDFs as multivalued. The space used by duplicating the form field data is probably a tiny fraction of the

Re: Getting max/min dates from solr index

2010-02-16 Thread Mark N
thanks . Is it possible to do date faceting on multiple solr shards? I am using index created in two different shards to do date faceting on field DATE *

Re: persistent cache

2010-02-16 Thread Jason Rutherglen
On a related note. Maybe it'd be good to have wiki page of experiences and possibly stats of various SSD drives? Either on Lucene or Solr wiki sites? 2010/2/16 Tim Terlegård tim.terleg...@gmail.com: 2010/2/15 Toke Eskildsen t...@statsbiblioteket.dk: From: Tim Terlegård

Delete by query discrepancy

2010-02-16 Thread Mat Brown
Hi all, Trying to debug a very sneaky bug in a small Solr extension that I wrote, and I've come across an odd situation. Here's what my test suite does: deleteByQuery(*:*); // add some documents commit(); // test the search This works fine. The test suite that exposed the error (which is

Re: Delete by query discrepancy

2010-02-16 Thread Mark Miller
Mat Brown wrote: Hi all, Trying to debug a very sneaky bug in a small Solr extension that I wrote, and I've come across an odd situation. Here's what my test suite does: deleteByQuery(*:*); // add some documents commit(); // test the search This works fine. The test suite that exposed

Re: regarding ranking

2010-02-16 Thread Ahmet Arslan
Hello ,           Thanks. That clears my doubts. Coming to the point two, Can you please tell me which part of the Similarity takes care of the same. Is it possible to implement in such a way that we give more preference to number of found terms. public float coord(int overlap, int

Re: Delete by query discrepancy

2010-02-16 Thread Mat Brown
Cool, thanks - just wanted to make sure I'm not insane. Makes sense that there would be a difference if the index is built fresh in that case. On Tue, Feb 16, 2010 at 11:59, Mark Miller markrmil...@gmail.com wrote: Mat Brown wrote: Hi all, Trying to debug a very sneaky bug in a small Solr

Strict Hierarchical Facets (SOLR-64)

2010-02-16 Thread Wadim Kruse
Hi @all, I am getting the same recursive-concatenated results as the guys in the comments (http://issues.apache.org/jira/browse/SOLR-64). I couldn't get hiefacets working wether with release-1.4.0 nor with branch-1.4.0. I've got a 1.4.0-dev incl. SOLR-64 running and in parallel a 1.4.0-final. I

Re: cannot match on phrase queries

2010-02-16 Thread Kevin Osborn
It definitely had something to do with omitTermFreqAndPosition. As soon as I disabled the option and re-indexed, my queries starting working as expected.I suspect it has to something to do with terms occupying the same position and losing that information by using omitTermFreqAndPositions, but

Re: Upgrading Tika in Solr

2010-02-16 Thread Grant Ingersoll
I've got a task open to upgrade to 0.6. Will try to get to it this week. Upgrading is usually pretty trivial. On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote: Afternoon, I've got a large collections of documents which I'm attempting to add to a Solr index using Tika via the

Re: How to retrieve relevance debug/explain info in code?

2010-02-16 Thread uwdanny
any hints? -- View this message in context: http://old.nabble.com/How-to-retrieve-relevance-%22debug-explain%22-info-in-code--tp27602530p27612814.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating index: Replacing data directory recommended?

2010-02-16 Thread Peter Karich
Hi, any hints or suggestions? Does anyone do the updating this way? Regards, Peter. Hi solr community! Is it recommended to replace the data directory of a heavy used solr instance? (I am aware of the http queries, but that will be too slow) I need a fast way to push development data to

Re: How to retrieve relevance debug/explain info in code?

2010-02-16 Thread Erick Erickson
Any details? This is pretty ambiguous tacking debugQuery=true to a URL brings back some stuff in Lucene, IndexSearcher.explain()? Erick On Tue, Feb 16, 2010 at 1:21 PM, uwdanny uwda...@gmail.com wrote: any hints? -- View this message in context:

Tool for analyzing data in solr

2010-02-16 Thread dipti khullar
Hi All Is there any tool to analyze corrupted data in Solr. I am aware of luke. But does it shows somehow that the data is corrupted? Like some segments are missing or whether some documents have been corrupted - not fully indexed? Thanks Dipti

Re: How to retrieve relevance debug/explain info in code?

2010-02-16 Thread uwdanny
Hi erick, thanks for the reply. my query url includes debugQuery=on and the result page is correctly showing all the debug / explain info. the problem I'm facing is that I cannot get the same debug/explain info in code. I've been trying IndexSearcher.explain(Weight, int ) API, as well as

Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Daniel Shane
I've setup a simple DIH import handler with Solr that connects via a database to my data. I have a small worry though. When I call the full-import functions, can I configure Solr (via the XML files) to make sure there are rows to index before wiping everything? What worries me is if, for some

Question about custom Lucene filters and Solr

2010-02-16 Thread Jon Bodner
Hello, I'm interested in using Solr with a custom Lucene Filter (like the one described in section 6.4.1 of the Lucene In Action, Second Edition book). I'd like to filter search results from a Lucene index against information stored in a relational database. I don't want to move the

Re: regarding ranking

2010-02-16 Thread Smith G
Hello , Thanks for your detailed explaination. Do you want to punish *more* long documents? Not alot, but a bit more than default implementation. It seems lengthNorm is field based and pinushing lengthy fields does fit most of the cases in our project. There will be a trade-off

filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Hi everyone, I am attempting to implement a faceted drill down feature with Solr. I am having problems explaining some results of the fq parameter. Let's say I have two fields, 'people' and 'category'. I do a search for 'dog' and ask to facet on the people and category fields. I am told that

Range Queries, Geospatial

2010-02-16 Thread Fuad Efendi
Hi, I've read very interesting interview with Ryan, http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and -Videos/Interview-Ryan-McKinley Another finding is https://issues.apache.org/jira/browse/SOLR-773 (lucene/contrib/spatial) Is there any more staff going on for SOLR

Re: Re: Updating index: Replacing data directory recommended?

2010-02-16 Thread Peter Karich
Hi Ups, sorry. I didn't recognized the answer because it was in the bulk folder. I though with this procedure it will be a lot faster and less overhead. Just two lines of shell script. What do you think? Regards, Peter. This should work on Linux. The rsync based replication scripts used

RE: filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Problem solved. I wasn't quoting the value. Since I was using names such as 'Gary Bettman' solr must have been giving all the Garys. -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, February 16, 2010 3:22 PM To: 'solr-user@lucene.apache.org'

Re: Deleting spelll checker index

2010-02-16 Thread darniz
Thanks Hoss Apology for flooding the post. But still i cant stop thinking about this. i deleted my entire index and now i have 0 documents. Now if i make a query with accrd i still get a suggestion of accord even though there are no document returned since i deleted my entire index. i hope it

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Israel Ekpo
Hi Jon, You will need to write a plugin You will need custom Query parser and an Update Handler depending on what you are doing. The implementation of an Update Handler or Update Request Processor is not recommended because it is considered to be advanced. Take a look at the following links

Merge several queries into one result?

2010-02-16 Thread Daniel Shane
Hi all! I'm trying to join 2 indexes together to produce a final result using only Solr + Velocity Response Writer. The problem is that each hit of the main index contains references to some common documents located in another index. For example, the hit could have a field that describes in

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Jon Bodner
Hi Israel (et al), I don't think that I need an Update Handler; I don't intend to change the values in the search index (in fact, the goal is to build a Lucene index with Hadoop and then point a Solr instance at it). What I'm trying to do is split the document into two locations: one is the

Re: regarding ranking

2010-02-16 Thread Ahmet Arslan
After getting aware of all these combinations, it seems not wise to proceed blindly by punushing what ever we want. Thank you very much for letting me know. Generally most of the people are happy with default solr scoring. Especially in web like search. I am not sure but you can find this

Re: How to retrieve relevance debug/explain info in code?

2010-02-16 Thread uwdanny
update - found the answer API getExplainList in org.apache.solr.util.SolrPluginUtils works. uwdanny wrote: Hi, I was trying to get the detailed explain info in (java) code using the APIs, see codes below, - ResponseBuilder rb (from some inherited process

ConstantScoreQuery and wildcards

2010-02-16 Thread TCK
Hi, It seems that when I do a search with a wildcard (eg, +text:abc*) the Solr standard SearchHandler will construct a ConstantScoreQuery passing in a Filter, so all the documents in the result set are scored the same. Is there a way to make Solr construct a BooleanQuery instead so that scoring

Re: ConstantScoreQuery and wildcards

2010-02-16 Thread Ahmet Arslan
It seems that when I do a search with a wildcard (eg, +text:abc*) the Solr standard SearchHandler will construct a ConstantScoreQuery passing in a Filter, so all the documents in the result set are scored the same. Is there a way to make Solr construct a BooleanQuery instead so that

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Chris Hostetter
: According to this email exchange between Koji and Mat Brown, : : http://www.mail-archive.com/solr-user@lucene.apache.org/msg23759.html : : The boost value from copyField's shouldn't be accumulated into the boost for : the text field, can anyone else verify this? This seem to go against what

Re: How to query multiple fields with phrases

2010-02-16 Thread Chris Hostetter
: I need to do a search that will search 3 different fields and combine : the results. First, it needs to not break the phrase into tokens, but : rather treat it is a phrase for one field. The other fields need to be : parsed with their normal analyzers. your description of your goal is a

Seattle Hadoop/Lucene/NoSQL Meetup; Wed Feb 24th, Feat. MongoDB

2010-02-16 Thread Bradford Stephens
Greetings, It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL Meetup! As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here: http://www.washington.edu/home/maps/southcentral.html?cse Last month, we had a

Re: Request time out in solr

2010-02-16 Thread Chris Hostetter
: I want to know How can I set request timeout through perl by : webservice::solr end or solr end so that I could hanlde request timeout I've never used WebService::Solr, but it's docs say it takes in a user agent object, (ie: LWP::UserAgent) so that's where you can specify the client side

Re: Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Chris Hostetter
:i have indexed some data on solr 1.3.0. Now i wanna upgrade to solr : 1.4.0 but on the same data. : so here are the following steps i performed: : 1. extract solr 1.4.0 : 2. copied the conf and data folder of my index from solr : 1.3.0/examples/multicore to solr1.4.0/examples/multicore/ :

Re: Collating results from multiple indexes

2010-02-16 Thread Will Johnson
Jan Hoydal / Otis, First off, Thanks for mentioning us. We do use some utility functions from SOLR but our index engine is built on top of Lucene only, there are no Solr cores involved. We do have a JOIN operator that allows us to perform relational searches while still acting like a search

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Koji Sekiguchi
Chris Hostetter wrote: : According to this email exchange between Koji and Mat Brown, : : http://www.mail-archive.com/solr-user@lucene.apache.org/msg23759.html : : The boost value from copyField's shouldn't be accumulated into the boost for : the text field, can anyone else verify this? This

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Chris Hostetter
: I have a small worry though. When I call the full-import functions, can : I configure Solr (via the XML files) to make sure there are rows to : index before wiping everything? What worries me is if, for some unknown : reason, we have an empty database, then the full-import will just wipe :

Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Chris Hostetter
: I'm interested in using Solr with a custom Lucene Filter (like the one : described in section 6.4.1 of the Lucene In Action, Second Edition : book). I'd like to filter search results from a Lucene index against : information stored in a relational database. I don't want to move the :

Re: Deleting spelll checker index

2010-02-16 Thread Chris Hostetter
: But still i cant stop thinking about this. : i deleted my entire index and now i have 0 documents. : : Now if i make a query with accrd i still get a suggestion of accord even : though there are no document returned since i deleted my entire index. i : hope it also clear the spell check index

Re: defaultSearchField and DisMaxRequestHandler

2010-02-16 Thread Chris Hostetter
: no but you can set a default for the qf parameter with the same value good call... https://issues.apache.org/jira/browse/SOLR-1776 -Hoss

Re: How to retrieve relevance debug/explain info in code?

2010-02-16 Thread Erick Erickson
Thanks for bringing closure. Erick On Tue, Feb 16, 2010 at 7:13 PM, uwdanny uwda...@gmail.com wrote: update - found the answer API getExplainList in org.apache.solr.util.SolrPluginUtils works. uwdanny wrote: Hi, I was trying to get the detailed explain info in (java) code

Re: Merge several queries into one result?

2010-02-16 Thread Erick Erickson
It's generally a bad idea to try to think of various SOLR/Lucene indexes in a database-like way, Lucene isn't built to do RDBMS-like stuff. The first suggestion is usually to consider flattening your data. That would be something like adding NY and New York in each document. If that's not

Re: implementing profanity detector

2010-02-16 Thread Lance Norskog
A problem is that your profanity list will not stop growing, and with each new word you will want to rescrub the index. We had a thousand-word NOT clause in every query (a filter query would be true for 99% of the index) until we switched to another arrangement. Another small problem was that I

Re: schema design - catch all field question

2010-02-16 Thread Lance Norskog
The data copied from title to content is exactly the strings that you give. The data is copied around, then each field is analyzed. Changing 'title' from text to string makes no difference. On Mon, Feb 15, 2010 at 6:48 AM, adeelmahmood adeelmahm...@gmail.com wrote: I am just trying to

Re: Question on Index Replication

2010-02-16 Thread Lance Norskog
When you change an index you do not have to copy the entire index again. The new part of the index is in separate files and the replication code knows to only pull the differences. Indexing on a master and copying to slaves works very well - there are thousands of Solr installations using that

Re: Tool for analyzing data in solr

2010-02-16 Thread Lance Norskog
This is the CheckIndex program in Lucene. I don't have a link handy for running it, but it is in the lucene-core jar file in solr/lib. On Tue, Feb 16, 2010 at 11:08 AM, dipti khullar dipti.khul...@gmail.com wrote: Hi All Is there any tool to analyze corrupted data in Solr. I am aware of luke.

Re: regarding ranking

2010-02-16 Thread Lance Norskog
Norms are generally not calculated. You need to change the field you want with this attribute: omitNorms=false. On Tue, Feb 16, 2010 at 2:38 PM, Ahmet Arslan iori...@yahoo.com wrote: After getting aware of all these combinations, it seems not wise to proceed blindly by punushing what ever we

Re: Performance-Issues and raising numbers of cumulative inserts

2010-02-16 Thread Lance Norskog
These are some very large numbers. 700k ms is 70 seconds, 4M ms is 4k seconds or 66 minutes. No Solr installation should take this long to warm up. There is something very wrong here. Have you optimized lately? What queries do you run to warm it up? And, the basics: how many documents, how much

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have a small worry though. When I call the full-import functions, can : I configure Solr (via the XML files) to make sure there are rows to : index before wiping everything? What worries me is if, for some

Re: Performance-Issues and raising numbers of cumulative inserts

2010-02-16 Thread Antonio Lobato
I've actually run into this issue; huge, 30 minute warm up times. I've found that reducing the auto-warm count on caches (and the general size of the cache) helped a -lot-, as did making sure my warm up query wasn't something like: q=*:*facet=truefacet.field=somethingWithAWholeLotOfTerms

Re: Upgrading from solr1.3 to solr1.4

2010-02-16 Thread Rakhi Khatwani
Hi, Solr home: 1.3.0/examples/multicore Type of Queries: Recursive e.g. I search in the index for some name that returns some rows. For each row there is a field called parentid which is a unique key for some other row in the index. The next queries search the index for the parentid . This

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-16 Thread Chris Hostetter
: I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the : boosts have been propogated to copyField destinations since that method was : added in 2007 (initially it didn't deal with copyfields at all, but once : that was fixed it copied the boosts as well.) ... : Hmm,

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-16 Thread Steve Radhouani
Thanks Ron. Actually, I'm developing a Web search engine. Would that matter? Thanks. 2010/2/16 Ron Chan rc...@i-tao.com I'd doubt if a performance benchmark would be very useful, it ultimately depends on what you are trying to do and what you are comfortable with. We've had successful