Re: Problem in faceting

2011-02-04 Thread Grijesh
change the default operator from OR to AND by using q.op or in schema - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p248.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet Query

2011-02-04 Thread Grijesh
No ,Facet query and fq parameters work with any type of query. when you will search for facet.query=city:mumbai then it will return facet like lst name=facet_queries int name=city:mumbaie3/int /lst Facet query is for faceting against perticullar query. If you wants result for that query

Re: Problem in faceting

2011-02-04 Thread Bagesh Sharma
But i want results as it is as the above query is returning. There is no problem with the results with it is returning. Problem detail I have implemented search for my company in which in search box user can search any query. Now when a user search water treatment plant. Then the results come

Solr faceting on score

2011-02-04 Thread Bagesh Sharma
Hi friends, Is it possible to do faceting over score. I want to results from facets which have more score. Please suggest. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422076.html Sent from the Solr - User mailing list archive at

Re: Problem in faceting

2011-02-04 Thread Grijesh
Try solr's new Local Params ,may that will help for your requirement. http://wiki.apache.org/solr/LocalParams - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422534.html Sent from the Solr -

RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE
Using a facet query like facet.query=+water +treatement +plant ... should give a count of 0 to documents not having all tree terms. This could do the trick, if I understand how this parameter works.

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-04 Thread Churchill Nanje Mambe
thanks Dominique I am on windows... how do I do this on a windows 7 machine... I have netbeans and I have SVN and ant plugins regards Mambe Churchill Nanje 237 33011349, AfroVisioN Founder, President,CEO http://www.afrovisiongroup.com | http://mambenanje.blogspot.com skypeID: mambenanje

Re: DataImportHandler usage with RDF database

2011-02-04 Thread Otis Gospodnetic
Hi Lewis, I am very interested in DataImportHandler. I have data stored in an RDF db and wish to use this data to boost query results via Solr. I wish to keep this data stored in db as I have a web app which directly maintains this db. Is it possible to use a DataImportHandler to read

Re: geodist and spacial search

2011-02-04 Thread Eric Grobler
Hi Grant, Thanks for the tip This seems to work: q=*:* fq={!func}geodist() sfield=store pt=49.45031,11.077721 fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll gsing...@apache.org wrote: Use a filter query? See

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Otis Gospodnetic
Gustavo, I haven't used RMI in 5 years, but last time I used it I remember it being problematic - this is in the context of Lucene-based search involving some 40 different shards/servers, high query rates, and some 2 billion documents, if I remember correctly. I remember us wanting to get

Re: value for maxFieldLength

2011-02-04 Thread Otis Gospodnetic
Lewis, A large maxFieldLength may not necessarily result in OOM - it depends on -Xmx you are using, the number of concurrent documents being processed, and such. So the first thing I'd look would be my machine's RAM, then -Xmx I can afford, then based on that set maxFieldLengthmay. Otis

Re: Using terms and N-gram

2011-02-04 Thread Otis Gospodnetic
Hi, The main difference is that CommonGrams will take 2 adjacent words and put them together, while NGram* stuff will take a single word and chop it up in sequences of one or more characters/letters. If you are stuck with auto-complete stuff, consider

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Otis Gospodnetic
Hi, I'll admit I didn't read your email closely, but the first part makes me thing that ngrams, which I don't think you mentioned, might be handy for you here, allowing for misspellings without the implementation complexity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Re: Solr Indexing Performance

2011-02-04 Thread Otis Gospodnetic
Hi, 2 GB for ramBufferSize is probably too much and not needed, but you could increase it from default 32 MB to something like 128 MB or even 512 MB, if you really have that much data where that would make a difference (you mention only 49 PDF files). I'd leave mergeFactor at 10 for now. The

Re: Detect Out of Memory Errors

2011-02-04 Thread Otis Gospodnetic
Hi, There are external tools that one can use to watch Java processes, listen for errors, and restart processes if they die - monit, daemontools, and some Java-specific ones. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Hi, Sharding is an option too but that too comes with limitations so want to keep that as a last resort but I think there must be other things coz 150GB is not too big for one drive/server with 32GB Ram. Hmm what makes you think 32 GB is enough for your 150 GB index? It depends on

Re: Highlighting with/without Term Vectors

2011-02-04 Thread Otis Gospodnetic
Salman, It also depends on the size of your documents. Re-analyzing 20 fields of 500 bytes each will be a lot faster than re-analyzing 20 fields with 50 KB each. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -

Re: Solr for finding similar word between two documents

2011-02-04 Thread Otis Gospodnetic
Rohan, You can really do that with Lucene's tokenizers to get individual tokens/words and a HashMap where keys are those words/tokens from the first document. You can then tokenize the second doc and check each of its words in the HashMap. Our Key Phrase Extractor (

Re: Problem in faceting

2011-02-04 Thread Bagesh Sharma
Sending two separate queries is an approach but i think it may affect performance of the solr because for every new search there will be two queries to solr due to this reason i was thinking to do it by a single query. I am going to implement it with two queries now but if any thing is found

Re: Facet Query

2011-02-04 Thread Bagesh Sharma
yes it works fine ... thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Query-tp2422212p2424155.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Index Not Matching

2011-02-04 Thread Esclusa, Will
Hello Grijesh, The URL below returns a 404 with the following error: The requested resource (/select/) is not available. -Original Message- From: Grijesh [mailto:pintu.grij...@gmail.com] Sent: Friday, February 04, 2011 12:17 AM To: solr-user@lucene.apache.org Subject: RE: Index Not

Re: Index Not Matching

2011-02-04 Thread Stefan Matheis
try http://localhost:8080/solr/select?q=*:* or while using solr's default port http://localhost:8983/solr/select?q=*:* On Fri, Feb 4, 2011 at 2:50 PM, Esclusa, Will william.escl...@bonton.com wrote: Hello Grijesh, The URL below returns a 404 with the following error: The requested resource

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Gustavo Maia
Hi Otis, Hello, You have many documents, 2 billion. Could you explain to me how this set yours? The mine is defined as follows, but using lucene. I have 3 machines and each machine with 6 each hds. Each hd this index with afragment of 10GB. Soon I have 3 servers search. Each server uses the

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Mattmann, Chris A (388J)
Hi Guys, It depends on what properties you're trying to maximize. I've done several studies of this over the years: http://sunset.usc.edu/~mattmann/pubs/MSST2006.pdf http://sunset.usc.edu/~mattmann/pubs/IWICSS07.pdf http://sunset.usc.edu/~mattmann/pubs/icse-shark08.pdf And if you're really

prices

2011-02-04 Thread Dennis Gearon
Using solr 1.4. I have a price in my schema. Currently it's a tfloat. Somewhere along the way from php, json, solr, and back, extra zeroes are getting truncated along with the decimal point for even dollar amounts. So I have two questions, neither of which seemed to be findable with google.

RE: DataImportHandler usage with RDF database

2011-02-04 Thread McGibbney, Lewis John
Hi Otis... thanks for your thoughts. I don't think DIH can read from a triple store today. It can read from a RDBMS, RSS/Atom feeds, URLs, mail servers, maybe others... Maybe what you should be looking at is the ManifoldCF instead, although I don't think it can fetch data from triple stores

Re: prices

2011-02-04 Thread Yonik Seeley
On Fri, Feb 4, 2011 at 12:56 PM, Dennis Gearon gear...@sbcglobal.net wrote: Using solr 1.4. I have a price in my schema. Currently it's a tfloat. Somewhere along the way from php, json, solr, and back, extra zeroes are getting truncated along with the decimal point for even dollar amounts.

Re: HTTP ERROR 400 undefined field: *

2011-02-04 Thread Jed Glazner
Sorry for the lack of details. It's all clear in my head.. :) We checked out the head revision from the 3.x branch a few weeks ago (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We picked up r1058326. We upgraded from a previous checkout (r960098). I am using our

RE: prices

2011-02-04 Thread Jonathan Rochkind
Your prices are just dollars and cents? For actual queries, you might consider an int type rather than a float type. Multiple by a hundred to put it in the index, then multiply your values in queries by a hundred before putting them in the query. Same for range facetting, just divide by 100

Re: Using terms and N-gram

2011-02-04 Thread openvictor Open
Hi Otis, That's good I finally made it. For sematext I am afraid that I am too poor to consider this solution :) (I am doing that for fun) Thank you anyway ! 2011/2/4 Otis Gospodnetic otis_gospodne...@yahoo.com Hi, The main difference is that CommonGrams will take 2 adjacent words and put

Re: prices

2011-02-04 Thread Dennis Gearon
That's a good idea, Yonik. So, fields that aren't stored don't get displayed, so the float field in the schema never gets seen by the user. Good, I like it. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram
I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram
Well I assume many people out there would have indexes larger than 100GB and I don't think so normally you will have more RAM than 32GB or 64! As I mentioned the queries are mostly phrase, proximity, wildcard and combination of these. What exactly do you mean by distribution of documents? On

NullPointerException on queries to new 3rd core

2011-02-04 Thread Alex Thurlow
I just moved to a multi core solr instance a few weeks ago, and it's been working great. I'm trying to add a 3rd core and I can't query against it though. I'm running 1.4.1 (and tried 1.4.0) with the spatial search plugin. This is the section in solr.xml cores adminPath=/admin/cores

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Jay Hill
You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan

WordDelimiterFilterFactory

2011-02-04 Thread John kim
If i use WordDelimiterFilterFactory during indexing and at query time, will a search for cls500 find cls 500 and cls500x? If so, will it find and score exact matches higher? If not, how do you get exact matches to display first?

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Otis Gospodnetic
Hi Gustavo, I think none of the answers I could give you would be valuable to you now, because they would be from circa 2007 or 2008. We didn't use Solr, just Lucene. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/

Re: Highlighting with/without Term Vectors

2011-02-04 Thread Otis Gospodnetic
Hi Salman, Ah, so in the end you *did* have TV enabled on one of your fields! :) (I think this was a problem we were trying to solve a few weeks ago here) How many docs you have in the index doesn't matter here - only N docs/fields that you need to display on a page with N results need to be

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From:

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Heh, I'm not sure if this is valid thinking. :) By *matching* doc distribution I meant: what proportion of your millions of documents actually ever get matched and then how many of those make it to the UI. If you have 1000 queries in a day and they all end up matching only 3 of your docs, the

Re: geodist and spacial search

2011-02-04 Thread Bill Bell
Why not just: q=*:* fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc http://localhost:8983/solr/select?q=*:*sfield=storept=49.45031,11.077721; d=40fq={!bbox}sort=geodist%28%29%20asc That will sort, and filter up to 40km. No need for the fq={!func}geodist()

UIMA Error

2011-02-04 Thread Darx Oman
hi guys i'm trying to use UIMA contrib, but i got the following error ... INFO: [] webapp=/solr path=/select params={clean=falsecommit=truecommand=statusqt=/dataimport} status=0 QTime=0 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.UIMAUpdateRequestProcessor processText INFO: Analazying