Filtering results based on score

2010-11-01 Thread sivaprasad
Hi, As part of solr results i am able to get the max score.If i want to filter the results based on the max score, let say the max score is 10 And i need only the results between max score to 50 % of max score.This max score is going to change dynamically.How can we implement this?Do we need to

Solr Relevency Calculation

2010-11-01 Thread sivaprasad
Hi, I have 25 indexed fields in my document.But by default, if i give q=laptops this is going to search on five fields and iam getting the score as part of search results.How solr will calculate the score?Is it going to calculate only on the five fields or on 25 fields which are indexed?What is

Boosting the score based on certain field

2010-11-01 Thread sivaprasad
Hi, In my document i have a filed called category.This contains electronics,games ,..etc.For some of the category values i need to boost the document score.Let us say, for electronics category, i will decide the boosting parameter grater than the games category.Is there any body has the idea to

Re: Filtering results based on score

2010-11-01 Thread Ahmet Arslan
As part of solr results i am able to get the max score.If i want to filter the results based on the max score, let say the max score  is 10 And i need only the results between max score  to 50 % of max score.This max score is going to change dynamically.How can we implement this?Do we need

Multiple Keyword Search

2010-11-01 Thread Pawan Darira
Hi There is a situation where i search for more than 1 keyword my main 2 fields are ad_title ad_description. I want those results which match all of the keywords in both fields, should come on top. Then sequentially one by one keyword can be dropped in further results. E.g. In a search of 3

Re:Re: problem of solr replcation's speed

2010-11-01 Thread kafka0102
I hacked SnapPuller to log the cost, and the log is like thus: [2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 979 [2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 4 [2010-11-01

Re:Re:Re: problem of solr replcation's speed

2010-11-01 Thread kafka0102
I suspected my app has some sleeping op every 1s, so I changed ReplicationHandler.PACKET_SZ to 1024 * 1024*10; // 10MB and log result is like thus : [2010-11-01 17:49:29][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 cost 3184 [2010-11-01

Re: Design and Usage Questions

2010-11-01 Thread torin farmer
Hm, I do not have a webserver setup for security reasons.I use SVNKit to connect to SVN via the file:// protocol, what I get then is the ByteArrayOutputStream.What would the buffer-solution or the DualThread Writer/Reader pair look like?-Ursprüngliche Nachricht- Von: Lance Norskog

Re: Design and Usage Questions

2010-11-01 Thread getagrip
Ok, so if I did NOT use Solr_J I could PUSH a Stream to Solr somehow? I do not depend on Solr_J, any connection-method would suffice. On 11/01/2010 03:23 AM, Lance Norskog wrote: 2. The SolrJ library handling of content streams is pull, not push. That is, you give it a reader and it pulls

Re: Custom Sorting in Solr

2010-11-01 Thread Ezequiel Calderara
Ok i imagined that the double linked list would be far too complicated for solr. Now, how can i achieve that solr connects to a webservice and do the import? I'm sorry if i'm not clear, sometimes my english gets fuzzy :P On Fri, Oct 29, 2010 at 4:51 PM, Yonik Seeley

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Roxana Angheluta
Hi, Yes, sometimes it takes 5 minutes for a query. I agree this is not desirable. However, if the application has no control over the input queries other that closing the socket after a while, solr should not continue writing the response, but terminate the thread. In general, is there a way

big terms in UnInvertedField

2010-11-01 Thread Koji Sekiguchi
Hello, With solr example, using facet.field=text creates UnInvertedField for the text field in fieldValueCache. After that, I saw stats page and I was surprised at counters in *filterCache* were up: lookups : 213 hits : 106 hitratio : 0.49 inserts : 107 evictions : 0 size : 107 warmupTime : 0

Re: big terms in UnInvertedField

2010-11-01 Thread Yonik Seeley
2010/11/1 Koji Sekiguchi k...@r.email.ne.jp: With solr example, using facet.field=text creates UnInvertedField for the text field in fieldValueCache. After that, I saw stats page and I was surprised at counters in *filterCache* were up: Do they cause of big words in UnInvertedField? Yes.

Re: big terms in UnInvertedField

2010-11-01 Thread Koji Sekiguchi
Yonik, Thank you for your reply. I just wanted to share my surprise. :) Koji -- http://www.rondhuit.com/en/ (10/11/01 23:17), Yonik Seeley wrote: 2010/11/1 Koji Sekiguchik...@r.email.ne.jp: With solr example, using facet.field=text creates UnInvertedField for the text field in

Re: Solr Relevency Calculation

2010-11-01 Thread Erick Erickson
Here's a good place to start: http://search.lucidimagination.com/search/out?u=http://lucene.apache.org/java/2_4_0/scoring.html http://search.lucidimagination.com/search/out?u=http://lucene.apache.org/java/2_4_0/scoring.htmlBut what do you mean this is going to search on five fields? This

Re: Boosting the score based on certain field

2010-11-01 Thread Erick Erickson
Would simple boosting work? As in category:electronics^2? If not, perhaps you can explain a bit more about what you're trying to accomplish... Best Erick On Sun, Oct 31, 2010 at 10:55 PM, sivaprasad sivaprasa...@echidnainc.comwrote: Hi, In my document i have a filed called category.This

Re: Multiple Keyword Search

2010-11-01 Thread Erick Erickson
I'm not sure this exactly fits your use-case, but it may come close enough. Have you looked at disMax and the mm parameter (minimum should match)? Best Erick On Mon, Nov 1, 2010 at 5:00 AM, Pawan Darira pawan.dar...@gmail.com wrote: Hi There is a situation where i search for more than 1

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Erick Erickson
I'm going to nudge you in the direction of understanding why the queries take so long in the first place rather than going toward the blunt approach of cutting them off after some time. The fact that you don't control the queries submitted doesn't prevent you from trying to understand what is

Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
We are trying to solve some multilingual issues with our Solr analysis filter chain and would like to use the new Lucene 3.x filters that are Unicode compliant. Is it possible to use the Lucene ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr? Is it just a matter of

Re: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Robert Muir
On Mon, Nov 1, 2010 at 12:24 PM, Burton-West, Tom tburt...@umich.edu wrote: We are trying to solve some multilingual issues with our Solr analysis filter chain and would like to use the new Lucene 3.x filters that are Unicode compliant. Is it possible to use the Lucene ICUTokenizerFilter or

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Jonathan Rochkind
I think you guys are talking about two different kinds of 'virtual hosts'. Lance is talking about CPU virtualization. Eric appears to be talking about apache virtual web hosts, although Eric hasn't told us how apache is involved in his setup in the first place, so it's unclear. Assuming you

Facet count of zero

2010-11-01 Thread Tod
I'm trying to exclude certain facet results from a facet query. It seems to work but rather than being excluded from the facet list its returned with a count of zero. Ex: q=(-foo:bar)facet=truefacet.field=foofacet.sort=idxwt=jsonindent=true This returns bar with a count of zero. All the

Problem with phrase matches in Solr

2010-11-01 Thread Moazzam Khan
Hey guys, I have a solr index where i store information about experts from various fields. The thing is when I search for channel marketing i get people that have the word channel or marketing in their data. I only want people who have that entire phrase in their bio. I copy the contents of bio

Re: Facet count of zero

2010-11-01 Thread Yonik Seeley
On Mon, Nov 1, 2010 at 12:55 PM, Tod listac...@gmail.com wrote: I'm trying to exclude certain facet results from a facet query.  It seems to work but rather than being excluded from the facet list its returned with a count of zero. If you don't want to see 0 counts, use facet.mincount=1

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Eric Martin
I was speaking about apache virtual hosts. I was concerned that there was an increase processing time due to the solr and nutch instance being housed inside a virtual host as opposed to being dropped in root of my distro. Thank you for the astute clarification. -Original Message- From:

Re: Problem with phrase matches in Solr

2010-11-01 Thread darren
Take a look at term proximity and phrase query. http://wiki.apache.org/solr/SolrRelevancyCookbook Hey guys, I have a solr index where i store information about experts from various fields. The thing is when I search for channel marketing i get people that have the word channel or marketing

RE: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
Thanks Robert, I'll use the workaround for now (using StandardTokenizerFactory and specifying version 3.1), but I suspect that I don't want the added URL/IP address recognition due to my use case. I've also talked to a couple people who recommended using the ICUTokenFilter with some rule

RE: How does DIH multithreading work?

2010-11-01 Thread Dyer, James
Mark, I have the same question so I did a little research on this. Not a complete answer but here is what I've found: - threads was aded with SOLR-1352 (https://issues.apache.org/jira/browse/SOLR-1352). - Also see

RE: indexing '-

2010-11-01 Thread PeterKerk
Guys, the string type did the trick :) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-tp1816969p1823199.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Robert Muir
On Mon, Nov 1, 2010 at 1:34 PM, Burton-West, Tom tburt...@umich.edu wrote: Thanks Robert, I'll use the workaround for now (using StandardTokenizerFactory and specifying version 3.1), but I suspect that I don't want the added URL/IP address recognition due to my use case.  I've also talked

Testing/packaging question

2010-11-01 Thread Bernhard Reiter
Hi, I'm pretty much of a Solr newbie currently packaging solrpy for Debian; see http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ In order to run solrpy's supplied tests at build time, I'd need Solr to know about the schema.xml that comes with the tests. Can anyone tell

Re: Facet count of zero

2010-11-01 Thread Tod
On 11/1/2010 1:03 PM, Yonik Seeley wrote: On Mon, Nov 1, 2010 at 12:55 PM, Todlistac...@gmail.com wrote: I'm trying to exclude certain facet results from a facet query. �It seems to work but rather than being excluded from the facet list its returned with a count of zero. If you don't want

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Chris Hostetter
: References: aanlktimvv5foc2b=gxo+xs1zwgps9o5t5jorwv3id...@mail.gmail.com : aanlktim30aat8s0nxq_8utxcokv8myyabz8wtxeyl...@mail.gmail.com : aanlktimpo9v_krgaxomd4hocqabibgzdhc+jhhgsq...@mail.gmail.com : aanlktimdvaawj7=b7=pgu+rzm+nobvzdfh4o39nkp...@mail.gmail.com :

Re: Reverse range search

2010-11-01 Thread Jan Høydahl / Cominvent
Hi, I think I have seen a comment on the list from someone with the same need a few months ago. He planned to make a new fieldType to support this, e.g. MinMaxRangeFieldType which would be a polyField type holding both a min and max value, and then you could query it q=myminmaxfield:123 I did

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Eric Martin
I don't think you read the entire thread. I'm assuming you made a mistake. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, November 01, 2010 11:49 AM To: solr-user@lucene.apache.org Subject: Re: Solr in virtual host as opposed to /lib :

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Markus Jelsma
No, he didn't make a mistake but you did. Next time, please start a new thread not by conveniently replying to an existing thread and just changing the subject. Now we have two threads in thread. :) I don't think you read the entire thread. I'm assuming you made a mistake. -Original

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Chris Hostetter
: I don't think you read the entire thread. I'm assuming you made a mistake. No mistake. When you sent your first message with the subject Solr in virtual host as opposed to /lib you did so in response to a completely unrelated thread (Searching with wrong keyboard layout or using translit)

is my search fast ?! date search i need some feedback :D

2010-11-01 Thread stockiii
my index is 13M big and i have not index all of my documents. the index in production system should be about 30M Documents big. so with my test 13M Index i try a search over all documents, with first query: q:[2008-10-27 12:23:00:00 TO 2009-04-29 23:59:00:00] than i run the next query, for

Re: Use SolrCloud (SOLR-1873) on trunk, or with 1.4.1?

2010-11-01 Thread Jeremy Hinegardner
I took a swag at applying SOLR-1873 to branch_3x. It applied mostly, most of the rest of the issues where Zookeeper integrations, and those appliedly cleanly by hand. There were also a few constants and such that need to be pulled in from trunk. At the moment, it passes all the tests. I have

Re: How does DIH multithreading work?

2010-11-01 Thread Lance Norskog
It is useful for parsing PDFs on a multi-processor machine. Also, if a sub-entity does an outbound I/O call to a database, a file, or another SOLR (SOLR-1499). Anything where the pipeline time outweighs disk i/o time. Threading happens on a per-document level- there is no concurrent access

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Lance Norskog
Besides, I don't know how you'd stop Solr processing a query mid-way through, I don't know of any way to make that happen. The timeAllowed parameter causes a timeout in the Solr server to kill the searching thread. They uses that now. But, yes, Erick is right- there is a fundamental problem

Re: Design and Usage Questions

2010-11-01 Thread Lance Norskog
Yes, you can write your own app to read the file with SVNkit and post it to the ExtractingRequestHandler. This would be easiest. On Mon, Nov 1, 2010 at 5:49 AM, getagrip getag...@web.de wrote: Ok, so if I did NOT use Solr_J I could PUSH a Stream to Solr somehow? I do not depend on Solr_J, any

Re: Design and Usage Questions

2010-11-01 Thread Xin Li
If you just want a quick way to query Solr server, Perl module Webservice::Solr is pretty good. On Mon, Nov 1, 2010 at 4:56 PM, Lance Norskog goks...@gmail.com wrote: Yes, you can write your own app to read the file with SVNkit and post it to the ExtractingRequestHandler. This would be

Re: is my search fast ?! date search i need some feedback :D

2010-11-01 Thread Erick Erickson
Careful here. First searches are known to be slow, various caches are filled up the first time they are used etc. So even though you're measuring the second query, it's still perhaps filling caches. And what are you measuring? The raw search time or the entire response time? These can be quite

Which is faster -- delete or update?

2010-11-01 Thread Andy
My documents have a down_vote field. Every time a user votes down a document, I increment the down_vote field in my database and also re-index the document to Solr to reflect the new down_vote value. During searches, I want to restrict the results to only documents with, say fewer than 3

Re: Which is faster -- delete or update?

2010-11-01 Thread Peter Karich
From the user perspective I wouldn't delete it, because it could be that down-voting by mistake or spam or something and up-voting can resurrect it. It could be also wise to keep the docs to see which content (from which users?) are down voted to get spam accounts? From the dev perspective

Re: Which is faster -- delete or update?

2010-11-01 Thread Erick Erickson
Just deleting a document is faster because all that really happens is the document is marked as deleted. An update is really a delete followed by an add of the same document, so by definition an update will be slower... But... does it really make a difference? How often to you expect this to

Re: Which is faster -- delete or update?

2010-11-01 Thread Jonathan Rochkind
The actual time it takes to delete or update the document is unlikely to make a difference to you. What might make a difference to you is the time it takes to actually finalize the commit, and the time it takes to re-warm your indexes after a commit, and especially the time it takes to run

Field boosting in DataImportHandler transformer

2010-11-01 Thread Brad Kellett
It's not looking very promising, but is there something I'm missing to be able to apply a field boost from within a transformer in the DataImportHandler? Not a boost defined within the schema, but a boost applied to the field from the transformer itself. I know you can do a document boost, but

Possible memory leaks with frequent replication

2010-11-01 Thread Simon Wistow
We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory:

Re: Re:Re: problem of solr replcation's speed

2010-11-01 Thread Lance Norskog
This is the time to replicate and open the new index, right? Opening a new index can take a lot of time. How many autowarmers and queries are there in the caches? Opening a new index re-runs all of the queries in all of the caches. 2010/11/1 kafka0102 kafka0...@163.com: I suspected my app has

Re: Possible memory leaks with frequent replication

2010-11-01 Thread Lance Norskog
You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but

Phrase Query Problem?

2010-11-01 Thread Tod
I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example:

Re: Phrase Query Problem?

2010-11-01 Thread Ken Stanley
On Mon, Nov 1, 2010 at 10:26 PM, Tod listac...@gmail.com wrote: I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example:

RE: Ensuring stable timestamp ordering

2010-11-01 Thread Dennis Gearon
how about a timrstamp with either a GUID appended on the end of it? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Default file locking on trunk

2010-11-01 Thread Lance Norskog
Scenario: Git update to current trunk (Nov 1, 2010). Build all Run solr in trunk/solr/example with 'java -jar start.jar' Hi ^C Jetty reports doing shutdown hook There is now a data/index with a write lock file in it. I have not attempted to read the index, let alone add something to it. I start