old searchers not closing after optimize or replication

2011-04-20 Thread Bernd Fehling
Hello list, we have the problem that old searchers often are not closing after optimize (on master) or replication (on slaves) and therefore have huge index volumes. Only solution so far is to stop and start solr which cleans up everything successfully, but this can only be a workaround. Is the

How could each core share configuration files

2011-04-20 Thread kun xiong
Hi all, Currently in my project , most of the core configurations are same(solrconfig.xml, dataimport.properties...), which are putted in their own folder as reduplicative. I am wondering how could I put common ones in one folder, which each core could share, and keep the different ones in

Re: How could each core share configuration files

2011-04-20 Thread lboutros
Perhaps this could help : http://lucene.472066.n3.nabble.com/Shared-conf-td2787771.html#a2789447 Ludovic. 2011/4/20 kun xiong [via Lucene] ml-node+2841801-1701787156-383...@n3.nabble.com Hi all, Currently in my project , most of the core configurations are same(solrconfig.xml,

RE: Custom Sorting

2011-04-20 Thread Michael Owen
Ok thank you for the discussion. As I thought regard to not possible within performance limits. I think the way to go is to document some more stats at index time, and use them in boost queries. :) Thanks Mike Date: Tue, 19 Apr 2011 15:12:00 -0400 Subject: Re: Custom Sorting From:

Re: TikaEntityProcessor

2011-04-20 Thread firdous_kind86
hi, i asked that :) didnt get that.. what dependencies? i am using solr 1.4 and tika 0.9 i replaced tika-core 0.9 and tika-parsers 0.9 at /contrib/extraction/lib also replaced old version of dataimporthandler-extras by apache-solr-dataimporthandler-extras-3.1.0.jar but still same problem..

Selecting (and sorting!) by the min/max value from multiple fields

2011-04-20 Thread jmaslac
Hello, short question is this - is there a way for a search to return a field that is not defined in the schema but is a minimal/maximum value of several (int/float) fields in solrDocument? (and how would that search look like?) Longer explanation. I have products and each of them can have a

Re: Selecting (and sorting!) by the min/max value from multiple fields

2011-04-20 Thread Tanguy Moal
Hello, Have you tried reading : http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function From that page I would try something like : http://host:port/solr/select?q=sonysort=min(min(priceCash,priceCreditCard),priceCoupon)+ascrows=10indent=ondebugQuery=on Is that of any help ? -- Tanguy On

Saravanan Chinnadurai/Actionimages is out of the office.

2011-04-20 Thread Saravanan . Chinnadurai
I will be out of the office starting 20/04/2011 and will not return until 21/04/2011. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data

RE: How could each core share configuration files

2011-04-20 Thread Ephraim Ofir
I just use soft-links... Ephraim Ofir -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: Wednesday, April 20, 2011 10:09 AM To: solr-user@lucene.apache.org Subject: Re: How could each core share configuration files Perhaps this could help :

Re: Selecting (and sorting!) by the min/max value from multiple fields

2011-04-20 Thread jmaslac
Tanguy, thanks for the anwser. Yes I have already tried that but the problem is that min() function is not yet available (it is set for Solr 3.2). :( Btw. in my original post I've asked if the query could in the results return a new field with this computed minimal value - that is redudant,

Re: KStemmer for Solr 3.x +

2011-04-20 Thread Ofer Fort
Seems like it isn't. In my installation (1.4.1) i used LucidKStemFilterFactory, and when switching the solr.war file to the 3.1 one i get: 14:42:31.664 ERROR [pool-1-thread-1]: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z at

Re: old searchers not closing after optimize or replication

2011-04-20 Thread Erick Erickson
Does this persist? In other words, if you just watch it for some time, does the disk usage go back to normal? Because it's typical that your index size will temporarily spike after the operations you describe as new searchers are warmed up. During that interval, both the old and new searchers are

Re: old searchers not closing after optimize or replication

2011-04-20 Thread Bernd Fehling
Hi Erik, Am 20.04.2011 13:56, schrieb Erick Erickson: Does this persist? In other words, if you just watch it for some time, does the disk usage go back to normal? Only after restarting the whole solr the disk usage goes back to normal. Because it's typical that your index size will

Re: old searchers not closing after optimize or replication

2011-04-20 Thread Erick Erickson
H, this isn't right. You've pretty much eliminated the obvious things. What does lsof show? I'm assuming it shows the files are being held open by your Solr instance, but it's worth checking. I'm not getting the same behavior, admittedly on a Windows box. The only other thing I can think of

Solr - Multi Term highlighting issue

2011-04-20 Thread Ramanathapuram, Rajesh
Hello, I am dealing with a highlighting issue in SOLR, I will try to explain the issue. When I search for a single term in solr, it wraps em tag around the words I want to highlight, all works well. But if I search multiple term, for most part highlighting works good and then for some of the

Re: old searchers not closing after optimize or replication

2011-04-20 Thread Bernd Fehling
Hi Erik, Am 20.04.2011 15:42, schrieb Erick Erickson: H, this isn't right. You've pretty much eliminated the obvious things. What does lsof show? I'm assuming it shows the files are being held open by your Solr instance, but it's worth checking. Just commited new content 3 times and

Re: TikaEntityProcessor

2011-04-20 Thread Andreas Kemkes
I went unsuccessfully down this path - too many incompatibilities among versions - some code changes and recompiling required. See also thread Solr 1.4.1 and Tika 0.9 - some tests not passing for remaining issues. You'll have better luck with the newer Solr 3.1 release, which already uses

Re: TikaEntityProcessor

2011-04-20 Thread firdous_kind86
after reading this post i hoped that i could achieve.. but couldnt find any success in almost a week http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a867572 -- View this message in context:

Multiple Tags and Facets

2011-04-20 Thread Em
Hello, I watched an online video with Chris Hostsetter from Lucidimagination. He showed the possibility of having some Facets that exclude *all* filter while also having some Facets that take care of some of the set filters while ignoring other filters. Unfortunately the Webinar did not explain

Re: old searchers not closing after optimize or replication

2011-04-20 Thread Erick Erickson
It looks OK, but still doesn't explain keeping the old files around. What is your deletionPolicy in your solrconfig.xml look like? It's possible that you're seeing Solr attempt to keep around several optimized copies of the index, but that still doesn't explain why restarting Solr removes them

Re: Solr - Multi Term highlighting issue

2011-04-20 Thread Erick Erickson
Does your configuration have hl.mergeContiguous set to true by any chance? And what happens if you explicitly set this to false on your query? Best Erick On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh rajesh.ramanathapu...@turner.com wrote: Hello, I am dealing with a highlighting

HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-20 Thread Robert Gründler
Hi all, i'm getting the following exception when using highlighting for a field containing HTMLStripCharFilterFactory: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token ... exceeds length of provided text sized 21 It seems this is a know issue:

Re: Creating a TrieDateField (and other Trie fields) from Lucene Java

2011-04-20 Thread Yonik Seeley
On Tue, Apr 19, 2011 at 11:17 PM, Craig Stires craig.sti...@gmail.com wrote: The barrier I have is that I need to build this offline (without using a solr server, solrconfig.xml, or schema.xml) This is pretty unusual... can you share your use case? Solr can also be run in embedded mode if you

Re: HTMLStripCharFilterFactory, highlighting and InvalidTokenOffsetsException

2011-04-20 Thread Robert Muir
Hi, there is a proposed patch uploaded to the issue. Maybe you can help by reviewing/testing it? 2011/4/20 Robert Gründler rob...@dubture.com: Hi all, i'm getting the following exception when using highlighting for a field containing HTMLStripCharFilterFactory:

stemming filter analyzers, any favorites?

2011-04-20 Thread Robert Petersen
Stemming filter analyzers... anyone have any favorites for particular search domains? Just wondering what people are using. I'm using Lucid K Stemmer and having issues. Seems like it misses a lot of common stems. We went to that because of excessively loose matches on the

Re: stemming filter analyzers, any favorites?

2011-04-20 Thread Erick Erickson
You can get a better sense of exactly what tranformations occur when if you look at the analysis page (be sure to check the verbose checkbox). I'm surprised that bags doesn't match bag, what does the analysis page say? Best Erick On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen rober...@buy.com

Bug in solr.KeywordMarkerFilterFactory?

2011-04-20 Thread Demian Katz
I've just started experimenting with the solr.KeywordMarkerFilterFactory in Solr 3.1, and I'm seeing some strange behavior. It seems that every word subsequent to a protected word is also treated as being protected. For testing purposes, I have put the word spelling in my protwords.txt. If I

Re: Bug in solr.KeywordMarkerFilterFactory?

2011-04-20 Thread Yonik Seeley
On Wed, Apr 20, 2011 at 2:01 PM, Demian Katz demian.k...@villanova.edu wrote: I've just started experimenting with the solr.KeywordMarkerFilterFactory in Solr 3.1, and I'm seeing some strange behavior.  It seems that every word subsequent to a protected word is also treated as being

RE: Solr - Multi Term highlighting issue

2011-04-20 Thread Ramanathapuram, Rajesh
Thanks Erick. I tried your suggestion, the issue still exists.

Re: Bug in solr.KeywordMarkerFilterFactory?

2011-04-20 Thread Robert Muir
No, this is only a bug in analysis.jsp. you can see this by comparing analysis.jsp's dontstems bees to using the query debug interface: lst name=debug str name=rawquerystringdontstems bees/str str name=querystringdontstems bees/str str name=parsedqueryPhraseQuery(text:dontstems bee)/str

RE: Bug in solr.KeywordMarkerFilterFactory?

2011-04-20 Thread Demian Katz
That's good news -- thanks for the help (not to mention the reassurance that Solr itself is actually working right)! Hopefully 3.1.1 won't be too far off, though; when the analysis tool lies, life can get very confusing! :-) - Demian -Original Message- From: Robert Muir

Re: ConcurrentLRUCache$Stats error

2011-04-20 Thread Chris Hostetter
: https://issues.apache.org/jira/browse/SOLR-1797 that issue doesn't seem to have anything to do with the stack trace reported... : SEVERE: java.util.concurrent.ExecutionException: : java.lang.NoSuchMethodError: : org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/c :

RE: stemming filter analyzers, any favorites?

2011-04-20 Thread Robert Petersen
I have been doing that, and for Bags example the trailing 's' is not being removed by the Kstemmer so if indexing the word bags and searching on bag you get no matches. Why wouldn't the trailing 's' get stemmed off? Kstemmer is dictionary based so bags isn't in the dictionary? That trailing

entity name issue

2011-04-20 Thread tjtong
Hi guys, I have encountered a problem with entity name, see the data config code below. the variable '${ea.a_aid}' was always empty. I suspect it is a namespace issue. Anyone knows how to bypass it? This is on oracle database. I had to use the prefix myschema., otherwise, the table name was not

Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
Hi, I am looking for the best way to find the terms with the highest frequency for a given subset of documents. (terms in the text field) My first thought was to do a count facet search , where the query defines the subset of documents and the facet.field is the text field, this gives me the

RE: Highest frequency terms for a subset of documents

2011-04-20 Thread Jonathan Rochkind
I think faceting is probably the best way to do that, indeed. It might be slow, but it's kind of set up for exactly that case, I can't imagine any other technique being faster -- there's stuff that has to be done to look up the info you want. BUT, I see your problem: don't use

Re: How to index MS SQL Server column with image type

2011-04-20 Thread Chris Hostetter
: Subject: How to index MS SQL Server column with image type : : Hi all, : : When I index a column(image type) of a table via * : http://localhost:8080/solr/dataimport?command=full-import* : *There is a error like this: String length must be a multiple of four.* For future refrence: full

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
thanks, but that's what i started with, but it took an even longer time and threw this: Approaching too many values for UnInvertedField faceting on field 'text' : bucket size=15560140 Approaching too many values for UnInvertedField faceting on field 'text : bucket size=15619075 Exception during

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
seems like the facet search is not all that suited for a full text field. ( http://search.lucidimagination.com/search/document/178f1a82ff19070c/solr_severe_error_when_doing_a_faceted_search#16562790cda76197 ) Maybe i should go another direction. I think that the HighFreqTerms approach, just not

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Chris Hostetter
: thanks, but that's what i started with, but it took an even longer time and : threw this: : Approaching too many values for UnInvertedField faceting on field 'text' : : bucket size=15560140 : Approaching too many values for UnInvertedField faceting on field 'text : : bucket size=15619075 :

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Yonik Seeley
On Wed, Apr 20, 2011 at 7:34 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : thanks, but that's what i started with, but it took an even longer time and : threw this: : Approaching too many values for UnInvertedField faceting on field 'text' : : bucket size=15560140 : Approaching too

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
Thanks but i've disabled the cache already, since my concern is speed and i'm willing to pay the price (memory), and my subset are not fixed. Does the facet search do any extra work that i don't need, that i might be able to disable (either by a flag or by a code change), Somehow i feel, or rather

Re: How to return score without using _val_

2011-04-20 Thread Yonik Seeley
On Tue, Apr 19, 2011 at 11:41 PM, Bill Bell billnb...@gmail.com wrote: I would like to influence the score but I would rather not mess with the q= field since I want the query to dismax for Q. Something like: fq={!type=dismax qf=$qqf v=$qspec} fq={!type=dismax qt=dismaxname v=$qname}

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
BTW, i'm using solr 1.4.1, does 3.1 or 4.0 contain any performance improvements that will make a difference as far as facet search? thanks again Ofer On Thu, Apr 21, 2011 at 2:45 AM, Ofer Fort o...@tra.cx wrote: Thanks but i've disabled the cache already, since my concern is speed and i'm

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Yonik Seeley
On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort o...@tra.cx wrote: Thanks but i've disabled the cache already, since my concern is speed and i'm willing to pay the price (memory) Then you should not disable the cache. , and my subset are not fixed. Does the facet search do any extra work that i

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort
my documents are user entries, so i'm guessing they vary a lot. Tomorrow i'll try 3.1 and also 4.0, and see if they have an improvement. thanks guys! On Thu, Apr 21, 2011 at 3:02 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort o...@tra.cx wrote:

Solr - upgrade from 1.4.1 to 3.1 - finding AbstractSolrTestCase binaries - help please?

2011-04-20 Thread Bob Sandiford
HI, all. I'm working on upgrading from 1.4.1 to 3.1, and I'm having some troubles with some of the unit test code for our custom Filters. We wrote the tests to extend AbstractSolrTestCase, and I've been reading the thread about the test-harness elements not being present in the 3.1

RE: Creating a TrieDateField (and other Trie fields) from Lucene Java

2011-04-20 Thread Craig Stires
Hi Yonik, The limitations I need to work within, have to do with the index already being built as part of an existing process. Currently, the Solr server is in read-only mode and receives new indexes daily from a Java application. The Java app runs Lucene/Tika and is indexing resources within

The issue of import data from database using Solr DIH

2011-04-20 Thread Kevin Xiang
Hi all, I am a new to solr,I am importing data from database using DIH(solr 1.4).One document is made up of two entity,Every entity is a table in database. For example: Table1:have 3 fields; Table2:have 4 fields; If it is Ok,it will be 7 fields. But it is only 4 fields,it seem that solr don't

Apache Spam Filter Blocking Messages

2011-04-20 Thread Trey Grainger
Hey (solr-user) Mailing list admin's, I've tried replying to a thread multiple times tonight, and keep getting a bounce-back with this response: Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the

Re: Apache Spam Filter Blocking Messages

2011-04-20 Thread Marvin Humphrey
On Thu, Apr 21, 2011 at 12:30:29AM -0400, Trey Grainger wrote: (FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL Note the HTML_MESSAGE in the list of things SpamAssassin didn't like. Apparently I

Need to create dyanamic indexies base on different document workspaces

2011-04-20 Thread Gaurav Shingala
Hi, Is there a way to create different solr indexes for different categories? We have different document workspaces and ideally want each workspace to have its own solr index. Thanks, Gaurav