Re: Solr Memory Usage

2010-12-14 Thread Toke Eskildsen
On Tue, 2010-12-14 at 06:07 +0100, Cameron Hurst wrote: [Cameron expected 150MB overhead] As I start to index data and passing queries to the database I notice a steady rise in the RAM but it doesn't stop at 150MB. If I continue to reindex the exact same data set with no additional data

Re: Query performance very slow even after autowarming

2010-12-14 Thread johnnyisrael
Hi Chris, Thanks for looking into it. Here is the sample query. http://localhost:8080/solr/core0/select/?qt=autosuggestq=a I am using a request handler with a name autosuggest with the following configuration. requestHandler name=autosuggest class=solr.SearchHandler lst

Google like search

2010-12-14 Thread satya swaroop
Hi All, Can we get the results like google having some data about the search... I was able to get the data that is the first 300 characters of a file, but it is not helpful for me, can i be get the data that is having the first found key in that file Regards, Satya

Re: Google like search

2010-12-14 Thread Tanguy Moal
Hi Satya, I think what you'e looking for is called highlighting in the sense of highlighting the query terms in their matching context. You could start by googling solr highlight, surely the first results will make sense. Solr's wiki results are usually a good entry point :

Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy, I am not asking for highlighting.. I think it can be explained with an example.. Here i illustarte it:: when i post the query like dis:: http://localhost:8080/solr/select?q=Javaversion=2.2start=0rows=10indent=on i Would be getting the result as follows:: -response

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Where did you put the jar? All, Can anyone shed some light on this error. I can't seem to get this class to load. I am using the distribution of Solr from Lucid Imagination and the Spatial Plugin from here https://issues.apache.org/jira/browse/SOLR-773. I don't know how to apply a patch

Is there a way to view the values of stored=false fields in search results?

2010-12-14 Thread Swapnonil Mukherjee
Hi All, I have setup certain fields to be indexed=true and stored=false. According to the documentation fields marked as stored=false do not appear in search results, which is perfectly ok. But now I have a situation where I need to debug to see the value of these fields. So is there a way to

Re: Is there a way to view the values of stored=false fields in search results?

2010-12-14 Thread Ahmet Arslan
But now I have a situation where I need to debug to see the value of these fields. So is there a way to see the value of stored=false fields? You cannot see the original values. But you can see what is indexed. http://www.getopt.org/luke/ can display it.

RE: Google like search

2010-12-14 Thread Dave Searle
Highlighting is exactly what you need, although if you highlight the whole book, this could slow down your queries. Index/store the first 5000-1 characters and see how you get on -Original Message- From: satya swaroop [mailto:satya.yada...@gmail.com] Sent: 14 December 2010 10:08

Re: Query-Expansion, copyFields, flexibility and size of Index (Solr-3.1-SNAPSHOT)

2010-12-14 Thread mdz-munich
Okay, I start guessing: - Do we have to write a customized QueryParserPlugin? - On which point does the RequestHandler/QueryParser/whatever decide what query-analyzer to use? 10% for every copied field is a lot for us, we're facing Terra-bytes of digitized Book-Data. So we want to keep the

De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Jason Brown
I have configured de-duplication according to the Wiki.. My signature field is defined thus... field name=signature type=string stored=true indexed=true multiValued=false / and my updateRequestProcessor as follows updateRequestProcessorChain name=dedupe processor

Re: De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Markus Jelsma
Check this setting: bool name=overwriteDupesfalse/bool On Tuesday 14 December 2010 14:26:21 Jason Brown wrote: I have configured de-duplication according to the Wiki.. My signature field is defined thus... field name=signature type=string stored=true indexed=true

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Anyway, try putting the jar in work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/ On Tuesday 14 December 2010 11:10:47 Markus Jelsma wrote: Where did you put the jar? All, Can anyone shed some light on this error. I can't seem to get this class to load. I am using

Re: RAM usage issues

2010-12-14 Thread Erick Erickson
Several observations: 1 If by RAM buffer size you're referring to the value in solrconfig.xml, ramBufferSizeMB, that is a limit on the size of the internal buffer while indexing. When that limit is reached the data is flushed to disk. It is irrelevant to searching. 2 When you run searches,

Re: Solr Tika, Text with style

2010-12-14 Thread Grant Ingersoll
To do that, you need to keep the original content and store it in a field. On Dec 11, 2010, at 10:56 AM, ali fathieh wrote: Hello, I've seen this link: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika What I got is pure text without any

Re: Search with facet.pivot

2010-12-14 Thread Grant Ingersoll
The formatting of your message is a bit hard to read. Could you please clarify which commands worked and which ones didn't? Since the pivot stuff is relatively new, there could very well be a bug, so if you can give a simple test case that shows what is going on that would also be helpful,

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Grant Ingersoll
For this functionality, you are probably better off using trunk or branch_3x. There are quite a few patches related to that particular one that you will need to apply in order to have it work correctly. On Dec 13, 2010, at 10:06 PM, Adam Estrada wrote: All, Can anyone shed some light on

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
But the entirety of the old indexes (no longer on disk) wasn't cached in memory, right? Or is it? Maybe this is me not understanding lucene enough. I thought that portions of the index were cached in disk, but that sometimes the index reader still has to go to disk to get things that aren't

Re: Google like search

2010-12-14 Thread Tanguy Moal
Satya, In fact the highlighter will select the relevant part of the whole text and return it with the matched terms highlighted. If you do so for a whole book, you will face the issue spotted by Dave (too long text). To address that issue, you have the possibility to split your book in

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Upayavira
A Lucene index is made up of segments. Each commit writes a segment. Sometimes, upon commit, some segments are merged together into one, to reduce the overall segment count, as too many segments hinders performance. Upon optimisation, all segments are (typically) merged into a single segment.

Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy, Thanks for ur reply. sorry to ask this type of question. how can we index each chapter of a file as seperate document.As for i know we just give the path of file to solr to index it... Can u provide me any sources for this type... I mean any blogs or wiki's... Regards,

Re: Google like search

2010-12-14 Thread Tanguy Moal
To do so, you have several possibilities, I don't know if there is a best one. It depends pretty much on the format of the input file(s), your affinities with a given programing language,some libraries you might need and the time you're ready to spend on this task. Consider having a look at

Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
When using the index replication over HTTP that was introduced in Solr 1.4, what is the recommended way to periodically clean up old indexes on the slaves? I found references to the snapcleaner script, but that seems to be for the older ssh/rsync replication model. thanks, Tim

Need some guidance on solr-config settings

2010-12-14 Thread Mark
Can anyone offer some advice on what some good settings would be for an index or around 6 million documents totaling around 20-25gb? It seems like when our index gets to this size our CPU load spikes tremendously. What would be some appropriate settings for ramBufferSize and mergeFactor? We

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey
On 12/14/2010 8:31 AM, Tim Heckman wrote: When using the index replication over HTTP that was introduced in Solr 1.4, what is the recommended way to periodically clean up old indexes on the slaves? I found references to the snapcleaner script, but that seems to be for the older ssh/rsync

Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey
On 12/14/2010 8:31 AM, Mark wrote: Can anyone offer some advice on what some good settings would be for an index or around 6 million documents totaling around 20-25gb? It seems like when our index gets to this size our CPU load spikes tremendously. If you are adding, deleting, or updating

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
Yeah, I understand basically how caches work. What I don't understand is what happens in replication if, the new segment files are succesfully copied, but the actual commit fails due to maxAutoWarmingSearches. The new files are on disk... but the commit could not succeed and there is NOT a

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
On Tue, Dec 14, 2010 at 10:37 AM, Shawn Heisey s...@elyograg.org wrote: It's supposed to take care of removing the old indexes on its own - when everything is working, it builds an index.timestamp directory, replicates, swaps that directory in to replace index, and deletes the directory with

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey
On 12/14/2010 9:13 AM, Tim Heckman wrote: Once per day in the morning, I run a full index + optimize into an on deck core. When this is complete, I swap the on deck with the live core. A side-effect of this is that the version number / generation of the live index just went backwards, since the

Re: Very high load

2010-12-14 Thread Shawn Heisey
On 12/13/2010 9:15 PM, Mark wrote: No cache warming queries and our machines have 8g of memory in them with about 5120m of ram dedicated to so Solr. When our index is around 10-11g in size everything runs smoothly. At around 20g+ it just falls apart. I just replied to your new email thread,

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Shawn Heisey
On 12/14/2010 9:02 AM, Jonathan Rochkind wrote: 1. Will the existing index searcher have problems because the files have been changed out from under it? 2. Will a future replication -- at which NO new files are available on master -- still trigger a future commit on slave? I'm not really

Re: RAM usage issues

2010-12-14 Thread Shawn Heisey
On 12/13/2010 9:46 PM, Cameron Hurst wrote: When i start the server I am using about 90MB of RAM which is fine and from the google searches I found that is normal. The issue comes when I start indexing data. In my solrconf.xml file that my maximum RAM buffer is 32MB. In my mind that means that

changing data type

2010-12-14 Thread Wodek Siebor
Using DataImportHandler. In the select statement I use Oracle decode() function. As the result I have to change the indexed field from int to string. However, during the load Solr throws an exception. Any experience with that? Thanks -- View this message in context:

changing data type

2010-12-14 Thread Wodek Siebor
Using DataImportHandler. In the select statement I use Oracle decode() function. As the result I have to change the indexed field from int to string. However, during the load Solr throws an exception. Any experience with that? Thanks -- View this message in context:

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
Thanks Shawn, that helps explain things. So the issue there, with using maxSearchWarmers to try and prevent out of control RAM/CPU usage from over-lapping on-deck, combined with replication... is if you're still pulling down replications very frequently but using maxSearchWarmers to prevent

facet.pivot for date fields

2010-12-14 Thread Adeel Qureshi
It doesnt seems like pivot facetting works on dates .. I was just curious if thats how its supposed to be or I am doing something wrong .. if I include a datefield in the pivot list .. i simply dont get any facet results back for that datefield Thanks Adeel

Re: changing data type

2010-12-14 Thread Erick Erickson
You haven't given us much to go on. Please post: 1 your DIH statement 2 your schema file, particularly fieldType and field in question 3 the exception trace. 4 Anything else that comes to mind. Remember we know nothing about your particular setup... Best Erick On Tue, Dec 14, 2010 at 3:17

Re: changing data type

2010-12-14 Thread Wodek Siebor
The DIH statement works fine if I run it directly in SQL developer. It's sth like: decode(oracle_field, 0, 'string_1', 1, 'string_2') The oracle_field is of type int, and in the schema.xml, since the decode output is string then the corresponding indexed field is of type string. Is there a

[DIH] Example for SQL Server

2010-12-14 Thread Adam Estrada
Does anyone have an example config.xml file I can take a look at for SQL Server? I need to index a lot of data from a DB and can't seem to figure out the right syntax so any help would be greatly appreciated. What is the correct /jar file to use and where do I put it in order for it to work?

Re: [DIH] Example for SQL Server

2010-12-14 Thread Erick Erickson
The config isn't really any different for various sql instances, about the only difference is the driver. Have you seen the example in the distribution somewhere like solr_home/example/example-DIH/solr/db/conf/db-data-config.xml? Also, there's a magic URL for debugging DIH at:

Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Jan Høydahl / Cominvent
Hi, A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 I want to set the default to 0.8 so that if a user enters the query foo~ it euqals to foo~0.8 Have not seen a way to do this in Solr. A param fuzzy.minSim=0.8 would do the trick. Anything like this, or shall I open

Re: Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Robert Muir
On Tue, Dec 14, 2010 at 5:51 PM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 just as an FYI, this isn't true in trunk (4.0) any more. the defaults are changed so that it never enumerates the entire

Re: my index has 500 million docs ,how to improve so lr search performance?

2010-12-14 Thread Alexey Serba
How much memory do you allocate for JVMs? Considering you have 10 JVMs per server (10*N) you might have not enough memory for OS file system cache ( you need to keep some memory free for that ) all indexs size is about 100G is this per server or whole size? On Mon, Nov 15, 2010 at 8:35 AM,

Re: Need some guidance on solr-config settings

2010-12-14 Thread Mark
Excellent reply. You mentioned: I've been experimenting with FastLRUCache versus LRUCache, because I read that below a certain hitratio, the latter is better. Do you happen to remember what that threshold is? Thanks On 12/14/10 7:59 AM, Shawn Heisey wrote: On 12/14/2010 8:31 AM, Mark

Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey
On 12/14/2010 5:05 PM, Mark wrote: Excellent reply. You mentioned: I've been experimenting with FastLRUCache versus LRUCache, because I read that below a certain hitratio, the latter is better. Do you happen to remember what that threshold is? Thanks Looks like it's 75%, and that it's

Re: Newbie: Indexing unrelated MySQL tables

2010-12-14 Thread Alexey Serba
I figured I would create three entities and relevant schema.xml entries in this way: dataimport.xml: entity name=Users query=select id,firstname,lastname from user/entity entity name=Artwork query=select id,user,name,description from artwork/entity entity name=Jobs query=select

limit the search results to one category

2010-12-14 Thread sara motahari
Hi all, I am using a dismax request handler with vrious fieds that it searches, but I also want to enable the users to select a category from a drop-down list and only get the results that belong to that category. It seems I can't use a nested query with dismax as the first one and standard as

Re: limit the search results to one category

2010-12-14 Thread Ahmet Arslan
I am using a dismax request handler with vrious fieds that it searches, but I also want to enable the users to select a category from a drop-down list and only get the results that belong to that category. It seems I can't use a nested query with dismax as the first one and standard as

Re: limit the search results to one category

2010-12-14 Thread sara motahari
I guess so. I didn't know I could use it with dismax I'll try. thanks Ahmet. From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, December 14, 2010 5:42:51 PM Subject: Re: limit the search results to one category I am using a dismax

Re: Search with facet.pivot

2010-12-14 Thread Anders Dam
I forgot to mention that the query is handlede by the Dismax Request Handler Grant, from the lst name=params tag and down you see all the query parameters used. The only thing varying from query to query is the actual query (q), When searching on by example '1000' (q=1000) facet.pivot fields are

Re: Google like search

2010-12-14 Thread Bhavnik Gajjar
Hi Satya, Coming to your original question, there is one possibility to make Solr emit snippets like Google. Solr query syntax goes like, http://localhost:8080/solr/DefaultInstance/select/?q=javaversion=2.2start=0rows=10indent=onhl=truehl.snippets=5hl.fl=Field_Textfl=Field_Text Note that, the

Re: facet.pivot for date fields

2010-12-14 Thread pankaj bhatt
Hi Adeel, You can make use of facet.query attribute to make the Faceting work across a range of dates. Here i am using the duration, just replace the field with a field date and Range values as the DATE in SOLR Format. so your query parameter will be like this ( you can pass multiple

RE: Userdefined Field type - Faceting

2010-12-14 Thread Viswa S
This worked, thanks Yonik. -Viswa Date: Mon, 13 Dec 2010 22:54:35 -0500 Subject: Re: Userdefined Field type - Faceting From: yo...@lucidimagination.com To: solr-user@lucene.apache.org Perhaps try overriding indexedToReadable() also? -Yonik http://www.lucidimagination.com On Mon,