Re: solr-duplicate post management

2009-01-22 Thread S.Selvam Siva
On Thu, Jan 22, 2009 at 7:12 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : what i need is ,to log the existing urlid and new urlid(of course both will : not be same) ,when a .xml file of same id(unique field) is posted. : : I want to make this by modifying the solr source.Which file

Intermittent high response times

2009-01-22 Thread hbi dev
Hi all, I have an implmentation of solr (rev.708837) running on tomcat 6. Approx 600,000 docs, 2 fairly content heavy text fields, between 4 and 7 facets (depending on what our front end is requesting, and mostly low unique values) 1GB of memory allocated, generally I do not see it using all of

Re: Intermittent high response times

2009-01-22 Thread Otis Gospodnetic
Hi, Is there anything special about those queries? e.g. lots of terms, frequent terms, something else? Is there anything else happening on that server when you see such long queries? Do you see lots of IO or lots of CPU being used during those times? Otis -- Sematext --

Re: Intermittent high response times

2009-01-22 Thread hbi dev
Hi, The criteria rarely varies from others that are much quicker, maybe only what the start row is. Most of the time the main terms are a single word or just a blank query (q.alt=*:*) My request handler does have a lot of predefined filters, this is included below. Most of this is auto-warmed.

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jaco
Hm, I don't know what to do anymore. I tried this: - Run Tomcat service as local administrator to overcome any permissioning issues - Installed latest nightly build (I noticed that item I mentioned before ( http://markmail.org/message/yq2ram4f3jblermd) had been committed which is good - Build a

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Shalin Shekhar Mangar
On Thu, Jan 22, 2009 at 10:18 PM, Jaco jdevr...@gmail.com wrote: Hm, I don't know what to do anymore. I tried this: - Run Tomcat service as local administrator to overcome any permissioning issues - Installed latest nightly build (I noticed that item I mentioned before (

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jeff Newburn
We are seeing something very similar. Ours is intermittent and usually happens a great deal on random days. Often it seems to occur during large index updates on the master. On 1/22/09 8:58 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Jan 22, 2009 at 10:18 PM, Jaco

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Shalin Shekhar Mangar
On Thu, Jan 22, 2009 at 10:37 PM, Jeff Newburn jnewb...@zappos.com wrote: We are seeing something very similar. Ours is intermittent and usually happens a great deal on random days. Often it seems to occur during large index updates on the master. Jeff, is this also on a Windows box? --

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jeff Newburn
My apologies. No we are using linux, tomcat setup. On 1/22/09 9:15 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Jan 22, 2009 at 10:37 PM, Jeff Newburn jnewb...@zappos.com wrote: We are seeing something very similar. Ours is intermittent and usually happens a great deal

Re: Intermittent high response times

2009-01-22 Thread wojtekpia
I'm experiencing similar issues. Mine seem to be related to old generation garbage collection. Can you monitor your garbage collection activity? (I'm using JConsole to monitor it: http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html). In my system, garbage collection usually

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
Jeff , Do you see both the empty index. dirs as well as the extra files in the index? --Noble On Thu, Jan 22, 2009 at 10:37 PM, Jeff Newburn jnewb...@zappos.com wrote: We are seeing something very similar. Ours is intermittent and usually happens a great deal on random days. Often it seems

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jeff Newburn
We have both. A majority of them are just empty but others have almost a full index worth of files. I have also noticed that during a lengthy index update the system will throw errors about how it cannot move one of the index files. Essentially on reindex the system does not replicate until an

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
This was reported by another user and was fixed recently.Are you using a recent version? --Noble On Fri, Jan 23, 2009 at 12:00 AM, Jeff Newburn jnewb...@zappos.com wrote: We have both. A majority of them are just empty but others have almost a full index worth of files. I have also noticed

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Jeff Newburn
Few weeks ago is our version. Does this contribute to the directory issues and extra files that are left? On 1/22/09 10:33 AM, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com wrote: This was reported by another user and was fixed recently.Are you using a recent version? --Noble On Fri,

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
I am not sure if it was completely fixed. (This was related to a Lucene bug) But you can try w/ a recent build and confirm it for us. I have never encountered these during our tests in windows XP/Linux I have attached a patch which logs the names of the files which could not get deleted (which

Re: Random queries extremely slow

2009-01-22 Thread oleg_gnatovskiy
My aplogies, this is likely the same issue as Intermittent high response times by hbi dev oleg_gnatovskiy wrote: Hello. Our production servers are operating relatively smoothly most of the time running Solr with 19 million listings. However every once in a while the same query that

Re: Newbie Design Questions

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
You are out of luck if you are not using a recent version of DIH The sub entity will work only if you use the FieldReaderDataSource. Then you do not need a ClobTransformer also. The trunk version of DIH can be used w/ Solr 1.3 release On Thu, Jan 22, 2009 at 12:59 PM, Gunaranjan Chandraraju

Re: Incorrect Scoring

2009-01-22 Thread Yonik Seeley
DisjunctionMax takes the max score of a disjuction... and max across all fields was slightly higher for the first match. Try setting tie higher (add tie=0.2 to your query or to the defaults in your request handler). http://wiki.apache.org/solr/DisMaxRequestHandler -Yonik On Wed, Jan 21,

Re: Random queries extremely slow

2009-01-22 Thread oleg_gnatovskiy
Actually my issue might merit a seperate discussion as I did tuning by adjusting the heap to different settings to see how it affected changed. It really had no affect, as with jdk 1.6, garbage collection is parallel which now should no longer interfere with requests during garbage collection

Re: Random queries extremely slow

2009-01-22 Thread Yonik Seeley
On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy oleg_gnatovs...@citysearch.com wrote: Hello. Our production servers are operating relatively smoothly most of the time running Solr with 19 million listings. However every once in a while the same query that used to take 100 miliseconds takes

Re: Solr Replication: disk space consumed on slave much higher than on master

2009-01-22 Thread Shalin Shekhar Mangar
On Fri, Jan 23, 2009 at 12:15 AM, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com wrote: I have attached a patch which logs the names of the files which could not get deleted (which may help us diagnose the problem). If you are comfortable applying a patch you may try it out. I've committed

Re: Random queries extremely slow

2009-01-22 Thread oleg_gnatovskiy
What are some things that could happen to force files out of the cache on a Linux machine? I don't know what kinds of events to look for... yonik wrote: On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy oleg_gnatovs...@citysearch.com wrote: Hello. Our production servers are operating

Re: Random queries extremely slow

2009-01-22 Thread Walter Underwood
The OS keeps recently accessed disk pages in memory. If another process does a lot of disk access, like a backup, the OS might replace the Solr index pages with that processes pages. What kind of storage: local disk, SAN, NFS? wunder On 1/22/09 11:22 AM, oleg_gnatovskiy

Re: Random queries extremely slow

2009-01-22 Thread Otis Gospodnetic
Here is one example: pushing a large newly optimized index onto the server. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com To: solr-user@lucene.apache.org Sent: Thursday, January 22, 2009

Re: Query Performance while updating teh index

2009-01-22 Thread Otis Gospodnetic
Oleg, This is more of an OS-level thing that Solr-thing, it seems from your emails. If you send answers to my questions we'll be able to help more. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oleg_gnatovskiy

Re: Random queries extremely slow

2009-01-22 Thread oleg_gnatovskiy
Well this probably isn't the cause of our random slow queries, but might be the cause of the slow queries after pulling a new index. Is there anything we could do to reduce the performance hit we take from this happening? Otis Gospodnetic wrote: Here is one example: pushing a large newly

Re: Query Performance while updating teh index

2009-01-22 Thread oleg_gnatovskiy
We do optimize the index before updates but we get tehse performance issues even when we pull an empty snapshot. Thus even when our update is tiny, the performance issues still happen. Otis Gospodnetic wrote: This is an old and long thread, and I no longer recall what the specific

Re: Query Performance while updating teh index

2009-01-22 Thread Otis Gospodnetic
OK. Then it's likely not this. You saw the other response about looking at GC to see if maybe that hits you once in a while and slows whatever queries are in flight? Try jconsole. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From:

Re: Query Performance while updating teh index

2009-01-22 Thread oleg_gnatovskiy
We've tried it. There doesn't seem to be any connection between GC and the bad performance spikes. Otis Gospodnetic wrote: OK. Then it's likely not this. You saw the other response about looking at GC to see if maybe that hits you once in a while and slows whatever queries are in flight?

Re: numFound problem

2009-01-22 Thread Chris Hostetter
: I have a test search which I know should return 34 docs and it does : : however, numFound says 40 : : with debug enabled, I can see the 40 it has found ... : now, I can probably work round it if had returned me the 40 docs but the problem is it returns 34 docs but gives me a

Re: numFound problem

2009-01-22 Thread Ron Chan
sorry, I miss counted the number of docs returned I was thrown when it first returned numFound=40, lost track after trying a few things the returned docs are correct and matches numFound , there is no problem here Sorry for the confusion - Original Message - From: Chris

Master failover - seeking comments

2009-01-22 Thread edre...@ha
Hi, We're looking forward to using Solr in a project. We're using a typical setup with one Master and a handful of Slaves. We're using the Master for writes and the Slaves for reads. Standard stuff. Our concern is with downtime of the Master server. I read a few posts that touched on this

Re: Newbie Design Questions

2009-01-22 Thread Gunaranjan Chandraraju
Thanks A last question - do you have any approximate date for the release of 1.4. If its going to be soon enough (within a month or so) then I can plan for our development around it. Thanks Guna On Jan 22, 2009, at 11:04 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: You are out of luck if you

Re: Embedded Solr updates not showing until restart

2009-01-22 Thread edre...@ha
Grant Ingersoll-6 wrote: Can you share your code? Or reduce it down to a repeatable test? I'll try to do this. For now I'm proceeding with the HTTP route. We're going to want to revisit this and I'll likely do it at that time. Thanks, Erik -- View this message in context:

Re: How to select *actual* match from a multi-valued field

2009-01-22 Thread Chris Hostetter
: At a high level, I'm trying to do some more intelligent searching using : an app that will send multiple queries to Solr. My current issue is : around multi-valued fields and determining which entry actually : generated the hit for a particular query. strictly speaking, this isn't possible

URL-import field type?

2009-01-22 Thread Paul Libbrecht
Hello list, after searching around for quite a while, including in the DataImportHandler documentation on the wiki (which looks amazing), I couldn't find a way to indicate to solr that the tokens of that field should be the result of analyzing the tokens of the stream at URL-xxx. I know

Re: Performance dead-zone due to garbage collection

2009-01-22 Thread wojtekpia
I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE paths, is there anything else I need to do run inside the IBM JVM? (e.g. re-compiling?) Walter Underwood wrote: What JVM and garbage collector setting? We are using the IBM JVM with their concurrent

Re: Performance dead-zone due to garbage collection

2009-01-22 Thread Walter Underwood
No need to recompile. Install it and change your JAVA_HOME and things should work. The options are different than for the Sun JVM. --wunder On 1/22/09 3:46 PM, wojtekpia wojte...@hotmail.com wrote: I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE

Re: Newbie Design Questions

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is planned to be in an another month or so. But it is never too sure. On Fri, Jan 23, 2009 at 3:57 AM, Gunaranjan Chandraraju chandrar...@apple.com wrote: Thanks A last question - do you have any approximate date for the release of 1.4. If its going to be soon enough (within a month or

Re: URL-import field type?

2009-01-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
where is this url coming from? what is the content type of the stream? is it plain text or html? if yes, this is a possible enhancement to DIH On Fri, Jan 23, 2009 at 4:39 AM, Paul Libbrecht p...@activemath.org wrote: Hello list, after searching around for quite a while, including in the

Re: Date Format in QueryParsing

2009-01-22 Thread Chris Hostetter
: When I parse DateRange query in a custom RequestHandler I get the date in : format -MM-dd'T'HH:mm:ss, but I would like it with the trailling 'Z' for : UTC time. Is there a way how to set the desired date format? ... : Query q = QueryParsing.parseQuery(query, req.getSchema()); :

Re: DocumentId, InternalDocID and Query from QueryResponse

2009-01-22 Thread Chris Hostetter
: I am new to Solr. I would like to know how to get DocumentId, : InternalDocID and Query from QueryResponse. I'm going to make some assumptions about what it is you are asking for... 1) by DocumentId, i assume you mean the value of the uniqueKey field you define in your schema.xml -- it's

Re: Master failover - seeking comments

2009-01-22 Thread Shalin Shekhar Mangar
On Fri, Jan 23, 2009 at 3:57 AM, edre...@ha edre...@homeaway.com wrote: Essentially, the plan is to add another Master server, so now we have M1 and M2. Both M1 and M2 are also configured to be slaves of each other. The plan is to put a load balancer in between the Slaves and the Master

How to make Relationships work for Multi-valued Index Fields?

2009-01-22 Thread Gunaranjan Chandraraju
Hi I may be completely off on this being new to SOLR but I am not sure how to index related groups of fields in a document and preserver their 'grouping'. I would appreciate any help on this.Detailed description of the problem below. I am trying to index an entity that can have

Re: how can solr search angainst group of field

2009-01-22 Thread surfer10
definitly disMax do the thing by searching one term against multifield. but what if my index contains two additional multivalued fields like category id i need to search against terms in particular fields of documents and dismax do this well thru qf=field1,field2 how can i filter results which

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-22 Thread Shalin Shekhar Mangar
On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju chandrar...@apple.com wrote: record coreInfo id=123 , .../ address street=XYZ1 State=CA ...type=home / address street=XYZ2 state=CA ... type=Office/ address street=XYZ3 state=CA type=Other/ /record I have setup my DIH

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-22 Thread Shalin Shekhar Mangar
Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju chandrar...@apple.com wrote: record coreInfo id=123 , .../ address