Re: Adding callback url to data import handler...Is this possible?
But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback. I would say the latter is more specific than the former. People who are comfortable writing JAVA wouldn't need any of these but the second best thing for others would be a capability to handle it in their own applications. A url can be the simplest way to invoke things in respective application. Doing it via javascript sounds like a round-about way of doing it. Cheers Avlesh 2009/10/15 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com wrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: storing multiple type of records (Parent - Child Relationship)
thanks Avlesh for your reply. ya even i had that idea . but the problem is project data could change very rapdily. so in that case i will end up changing the associated user details . say i have just 100 Project records but 1,00,000 user records . then changing one project record means changing associated all user records .may be it will go to 1000's. so any idea of how to do it ?? or any suggestions for that ?? Avlesh Singh wrote: but is there a way where we can store user records separately and project records separately. and jut give the link in solr ?? like mentioned below and still making it searchable and facetable ?? With single core, unfortunately not. Denormalizing data for storage and searches is a regular practice in Solr. It might not sound proper if you try to do this with heavily normalized data but there nothing wrong about it. To be specific, in your case, the fields to facet and search upon are designed correctly. My understanding is that you need the relationships to be preserved only for display. Right? If yes, then you can always create an untokenized field, say string, and store all the project specific data in some delimited format. e.g. in your case - projectName$$projectBU$$projectLocation etc. This data can be interpreted in your application to convert it back into a proper relational data structure for each document in the result. Cheers Avlesh On Thu, Oct 15, 2009 at 9:57 AM, ashokcz ashokkumar.gane...@tcs.com wrote: Hi All , I have a specific requirement of storing multiple type of records. but dont know how to do it . First let me tell the requirement. I have a table called user table and a user can be mapped to multiple projects. User table details are User Name , User Id , address , and other details . I have stored them in solr but now the mapping between user and project has to be stored . Project table have (project name , location , business unit ,etc) I can still go ahead and store user has single record with project details as indvidual fields , like UserId:user1 UserAddress: india ProjectNames: project1,project2 ProjectBU: retail , finance ProjectLocation:UK,US Here i will search in fields like UserId , ProjectBU ,ProjectLocation and have made UserAddress, ProjectLocation as facets but is there a way where we can store user records separately and project records separately . and jut give the link in solr ?? like mentioned below and still making it searchable and facetable ?? User Details = UserId:user1 UserAddress: india ProjectId:1,2 Project Details == ProjectId:1 ProjectNames: project1 ProjectBU: retail ProjectLocation:UK ProjectId:2 ProjectNames: project2 ProjectBU:finance ProjectLocation:US -- View this message in context: http://www.nabble.com/storing-multiple-type-of-records-%28Parent---Child-Relationship%29-tp25902894p25902894.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/storing-multiple-type-of-records-%28Parent---Child-Relationship%29-tp25902894p25903679.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding callback url to data import handler...Is this possible?
If the JavaScript support enables me to invoke a URL, it's really OK with me. Cheers, - Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 11:01 PM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback. I would say the latter is more specific than the former. People who are comfortable writing JAVA wouldn't need any of these but the second best thing for others would be a capability to handle it in their own applications. A url can be the simplest way to invoke things in respective application. Doing it via javascript sounds like a round-about way of doing it. Cheers Avlesh 2009/10/15 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com wrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Filtered search for subset of ids
Hi everybody, I'm new here..and this is my last chance to find a solution for my problem. I'm using acts_as_solr for Ruby On Rails. I need to submit a query to a subset of documents which id belong to an array of ids that I want to pass as parameter. for istance, something like: find_by_solr(query, id:[1,2,3,40,51,56]) or actually I'd just need a way in the option to filter a kind of sql IN instead of RANGE. I guess I need to override some methods..but first of all I want to know if you consider this possibile, and if you have any hints about how to achieve that. I'm working on Articles repository, indexing title and content only (but documents id is sincronized with the document id in the MySql database). Thanks (I hope this is not a duplicate..I've send it before to confirm subscription :S ) Andrea
Re: Boosting of words
Hi, I am able to see the results when i pass the values in the query browser. When i pass the below query i am able to see the difference in output. http://localhost:8983/solr/select/?q=java^100%20technology^1 Each time user cannot pass the values in the query browser to see the output. But where exactly java^100 technology^1 this value should be set.In which file and which location to be precise?. Please help me. Regards Bhaskar --- On Wed, 10/14/09, AHMET ARSLAN iori...@yahoo.com wrote: From: AHMET ARSLAN iori...@yahoo.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Wednesday, October 14, 2009, 6:41 AM Hi Clark, Thanks for your input. I have a query. I have my XML which contains the following: add doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc /add When I give “java technology” as my input in Solr admin page ,At present I get output as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc Now I need to get doc which has “technology” also When I give “java technology “ I need to get output as,I need to give boosting to doc which has “technology”. It should display in the below order.The output should come as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc Let me know how to achieve the same? The query : java^1 OR technology^100 will do it. Results will be in this order: 1-)This web site have more java technology related to web 2-)Information about technology is stored in the web sites 3-)java plays a important role in computer industry for web users 1-) contains both java and technology 2-) contains only technology 3-) contains only java Is that what you want? Note that there is no quotes in the query above. And you can adjust boost factors (1 and 100) according to your needs. Use OR operator between terms. You set individual terms boost with ^ operator. hope this helps.
Re: Adding callback url to data import handler...Is this possible?
It is not yet implemented .You may open an issue for the same --Noble On Thu, Oct 15, 2009 at 12:14 PM, William Pierce evalsi...@hotmail.com wrote: If the JavaScript support enables me to invoke a URL, it's really OK with me. Cheers, - Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 11:01 PM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback. I would say the latter is more specific than the former. People who are comfortable writing JAVA wouldn't need any of these but the second best thing for others would be a capability to handle it in their own applications. A url can be the simplest way to invoke things in respective application. Doing it via javascript sounds like a round-about way of doing it. The eventhandler Cheers Avlesh 2009/10/15 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com wrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
browse terms of index
Hi I use a sample embedded Apache Solr to create a Lucene index with few documents for tests purpose. Documents have text string, sint, sfloat, bool, and date fields, each of them are indexed. At this time they are also stored but only the ids documents will be stored at the end. I want to list the terms of index. I don't found a way this solr api so I made a try Apache Luke ( Lucene api.) Here the code of luke to see terms of index : public void terms(String field) throws CorruptIndexException, IOException { validateIndexSet(); validateOperationPossible(); SortedMapString,Integer termMap = new TreeMapString,Integer(); IndexReader reader = null; try { reader = IndexReader.open(indexName); TermEnum terms = reader.terms(); // return an enumeration of terms while (terms.next()) { Term term = terms.term(); if ((field.trim().length() == 0) || field.equals(term.field())) { termMap.put(term.field() + : + term.text(), new Integer((terms.docFreq(; } } int nkeys = 0; for (String key : termMap.keySet()) { Lucli.message(key + : + termMap.get(key)); nkeys++; if (nkeys Lucli.MAX_TERMS) { break; } } } finally { closeReader(reader); } } But for sfloat field (is the same for sint) I don't see the value of the term. The class Term of Lucene have just 2 fields of type String (name and value) Here values returned for the dynamic field f_float of type sfloat : f_float:┼?? f_float:┼?? f_float:┼?l f_float:┼?? f_float:┼?? So, have a way to convert term in the good type (int, date, float ) ? Or Have a way to see index terms with solr api ? Thanks for help Jean-François Melian
Re: browse terms of index
Have a look at http://wiki.apache.org/solr/TermsComponent On Oct 15, 2009, at 5:43 AM, jfmel...@free.fr wrote: Hi I use a sample embedded Apache Solr to create a Lucene index with few documents for tests purpose. Documents have text string, sint, sfloat, bool, and date fields, each of them are indexed. At this time they are also stored but only the ids documents will be stored at the end. I want to list the terms of index. I don't found a way this solr api so I made a try Apache Luke ( Lucene api.) Here the code of luke to see terms of index : public void terms(String field) throws CorruptIndexException, IOException { validateIndexSet(); validateOperationPossible(); SortedMapString,Integer termMap = new TreeMapString,Integer(); IndexReader reader = null; try { reader = IndexReader.open(indexName); TermEnum terms = reader.terms(); // return an enumeration of terms while (terms.next()) { Term term = terms.term(); if ((field.trim().length() == 0) || field.equals(term.field())) { termMap.put(term.field() + : + term.text(), new Integer((terms.docFreq(; } } int nkeys = 0; for (String key : termMap.keySet()) { Lucli.message(key + : + termMap.get(key)); nkeys++; if (nkeys Lucli.MAX_TERMS) { break; } } } finally { closeReader(reader); } } But for sfloat field (is the same for sint) I don't see the value of the term. The class Term of Lucene have just 2 fields of type String (name and value) Here values returned for the dynamic field f_float of type sfloat : f_float:┼?? f_float:┼?? f_float:┼?l f_float:┼?? f_float:┼?? So, have a way to convert term in the good type (int, date, float ) ? Or Have a way to see index terms with solr api ? Thanks for help Jean-François Melian -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Using DIH's special commands....Help needed
Folks: I see in the DIH wiki that there are special commands which according to the wiki Special commands can be given to DIH by adding certain variables to the row returned by any of the components . In my use case, my db contains rows that are marked PendingDelete. How do I use the $deleteDocByQuery special command to delete these rows using DIH? In other words, where/how do I specify this? Thanks, - Bill
Re: 'Down' boosting shorter docs
Another approach is to change the document length normalization formula. See Similarity.lengthNorm() in Lucene. wunder On Oct 15, 2009, at 12:45 AM, Andrea D'Ippolito wrote: I've read (correct me if I'm wrong) that a solution to achieve that is overboost all the other fields. but I guess this works easily only if u have few fields indexed ;) bye 2009/10/15 Simon Wistow si...@thegestalt.org Our index has some items in it which basically contain a title and a single word body. If the user searches for a word in the title (especially if title is of itself only oen word) then that doc will get scored quite highly, despite the fact that, in this case, it's not really relevant. I've tried something like qf=title^2.0 content^0.5 bf=num_pages but that disproportionally boosts long documents to the detriment of relevancy bf=product(num_pages,0.05) has no effect but bf=product(num_pages,0.06) has a bunch of long documents which don't seem to return any highlighted fields plus the short document with only the query in the title which is progress in that it's almost exactly the opposite of what I want. Any suggestions? Am I going to need to reindex and add the length in bytes or characters of the document? Simon
Limit occurences per page of items with same category
I was reading about field collapsing but I think is not what I'm looking for. I have to resolve this problem. After a search, I need to show, for example, 3 items per page which have the same Category. I will display 10 items per page. Suppose the search returns 15 items in this order after priority of search fields (cars and cycles are 4 in first page, so one of each should be moved to 2nd page): #id name category page 1: 3 -- bmw -- car 2 -- honda cycle 4 -- mercedes - car 14 - yamaha -- boat 13 - ferrari car 10 - ktm -- cycle 15 - jaguar car 12 - rolls royce - plane 1 -- aprilia - cycle 6 -- suzuki cycle page 2: 7 -- volvo - truck 8 -- scania truck 5 -- boeing plane 9 -- yamaha --- jetski 11 - toyota car What I want to know if it could be done with solr or some plugin is to limit the occurences of items per page according to a category for example. So in the first page, a car (the jaguar 15) and a cycle (the suzuki 6) will be moved to 2nd page and both trucks will be moved to first. Wanted result: page 1: 3 -- bmw -- car 2 -- honda cycle 4 -- mercedes - car 14 - yamaha -- boat 13 - ferrari car 10 - ktm -- cycle 12 - rolls royce - plane 1 -- aprilia - cycle 7 -- volvo - truck 8 -- scania truck page 2: 15 - jaguar car 6 -- suzuki cycle 5 -- boeing plane 9 -- yamaha --- jetski 11 - toyota car Thank you very much! -- View this message in context: http://www.nabble.com/Limit-occurences-per-page-of-items-with-same-category-tp25909143p25909143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting of words
Hi Bhaskar, The parameter you're looking for is the Boost Query. Remember using Dismax Query Handler. http://wiki.apache.org/solr/DisMaxRequestHandler#bq_.28Boost_Query.29 http://localhost:8983/solr/select/?q=videoqt=dismaxbq=cat:electronics^5.0 Michel On Thu, Oct 15, 2009 at 6:04 AM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, I am able to see the results when i pass the values in the query browser. When i pass the below query i am able to see the difference in output. http://localhost:8983/solr/select/?q=java^100%20technology^1 Each time user cannot pass the values in the query browser to see the output. But where exactly java^100 technology^1 this value should be set.In which file and which location to be precise?. Please help me. Regards Bhaskar --- On Wed, 10/14/09, AHMET ARSLAN iori...@yahoo.com wrote: From: AHMET ARSLAN iori...@yahoo.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Wednesday, October 14, 2009, 6:41 AM Hi Clark, Thanks for your input. I have a query. I have my XML which contains the following: add doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc /add When I give “java technology” as my input in Solr admin page ,At present I get output as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc Now I need to get doc which has “technology” also When I give “java technology “ I need to get output as,I need to give boosting to doc which has “technology”. It should display in the below order.The output should come as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc Let me know how to achieve the same? The query : java^1 OR technology^100 will do it. Results will be in this order: 1-)This web site have more java technology related to web 2-)Information about technology is stored in the web sites 3-)java plays a important role in computer industry for web users 1-) contains both java and technology 2-) contains only technology 3-) contains only java Is that what you want? Note that there is no quotes in the query above. And you can adjust boost factors (1 and 100) according to your needs. Use OR operator between terms. You set individual terms boost with ^ operator. hope this helps.
Re: Solr/Lucene keeps eating up memory while idling
On Oct 14, 2009, at 12:26 PM, nonrenewable wrote: I'm curious why this is occurring and whether i can prevent it. This is my scenario: Locally I have an idle running solr 1.3 service using lucene 2.4.1 which has an index of ~330K documents containing ~10 fields each(total size ~12GB). Did I read that right? 330K docs == 12 GB index. Currently I've turned off all caching, lazy field loading, however i do have facet fields set for some request handlers. What i'm seeing is heap space usage increasing by ~1.2MB per 2 sec (by java.lang.String objects). I'm assuming they're being used by lucene but i may be wrong about that, since i have no actual data to confirm it. Why exactly is this happening, considering no requests are being serviced? Shouldn't the memory usage stabilise with a certain set of information and only be affected on requests? Additionally there is a full GC every half hour, which seems very unreasonable on a machine that isn't actually being used as a service. Can you share the Solr logs and/or your config? Is this happening around a commit or some warming process? After startup, with no requests hitting it and no warming/commits/indexing, I don't see why it would be growing. Do you have custom code? I really hope there's just a certain setting that i've overlooked, or a concept i'm not understanding because otherwise this behaviour seems very unreasonable... Thanks beforehand, Tony -- View this message in context: http://www.nabble.com/Solr-Lucene-keeps-eating-up-memory-while-idling-tp25894357p25894357.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Boosting of words
Hi, I am able to see the results when i pass the values in the query browser. When i pass the below query i am able to see the difference in output. http://localhost:8983/solr/select/?q=java^100%20technology^1 Each time user cannot pass the values in the query browser to see the output. But where exactly java^100 technology^1 this value should be set. In which file and which location to be precise?. Please help me. Althought I do not understand you, you need to URL encode your parameter values before you invoke a HTTP GET. paramater=urlencode(value,UTF-8) Try this url : /select/?q=java%5E100+OR+technology%5E1version=2.2 Note that space is encoded into +. Also ^ is encoded into %5E. What kind of solr client are you using? How are you accessing to solr? From java, php, rubby?
Re: Using DIH's special commands....Help needed
On Thu, Oct 15, 2009 at 6:25 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I see in the DIH wiki that there are special commands which according to the wiki Special commands can be given to DIH by adding certain variables to the row returned by any of the components . In my use case, my db contains rows that are marked PendingDelete. How do I use the $deleteDocByQuery special command to delete these rows using DIH?In other words, where/how do I specify this? The $deleteDocByQuery is for deleting Solr documents by a Solr query and not DB rows. -- Regards, Shalin Shekhar Mangar.
Re: Using DIH's special commands....Help needed
Thanks, Shalin. I am sorry if I phrased it incorrectly. Yes, I want to know how to delete documents in the solr index using the $deleteDocByQuery special command. I looked in the wiki doc and could not find out how to do this Sorry if this is self-evident... Cheers, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Thu, Oct 15, 2009 at 6:25 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I see in the DIH wiki that there are special commands which according to the wiki Special commands can be given to DIH by adding certain variables to the row returned by any of the components . In my use case, my db contains rows that are marked PendingDelete. How do I use the $deleteDocByQuery special command to delete these rows using DIH?In other words, where/how do I specify this? The $deleteDocByQuery is for deleting Solr documents by a Solr query and not DB rows. -- Regards, Shalin Shekhar Mangar.
Re: Solr/Lucene keeps eating up memory while idling
Did I read that right? 330K docs == 12 GB index. Ops, missed the dot - 1.2GB, but i don't think that should really make the difference in this case. Even if it was 12 GB it would just have some really juicy documents, right? :) Can you share the Solr logs and/or your config? Is this happening around a commit or some warming process? After startup, with no requests hitting it and no warming/commits/indexing, I don't see why it would be growing. Do you have custom code? There is custom code around the solrj API however it does not explain this behaviour because of the lack of requests coming through it. There are no indexing, commits or queries sent to the server after it's started up, except for the initial 2 warming queries (can those be to blame for this even with no caches present??). Here are these in the log (it's on it's default verbosity so i'll refrain from posting the whole start up until necessary) After the initial start up, what you see in the log is GC every 2.5 min and Full GC every 30min. No actual activity is present. Oct 15, 2009 1:13:36 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={start=0q=fast_warmrows=10} hits=0 status=0 QTime=16853 Oct 15, 2009 1:13:36 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Oct 15, 2009 1:13:36 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={q=static+firstSearcher+warming+query+from+solrconfig.xml} hits=0 status=0 QTime=204 Oct 15, 2009 1:13:36 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done here is the config on it: config abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError dataDir/r9/flare1.data/solr/data/dataDir indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType /indexDefaults mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex jmx / updateHandler class=solr.DirectUpdateHandler2 /updateHandler query maxBooleanClauses1024/maxBooleanClauses queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached HashDocSet maxSize=3000 loadFactor=0.75/ listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qrocks/str str name=start0/str str name=rows10/str /lst lststr name=qstatic newSearcher warming query from solrconfig.xml/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qfast_warm/str str name=start0/str str name=rows10/str /lst lststr name=qstatic firstSearcher warming query from solrconfig.xml/str/lst /arr /listener useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers /query requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 address_t^2.0 name^1.5 brand^1.1 airport_name_t^1.0 /str str name=pf text^0.2 address_t^1.1 name^1.5 brand^1.4 brand_exact^1.9 airport_name_t^1.0 /str str name=fl id,name,price,score /str int name=ps100/int str name=q.alt*:*/str str name=hl.fltext features name/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str !-- defined below -- str name=spellchecktrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str /lst arr name=last-components strspellcheck/str /arr /requestHandler requestHandler name=partitioned class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str str name=qftext^0.5 features^1.0 name^1.2 id^10.0/str str name=mm2lt;-1 5lt;-2 6lt;90%/str str name=bqincubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2/str /lst lst name=appends str name=fqinStock:true/str
Re: Using DIH's special commands....Help needed
On Thu, Oct 15, 2009 at 10:42 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Shalin. I am sorry if I phrased it incorrectly. Yes, I want to know how to delete documents in the solr index using the $deleteDocByQuery special command. I looked in the wiki doc and could not find out how to do this Sorry, I misunderstood your intent. These special flag variables can be emitted by Transformers. So what you can do is write a Transformer which checks if the current row contains PendingDelete in the column and add a key/value pair to the Map. The key should be $deleteDocByQuery and value should be the Solr query to be used for deletion. You can write the transformer in Java as well as Javascript. -- Regards, Shalin Shekhar Mangar.
Re: Using DIH's special commands....Help needed
Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row){ var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Thu, Oct 15, 2009 at 10:42 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Shalin. I am sorry if I phrased it incorrectly. Yes, I want to know how to delete documents in the solr index using the $deleteDocByQuery special command. I looked in the wiki doc and could not find out how to do this Sorry, I misunderstood your intent. These special flag variables can be emitted by Transformers. So what you can do is write a Transformer which checks if the current row contains PendingDelete in the column and add a key/value pair to the Map. The key should be $deleteDocByQuery and value should be the Solr query to be used for deletion. You can write the transformer in Java as well as Javascript. -- Regards, Shalin Shekhar Mangar.
Re: Conditional copyField
Nice find, Amhet, I'd love to see this formalized in the Solr schema syntax, as it is something I've often wanted to. Max Chars is OK, too, but would like to see max tokens as well. On Oct 12, 2009, at 6:31 PM, AHMET ARSLAN wrote: Hi, I am pushing data to solr from two different sources nutch and a cms. I have a data clash in that in nutch a copyField is required to push the url field to the id field as it is used as the primary lookup in the nutch solr intergration update. The other cms also uses the url field but also populates the id field with a different value. Now I can't really change either source definition so is there a way in solrconfig or schema to check if id is empty and only copy if true or is there a better way via the updateprocessor? copyField declaration has three attributes: source, dest and maxChars. Therefore it can be concluded that there is no way to do it in schema.xml Luckily, Wiki [1] has a quick example that implements a conditional copyField. [1] http://wiki.apache.org/solr/UpdateRequestProcessor -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr/Lucene keeps eating up memory while idling
Please send a log covering at least the 2.5 minutes you discuss, but upwards of 5 minutes would be good. On Oct 15, 2009, at 1:26 PM, nonrenewable wrote: Did I read that right? 330K docs == 12 GB index. Ops, missed the dot - 1.2GB, but i don't think that should really make the difference in this case. Even if it was 12 GB it would just have some really juicy documents, right? :) Can you share the Solr logs and/or your config? Is this happening around a commit or some warming process? After startup, with no requests hitting it and no warming/commits/indexing, I don't see why it would be growing. Do you have custom code? There is custom code around the solrj API however it does not explain this behaviour because of the lack of requests coming through it. There are no indexing, commits or queries sent to the server after it's started up, except for the initial 2 warming queries (can those be to blame for this even with no caches present??). Here are these in the log (it's on it's default verbosity so i'll refrain from posting the whole start up until necessary) After the initial start up, what you see in the log is GC every 2.5 min and Full GC every 30min. No actual activity is present. Oct 15, 2009 1:13:36 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={start=0q=fast_warmrows=10} hits=0 status=0 QTime=16853 Oct 15, 2009 1:13:36 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Oct 15, 2009 1:13:36 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={q=static+firstSearcher+warming+query+from+solrconfig.xml} hits=0 status=0 QTime=204 Oct 15, 2009 1:13:36 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done here is the config on it: config abortOnConfigurationError${solr.abortOnConfigurationError:true}/ abortOnConfigurationError dataDir/r9/flare1.data/solr/data/dataDir indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType /indexDefaults mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex jmx / updateHandler class=solr.DirectUpdateHandler2 /updateHandler query maxBooleanClauses1024/maxBooleanClauses queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached HashDocSet maxSize=3000 loadFactor=0.75/ listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qrocks/str str name=start0/str str name=rows10/str /lst lststr name=qstatic newSearcher warming query from solrconfig.xml/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qfast_warm/str str name=start0/str str name=rows10/str /lst lststr name=qstatic firstSearcher warming query from solrconfig.xml/str/lst /arr /listener useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers /query requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 address_t^2.0 name^1.5 brand^1.1 airport_name_t^1.0 /str str name=pf text^0.2 address_t^1.1 name^1.5 brand^1.4 brand_exact^1.9 airport_name_t^1.0 /str str name=fl id,name,price,score /str int name=ps100/int str name=q.alt*:*/str str name=hl.fltext features name/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str !-- defined below -- str name=spellchecktrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str /lst arr name=last-components strspellcheck/str /arr /requestHandler requestHandler name=partitioned class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str str name=qftext^0.5 features^1.0 name^1.2 id^10.0/str str name=mm2lt;-1 5lt;-2 6lt;90%/str str
Re: Using DIH's special commands....Help needed
On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row){ var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you must use JDK 6 which has javascript support. Rest looks fine to me. -- Regards, Shalin Shekhar Mangar.
Re: (Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?
: BTW, is there some sort of transition guide for Solr 1.4? : I see there are changes how classes are divided into JARs : like above, and there are some incompatible API changes. : It'll be greate if such information can be part of CHANGES.txt. CHANGES.txt contains an Upgrading from Solr 1.3 section ... if there are incompatible API changes for plugins they *should* be identified there, if you know of something that isn't listed pelase let us know (the specifics). -Hoss
Re: Facet query help
: the original pastie(http://pastie.org/650932). I tried the fq query body with : quotes and without quotes. the entire fq param shouldn't be in quotes ... just the value that you want to query on (since it's a string field and you want the whole field treated as a single string... fq = Memory_s:1 GB fq=Memory_s:%221+GB%22 -Hoss
Re: Using DIH's special commands....Help needed
Hi, For example, my data-import.conf has the following. It allows me to specify a parameter single=pathname on the url used to invoke DIH. It allows a doc to be deleted from the index by, in my case its pathname, which is stored in the field fileAbsolutePath. document !-- ### -- entity name=single-delete dataSource=null processor=XPathEntityProcessor url=${dataimporter.request.single} rootEntity=true flatten=true stream=false forEach=/record transformer=TemplateTransformer field column=fileAbsolutePath template=${dataimporter.request.single} / field column=$deleteDocByQuery template=fileAbsolutePath:${dataimporter.functions.escapeQueryChars(dataimporter.request.single)} / field column=solluckey template=${dataimporter.request.single} / /entity /document I feel sure this can be optimised! Fergus. On Thu, Oct 15, 2009 at 6:25 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I see in the DIH wiki that there are special commands which according to the wiki Special commands can be given to DIH by adding certain variables to the row returned by any of the components . In my use case, my db contains rows that are marked PendingDelete. How do I use the $deleteDocByQuery special command to delete these rows using DIH?In other words, where/how do I specify this? The $deleteDocByQuery is for deleting Solr documents by a Solr query and not DB rows. -- Regards, Shalin Shekhar Mangar. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Solr/Lucene keeps eating up memory while idling
Here is exactly half an hour from roughly the beginning of logging. There's nothing to see really because no requests are sent, you just see the GC behaviour: [Full GC 211987K-208493K(432448K), 0.6273480 secs] [GC 276333K-212269K(438720K), 0.0929710 secs] [GC 289133K-216269K(439936K), 0.1019780 secs] [GC 293133K-220205K(436672K), 0.1128410 secs] [GC 304301K-224429K(441472K), 0.1358250 secs] [GC 308525K-228685K(431744K), 0.1559950 secs] [GC 317197K-233069K(437312K), 0.1642160 secs] [GC 321581K-237613K(432832K), 0.1772830 secs] [GC 329197K-242093K(435136K), 0.1896270 secs] [GC 333677K-246701K(436352K), 0.2039880 secs] [GC 274165K-247917K(437760K), 0.2022640 secs] [Full GC 247917K-208726K(437760K), 0.7195200 secs] The heap is set to 1400m so it'll take it awhile to hit the roof. I also haven't tested to see if it stabilises but i'll leave it running now and see what happens to it overnight. I assume that when(if) it reaches the heap limit i'll just do full GCs more often. Grant Ingersoll-6 wrote: Please send a log covering at least the 2.5 minutes you discuss, but upwards of 5 minutes would be good. -- View this message in context: http://www.nabble.com/Solr-Lucene-keeps-eating-up-memory-while-idling-tp25894357p25916348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr/Lucene keeps eating up memory while idling
I just did some allocation profiling on the stock Solr example... it's not completely idle when no requests are being made. There's only one thing allocating memory: org.mortbay.util.Scanner.scanFiles() That must be Jetty looking to see if any of the files under webapps has changed. It's really nothing to worry about - there's no memory leaks, and the activity is extremely minimal, but if you want to shut it off, it would be a Jetty config option somewhere. -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 12:26 PM, nonrenewable nonrenewa...@gmail.com wrote: I'm curious why this is occurring and whether i can prevent it. This is my scenario: Locally I have an idle running solr 1.3 service using lucene 2.4.1 which has an index of ~330K documents containing ~10 fields each(total size ~12GB). Currently I've turned off all caching, lazy field loading, however i do have facet fields set for some request handlers. What i'm seeing is heap space usage increasing by ~1.2MB per 2 sec (by java.lang.String objects). I'm assuming they're being used by lucene but i may be wrong about that, since i have no actual data to confirm it. Why exactly is this happening, considering no requests are being serviced? Shouldn't the memory usage stabilise with a certain set of information and only be affected on requests? Additionally there is a full GC every half hour, which seems very unreasonable on a machine that isn't actually being used as a service. I really hope there's just a certain setting that i've overlooked, or a concept i'm not understanding because otherwise this behaviour seems very unreasonable... Thanks beforehand, Tony -- View this message in context: http://www.nabble.com/Solr-Lucene-keeps-eating-up-memory-while-idling-tp25894357p25894357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using mincount with date facet in Solr 1.4
: But I was getting facets even with count 0. So I tried following : combinations of mincount parameters, as none was specified in the : wikihttp://wiki.apache.org/solr/SimpleFacetParameters, : for date faceting. mincount is not a date faceting option -- it only applies to field value faceting (ie: facet.field=foo) ... that's why it's in the Field Value Faceting Parameters section of the wiki page you listed, and not in the Date Faceting Parameters section. -Hoss
RE: Right place to put my Tokenizer jars
: Actually, I meant to say I have my Tokenizer jars in solr/lib. : I have the jars that my Tokenizer jars depend in lib/ext, : as I wanted them to be loaded only once per container : due to their internal description. Bad idea? unless there is something *really* hinky about those dependencies, i wouldn't worry about it -- just put them in solr/lib as well (or sharedLib if you use a solr.xml file) : This error can be fixed by putting another set of SLF4J jars : in example/lib/ext, but I don't understand why. In generaly what you are seeing is the security model of classloaders ... classes in Solr can access classes in the container's classloader, but classes in the container's loader can't see classes in solr. Even if they are the same classes, they are differnet *instances* of those classes -- the Class obejcts themselves are distinct. If you're really concerned about minimizing duplication and having a really tiny footprint, reconstructing the solr war to contain all of your classes 9and removing anything you *don't* need) is your best bet ... but for 99% of the worl hat's going to be major overkill. -Hoss
Re: Customizing solr search: SpanQueries (revisited)
: with (in my overridden process() method): : String[] selectFields = {id, fileName}; // the subset of fields : I am interested in : TopDocs results = searcher.search(cmd.getQuery(), 10); // : custom spanquery, and many/all hits : /* save hit info (doc score) */ : /* maybe process SpanQuery.getSpans() here, but perhaps try doc : oriented results processing approach(?) for tokenization : caching/optimization? */ For an approach like this (where you get the top N matches, then process those N to get the spans) you can actually use the existing QueryCOmponent as is, and just add your own SearchComponent that runs after it and inspects to DocList in the QueryResult to get th Spans and record whatever data you want. doing that would have the added benefit of leveraging hte existing filter/query caches when doing the main search (you would still need to use the caching APIs if you wnated to cache your post processing work) The alternate approach using a HitCollector (or the code you've got now asking for TopDocs) bypasses all of Solr's caching -- it will works fine, it's just a question of what you want. -Hoss
Re: advice on failover setup
Don, I neglected to mention the Solr Katta integration patch, SOLR-1395 That's a great place to start coding wise! -J On Wed, Oct 14, 2009 at 4:20 PM, Don Clore don.cl...@5to1.com wrote: I'm sorry, for clarification, is it the *wiki# pages that are under development, or the features (I'm guessing the latter)? If the latter (ZooKeeperIntegration and KattaIntegration are not available yet), is there any sort of guess as to when these features might become available? thanks, Don On Wed, Oct 14, 2009 at 2:13 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Dan, For automatic failover there are 2 wiki pages that may be helpful, however both are in the development stage. http://wiki.apache.org/solr/ZooKeeperIntegration http://wiki.apache.org/solr/KattaIntegration -J On Wed, Oct 14, 2009 at 12:48 PM, Katz, Dan dan.k...@fepoc.com wrote: Hi folks, I'm tasked with designing a failover architecture for our new Solr server. I've read the Replication section in the docs (http://wiki.apache.org/solr/SolrReplication) and I need some clarification/insight. My questions: 1. Is there such a thing as master/master replication? 2. If we have one master and one slave server, and the master goes down, does the slave automatically become the master? What's the process for brining the server back up and getting the two back in sync? Is it a manual process always? 3. We're running Solr inside Tomcat on Windows currently. Any suggestions for a load balancer that will automatically switch to the alternate server if one goes down? Thanks in advance, -- Dan Katz Lead Web Developer FEP Operations Center(r) 202.203.2572 (Direct) dan.k...@fepoc.com Unauthorized interception of this communication could be a violation of Federal and State Law. This communication and any files transmitted with it are confidential and may contain protected health information. This communication is solely for the use of the person or entity to whom it was addressed. If you are not the intended recipient, any use, distribution, printing or acting in reliance on the contents of this message is strictly prohibited. If you have received this message in error, please notify the sender and destroy any and all copies. Thank you. ***
Re: hadoop configuarions for SOLR-1301 patch
Hi Pravin, You'll need to setup a Hadoop cluster which is independent of SOLR-1301. 1301 is for building Solr indexes only, so there isn't a master and slave. After building the indexes one needs to provision the indexes to Solr servers. In my case I only have slaves because I'm not incrementally indexing on the Hadoop generated shards. 1301 does need a Hadoop specific unit test, which I got started and need to complete, that could help a little in understanding. -J On Wed, Oct 14, 2009 at 5:45 AM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am using SOLR-1301 path. I have build the solr with given patch. But I am not able to configure Hadoop for above war. I want to run solr(create index) with 3 nodes (1+2) cluster. How to do the Hadoop configurations for above patch? How to set master and slave? Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Using DIH's special commands....Help needed
use LogTransformer to see if the value is indeed set entity name=post transformer=script:DeleteRow, RegexTransformer,LogTransformer logTemplate=${post} query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce evalsi...@hotmail.com wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Thursday, October 15, 2009 1:42 PM To: solr-user@lucene.apache.org Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce evalsi...@hotmail.comwrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. dataConfig script![CDATA[ function DeleteRow(row) { var jis = row.get('IndexingStatus'); var jid = row.get('Id'); if ( jis == 4 ) { row.put('$deleteDocById', jid); } return row; } ]]/script dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db user=** password=***/ document entity name=post transformer=script:DeleteRow, RegexTransformer query= select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) field column=ptype splitBy=, sourceColName=a / field column=wauth splitBy=, sourceColName=b / field column=miles splitBy=, sourceColName=c / /entity /document /dataConfig One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you must use JDK 6 which has javascript support. Rest looks fine to me. -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com