Re: mergeFactor / indexing speed
Hi Avlesh, hi Otis, hi Grant, hi all, (enumerating to keep track of all the input) a) mergeFactor 1000 too high I'll change that back to 10. I thought it would make Lucene use more RAM before starting IO. b) ramBufferSize: OK, or maybe more. I'll keep that in mind. c) solrconfig.xml - default and main index: I've always changed both sections, the default and the main index one. d) JDBC batch size: I haven't set it. I'll do that. e) DB server performance: I agree, ping is definitely not much information. I also did queries from my own computer towards it (while the indexer ran) which came back as fast as usual. Currently, I don't have any login to ssh to that machine, but I'm going to try get one. f) Network: I'll definitely need to have a look at that once I have access to the db machine. g) the data g.1) nested entity in DIH conf there is only the root and one nested entity. However, that nested entity returns multiple rows (about 10) for one query. (Fetched rows is about 10 times the number of processed documents.) g.2) my custom EntityProcessor ( The code is pasted at the very end of this e-mail. ) - iterates over those multiple rows, - uses one column to create a key in a map, - uses two other columns to create the corresponding value (String concatenation), - if a key already exists, it gets the value, if that value is a list, it adds the new value to that list, if it's not a list, it creates one and adds the old and the new value to it. I refrained from adding any business logic to that processor. It treats all rows alike, no matter whether they hold values that can appear multiple or values that must appear only once. g.3) the two transformers - to split one value into two (regex) field column=person / field column=participant sourceColName=person regex=([^\|]+)\|.*/ field column=role sourceColName=person regex=[^\|]+\|\d+,\d+,\d+,(.*)/ - to create extract a number from an existing number (bit calculation using the script transformer). As that one works on a field that is potentially multiValued, it needs to take care of creating and populating a list, as well. field column=cat name=cat / script![CDATA[ function getMainCategory(row) { var cat = row.get('cat'); var mainCat; if (cat != null) { // check whether cat is an array if (cat instanceof java.util.List) { var arr = java.util.ArrayList(); for (var i=0; icat.size(); i++) { mainCat = new java.lang.Integer(cat.get(i)8); if (!arr.contains(mainCat)) { arr.add(mainCat); } } row.put('maincat', arr); } else { // it is a single value var mainCat = new java.lang.Integer(cat8); row.put('maincat', mainCat); } } return row; } ]]/script (The EpgValueEntityProcessor decides on creating lists on a case by case basis: only if a value is specified multiple times for a certain data set does it create a list. This is because I didn't want to put any complex configuration or business logic into it.) g.4) fields the DIH extracts 5 fields from the root entity, 11 fields from the nested entity, and the transformers might create additional 3 (multiValued). schema.xml defines 21 fields (two additional fields: the timestamp field (default=NOW) and a field collecting three other text fields for default search (using copy field)): - 2 long - 3 integer - 3 sint - 3 date - 6 text_cs (class=solr.TextField positionIncrementGap=100): analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0 generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 / /analyzer - 4 text_de (one is the field populated by copying from the 3 others): analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LengthFilterFactory min=2 max=5000 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_de.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=German / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer Thank you for taking your time! Cheers, Chantal ** EpgValueEntityProcessor.java *** import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.logging.Logger; import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.SqlEntityProcessor; public class
Functions in search result
Solr people, Can i retrieve results from a function query? For instance, i have a schema in which all documents have a size in bytes field. For each query, i also need to sum of the bytes field for the returned documents. I know i can use SUM as part of a function query but i cannot figure it out if it even works for me. I prefer doing it with Solr and have the sum in the in the response header or somewhere similar instead of iterating over the entire resultset myself. Also, iterating over the resultset would not really work for me either since i also need paging through start= and rows= to limit the show documents but still keeping the sum of bytes the same. Regards, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105
Re: How to configure Solr in Glassfish ?
On 7/20/09 11:08 PM, huenzhao wrote: Yes, I don't know how set solr.home in glassfish with centOS. I tried to configure the solr.home, but the error log is:looking for solr.xml: /var/deploy/solr/solr.xml Is that the appropriate path for your solr.home? What did you intend to set it to? -- Ilan Rabinovitch i...@fonz.net --- SCALE 8x: 2010 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org
Re: Rotating the primary shard in /solr/select
On Wed, Jul 29, 2009 at 2:57 AM, Phillip Farber pfar...@umich.edu wrote: Is there any value in a round-robin scheme to cycle through the Solr instances supporting a multi-shard index over several machines when sending queries or is it better to just pick one instance and stick with it. I'm assuming all machines in the cluster have the same hardware specs. So scenario A (round-robin): query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2 query 2: /solr-shard-2/select?q=dog... shards=shard-1,shard2 query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2 etc. or or scenario B (fixed): query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2 query 2: /solr-shard-1/select?q=dog... shards=shard-1,shard2 query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2 etc. Is there evidence that distributing the overhead of result merging over more machines (A) gives a performance boost? We issue distributed search queries through a load balancer. So in effect, the merging server (or aggregator) keeps changing. I don't know if that leads to a performance boost or not but I guess spreading the load is a good idea. -- Regards, Shalin Shekhar Mangar.
Re: Rotating the primary shard in /solr/select
On Tue, Aug 4, 2009 at 11:26 AM, Rahul R rahul.s...@gmail.com wrote: Philip, I cannot answer your question, but I do have a question for you. Does aggregation happen at the primary shard ? For eg : if I have three JVMs JVM 1 : My application powered by Solr JVM 2 : Shard 1 JVM 3 : Shard 2 I initialize my SolrServer like this SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1); Does aggregation now happen at JVM 2 ? Yes. Is there any other reason for initializing the SolrServer with one of the shard URLs ? The SolrServer is initialized to the server to which you want to send the request. It has nothing to do with distributed search by itself. -- Regards, Shalin Shekhar Mangar.
Re: Rotating the primary shard in /solr/select
*The SolrServer is initialized to the server to which you want to send the request. It has nothing to do with distributed search by itself.* But isn't the request sent to all the shards ? We set all the shard urls in the 'shards' parameter of our HttpRequest.Or is it something like the request is first sent to the server (with which SolrServer is initialized) and from there it is sent to all the other shards ? Regards Rahul On Tue, Aug 4, 2009 at 2:29 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Aug 4, 2009 at 11:26 AM, Rahul R rahul.s...@gmail.com wrote: Philip, I cannot answer your question, but I do have a question for you. Does aggregation happen at the primary shard ? For eg : if I have three JVMs JVM 1 : My application powered by Solr JVM 2 : Shard 1 JVM 3 : Shard 2 I initialize my SolrServer like this SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1); Does aggregation now happen at JVM 2 ? Yes. Is there any other reason for initializing the SolrServer with one of the shard URLs ? The SolrServer is initialized to the server to which you want to send the request. It has nothing to do with distributed search by itself. -- Regards, Shalin Shekhar Mangar.
eternal optimize interrupted
Hi, last evening we started an optimize over our solr index of 45GB. This morning the optimize was still running, discs spinning like crazy and de index directory has grew to 83GB. We stopped and restarted tomcat since solr was unresponsive and we needed to query the index. Now I don't know what to do? How to find out which ratio of the index is optimized, how many nights will it take to finish? Best regards, Thomas Koch, http://www.koch.ro
Re: Rotating the primary shard in /solr/select
On Tue, Aug 4, 2009 at 2:37 PM, Rahul R rahul.s...@gmail.com wrote: *The SolrServer is initialized to the server to which you want to send the request. It has nothing to do with distributed search by itself.* But isn't the request sent to all the shards ? We set all the shard urls in the 'shards' parameter of our HttpRequest.Or is it something like the request is first sent to the server (with which SolrServer is initialized) and from there it is sent to all the other shards ? The request is sent to the server with which SolrServer is initialized. That server makes use of the shards parameter, queries other servers, merges the responses and sends it back to the client. -- Regards, Shalin Shekhar Mangar.
Re: Picking Facet Fields by Frequency-in-Results
And further on this, if you want a field automatically added to each document with the list of its field names, check out http://issues.apache.org/jira/browse/SOLR-1280 Erik On Aug 4, 2009, at 1:01 AM, Avlesh Singh wrote: I understand the general need here. And just extending what you suggested (indexing the fields themselves inside a multiValued field), you can perform a query like this - /search? q = myquery facet = true facet .field = indexedfieldsfacet.field=field1facet.field=field2...facet.sort=true You'll get facets for all the fields (passed as multiple facet.field params), including the one that gives you field frequency. You can do all sorts of post processing on this data to achieve the desired. Hope this helps. Cheers Avlesh On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris rygu...@gmail.com wrote: One task when designing a facet-based UI is deciding which fields to facet on and display facets for. One possibility that I hope to explore is to determine which fields to facet on dynamically, based on the search results. In particular, I hypothesize that, for a somewhat heterogeneous index (heterogeneous in terms of which fields a given record might contain), that the following rule might be helpful: Facet on a given field to the extent that it is frequently set in the documents matching the user's search. For example, let's say my results look like this: Doc A: f1: foo f2: bar f3: N/A f4: N/A Doc B: f1: foo2 f2: N/A f3: N/A f4: N/A Doc C: f1: foo3 f2: quiz f3: N/A f4: buzz Doc D: f1: foo4 f2: question f3: bam f4: bing The field usage information for these documents could be summarized like this: field f1: Set in 4 docs field f2: Set in 3 doc field f3: Set 1 doc field f4: Set 2 doc If I were choosing facet fields based on the above rule, I would definitely want to display facets for field f1, since occurs in all documents. If I had room for another facet in the UI, I would facet f2. If I wanted another one, I'd go with f4, since it's more popular than f3. I probably would ignore f3 in any case, because it's set for only one document. Has anyone implemented such a scheme with Solr? Any success? (The closest thing I can find is http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries to pick which facets to display based not on frequency but based more on a ruleset.) As far as implementation, the most straightforward approach (which wouldn't involve modifying Solr) would apparently be to add a new multi-valued fieldsindexed field to each document, which would note which fields actually have a value for each document. So when I pass data to Solr at indexing time, it will look something like this (except of course it will be in valid Solr XML, rather than this schematic): Doc A: f1: foo f2: bar indexedfields: f1, f2 Doc B: f1: foo2 indexedfields: f1 Doc C: f1: foo3 f2: quiz f4: buzz indexedfields: f1, f2, f4 Doc D: f1: foo4 f2: question f3: bam f4: bing indexedfields: f1, f2, f3, f4 Then to chose which facets to display, I call http://myserver/solr/search?q=myqueryfacet=truefacet.field=indexedfieldsfacet.sort=true and use the frequency information from this query to determine which fields to display in the faceting UI. (To get the actual facet information for those fields, I would query Solr a second time.) Are there any alternatives that would be easier or more efficient? Thanks, Chris
Re: Rotating the primary shard in /solr/select
Shalin, thank you for the clarification. Philip, I just realized that I have diverted the original topic of the thread. My apologies. Regards Rahul On Tue, Aug 4, 2009 at 3:35 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Aug 4, 2009 at 2:37 PM, Rahul R rahul.s...@gmail.com wrote: *The SolrServer is initialized to the server to which you want to send the request. It has nothing to do with distributed search by itself.* But isn't the request sent to all the shards ? We set all the shard urls in the 'shards' parameter of our HttpRequest.Or is it something like the request is first sent to the server (with which SolrServer is initialized) and from there it is sent to all the other shards ? The request is sent to the server with which SolrServer is initialized. That server makes use of the shards parameter, queries other servers, merges the responses and sends it back to the client. -- Regards, Shalin Shekhar Mangar.
Synonym aware string field typ
Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: fieldType name=sastring class=solr.StrField analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.SynonymFilterFactory tokenizerFactory=solr.KeywordTokenizerFactory synonyms=my_synonyms.txt ignoreCase=true/ /analyzer /fieldType My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: ClassCastException from custom request handler
Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() { return ; } public String getSource() { return ; } public String getSourceId() { return ; } public NamedList getStatistics() { return new NamedList(); } public String getVersion() { return ; } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { CollectionString names = req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames(); rsp.add(cores, names); // if the cores are dynamic, you prob don't want to cache rsp.setHttpCaching(false); } } 2009/8/4 Avlesh Singh avl...@gmail.com I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather thanthe ClassCastException. You are right about that, James. Which Solr version are you using? Can you please paste the relevant pieces in your solrconfig.xml and the request handler class you have created? Cheers Avlesh On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com wrote: Hi, Thanks for your suggestions! I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather than the ClassCastException. I did have some problems getting my class on the app server's classpath. I'm running with solr.home set to multicore, but creating a multicore/lib directory and putting my request handler class in there resulted in Error loading class errors. I found that setting jetty.class.path to include multicore/lib (and also explicitly point at Solr's core and common JARs) fixed the Error loading class errors, leaving these ClassCastExceptions... 2009/8/3 Avlesh Singh avl...@gmail.com Can you cross check the class attribute for your handler in solrconfig.xml? My guess is that it is specified as solr.LiveCoresHandler. It should be fully qualified class name - com.foo.path.to.LiveCoresHandler instead. Moreover, I am damn sure that you did not forget to drop your jar into solr.home/lib. Checking once again might not be a bad idea :) Cheers Avlesh On Mon, Aug 3, 2009 at 9:11 PM, James Brady james.colin.br...@gmail.com wrote: Hi, I'm creating a custom request handler to return a list of live cores in Solr. On startup, I get this exception for each core: Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log SEVERE: java.lang.ClassCastException: LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169) at org.apache.solr.core.SolrCore.init(SolrCore.java:444) I've tried a few variations on the class definition, including extending RequestHandlerBase (as suggested here: http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2 ) and implementing SolrRequestHandler directly. I'm sure that the Solr libraries I built against and those I'm running on are the same version too, as I unzipped the Solr war file and copies the relevant jars out of there to build against. Any ideas on what could be causing the ClassCastException? I've attached a debugger to the running Solr process but it didn't shed any light on the issue... Thanks! James -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom
Re: ClassCastException from custom request handler
what is the package of LiveCoresHandler ? I guess the requestHandler name should be name=/livecores On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com wrote: Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() { return ; } public String getSource() { return ; } public String getSourceId() { return ; } public NamedList getStatistics() { return new NamedList(); } public String getVersion() { return ; } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { CollectionString names = req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames(); rsp.add(cores, names); // if the cores are dynamic, you prob don't want to cache rsp.setHttpCaching(false); } } 2009/8/4 Avlesh Singh avl...@gmail.com I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather thanthe ClassCastException. You are right about that, James. Which Solr version are you using? Can you please paste the relevant pieces in your solrconfig.xml and the request handler class you have created? Cheers Avlesh On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com wrote: Hi, Thanks for your suggestions! I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather than the ClassCastException. I did have some problems getting my class on the app server's classpath. I'm running with solr.home set to multicore, but creating a multicore/lib directory and putting my request handler class in there resulted in Error loading class errors. I found that setting jetty.class.path to include multicore/lib (and also explicitly point at Solr's core and common JARs) fixed the Error loading class errors, leaving these ClassCastExceptions... 2009/8/3 Avlesh Singh avl...@gmail.com Can you cross check the class attribute for your handler in solrconfig.xml? My guess is that it is specified as solr.LiveCoresHandler. It should be fully qualified class name - com.foo.path.to.LiveCoresHandler instead. Moreover, I am damn sure that you did not forget to drop your jar into solr.home/lib. Checking once again might not be a bad idea :) Cheers Avlesh On Mon, Aug 3, 2009 at 9:11 PM, James Brady james.colin.br...@gmail.com wrote: Hi, I'm creating a custom request handler to return a list of live cores in Solr. On startup, I get this exception for each core: Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log SEVERE: java.lang.ClassCastException: LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169) at org.apache.solr.core.SolrCore.init(SolrCore.java:444) I've tried a few variations on the class definition, including extending RequestHandlerBase (as suggested here: http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2 ) and implementing SolrRequestHandler directly. I'm sure that the Solr libraries I built against and those I'm running on are the same version too, as I unzipped the Solr war file and copies the relevant jars out of there to build against. Any ideas on what could be causing the ClassCastException? I've attached a debugger to the running Solr process but it didn't shed any light on the issue... Thanks! James -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: ClassCastException from custom request handler
Hi, the LiveCoresHandler is in the default package - the behaviour's the same if I have it in a properly namespaced package too... The requestHandler name can start either be a path (starting with '/') or a qt name: http://wiki.apache.org/solr/SolrRequestHandler 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com what is the package of LiveCoresHandler ? I guess the requestHandler name should be name=/livecores On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com wrote: Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() { return ; } public String getSource() { return ; } public String getSourceId() { return ; } public NamedList getStatistics() { return new NamedList(); } public String getVersion() { return ; } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { CollectionString names = req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames(); rsp.add(cores, names); // if the cores are dynamic, you prob don't want to cache rsp.setHttpCaching(false); } } 2009/8/4 Avlesh Singh avl...@gmail.com I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather thanthe ClassCastException. You are right about that, James. Which Solr version are you using? Can you please paste the relevant pieces in your solrconfig.xml and the request handler class you have created? Cheers Avlesh On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com wrote: Hi, Thanks for your suggestions! I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather than the ClassCastException. I did have some problems getting my class on the app server's classpath. I'm running with solr.home set to multicore, but creating a multicore/lib directory and putting my request handler class in there resulted in Error loading class errors. I found that setting jetty.class.path to include multicore/lib (and also explicitly point at Solr's core and common JARs) fixed the Error loading class errors, leaving these ClassCastExceptions... 2009/8/3 Avlesh Singh avl...@gmail.com Can you cross check the class attribute for your handler in solrconfig.xml? My guess is that it is specified as solr.LiveCoresHandler. It should be fully qualified class name - com.foo.path.to.LiveCoresHandler instead. Moreover, I am damn sure that you did not forget to drop your jar into solr.home/lib. Checking once again might not be a bad idea :) Cheers Avlesh On Mon, Aug 3, 2009 at 9:11 PM, James Brady james.colin.br...@gmail.com wrote: Hi, I'm creating a custom request handler to return a list of live cores in Solr. On startup, I get this exception for each core: Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log SEVERE: java.lang.ClassCastException: LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169) at org.apache.solr.core.SolrCore.init(SolrCore.java:444) I've tried a few variations on the class definition, including extending RequestHandlerBase (as suggested here: http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2 ) and implementing SolrRequestHandler directly. I'm sure that the Solr libraries I built against and those I'm running on are the same version too, as I unzipped the Solr war file and copies the relevant jars out of there to build against. Any ideas on what could be causing the ClassCastException? I've attached a debugger to the running Solr process but it didn't shed any light on the issue... Thanks! James -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom --
Solr 1.4 schedule?
Hi, When is Solr 1.4 scheduled for release? Is there any ballpark date yet? Thanks Rob
Delete solr data from disk space
I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 schedule?
Very soon I think is the answer. As well as when its ready. Solr 1.4 is waiting for the next release of Lucene, which is very soon. Once Lucene comes out, Solr will follow in a week or two barring release issues. Also, if you look at JIRA: http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12313351 you can see that there are 34 open issues still assigned to 1.4 Eric On Tue, Aug 4, 2009 at 8:08 AM, Robert Youngr...@roryoung.co.uk wrote: Hi, When is Solr 1.4 scheduled for release? Is there any ballpark date yet? Thanks Rob
Re: Delete solr data from disk space
Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance.
Re: Delete solr data from disk space
Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808868.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete solr data from disk space
Hi , Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. Markus Jelsma - Buyways B.V. wrote: Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete solr data from disk space
You simply can't delete individual index files. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Ashish Kumar Srivastava ashu.impe...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 9:41:09 AM Subject: Re: Delete solr data from disk space Hi , Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. Markus Jelsma - Buyways B.V. wrote: Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html Sent from the Solr - User mailing list archive at Nabble.com.
Error with UpdateRequestProcessorFactory
Hi folks, I'm having some problem with a custom handler on my Solr. All the application works fine, but when I do a new checkout from svn and generate a jar file with my handler, I got: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory; I checked versions of my libs and they're ok. I'm using Solr 1.3 and the environment is the same that works previously. Does anyone have an idea of what could be? Thanks! Cheers, -- Daniel Cassiano _ http://www.apontador.com.br/ http://www.maplink.com.br/
Re: Delete solr data from disk space
Hi Anish, Have you optimized your index? When you delete documents in lucene they are simply marked as 'deleted', they aren't physically removed from the disk. To get the disk space back you must run an optimize, which re-writes the index out to disk without the deleted documents, then deletes the original. Toby On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote: Hi , Sorry!! But this solution will not work because I deleted data by certain query. Then how can i know which files should be deleted. I cant delete whole data. Markus Jelsma - Buyways B.V. wrote: Hello, A rigorous but quite effective method is manually deleting the files in your SOLR_HOME/data directory and reindex the documents you want. This will surely free some diskspace. Cheers, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote: I am facing a problem in deleting solr data form disk space. I had 80Gb of of solr data. I deleted 30% of these data by using query in solr-php client and committed. Now deleted data is not visible from the solr UI but used disk space is still 80Gb for solr data. Please reply if you have any solution to free the disk space after deleting some solr data. Thanks in advance. -- View this message in context: http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html Sent from the Solr - User mailing list archive at Nabble.com. -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: Synonym aware string field typ
Hi, KeywordTokenizer will not tokenize your string. I have a feeling that won't work with synonyms, unless your field value entirely match a synonym. Maybe an example would help: If you have: foo canine bar Then KeywordTokenizer won't break this into 3 tokens. And then canine/dog synonym won't work. Yes, if you define the analyzer like that, it will be used both at index and query time. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé jerome.et...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 7:33:28 AM Subject: Synonym aware string field typ Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: tokenizerFactory=solr.KeywordTokenizerFactory synonyms=my_synonyms.txt ignoreCase=true/ My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Functions in search result
On Aug 4, 2009, at 4:37 AM, Markus Jelsma - Buyways B.V. wrote: Solr people, Can i retrieve results from a function query? For instance, i have a schema in which all documents have a size in bytes field. For each query, i also need to sum of the bytes field for the returned documents. I know i can use SUM as part of a function query but i cannot figure it out if it even works for me. In short, no. However, see https://issues.apache.org/jira/browse/SOLR-1298 as you are not alone in wanting this. I prefer doing it with Solr and have the sum in the in the response header or somewhere similar instead of iterating over the entire resultset myself. Also, iterating over the resultset would not really work for me either since i also need paging through start= and rows= to limit the show documents but still keeping the sum of bytes the same. Regards, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: JVM Heap utilization Memory leaks with Solr
Hi Rahul, A) There are no known (to me) memory leaks. I think there are too many variables for a person to tell you what exactly is happening, plus you are dealing with the JVM here. :) Try jmap -histo:live PID-HERE | less and see what's using your memory. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Rahul R rahul.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 1:09:06 AM Subject: JVM Heap utilization Memory leaks with Solr I am trying to track memory utilization with my Application that uses Solr. Details of the setup : -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0 - Hardware : 12 CPU, 24 GB RAM For testing during PSR I am using a smaller subset of the actual data that I want to work with. Details of this smaller sub-set : - 5 million records, 4.5 GB index size Observations during PSR: A) I have allocated 3.2 GB for the JVM(s) that I used. After all users logout and doing a force GC, only 60 % of the heap is reclaimed. As part of the logout process I am invalidating the HttpSession and doing a close() on CoreContainer. From my application's side, I don't believe I am holding on to any resource. I wanted to know if there are known issues surrounding memory leaks with Solr ? B) To further test this, I tried deploying with shards. 3.2 GB was allocated to each JVM. All JVMs had 96 % free heap space after start up. I got varying results with this. Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain. I split the 5 million index into 5 parts of 1 million each and used them as shards. After multiple users used the system and doing a force GC, around 94 - 96 % of heap was reclaimed in all the JVMs. Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On the other, I deployed the entire 5 million part index as one shard. After multiple users used the system and doing a gorce GC, around 76 % of the heap was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my application was running. This result further convinces me that my application can be absolved of holding on to memory resources. I am not sure how to interpret these results ? For searching, I am using Without Shards : EmbeddedSolrServer With Shards :CommonsHttpSolrServer In terms of Solr objects this is what differs in my code between normal search and shards search (distributed search) After looking at Case 1, I thought that the CommonsHttpSolrServer was more memory efficient but Case 2 proved me wrong. Or could there still be memory leaks in my application ? Any thoughts, suggestions would be welcome. Regards Rahul
Re: ClassCastException from custom request handler
There is *something* strange going on with classloaders; when I put my .class files in the right place in WEB-INF/lib in a repackaged solr.war file, it's not found by the plugin loader (Error loading class). So the plugin classloader isn't seeing stuff inside WEB-INF/lib. That explains why the plugin loader sees my class files when I point jetty.class.path at the right directory, but in that situation I also need to point jetty.class.path at the Solr JARs explicitly. Still, how would ClassCastExceptions be caused by class loader paths not being set correctly? I don't follow you... To get a ClassCastException, the class to cast to must have been found. The cast-to class must not be in the object's inheritance hierarchy, or be built against a different version, no? 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess this is a classloader issue. it is worth trying to put it in the WEB-INF/lib of the solr.war On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com wrote: Hi, the LiveCoresHandler is in the default package - the behaviour's the same if I have it in a properly namespaced package too... The requestHandler name can start either be a path (starting with '/') or a qt name: http://wiki.apache.org/solr/SolrRequestHandler starting w/ '/' helps in accessing it directly 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com what is the package of LiveCoresHandler ? I guess the requestHandler name should be name=/livecores On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com wrote: Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() { return ; } public String getSource() { return ; } public String getSourceId() { return ; } public NamedList getStatistics() { return new NamedList(); } public String getVersion() { return ; } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { CollectionString names = req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames(); rsp.add(cores, names); // if the cores are dynamic, you prob don't want to cache rsp.setHttpCaching(false); } } 2009/8/4 Avlesh Singh avl...@gmail.com I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather thanthe ClassCastException. You are right about that, James. Which Solr version are you using? Can you please paste the relevant pieces in your solrconfig.xml and the request handler class you have created? Cheers Avlesh On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com wrote: Hi, Thanks for your suggestions! I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather than the ClassCastException. I did have some problems getting my class on the app server's classpath. I'm running with solr.home set to multicore, but creating a multicore/lib directory and putting my request handler class in there resulted in Error loading class errors. I found that setting jetty.class.path to include multicore/lib (and also explicitly point at Solr's core and common JARs) fixed the Error loading class errors, leaving these ClassCastExceptions... 2009/8/3 Avlesh Singh avl...@gmail.com Can you cross check the class attribute for your handler in solrconfig.xml? My guess is that it is specified as solr.LiveCoresHandler. It should be fully qualified class name - com.foo.path.to.LiveCoresHandler instead. Moreover, I am damn sure that you did not forget to drop your jar into solr.home/lib. Checking once again might not be a bad idea :) Cheers Avlesh On Mon, Aug 3, 2009 at 9:11 PM, James Brady james.colin.br...@gmail.com wrote: Hi, I'm creating a custom request handler to return a list of live cores in Solr. On startup, I get this exception for each core: Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log SEVERE: java.lang.ClassCastException: LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at
Re: Functions in search result
Markus, As far as I know, functions are executed on a per-document/field basis. That is, I don't think any of them aggregate numeric field values from a result set. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Markus Jelsma - Buyways B.V. mar...@buyways.nl To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 4:37:09 AM Subject: Functions in search result Solr people, Can i retrieve results from a function query? For instance, i have a schema in which all documents have a size in bytes field. For each query, i also need to sum of the bytes field for the returned documents. I know i can use SUM as part of a function query but i cannot figure it out if it even works for me. I prefer doing it with Solr and have the sum in the in the response header or somewhere similar instead of iterating over the entire resultset myself. Also, iterating over the resultset would not really work for me either since i also need paging through start= and rows= to limit the show documents but still keeping the sum of bytes the same. Regards, - Markus Jelsma Buyways B.V. Tel. 050-3118123 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124 http://www.buyways.nl 9743 AD GroningenKvK 01074105
Re: 99.9% uptime requirement
On Mon, 3 Aug 2009 13:15:44 -0700 Robert Petersen rober...@buy.com wrote: Thanks all, I figured there would be more talk about daemontools if there were really a need. I appreciate the input and for starters we'll put two slaves behind a load balancer and grow it from there. Robert, not taking away from daemon tools, but daemon tools won't help you if your whole server goes down. don't put all your eggs in one basket - several servers, load balancer (hardware load balancers x 2, haproxy, etc) and sure, use daemon tools to keep your services running within each server... B _ {Beto|Norberto|Numard} Meijome Why do you sit there looking like an envelope without any address on it? Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: ClassCastException from custom request handler
Hi James! James Brady schrieb: There is *something* strange going on with classloaders; when I put my .class files in the right place in WEB-INF/lib in a repackaged solr.war file, it's not found by the plugin loader (Error loading class). So the plugin classloader isn't seeing stuff inside WEB-INF/lib. That explains why the plugin loader sees my class files when I point jetty.class.path at the right directory, but in that situation I also need to point jetty.class.path at the Solr JARs explicitly. you cannot be sure that it sees *your* files. It only sees a class that qualifies with the name that is requested in your code. It's obviously not the class the code expects, though - as it results in a ClassCastException at some point. It might help to have a look at where and why that casting went wrong. I wrote a custom EntityProcessor and deployed it first under WEB-INF/classes, and now in the plugin directory, and that worked without a problem. My first guess is that something with your packaging is wrong - what do you mean by default package? What is the full name of your class and how does its path in the file system look like? Can you paste the stack trace of the exception? Chantal Still, how would ClassCastExceptions be caused by class loader paths not being set correctly? I don't follow you... To get a ClassCastException, the class to cast to must have been found. The cast-to class must not be in the object's inheritance hierarchy, or be built against a different version, no? 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess this is a classloader issue. it is worth trying to put it in the WEB-INF/lib of the solr.war On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com wrote: Hi, the LiveCoresHandler is in the default package - the behaviour's the same if I have it in a properly namespaced package too... The requestHandler name can start either be a path (starting with '/') or a qt name: http://wiki.apache.org/solr/SolrRequestHandler starting w/ '/' helps in accessing it directly 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com what is the package of LiveCoresHandler ? I guess the requestHandler name should be name=/livecores On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com wrote: Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() { return ; } public String getSource() { return ; } public String getSourceId() { return ; } public NamedList getStatistics() { return new NamedList(); } public String getVersion() { return ; } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { CollectionString names = req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames(); rsp.add(cores, names); // if the cores are dynamic, you prob don't want to cache rsp.setHttpCaching(false); } } 2009/8/4 Avlesh Singh avl...@gmail.com I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather thanthe ClassCastException. You are right about that, James. Which Solr version are you using? Can you please paste the relevant pieces in your solrconfig.xml and the request handler class you have created? Cheers Avlesh On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com wrote: Hi, Thanks for your suggestions! I'm sure I have the class name right - changing it to something patently incorrect results in the expected org.apache.solr.common.SolrException: Error loading class ..., rather than the ClassCastException. I did have some problems getting my class on the app server's classpath. I'm running with solr.home set to multicore, but creating a multicore/lib directory and putting my request handler class in there resulted in Error loading class errors. I found that setting jetty.class.path to include multicore/lib (and also explicitly point at Solr's core and common JARs) fixed the Error loading class errors, leaving these ClassCastExceptions... 2009/8/3 Avlesh Singh avl...@gmail.com Can you cross check the class attribute for your handler in solrconfig.xml? My guess is that it is specified as solr.LiveCoresHandler. It should be fully qualified class name - com.foo.path.to.LiveCoresHandler instead. Moreover, I am damn sure that you did not forget to drop your jar into solr.home/lib. Checking once again might not be a bad idea :) Cheers Avlesh On Mon, Aug 3, 2009 at 9:11 PM, James Brady james.colin.br...@gmail.com wrote: Hi, I'm creating a custom request handler to return a list of live cores in Solr. On startup, I get this exception for each core: Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log
Wild card search does not return any result
Hello All, I have two fields. field name=BUS type=text indexed=true stored=true/ field name=ROLE type=text indexed=true stored=true / I have document(which has been indexed) that has a value of ICS for BUS field and SSE for ROLE filed When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics* i don't get any match (or result) when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the result. why BUS:ics* does not return any result ? I have the default configuration for text filed, see below. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Thanks/Regards, Parvez Note : This is a re-post. looks like something went wrong the first time around.
Re: Error with UpdateRequestProcessorFactory
On Tue, Aug 4, 2009 at 7:28 PM, Daniel Cassiano danielcassi...@gmail.comwrote: Hi folks, I'm having some problem with a custom handler on my Solr. All the application works fine, but when I do a new checkout from svn and generate a jar file with my handler, I got: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory; I checked versions of my libs and they're ok. I'm using Solr 1.3 and the environment is the same that works previously. Are you using the released Solr 1.3 or some intermediate nightly build? The 1.3 release has SolrCore.getUpdateProcessorChain(String) method. -- Regards, Shalin Shekhar Mangar.
Re: ClassCastException from custom request handler
Hi Chantal! I've included a stack trace below. I've attached a debugger to the server starting up, and it is finding my class file as expected... I agree it looks like something wrong with how I've deployed the compiled code, but perhaps different Solr versions at compile time and run time? However, I've checked and rechecked that and can't see a problem! The actually ClassCastException is being thrown in a anonymous AbstractPluginLoader instance's create method: http://svn.apache.org/viewvc/lucene/solr/tags/release-1.3.0/src/java/org/apache/solr/util/plugin/AbstractPluginLoader.java?revision=695557 It's the cast to SolrRequestHandler which fails. Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /update/csv: org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers Aug 4, 2009 4:24:25 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ClassCastException: com.jmsbrdy.LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169) at org.apache.solr.core.SolrCore.init(SolrCore.java:444) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) At the moment, my deployment is: 1. compile my single Java file from an Ant script (pointing at the Solr JARs from an exploded solr.war) 2. copy that class file's directory tree (com/jmsbrdy/LiveCoresHandler.class) to a lib in the root of my jetty install 3. add lib to Jetty's class path 4. add the Solr JARs from the exploded war to Jetty's class path 5. start the server Can you see any problems there? 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de Hi James! James Brady schrieb: There is *something* strange going on with classloaders; when I put my .class files in the right place in WEB-INF/lib in a repackaged solr.war file, it's not found by the plugin loader (Error loading class). So the plugin classloader isn't seeing stuff inside WEB-INF/lib. That explains why the plugin loader sees my class files when I point jetty.class.path at the right directory, but in that situation I also need to point jetty.class.path at the Solr JARs explicitly. you cannot be sure that it sees *your* files. It only sees a class that qualifies with the name that is requested in your code. It's obviously not the class the code expects, though - as it results in a ClassCastException at some point. It might help to have a look at where and why that casting went wrong. I wrote a custom EntityProcessor and deployed it first under WEB-INF/classes, and now in the plugin directory, and that worked without a problem. My first guess is that something with your packaging is wrong - what do you mean by default package? What is the full name of your class and how does its path in the file system look like? Can you paste the stack trace of the exception? Chantal Still, how would ClassCastExceptions be caused by class loader paths not being set correctly? I don't follow you... To get a ClassCastException, the class to cast to must have been found. The cast-to class must not be in the object's inheritance hierarchy, or be built against a different version, no? 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess this is a classloader issue. it is worth trying to put it in the WEB-INF/lib of the solr.war On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com wrote: Hi, the LiveCoresHandler is in the default package - the behaviour's the same if I have it in a properly namespaced package too... The requestHandler name can start either be a path (starting with '/') or a qt name: http://wiki.apache.org/solr/SolrRequestHandler starting w/ '/' helps in accessing it directly 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com what is the package of LiveCoresHandler ? I guess the requestHandler name should be name=/livecores On Tue, Aug 4, 2009 at 5:04 PM, James Brady james.colin.br...@gmail.com wrote: Solr version: 1.3.0 694707 solrconfig.xml: requestHandler name=livecores class=LiveCoresHandler / public class LiveCoresHandler extends RequestHandlerBase { public void init(NamedList args) { } public String getDescription() {
Re: 99.9% uptime requirement
Right. You don't get to 99.9% by assuming that an 8 hour outage is OK. Design for continuous uptime, with plans for how long it takes to patch around a single point of failure. For example, if your load balancer is a single point of failure, make sure that you can redirect the front end servers to a single Solr server in much less than 8 hours. Also, think about your SLA. Can the search index be more than 8 hours stale? How quickly do you need to be able to replace a failed indexing server? You might be able to run indexing locally on each search server if they are lightly loaded. wunder On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote: On Mon, 3 Aug 2009 13:15:44 -0700 Robert Petersen rober...@buy.com wrote: Thanks all, I figured there would be more talk about daemontools if there were really a need. I appreciate the input and for starters we'll put two slaves behind a load balancer and grow it from there. Robert, not taking away from daemon tools, but daemon tools won't help you if your whole server goes down. don't put all your eggs in one basket - several servers, load balancer (hardware load balancers x 2, haproxy, etc) and sure, use daemon tools to keep your services running within each server... B _ {Beto|Norberto|Numard} Meijome Why do you sit there looking like an envelope without any address on it? Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Synonym aware string field typ
Hi Otis, Thanks. Yep, this synonym behaviour is the one I want. So if I don't want the synonyms to be applied at index time, I need to specify an index time analyzer right ? Jerome. 2009/8/4 Otis Gospodnetic otis_gospodne...@yahoo.com: Hi, KeywordTokenizer will not tokenize your string. I have a feeling that won't work with synonyms, unless your field value entirely match a synonym. Maybe an example would help: If you have: foo canine bar Then KeywordTokenizer won't break this into 3 tokens. And then canine/dog synonym won't work. Yes, if you define the analyzer like that, it will be used both at index and query time. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé jerome.et...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 7:33:28 AM Subject: Synonym aware string field typ Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: tokenizerFactory=solr.KeywordTokenizerFactory synonyms=my_synonyms.txt ignoreCase=true/ My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: ClassCastException from custom request handler
Hi there, could it be that something with the Generics code in the plugin loader classes works not as expected? Citing for example http://stackoverflow.com/questions/372250/java-generics-arrays-and-the-classcastexception this is because Generics only provide type-safety at compile-time. 80-84 @SuppressWarnings(unchecked) protected T create( ResourceLoader loader, String name, String className, Node node ) throws Exception { return (T) loader.newInstance( className, getDefaultPackages() ); } I am not sure what T is at runtime in this case. The subclass (anonymous in RequestHandlers line 139) replaces T with SolrRequestHandler. But what happens in the superclass? Is it using Object? Sorry, I'm not that deep into Generics. Chantal James Brady schrieb: Hi Chantal! I've included a stack trace below. I've attached a debugger to the server starting up, and it is finding my class file as expected... I agree it looks like something wrong with how I've deployed the compiled code, but perhaps different Solr versions at compile time and run time? However, I've checked and rechecked that and can't see a problem! The actually ClassCastException is being thrown in a anonymous AbstractPluginLoader instance's create method: http://svn.apache.org/viewvc/lucene/solr/tags/release-1.3.0/src/java/org/apache/solr/util/plugin/AbstractPluginLoader.java?revision=695557 It's the cast to SolrRequestHandler which fails. Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /update/csv: org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers Aug 4, 2009 4:24:25 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ClassCastException: com.jmsbrdy.LiveCoresHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152) at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169) at org.apache.solr.core.SolrCore.init(SolrCore.java:444) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) At the moment, my deployment is: 1. compile my single Java file from an Ant script (pointing at the Solr JARs from an exploded solr.war) 2. copy that class file's directory tree (com/jmsbrdy/LiveCoresHandler.class) to a lib in the root of my jetty install 3. add lib to Jetty's class path 4. add the Solr JARs from the exploded war to Jetty's class path 5. start the server Can you see any problems there? 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de Hi James! James Brady schrieb: There is *something* strange going on with classloaders; when I put my .class files in the right place in WEB-INF/lib in a repackaged solr.war file, it's not found by the plugin loader (Error loading class). So the plugin classloader isn't seeing stuff inside WEB-INF/lib. That explains why the plugin loader sees my class files when I point jetty.class.path at the right directory, but in that situation I also need to point jetty.class.path at the Solr JARs explicitly. you cannot be sure that it sees *your* files. It only sees a class that qualifies with the name that is requested in your code. It's obviously not the class the code expects, though - as it results in a ClassCastException at some point. It might help to have a look at where and why that casting went wrong. I wrote a custom EntityProcessor and deployed it first under WEB-INF/classes, and now in the plugin directory, and that worked without a problem. My first guess is that something with your packaging is wrong - what do you mean by default package? What is the full name of your class and how does its path in the file system look like? Can you paste the stack trace of the exception? Chantal Still, how would ClassCastExceptions be caused by class loader paths not being set correctly? I don't follow you... To get a ClassCastException, the class to cast to must have been found. The cast-to class must not be in the object's inheritance hierarchy, or be built against a different version, no? 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I guess this is a classloader issue. it is worth trying to put it in the WEB-INF/lib of the solr.war On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com wrote: Hi, the LiveCoresHandler is in the default package - the
Re: Wild card search does not return any result
Could it be the same reason as described here: http://markmail.org/message/ts65a6jok3ii6nva Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mohamed Parvez par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 11:26:45 AM Subject: Wild card search does not return any result Hello All, I have two fields. I have document(which has been indexed) that has a value of ICS for BUS field and SSE for ROLE filed When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics* i don't get any match (or result) when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the result. why BUS:ics* does not return any result ? I have the default configuration for text filed, see below. positionIncrementGap=100 ignoreCase=true words=stopwords.txt enablePositionIncrements=true / generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ Thanks/Regards, Parvez Note : This is a re-post. looks like something went wrong the first time around.
Re: Synonym aware string field typ
Yes, you need to specify one or the other then, index-time or query-time, depending on where you want your synonyms to kick in. Eh, hitting reply to this email used your personal email instead of solr-user@lucene.apache.org . Eh eh. Making it hard for people replying to keep the discussion on the list without doing extra work Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé jerome.et...@gmail.com To: Otis Gospodnetic otis_gospodne...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 12:39:33 PM Subject: Re: Synonym aware string field typ Hi Otis, Thanks. Yep, this synonym behaviour is the one I want. So if I don't want the synonyms to be applied at index time, I need to specify an index time analyzer right ? Jerome. 2009/8/4 Otis Gospodnetic : Hi, KeywordTokenizer will not tokenize your string. I have a feeling that won't work with synonyms, unless your field value entirely match a synonym. Maybe an example would help: If you have: foo canine bar Then KeywordTokenizer won't break this into 3 tokens. And then canine/dog synonym won't work. Yes, if you define the analyzer like that, it will be used both at index and query time. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 7:33:28 AM Subject: Synonym aware string field typ Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: tokenizerFactory=solr.KeywordTokenizerFactory synonyms=my_synonyms.txt ignoreCase=true/ My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: ClassCastException from custom request handler
Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the regular stable release, no svn checkout). 80-84 @SuppressWarnings(unchecked) protected T create( ResourceLoader loader, String name, String className, Node node ) throws Exception { return (T) loader.newInstance( className, getDefaultPackages() ); }
Re: Synonym aware string field typ
2009/8/4 Otis Gospodnetic otis_gospodne...@yahoo.com: Yes, you need to specify one or the other then, index-time or query-time, depending on where you want your synonyms to kick in. Ok great. Thx ! Eh, hitting reply to this email used your personal email instead of solr-user@lucene.apache.org . Eh eh. Making it hard for people replying to keep the discussion on the list without doing extra work It did the same for me with your message. I had to click 'reply all' . Maybe it's a gmail problem. J. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé jerome.et...@gmail.com To: Otis Gospodnetic otis_gospodne...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 12:39:33 PM Subject: Re: Synonym aware string field typ Hi Otis, Thanks. Yep, this synonym behaviour is the one I want. So if I don't want the synonyms to be applied at index time, I need to specify an index time analyzer right ? Jerome. 2009/8/4 Otis Gospodnetic : Hi, KeywordTokenizer will not tokenize your string. I have a feeling that won't work with synonyms, unless your field value entirely match a synonym. Maybe an example would help: If you have: foo canine bar Then KeywordTokenizer won't break this into 3 tokens. And then canine/dog synonym won't work. Yes, if you define the analyzer like that, it will be used both at index and query time. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jérôme Etévé To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 7:33:28 AM Subject: Synonym aware string field typ Hi all, I'd like to have a string type which is synonym aware at query time. Is it ok to have something like that: tokenizerFactory=solr.KeywordTokenizerFactory synonyms=my_synonyms.txt ignoreCase=true/ My questions are: - Will the index time analyzer stay the default for the type solr.StrField . - Is the KeywordTokenizerFactory the right one to use for the query time analyzer ? Cheers! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: ClassCastException from custom request handler
Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger can't tell me... Lot's of other handlers are created with no problem before my plugin falls over, so I don't think it's a problem with T not being what we expected. Do you know of any working examples of plugins I can download and build in my environment to see what happens? 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the regular stable release, no svn checkout). 80-84 @SuppressWarnings(unchecked) protected T create( ResourceLoader loader, String name, String className, Node node ) throws Exception { return (T) loader.newInstance( className, getDefaultPackages() ); } -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom
DisMax - fetching dynamic fields
Hi everybody, I have a couple of dynamic fields in my schema, e.g. rating_* popularity_* The problem I have is that if I try to specify existing fields rating_1 popularity_1 in fl parameter - DisMax handler just ignores them whereas StandardRequestHandler works fine. Any clues what's wrong? Thanks in advance, Alex
Re: DisMax - fetching dynamic fields
Solr 1.4 built from trunk revision 790594 ( 02 Jul 2009 ) On Tue, Aug 4, 2009 at 9:19 PM, Alexey Serbaase...@gmail.com wrote: Hi everybody, I have a couple of dynamic fields in my schema, e.g. rating_* popularity_* The problem I have is that if I try to specify existing fields rating_1 popularity_1 in fl parameter - DisMax handler just ignores them whereas StandardRequestHandler works fine. Any clues what's wrong? Thanks in advance, Alex
Re: ClassCastException from custom request handler
James Brady schrieb: Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger can't tell me... You could try disassembling. Or Eclipse opens classes in a very rudimentary format when there is no source code attached. Maybe it shows the actual return value there, instead of T. Lot's of other handlers are created with no problem before my plugin falls over, so I don't think it's a problem with T not being what we expected. Do you know of any working examples of plugins I can download and build in my environment to see what happens? No sorry. I've only overwritten the EntityProcessor from DataImportHandler, and that is not configured in solrconfig.xml. 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the regular stable release, no svn checkout). 80-84 @SuppressWarnings(unchecked) protected T create( ResourceLoader loader, String name, String className, Node node ) throws Exception { return (T) loader.newInstance( className, getDefaultPackages() ); } -- http://twitter.com/goodgravy 512 300 4210 http://webmynd.com/ Sent from Bury, United Kingdom
Re: DIH: Any way to make update on db table?
Excellent, thanks Avlesh and Noble. -Jay On Mon, Aug 3, 2009 at 9:28 PM, Avlesh Singh avl...@gmail.com wrote: datasource.getData(update mytable ); //though the name is getData() it can execute update commands also Even when the dataSource is readOnly, Noble? Cheers Avlesh 2009/8/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com If your are writing a Transformer (or any other component) you can get hold of a dataSource instance . datasource =Context#getDataSource(name). //then you can invoke datasource.getData(update mytable ); //though the name is getData() it can execute update commands also ensure that you do a datasource.close(); after you are done On Tue, Aug 4, 2009 at 9:40 AM, Avlesh Singhavl...@gmail.com wrote: Couple of things - 1. Your dataSource is probably in readOnly mode. It is possible to fire updates, by specifying readOnly=false in your dataSource. 2. What you are trying achieve, is typically done using a select for update. For MySql, here's the documentation - http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html 3. You don't need to create a separate entity for firing updates. Writing a database procedure might be a good idea. In that case your query will simply be entity name=mainEntity query=call MyProcedure(); .../. All the heavy lifting can be done by this query. Moreover, update queries, only return the number of rows affected and not a resultSet. DIH expects one and hence the exception. Cheers Avlesh On Tue, Aug 4, 2009 at 1:49 AM, Jay Hill jayallenh...@gmail.com wrote: Is it possible for the DataImportHandler to update records in the table it is querying? For example, say I have a query like this in my entity: query=select field1, field2, from someTable where hasBeenIndexed=false Is there a way I can mark each record processed by updating the hasBeenIndexed field? Here's a config I tried: ?xml version=1.0? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/solrhacks user=user password=pass/ document name=testingDIHupdate entity name=mainEntity pk=id query=select id, name from tableToIndex where hasBeenIndexed=0 field column=id template=dihTestUpdate-${main.id}/ field column=name name=name/ entity name=updateEntity pk=id query=update tableToIndex set hasBeenIndexed=1 where id=${mainEntity.id} /entity /entity /document /dataConfig It does update the first record, but then an Exception is thrown: Aug 3, 2009 1:15:24 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: mainEntity document : SolrInputDocument[{id=id(1.0)={1}, name=name(1.0)={John Jones}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: update tableToIndex set hasBeenIndexed=1 where id=1 Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:250) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:207) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:40) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:370) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:248) ... 12 more -Jay -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Wild card search does not return any result
Thanks Otis, The thread suggests that this is bug http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k Both SSE and ICS are 3 letter word and both are not part of English language. SEE* works fine and ICS* does not work, this is sure a bug. Any idea when will this bug be fixed or if there is any work around. Thanks/Regards, Parvez GV : 786-693-2228 On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Could it be the same reason as described here: http://markmail.org/message/ts65a6jok3ii6nva Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mohamed Parvez par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 11:26:45 AM Subject: Wild card search does not return any result Hello All, I have two fields. I have document(which has been indexed) that has a value of ICS for BUS field and SSE for ROLE filed When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics* i don't get any match (or result) when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the result. why BUS:ics* does not return any result ? I have the default configuration for text filed, see below. positionIncrementGap=100 ignoreCase=true words=stopwords.txt enablePositionIncrements=true / generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ Thanks/Regards, Parvez Note : This is a re-post. looks like something went wrong the first time around.
Dynamic Configuration
I have a client who is interested in using Solr/Lucene as their search engine. So far I think it meets 85% of their requirements. I have decided to integrate with JAMon tp provide statistical/performance analysis at run-time. The piece I am still missing is dynamic configuration of the indexing engine. Is it possible to problematically control such things as what fields are indexed based on content type, weights, etc? The key requirement is that these should be modifiable without restarting the server. I thought I may be able to provide this through JMX but these attributes seem to be read-only. Pete -- View this message in context: http://www.nabble.com/Dynamic-Configuration-tp24814729p24814729.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error with UpdateRequestProcessorFactory
Hi Shalin, On Tue, Aug 4, 2009 at 12:43 PM, Shalin Shekhar Mangarshalinman...@gmail.com wrote: I'm having some problem with a custom handler on my Solr. All the application works fine, but when I do a new checkout from svn and generate a jar file with my handler, I got: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory; I checked versions of my libs and they're ok. I'm using Solr 1.3 and the environment is the same that works previously. Are you using the released Solr 1.3 or some intermediate nightly build? The 1.3 release has SolrCore.getUpdateProcessorChain(String) method. You are ritght. I was using some nightly build. I changed to the released 1.3 and it works. Thanks! -- Daniel Cassiano _ Page: http://danielcassiano.net/ http://www.umitproject.org/
RE: facet sorting by index on sint fields
To solve this issue I created a subclass of SortableIntField that overrides the getSortField() method as follows... @Override public SortField getSortField(SchemaField field, boolean reverse) { return new SortField(field.getName(), SortField.INT, reverse); } I'm not really sure of the impact of this change but it seems to now do what I want. I'm curious as to why the SortableIntField supplied with SOLR uses SortField.STRING here. I found some references to it in solr-dev but no conclusions. If anyone has any thoughts about the impact of this change, or why it is not like this by default I'd be very interested to hear. Thanks, Simon -Original Message- From: Simon Stanlake [mailto:sim...@tradebytes.com] Sent: Thursday, July 30, 2009 7:28 PM To: 'solr-user@lucene.apache.org' Subject: facet sorting by index on sint fields Hi, I have a field in my schema specified using field name=wordCount type=sint/ Where sint is specified as follows (the default from schema.xml) fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ When I do a facet on this field using sort=index I always get the values back in lexicographic order. Eg: adding this to a query string... facet=truefacet.field=wordCountf.wordCount.facet.sort=index gives me lst name=wordCount int name=15/int int name=102/int int name=26/int ... Is this a current limitation of solr faceting or am I missing a configuration step somewhere? I couldn't find any notes in the docs about this. Cheers, Simon
Re: facet sorting by index on sint fields
On Thu, Jul 30, 2009 at 10:28 PM, Simon Stanlakesim...@tradebytes.com wrote: Hi, I have a field in my schema specified using field name=wordCount type=sint/ Where sint is specified as follows (the default from schema.xml) fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ When I do a facet on this field using sort=index I always get the values back in lexicographic order. Eg: adding this to a query string... facet=truefacet.field=wordCountf.wordCount.facet.sort=index gives me lst name=wordCount int name=15/int int name=102/int int name=26/int ... Is this a current limitation of solr faceting or am I missing a configuration step somewhere? I couldn't find any notes in the docs about this. This is not the intention - seems like a bug somewhere. Is it still broken in trunk? are you using distributed search? -Yonik http://www.lucidimagination.com
Re: facet sorting by index on sint fields
On Tue, Aug 4, 2009 at 5:27 PM, Yonik Seeleyyo...@lucidimagination.com wrote: Is this a current limitation of solr faceting or am I missing a configuration step somewhere? I couldn't find any notes in the docs about this. This is not the intention - seems like a bug somewhere. Is it still broken in trunk? are you using distributed search? OK, I just tried trunk with the example docs, with the popularity field indexed as both int (now trie based) and sint - both seem to work correctly. http://localhost:8983/solr/select?q=*:*facet=truefacet.field=popularityfacet.sort=lex -Yonik http://www.lucidimagination.com
RE: facet sorting by index on sint fields
Oh boy - I had a problem with my deploy scripts that was keeping an old version of the schema.xml file around. SortableIntField is working fine for me now. Sorry to waste everyone's time and thanks for the responses. Simon -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Tuesday, August 04, 2009 2:28 PM To: solr-user@lucene.apache.org Subject: Re: facet sorting by index on sint fields On Thu, Jul 30, 2009 at 10:28 PM, Simon Stanlakesim...@tradebytes.com wrote: Hi, I have a field in my schema specified using field name=wordCount type=sint/ Where sint is specified as follows (the default from schema.xml) fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ When I do a facet on this field using sort=index I always get the values back in lexicographic order. Eg: adding this to a query string... facet=truefacet.field=wordCountf.wordCount.facet.sort=index gives me lst name=wordCount int name=15/int int name=102/int int name=26/int ... Is this a current limitation of solr faceting or am I missing a configuration step somewhere? I couldn't find any notes in the docs about this. This is not the intention - seems like a bug somewhere. Is it still broken in trunk? are you using distributed search? -Yonik http://www.lucidimagination.com
Re: Dynamic Configuration
pgiesin wrote: I have a client who is interested in using Solr/Lucene as their search engine. So far I think it meets 85% of their requirements. I have decided to integrate with JAMon tp provide statistical/performance analysis at run-time. The piece I am still missing is dynamic configuration of the indexing engine. Is it possible to problematically control such things as what fields are indexed based on content type, weights, etc? The key requirement is that these should be modifiable without restarting the server. I thought I may be able to provide this through JMX but these attributes seem to be read-only. Pete Solr multicore might be an option. It has reload/swap/... commands to reload/switch SolrCore: http://wiki.apache.org/solr/CoreAdmin Koji
Re: Wild card search does not return any result
Hi, I doubt it's a bug. It's probably working correctly based on the config, etc., I just don't have enough details about the configuration, your request handler, query rewriting, the data in your index, etc. to tell you what exactly is happening. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mohamed Parvez par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 3:22:53 PM Subject: Re: Wild card search does not return any result Thanks Otis, The thread suggests that this is bug http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k Both SSE and ICS are 3 letter word and both are not part of English language. SEE* works fine and ICS* does not work, this is sure a bug. Any idea when will this bug be fixed or if there is any work around. Thanks/Regards, Parvez GV : 786-693-2228 On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Could it be the same reason as described here: http://markmail.org/message/ts65a6jok3ii6nva Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mohamed Parvez To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 11:26:45 AM Subject: Wild card search does not return any result Hello All, I have two fields. I have document(which has been indexed) that has a value of ICS for BUS field and SSE for ROLE filed When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics* i don't get any match (or result) when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the result. why BUS:ics* does not return any result ? I have the default configuration for text filed, see below. positionIncrementGap=100 ignoreCase=true words=stopwords.txt enablePositionIncrements=true / generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ Thanks/Regards, Parvez Note : This is a re-post. looks like something went wrong the first time around.
Re: eternal optimize interrupted
On Tue, Aug 4, 2009 at 6:04 AM, Thomas Kochtho...@koch.ro wrote: last evening we started an optimize over our solr index of 45GB. This morning the optimize was still running, discs spinning like crazy and de index directory has grew to 83GB. Hmmm, it was probably code to done given that 45*2=90. But with that size of an index, and given that solr/tomcat wasn't responsive, and that there was a lot of disk IO, perhaps the system was swapping? -Yonik http://www.lucidimagination.com
A Presentation on Building a Hadoop + Lucene System Architecture
Hey all, I just wanted to send a link to a presentation I made on how my company is building its entire core BI infrastructure around Hadoop, HBase, Lucene, and more. It features a decent amount of practical advice: from rules for approaching scalability problems, to why we chose certain aspects of the Hadoop Ecosystem. Perhaps you can use it as justification for their decisions, or as a jumping-off point to utilizing it in the real world. I hope you find it helpful! You can catch it at my blog: http://www.roadtofailure.com . There's also a few inflammatory articles, such as Social Media Kills the RDBMS. Ask me if you have any questions :) -- http://www.hadoopconsulting.com -- Making Hadoop and your web apps that use it scale http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Wild card search does not return any result
You read it incorrectly Parvez. The bug that Bill seem to have found out is with the analysis tool and NOT the search handler itself. Results in your case is as expected. Wildcard queries are not analyzed hence the inconsistency. A workaround is suggested, on the same thread, here - http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results Cheers Avlesh On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez par...@gmail.com wrote: Thanks Otis, The thread suggests that this is bug http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k Both SSE and ICS are 3 letter word and both are not part of English language. SEE* works fine and ICS* does not work, this is sure a bug. Any idea when will this bug be fixed or if there is any work around. Thanks/Regards, Parvez GV : 786-693-2228 On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Could it be the same reason as described here: http://markmail.org/message/ts65a6jok3ii6nva Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mohamed Parvez par...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 11:26:45 AM Subject: Wild card search does not return any result Hello All, I have two fields. I have document(which has been indexed) that has a value of ICS for BUS field and SSE for ROLE filed When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics* i don't get any match (or result) when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the result. why BUS:ics* does not return any result ? I have the default configuration for text filed, see below. positionIncrementGap=100 ignoreCase=true words=stopwords.txt enablePositionIncrements=true / generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ protected=protwords.txt/ Thanks/Regards, Parvez Note : This is a re-post. looks like something went wrong the first time around.
Re: JVM Heap utilization Memory leaks with Solr
Otis, Thank you for your response. I know there are a few variables here but the difference in memory utilization with and without shards somehow leads me to believe that the leak could be within Solr. I tried using a profiling tool - Yourkit. The trial version was free for 15 days. But I couldn't find anything of significance. Regards Rahul On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, A) There are no known (to me) memory leaks. I think there are too many variables for a person to tell you what exactly is happening, plus you are dealing with the JVM here. :) Try jmap -histo:live PID-HERE | less and see what's using your memory. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Rahul R rahul.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 4, 2009 1:09:06 AM Subject: JVM Heap utilization Memory leaks with Solr I am trying to track memory utilization with my Application that uses Solr. Details of the setup : -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0 - Hardware : 12 CPU, 24 GB RAM For testing during PSR I am using a smaller subset of the actual data that I want to work with. Details of this smaller sub-set : - 5 million records, 4.5 GB index size Observations during PSR: A) I have allocated 3.2 GB for the JVM(s) that I used. After all users logout and doing a force GC, only 60 % of the heap is reclaimed. As part of the logout process I am invalidating the HttpSession and doing a close() on CoreContainer. From my application's side, I don't believe I am holding on to any resource. I wanted to know if there are known issues surrounding memory leaks with Solr ? B) To further test this, I tried deploying with shards. 3.2 GB was allocated to each JVM. All JVMs had 96 % free heap space after start up. I got varying results with this. Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain. I split the 5 million index into 5 parts of 1 million each and used them as shards. After multiple users used the system and doing a force GC, around 94 - 96 % of heap was reclaimed in all the JVMs. Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On the other, I deployed the entire 5 million part index as one shard. After multiple users used the system and doing a gorce GC, around 76 % of the heap was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my application was running. This result further convinces me that my application can be absolved of holding on to memory resources. I am not sure how to interpret these results ? For searching, I am using Without Shards : EmbeddedSolrServer With Shards :CommonsHttpSolrServer In terms of Solr objects this is what differs in my code between normal search and shards search (distributed search) After looking at Case 1, I thought that the CommonsHttpSolrServer was more memory efficient but Case 2 proved me wrong. Or could there still be memory leaks in my application ? Any thoughts, suggestions would be welcome. Regards Rahul
Re: Dynamic Configuration
On Wed, Aug 5, 2009 at 12:59 AM, pgiesinpgie...@hubcitymedia.com wrote: I have a client who is interested in using Solr/Lucene as their search engine. So far I think it meets 85% of their requirements. I have decided to integrate with JAMon tp provide statistical/performance analysis at run-time. The piece I am still missing is dynamic configuration of the indexing engine. Is it possible to problematically control such things as what fields are indexed based on content type, weights, etc? The key requirement is that these should be modifiable without restarting the server. I thought I may be able to provide this through JMX but these attributes seem to be read-only. I don't think it is possible to change the behavior of the same field during runtime (it is not even advisable). But you can always write the data to a different field w/ the required attributes using an UpdateProcessor or you can write a new UpdaterequestHandler Pete -- View this message in context: http://www.nabble.com/Dynamic-Configuration-tp24814729p24814729.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com