Large Hdd-Space using during commit/optimize
Hello. i have ~37 Million Docs that i want to index. when i starte a full-import i importing only every 2 Million Docs, because of better controll over solr and space/heap so when i import 2 million docs and solr start the commit and the optimize my used disc-space jumps into the sky. reacten: solr restart and space the used space goes down. why is using solr so many space ? can i optimize that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1985807.html Sent from the Solr - User mailing list archive at Nabble.com.
ArrayIndexOutOfBoundsException for query with rows=0 and sort param
Hi, after an upgrade from solr-1.3 to 1.4.1 we're getting an ArrayIndexOutOfBoundsException for a query with rows=0 and a sort param specified: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:84) at org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1391) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:872) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) The query is e.g.: /select/?sort=popularity+descrows=0start=0q=foo When this is changed to rows=1 or when the sort param is removed the exception is gone and everything's fine. With a clean 1.4.1 installation (unzipped, started example and posted two documents as described in the tutorial) this issue is not reproducable. Does anyone have a clue what might be the reason for this and how we could fix this on the solr side? Of course - for a quick fix - I'll change our app so that there's no sort param specified when rows=0. Thanx cheers, Martin -- Martin Grotzke http://twitter.com/martin_grotzke
Re: Large Hdd-Space using during commit/optimize
On Mon, 29 Nov 2010 03:07 -0800, stockii st...@shopgate.com wrote: Hello. i have ~37 Million Docs that i want to index. when i starte a full-import i importing only every 2 Million Docs, because of better controll over solr and space/heap so when i import 2 million docs and solr start the commit and the optimize my used disc-space jumps into the sky. reacten: solr restart and space the used space goes down. why is using solr so many space ? can i optimize that ? What do you mean into the sky? What percentage increase are you seeing? I'd expect it to double at least. I've heard it suggested that you should have three times the usual space available for an optimise. Remember, when your index is optimising, you'll want to keep the original index online and available for searches, so you'll have at least two copies of your index on disk during an optimise. Also, it is my understanding that if you commit infrequently, you won't need to optimise immediately. There's nothing to stop you importing your entire corpus, then doing a single commit. That will leave you with only one segment (or at most two - one that existed before and was empty, and one containing all of your documents). The net result being you don't need to optimise at that point. Note - I'm no solr guru, so I could be wrong with some of the above - I'm happy to be corrected. Upayavira
Re: Large Hdd-Space using during commit/optimize
First, don't optimize after every chunk, it's just making extra work for your system. If you're using a 3.x or trunk build, optimizing doesn't do much for you anyway, but if you must, just optimize after your entire import is done. Optimizing will pretty much copy the old index into a new set of files, so you can expect your disk space to at least double because Solr/Lucene doesn't delete anything until it's sure that the optimize finished successfully. Imagine the consequence of deleting files as they were copied to save disk space. Now hit a program error, power glitch or ctrl-c. Your indexes would be corrupted. Best Erick On Mon, Nov 29, 2010 at 6:07 AM, stockii st...@shopgate.com wrote: Hello. i have ~37 Million Docs that i want to index. when i starte a full-import i importing only every 2 Million Docs, because of better controll over solr and space/heap so when i import 2 million docs and solr start the commit and the optimize my used disc-space jumps into the sky. reacten: solr restart and space the used space goes down. why is using solr so many space ? can i optimize that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1985807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about Solr SignatureUpdateProcessorFactory
Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Could be realized without huge programming? Best regards, Bernd Am 29.11.2010 14:30, schrieb Bernd Fehling: Dear list, a question about Solr SignatureUpdateProcessorFactory: for (String field : sigFields) { SolrInputField f = doc.getField(field); if (f != null) { *sig.add(field); Object o = f.getValue(); if (o instanceof String) { sig.add((String)o); } else if (o instanceof Collection) { for (Object oo : (Collection)o) { if (oo instanceof String) { sig.add((String)oo); } } } } } Why is also the field name (* above) added to the signature and not only the content of the field? By purpose or by accident? I would like to suggest removing the field name from the signature and not mixing it up. Best regards, Bernd
Re: question about Solr SignatureUpdateProcessorFactory
Why do you want to do this? It'd be the same value, just stored in multiple fields in the document, which seems a waste. What's the use-case you're addressing? Best Erick On Mon, Nov 29, 2010 at 8:51 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Could be realized without huge programming? Best regards, Bernd Am 29.11.2010 14:30, schrieb Bernd Fehling: Dear list, a question about Solr SignatureUpdateProcessorFactory: for (String field : sigFields) { SolrInputField f = doc.getField(field); if (f != null) { *sig.add(field); Object o = f.getValue(); if (o instanceof String) { sig.add((String)o); } else if (o instanceof Collection) { for (Object oo : (Collection)o) { if (oo instanceof String) { sig.add((String)oo); } } } } } Why is also the field name (* above) added to the signature and not only the content of the field? By purpose or by accident? I would like to suggest removing the field name from the signature and not mixing it up. Best regards, Bernd
Re: question about Solr SignatureUpdateProcessorFactory
On Monday 29 November 2010 14:51:33 Bernd Fehling wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Use copyField Could be realized without huge programming? Best regards, Bernd Am 29.11.2010 14:30, schrieb Bernd Fehling: Dear list, a question about Solr SignatureUpdateProcessorFactory: for (String field : sigFields) { SolrInputField f = doc.getField(field); if (f != null) { *sig.add(field); Object o = f.getValue(); if (o instanceof String) { sig.add((String)o); } else if (o instanceof Collection) { for (Object oo : (Collection)o) { if (oo instanceof String) { sig.add((String)oo); } } } } } Why is also the field name (* above) added to the signature and not only the content of the field? By purpose or by accident? I would like to suggest removing the field name from the signature and not mixing it up. Best regards, Bernd -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: question about Solr SignatureUpdateProcessorFactory
Am 29.11.2010 14:55, schrieb Markus Jelsma: On Monday 29 November 2010 14:51:33 Bernd Fehling wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Use copyField Ooooh yes, you are right. Could be realized without huge programming? Best regards, Bernd Am 29.11.2010 14:30, schrieb Bernd Fehling: Dear list, a question about Solr SignatureUpdateProcessorFactory: for (String field : sigFields) { SolrInputField f = doc.getField(field); if (f != null) { *sig.add(field); Object o = f.getValue(); if (o instanceof String) { sig.add((String)o); } else if (o instanceof Collection) { for (Object oo : (Collection)o) { if (oo instanceof String) { sig.add((String)oo); } } } } } Why is also the field name (* above) added to the signature and not only the content of the field? By purpose or by accident? I would like to suggest removing the field name from the signature and not mixing it up. Best regards, Bernd
Using Ngram and Phrase search
Hi, all I want to use both EdegeNGram analysis and phrase search. But there is some problem. On Field which is not using EdgeNGram analysis, phrase search.is good work. But if using EdgeNGram then phrase search is incorrect. Now I'm using Solr1.4.0. Result of EdgeNGram analysis for pci express is below. http://lucene.472066.n3.nabble.com/file/n1986848/before.jpg I thought cause is term position. So I modified EdgeNGramTokenFilter of lucene-analyzers-2.9.1. After modified, result is below. http://lucene.472066.n3.nabble.com/file/n1986848/after.jpg So phrase search fot pci express from ngram index is good work. But another problem is happend. For example, when I searh phrase query pc express, docs included 'pci express' are searched too. In this case I don't want to search for 'pci express'. I just want exact match pc express. Please give your ideas. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Ngram-and-Phrase-search-tp1986848p1986848.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Hot Backup
Hi all, How can I backup indexes Solr without stopping the server? I saw the following link: http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/CollectionDistribution but I'm afraid that running these scripts 'on the fly' indexes could be corrupted. Thanks, Piero.
search strangeness
Hi all. I have a little question. Can anyone explain, why this solr search work so strange? :) For example, I make schema.xml: I add some fields with fieldType = text. Here 'text' properties fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory splitOnNumerics=0 generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I copied to text field all my fields: copyField source=name dest=text/ copyField source=caption dest=text/ Then I add one document to my index. Here schema browser for field 'caption': _term___frequency_ |annual |1 | |golfer |1 | |tournament |1 | |welcom |1 | |3rd|1 | After that I tried to find this document by terms: annual - no results golfer - found document tournament - no results welcom - found document 3rd - no results I read a lot of forums, some books and http://wiki.apache.org/solr/ but it don't help me. Can anyone explain me, why solr search so strange? Or where is my problem? Thank you ... -- View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1986895.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Hot Backup
As I understand it, those tools are more Solr 1.3 related, but I don't see why they shouldn't work on 1.4. I would say it is very unlikely that you will corrupt an index with them. Lucene indexes are write once, that is, any one index file will never be updated, only replaced. This means that taking a backup is actually exceptionally easy, as (on Unix at least) you can create a copy of the index directory with hard links, which takes milliseconds, even for multi-gigabyte indexes. You just need to make sure you are not committing while you take your backup, and it looks like those tools will take care of that for you. Another perk is that your backups won't take any additional disk space (just the space for the directory data, not the files themselves). As your index changes, disk usage will gradually increase though. Upayavira On Mon, 29 Nov 2010 16:13 +0100, Rodolico Piero p.rodol...@vitrociset.it wrote: Hi all, How can I backup indexes Solr without stopping the server? I saw the following link: http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/CollectionDistribution but I'm afraid that running these scripts 'on the fly' indexes could be corrupted. Thanks, Piero.
BasicHelloRequestHandler plugin
Hi, Thank for helping us. I’m creating a ‘helloword’ plugin in Solr 1.4 in BasicHelloRequestHandler.java In solrconfig.xml, I added: requestHandler name=hello class=com.polyspot.mercury.handler.BasicHelloRequestHandler !-- default values for query parameters -- lst name=defaults str name=messageDefault message/str int name=anumber-10/int /lst /requestHandler I verified ‘hello’ plugin is figured well at: http://localhost:8983/solr/admin/plugins When I executed: http://localhost:8983/solr/select?qt=hello, the java.lang.AbstractMethodError raised: type Rapport d'état message null java.lang.AbstractMethodError at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:595) I supposed that handleRequest in the BasicHelloRequestHandler isn’t called. Here’s BasicHelloRequestHandler .java code: import com.polyspot.mercury.common.params.HelloParams; import org.apache.solr.common.SolrException; import org.apache.solr.common.params.SolrParams; import org.apache.solr.common.util.NamedList; import org.apache.solr.common.util.SimpleOrderedMap; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.request.SolrRequestHandler; import org.apache.solr.response.SolrQueryResponse; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.net.URL; /** * User: nguyenht * Date: 26 nov. 2010 */ public class BasicHelloRequestHandler implements SolrRequestHandler { protected static Logger log = LoggerFactory.getLogger(BasicHelloRequestHandler.class); protected NamedList initArgs = null; protected SolrParams defaults; /** * codeinit/code will be called just once, immediately after creation. * pThe args are user-level initialization parameters that * may be specified when declaring a request handler in * solrconfig.xml */ public void init(NamedList args) { log.info(initializing BasicHelloRequestHandler: + args); initArgs = args; if (args != null) { Object o = args.get(defaults); if (o != null o instanceof NamedList) { defaults = SolrParams.toSolrParams((NamedList) o); } } } /** * Handles a query request, this method must be thread safe. * p/ * Information about the request may be obtained from codereq/code and * response information may be set using codersp/code. * p/ * There are no mandatory actions that handleRequest must perform. * An empty handleRequest implementation would fulfill * all interface obligations. */ public void handleRequest(SolrQueryRequest solrQueryRequest, SolrQueryResponse solrQueryResponse) { log.info(handling request for BasicHelloRequestHandler: ); //get request params SolrParams params = solrQueryRequest.getParams(); String message = params.get(HelloParams.MESSAGE); if (message == null) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, message is mandatory); } log.info(get anumber ); Integer anumber = params.getInt(HelloParams.ANUMBER); if (anumber == null) { anumber = defaults.getInt(HelloParams.ANUMBER); } int messageLength = message.length(); //write response solrQueryResponse.add(yousaid, message); solrQueryResponse.add(message length, messageLength); solrQueryResponse.add(optionalNumber, anumber); } /* methods below are for JMX info */ public String getName() { return this.getClass().getName(); } public String getVersion() { return 1; //TODO implement this } public String getDescription() { return hello; //TODO implement this }
Preventing index segment corruption when windows crashes
Hi, With the advent of new windows versions, there are increasing instances of system blue-screens, crashes, freezes and ad-hoc failures. If a Solr index is running at the time of a system halt, this can often corrupt a segments file, requiring the index to be -fix'ed by rewriting the offending file. Aside from the vagaries of automating such fixes, depending on the mergeFactor, this can be quite a few documents permanently lost. Would anyone have any experience/wisdom/insight on ways to mitigate such corruption in Lucene/Solr - e.g. applying a temp file technique etc.; though perhaps not 'just use Linux'.. :-) There are of course, client-side measures that can hold some number of pending documents until they are truly committed, but a server-side/Lucene method would be perferable, if possible. Thanks, Peter
Re: search strangeness
On a quick look with Solr 3.1, these results are puzzling. Are you sure that you are searching the field you think you are? I take it you're searching the text field, but that's controlled by your defaultSearchField entry in schema.xml. Try using the admin page, particularly the full interface link and turn debugging on, that should give you a better idea of what is actually being searched. Another admin page that's very useful is the analysis page, that'll show you exactly what transformations are made to your terms at index and query time and why. I'm a little suspicious that you've put the stopword filter in a different place in the index and query process, but I doubt that is a problem. The analysis page will help with that too. But nothing really jumps out at me, if you don't get anywhere with the admin page, perhaps you can show us the field definitions for the name, caption and text fields (not the type, the actual field/field part of the schema). Also, please post the results of appending debugQuery=on to the request. Best Erick On Mon, Nov 29, 2010 at 10:06 AM, ramzesua michaelnaza...@gmail.com wrote: Hi all. I have a little question. Can anyone explain, why this solr search work so strange? :) For example, I make schema.xml: I add some fields with fieldType = text. Here 'text' properties fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory splitOnNumerics=0 generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I copied to text field all my fields: copyField source=name dest=text/ copyField source=caption dest=text/ Then I add one document to my index. Here schema browser for field 'caption': _term___frequency_ |annual |1 | |golfer |1 | |tournament |1 | |welcom |1 | |3rd|1 | After that I tried to find this document by terms: annual - no results golfer - found document tournament - no results welcom - found document 3rd - no results I read a lot of forums, some books and http://wiki.apache.org/solr/ but it don't help me. Can anyone explain me, why solr search so strange? Or where is my problem? Thank you ... -- View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1986895.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr DataImportHandler (DIH) and Cassandra
Is there anyway to use DIH to import from Cassandra? Thanks
bf for Dismax completly ignored by 'recip(ms(NOW,INDAT),3.16e-11,1,1)'
Hello, I got a problem that I'm unable to solve: As mentioned in the docs, I put in a recip(ms(NOW,INDAT),3.16e-11,1,1) at the boost-Function fielf bf. That is completly ignored by the dismax Search Handler. The dismax SearchHandler is set to be the default SearchHandler. If I post a solr/select?q={!boost b=recip(ms(NOW,INDAT),3.16e-11,1,1)}SearchTerm to the solr-Server, the request is answered as expected, while doing it with the php-Client completly fails. The solrconfig looks like: str name=bfrecip(ms(NOW,INDAT),3.16e-11,1,1)/str My someone has an idea? Thanks a lot! Ralf -- View this message in context: http://lucene.472066.n3.nabble.com/bf-for-Dismax-completly-ignored-by-recip-ms-NOW-INDAT-3-16e-11-1-1-tp1987228p1987228.html Sent from the Solr - User mailing list archive at Nabble.com.
Boost on newer documents
Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the same title but are older. Does anyone know how I can achieve this? Thank You Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: Boost on newer documents
Hi Jason, maybe, just use another field w/ creation-/modification-date and boost on this field? Regards Stefan On Mon, Nov 29, 2010 at 5:28 PM, Jason Brown jason.br...@sjp.co.uk wrote: Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the same title but are older. Does anyone know how I can achieve this? Thank You Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: Boost on newer documents
Hi Jason, You can use boost functions in the dismax handler to do this: http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 Mat On Mon, Nov 29, 2010 at 11:28, Jason Brown jason.br...@sjp.co.uk wrote: Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the same title but are older. Does anyone know how I can achieve this? Thank You Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
RE: Boost on newer documents
Great - Thank You. -Original Message- From: Mat Brown [mailto:m...@patch.com] Sent: Mon 29/11/2010 16:33 To: solr-user@lucene.apache.org Subject: Re: Boost on newer documents Hi Jason, You can use boost functions in the dismax handler to do this: http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 Mat On Mon, Nov 29, 2010 at 11:28, Jason Brown jason.br...@sjp.co.uk wrote: Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the same title but are older. Does anyone know how I can achieve this? Thank You Jason. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: Large Hdd-Space using during commit/optimize
aha okay. thx i dont know that solr copys the complete index for optimize. can i solr say, that he start an optimize, but wihtout copy ? -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1987477.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Large Hdd-Space using during commit/optimize
On Mon, 29 Nov 2010 08:43 -0800, stockii st...@shopgate.com wrote: aha okay. thx i dont know that solr copys the complete index for optimize. can i solr say, that he start an optimize, but wihtout copy ? No. The copy is to keep an index available for searches while the optimise is happening. Also, to allow for rollback should something go wrong with the optimise. The simplest thing is to keep your commits low (I suspect you could ingest 35m documents with just one commit at the end). In that case, optimisation is not required (optimisation is to reduce the number of segments in your index, and segments are created by commits. If you don't do many commits, you won't need to optimise - at least you won't at the point of initial ingestion. Upayavira
Re: Preventing index segment corruption when windows crashes
On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge peter.stu...@gmail.com wrote: If a Solr index is running at the time of a system halt, this can often corrupt a segments file, requiring the index to be -fix'ed by rewriting the offending file. Really? That shouldn't be possible (if you mean the index is truly corrupt - i.e. you can't open it). -Yonik http://www.lucidimagination.com
DIH causing shutdown hook executing?
Hi, I am in the process of trying to index about 50 mil documents using the data import handler. For some reason, about 2 days into the import, I see this message shutdown hook executing in the log and the solr web server instance exits gracefully. I do not see any errors in the entire log. This has happened twice now, usually 5 mil or so documents into the import process. Does anyone out there knows what this message mean? It's an INFO log message so I don't think it is caused by any error. Does this problem occur because the os is asking the server to shut down (for whatever reason) or is there something wrong with the server causing it to shutdown? Thanks for any help, Phong
Re: Solr Hot Backup
In Solr 1.4, I think the replication features should be able to accomplish your goal, and will be easier to use and more robust. On 11/29/2010 10:22 AM, Upayavira wrote: As I understand it, those tools are more Solr 1.3 related, but I don't see why they shouldn't work on 1.4. I would say it is very unlikely that you will corrupt an index with them. Lucene indexes are write once, that is, any one index file will never be updated, only replaced. This means that taking a backup is actually exceptionally easy, as (on Unix at least) you can create a copy of the index directory with hard links, which takes milliseconds, even for multi-gigabyte indexes. You just need to make sure you are not committing while you take your backup, and it looks like those tools will take care of that for you. Another perk is that your backups won't take any additional disk space (just the space for the directory data, not the files themselves). As your index changes, disk usage will gradually increase though. Upayavira On Mon, 29 Nov 2010 16:13 +0100, Rodolico Piero p.rodol...@vitrociset.it wrote: Hi all, How can I backup indexes Solr without stopping the server? I saw the following link: http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/CollectionDistribution but I'm afraid that running these scripts 'on the fly' indexes could be corrupted. Thanks, Piero.
R: Solr Hot Backup
Yes, I use the replication only for backup with this call: http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup It's work fine but the server must be always up... it's an http call... I tried also the script 'backup' but it creates hard links and are not recommended! -Messaggio originale- Da: Jonathan Rochkind [mailto:rochk...@jhu.edu] Inviato: lunedì 29 novembre 2010 19.22 A: solr-user@lucene.apache.org Oggetto: Re: Solr Hot Backup In Solr 1.4, I think the replication features should be able to accomplish your goal, and will be easier to use and more robust. On 11/29/2010 10:22 AM, Upayavira wrote: As I understand it, those tools are more Solr 1.3 related, but I don't see why they shouldn't work on 1.4. I would say it is very unlikely that you will corrupt an index with them. Lucene indexes are write once, that is, any one index file will never be updated, only replaced. This means that taking a backup is actually exceptionally easy, as (on Unix at least) you can create a copy of the index directory with hard links, which takes milliseconds, even for multi-gigabyte indexes. You just need to make sure you are not committing while you take your backup, and it looks like those tools will take care of that for you. Another perk is that your backups won't take any additional disk space (just the space for the directory data, not the files themselves). As your index changes, disk usage will gradually increase though. Upayavira On Mon, 29 Nov 2010 16:13 +0100, Rodolico Piero p.rodol...@vitrociset.it wrote: Hi all, How can I backup indexes Solr without stopping the server? I saw the following link: http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/CollectionDistribution but I'm afraid that running these scripts 'on the fly' indexes could be corrupted. Thanks, Piero.
Re: Spellcheck in solr-nutch integration
i solved the problemAll we need to modify schema file. Also the spellcheck index is created first when spellcheck.build=true - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p1988252.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH causing shutdown hook executing?
You're right, the OS is asking the server to shut down. In the default example under Jetty, this is a result of issuing a crtl-c. Is it possible that something is asking your server to quit? What servlet container are you running under? Does the Solr server run for more than this period if you're NOT indexing? And are you sure you have enough resources, especially disk space? On another note, I'm surprised that it's taking 2 days to index 5m documents. That's less than 30 docs/second and Solr should handle a considerably greater load than that. For whatever that's worth... And what version of Solr are you using? You may want to consider writing something in SolrJ to do your indexing, it'll provide you more flexible control over indexing than DIH.. Best Erick On Mon, Nov 29, 2010 at 1:20 PM, Phong Dais phong.gd...@gmail.com wrote: Hi, I am in the process of trying to index about 50 mil documents using the data import handler. For some reason, about 2 days into the import, I see this message shutdown hook executing in the log and the solr web server instance exits gracefully. I do not see any errors in the entire log. This has happened twice now, usually 5 mil or so documents into the import process. Does anyone out there knows what this message mean? It's an INFO log message so I don't think it is caused by any error. Does this problem occur because the os is asking the server to shut down (for whatever reason) or is there something wrong with the server causing it to shutdown? Thanks for any help, Phong
solr admin
Hello, is there any way to specify in the solr admin other than fields? and I'm nt talking about the full interface which is also very limited. like: score, fl, fq, ... and yes, I know that I can use the url... which indeed is not too handy. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: solr admin
I honestly don't understand what you're asking here. Specify what in solr admin other than fields? what is it you're trying to accomplish? Best Erick On Mon, Nov 29, 2010 at 2:56 PM, Papp Richard ccode...@gmail.com wrote: Hello, is there any way to specify in the solr admin other than fields? and I'm nt talking about the full interface which is also very limited. like: score, fl, fq, ... and yes, I know that I can use the url... which indeed is not too handy. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
special sorting
Hello, I have many pages with the same content in the search result (the result is the same for some of the cities from the same county)... which means that I have duplicate content. the filter query is something like: +locationId:(60 26a 39a) - for city with ID 60 and I get the same result for city with ID 62: +locationId:(62 26a 39a) (cityID, countyID, countryID) how could I use a sorting to have different docs order in results for different cities? (for the same city I need to have the same sort order always - it cannot be a simple random...) could I use somehow the cityID parameter as boost or score ? I tried but could't realise too much. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: DIH causing shutdown hook executing?
It is entirely possible that the server is asking solr to shutdown. I'll have to ask the admin. I'm running Solr-1.4 inside of Jetty. I definitely have enough disk space. I think I did notice solr shutting down while it was idle. I just disregarded it as a fluke... Perhaps there's something going on. I will try to run this inside of tomcat and see what happens. Not sure if this is related but I had to change the lockType to single instead of the default native. With native, I get a lock time out when starting up solr. I also have maxDocs set to 1. I did not want to have millions of uncommitted docs. I'm running under Linux RedHat. Regarding speed, the first million or so documents is done very quickly (maybe 3 hrs) but after that, things slows down tremendously. Thanks for the advice regarding solrj. I'll definitely look into that. P. On Mon, Nov 29, 2010 at 2:39 PM, Erick Erickson erickerick...@gmail.comwrote: You're right, the OS is asking the server to shut down. In the default example under Jetty, this is a result of issuing a crtl-c. Is it possible that something is asking your server to quit? What servlet container are you running under? Does the Solr server run for more than this period if you're NOT indexing? And are you sure you have enough resources, especially disk space? On another note, I'm surprised that it's taking 2 days to index 5m documents. That's less than 30 docs/second and Solr should handle a considerably greater load than that. For whatever that's worth... And what version of Solr are you using? You may want to consider writing something in SolrJ to do your indexing, it'll provide you more flexible control over indexing than DIH.. Best Erick On Mon, Nov 29, 2010 at 1:20 PM, Phong Dais phong.gd...@gmail.com wrote: Hi, I am in the process of trying to index about 50 mil documents using the data import handler. For some reason, about 2 days into the import, I see this message shutdown hook executing in the log and the solr web server instance exits gracefully. I do not see any errors in the entire log. This has happened twice now, usually 5 mil or so documents into the import process. Does anyone out there knows what this message mean? It's an INFO log message so I don't think it is caused by any error. Does this problem occur because the os is asking the server to shut down (for whatever reason) or is there something wrong with the server causing it to shutdown? Thanks for any help, Phong
Re: special sorting
Perhaps, depending on your domain logic you could use function queries to achieve that. http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function Regards, Tommaso 2010/11/29 Papp Richard ccode...@gmail.com Hello, I have many pages with the same content in the search result (the result is the same for some of the cities from the same county)... which means that I have duplicate content. the filter query is something like: +locationId:(60 26a 39a) - for city with ID 60 and I get the same result for city with ID 62: +locationId:(62 26a 39a) (cityID, countyID, countryID) how could I use a sorting to have different docs order in results for different cities? (for the same city I need to have the same sort order always - it cannot be a simple random...) could I use somehow the cityID parameter as boost or score ? I tried but could't realise too much. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Bad file descriptor Errors
Recently, we have started to get Bad file descriptor errors in one of our Solr instances. This instance is a searcher and its index is stored on a local SSD. The master however has it's index stored on NFS, which seems to be working fine, currently. I have tried restarting tomcat and bringing over the index fresh from the master (via snappull/snapinstall). Any help would be greatly appreciated. Thanks, John SEVERE: Exception during commit/optimize:java.lang.RuntimeException: java.io.FileNotFoundException: /u/solr/data/index/_w3vs.fnm (Bad file descriptor) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:371) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:512) at org.apache.solr.core.SolrCore.update(SolrCore.java:771) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:637) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) smime.p7s Description: S/MIME cryptographic signature
Re: DIH causing shutdown hook executing?
Try without autocommit or bump the limit up considerably to see if it changes the behavior. You should not be getting this kind of performance hit after the first million docs, so, it's probably worth exploring. See if you can find anything in your logs that indicates what's hogging the critical resource maybe? Best Erick On Mon, Nov 29, 2010 at 3:08 PM, Phong Dais phong.gd...@gmail.com wrote: It is entirely possible that the server is asking solr to shutdown. I'll have to ask the admin. I'm running Solr-1.4 inside of Jetty. I definitely have enough disk space. I think I did notice solr shutting down while it was idle. I just disregarded it as a fluke... Perhaps there's something going on. I will try to run this inside of tomcat and see what happens. Not sure if this is related but I had to change the lockType to single instead of the default native. With native, I get a lock time out when starting up solr. I also have maxDocs set to 1. I did not want to have millions of uncommitted docs. I'm running under Linux RedHat. Regarding speed, the first million or so documents is done very quickly (maybe 3 hrs) but after that, things slows down tremendously. Thanks for the advice regarding solrj. I'll definitely look into that. P. On Mon, Nov 29, 2010 at 2:39 PM, Erick Erickson erickerick...@gmail.com wrote: You're right, the OS is asking the server to shut down. In the default example under Jetty, this is a result of issuing a crtl-c. Is it possible that something is asking your server to quit? What servlet container are you running under? Does the Solr server run for more than this period if you're NOT indexing? And are you sure you have enough resources, especially disk space? On another note, I'm surprised that it's taking 2 days to index 5m documents. That's less than 30 docs/second and Solr should handle a considerably greater load than that. For whatever that's worth... And what version of Solr are you using? You may want to consider writing something in SolrJ to do your indexing, it'll provide you more flexible control over indexing than DIH.. Best Erick On Mon, Nov 29, 2010 at 1:20 PM, Phong Dais phong.gd...@gmail.com wrote: Hi, I am in the process of trying to index about 50 mil documents using the data import handler. For some reason, about 2 days into the import, I see this message shutdown hook executing in the log and the solr web server instance exits gracefully. I do not see any errors in the entire log. This has happened twice now, usually 5 mil or so documents into the import process. Does anyone out there knows what this message mean? It's an INFO log message so I don't think it is caused by any error. Does this problem occur because the os is asking the server to shut down (for whatever reason) or is there something wrong with the server causing it to shutdown? Thanks for any help, Phong
Good example of multiple tokenizers for a single field
I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory. I've found some references to using copyFields or NGrams but I can't quite grasp what the whole solution would look like. -- Jacob Elder @jelder (646) 535-3379
Termvector based result grouping / field collapsing?
I was just in a meeting where we discussed customer feedback on our website. One thing that the users would like to see is galleries where photos that are part of a set are grouped together under a single result. This is basically field collapsing. The problem I've got is that for most of our content, there's nothing to tie different photos together in a coherent way other than similar language in fields like the caption. Is it feasible to use termvector information to automatically group documents with similar (but not identical) data in one or more fields? Thanks, Shawn
Re: Good example of multiple tokenizers for a single field
You can use only one tokenizer per analyzer. You'd better use separate fields + fieldTypes for different languages. I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory. I've found some references to using copyFields or NGrams but I can't quite grasp what the whole solution would look like.
Re: Good example of multiple tokenizers for a single field
The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of dealing with mixed-language fields? On Mon, Nov 29, 2010 at 5:22 PM, Markus Jelsma markus.jel...@openindex.iowrote: You can use only one tokenizer per analyzer. You'd better use separate fields + fieldTypes for different languages. I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory. I've found some references to using copyFields or NGrams but I can't quite grasp what the whole solution would look like. -- Jacob Elder @jelder (646) 535-3379
Re: Good example of multiple tokenizers for a single field
On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder jel...@locamoda.com wrote: The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of dealing with mixed-language fields? maybe you should consider a tokenizer like StandardTokenizer, that works reasonably well for most languages.
Re: Good example of multiple tokenizers for a single field
StandardTokenizer doesn't handle some of the tokens we need, like @twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or Korean. Am I wrong about that? On Mon, Nov 29, 2010 at 5:31 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder jel...@locamoda.com wrote: The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of dealing with mixed-language fields? maybe you should consider a tokenizer like StandardTokenizer, that works reasonably well for most languages. -- Jacob Elder @jelder (646) 535-3379
Re: Good example of multiple tokenizers for a single field
On Mon, Nov 29, 2010 at 5:35 PM, Jacob Elder jel...@locamoda.com wrote: StandardTokenizer doesn't handle some of the tokens we need, like @twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or Korean. Am I wrong about that? it uses the unigram method for CJK ideographs... the CJKtokenizer just uses the bigram method, its just an alternative method. the whitespace doesnt work at all though, so give up on that!
Re: Good example of multiple tokenizers for a single field
You can only use one tokenizer on given field, I think. But a tokenizer isn't in fact the only thing that can tokenize, an ordinary filter can change tokenization too, so you could use two filters in a row. You could also write your own custom tokenizer that does what you want, although I'm not entirely sure if you turn exactly what you say into code it will actually do what you want, I think it's more complicated, I think you'll need a tokenizer that looks for contiguous blocks of bytes that are UTF-8 CJK and does one thing to them, and contiguous blocks of bytes that are not UTF8 CJK and does another thing to them; rather than just first do one to the whole string and then do another. Dealing with mixed language fields is tricky, I know of no general purpose good solutions, in part just because of the semantics involved. If you have some strings for the field you know are CJK, adn others you know are English, the easiest thing to do is NOT put them in the same field, but put them in different fields, and use dismax (for example) to search both fields on query. But if you can't even tell at index time which is which, or if you have strings that themselves include both CJK and English interspersed with each other, that might not work. For my own case, where everything is just interspersed in the fields and I don't really know what language it is, here's what I do, which is definitely not great for CJK, but is better than nothing: * As a tokenizer, I use the WhitespaceTokenizer. * Then I apply a custom filter that looks for CJK chars, and re-tokenizes any CJK chars into one-token-per-char. This custom filter was written by someone other than me; it is open source; but I'm not sure if it's actually in a public repo, or how well documented it is. I can put you in touch with the author to try and ask. There may also be a more standard filter other than the custom one I'm using that does the same thing? Jonathan Jonathan On 11/29/2010 5:30 PM, Jacob Elder wrote: The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of dealing with mixed-language fields? On Mon, Nov 29, 2010 at 5:22 PM, Markus Jelsma markus.jel...@openindex.iowrote: You can use only one tokenizer per analyzer. You'd better use separate fields + fieldTypes for different languages. I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory. I've found some references to using copyFields or NGrams but I can't quite grasp what the whole solution would look like.
Re: Good example of multiple tokenizers for a single field
On Mon, Nov 29, 2010 at 5:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: * As a tokenizer, I use the WhitespaceTokenizer. * Then I apply a custom filter that looks for CJK chars, and re-tokenizes any CJK chars into one-token-per-char. This custom filter was written by someone other than me; it is open source; but I'm not sure if it's actually in a public repo, or how well documented it is. I can put you in touch with the author to try and ask. There may also be a more standard filter other than the custom one I'm using that does the same thing? You are describing what standardtokenizer does.
RE: solr admin
in Solr admin (http://localhost:8180/services/admin/) I can specify something like: +category_id:200 +xxx:300 but how can I specify a sort option? sort:category_id+asc regards, Rich -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, November 29, 2010 22:00 To: solr-user@lucene.apache.org Subject: Re: solr admin I honestly don't understand what you're asking here. Specify what in solr admin other than fields? what is it you're trying to accomplish? Best Erick On Mon, Nov 29, 2010 at 2:56 PM, Papp Richard ccode...@gmail.com wrote: Hello, is there any way to specify in the solr admin other than fields? and I'm nt talking about the full interface which is also very limited. like: score, fl, fq, ... and yes, I know that I can use the url... which indeed is not too handy. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
RE: special sorting
Hmm, any clue how to use it? use the location_id somehow? thanks, Rich -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Monday, November 29, 2010 22:08 To: solr-user@lucene.apache.org Subject: Re: special sorting Perhaps, depending on your domain logic you could use function queries to achieve that. http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function Regards, Tommaso 2010/11/29 Papp Richard ccode...@gmail.com Hello, I have many pages with the same content in the search result (the result is the same for some of the cities from the same county)... which means that I have duplicate content. the filter query is something like: +locationId:(60 26a 39a) - for city with ID 60 and I get the same result for city with ID 62: +locationId:(62 26a 39a) (cityID, countyID, countryID) how could I use a sorting to have different docs order in results for different cities? (for the same city I need to have the same sort order always - it cannot be a simple random...) could I use somehow the cityID parameter as boost or score ? I tried but could't realise too much. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: Solr DataImportHandler (DIH) and Cassandra
The DataSource subclass route is what I will probably be interested in. Are there are working examples of this already out there? On 11/29/10 12:32 PM, Aaron Morton wrote: AFAIK there is nothing pre-written to pull the data out for you. You should be able to create your DataSource sub class http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html Using the Hector java library to pull data from Cassandra. I'm guessing you will need to consider how to perform delta imports. Perhaps using the secondary indexes in 0.7* , or maintaining your own queues or indexes to know what has changed. There is also the Lucandra project, not exactly what your after but may be of interest anyway https://github.com/tjake/Lucandra Hope that helps. Aaron On 30 Nov, 2010,at 05:04 AM, Mark static.void@gmail.com wrote: Is there anyway to use DIH to import from Cassandra? Thanks
RE: solr admin
in Solr admin (http://localhost:8180/services/admin/) I can specify something like: +category_id:200 +xxx:300 but how can I specify a sort option? sort:category_id+asc There is an [FULL INTERFACE] /admin/form.jsp link but it does not have sort option. It seems that you need to append it to your search url.
Re: solr admin
On Mon, Nov 29, 2010 at 8:02 PM, Ahmet Arslan iori...@yahoo.com wrote: in Solr admin (http://localhost:8180/services/admin/) I can specify something like: +category_id:200 +xxx:300 but how can I specify a sort option? sort:category_id+asc There is an [FULL INTERFACE] /admin/form.jsp link but it does not have sort option. It seems that you need to append it to your search url. Heh - yeah... that's an old interface, from the times when sort was specified along with the query. Can someone provide a patch to add a way to specify the sort? -Yonik http://www.lucidimagination.com
Re: Spell checking question from a Solr novice
On Mon, Oct 18, 2010 at 5:24 PM, Jason Blackerby jblacke...@gmail.comwrote: If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: Or, you know, correct the data :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: question about Solr SignatureUpdateProcessorFactory
: Why is also the field name (* above) added to the signature : and not only the content of the field? : : By purpose or by accident? It was definitely deliberate. This way if your signature fields are fieldA,fieldB,fieldC then these two documents... Doc1:fielda:XXX Doc1:fieldB:YYY Doc2:fieldB:XXX Doc2:fieldC:YYY ...don't wind up with identical signature alues : I would like to suggest removing the field name from the signature and : not mixing it up. As mentioned, in the typical case it's important that the field names be included in the signature, but i imagine there would be cases where you wouldn't want them included (like a simple concat Signature for building basic composite keys) I think the Signature API could definitely be enhanced to have additional methods for adding field names vs adding field values. wanna open an issue in Jira sith some suggestions and use cases? -Hoss
Re: search strangeness
Hi, Erick. There is defaultSearchField in my schema.xml. Can you give me your example of configure for text field ?(What filters do you use for index and for query) -- View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1989466.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Good example of multiple tokenizers for a single field
On 11/29/2010 3:15 PM, Jacob Elder wrote: I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in addition to WhitespaceTokenizerFactory. What I'd like to see is a CJK filter that runs after tokenization (whitespace in my case) and doesn't do anything but handle the CJK characters. If there are no CJK characters in the token, it should do nothing at all. The CJK tokenizer does a whole host of other things that I want to handle myself. Shawn