Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver
maybe encoding !? -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Could-not-load-driver-com-mysql-jdbc-Driver-tp2021616p2027138.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr -File Based Spell Check
Hi. As I know, for file based spellcheck you need: - configure you spellcheck seach component in solrconfig.xml, for example: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellcheckerFile/str /lst /searchComponent - then you must get or form spellings.txt, for example: abaft abalone abalones abandon abandoned abandonedly ... (each correct word in new line) - after that you must build you file to index: http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build for build try do to this: http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=true After that you can use spellcheck in your search, for example: http://solr:8983/solr/select?q=bingospellcheck=true Try this, if there is some errors, post here.. P.S. please, read http://wiki.apache.org/solr/SpellCheckComponent for more information -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-File-Based-Spell-Check-tp2025671p2027258.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr -File Based Spell Check
Are you sure you want spellcheck/autosuggest? Because what you're talking about almost sounds like synonyms. Best Erick On Mon, Dec 6, 2010 at 1:37 AM, rajini maski rajinima...@gmail.com wrote: How does the solr file based spell check work? How do we need to enter data in the spelling.txt...I am not clear about its functionality..If anyone know..Please reply. I want to index a word = Wear But while searching I search as =Dress I want to get results for Wear.. How do i apply this functionality.. Awaiting Reply
Re: Solr -File Based Spell Check
Yeah.. I wanna use this Spell-check only.. I want to create myself the dictionary.. And give it as input to solr.. Because my indexes also have mis-spelled content and so I want solr to refer this file and not autogenrated. How do i get this done? I will try the spell check as suggested by michael... One more main thing I wanted to know is, how to extract the dictionary generated by default.? How do i read this .cfs files generated in index folder.. Please reply if you know anything related to this.. Awaiting reply On Mon, Dec 6, 2010 at 7:33 PM, Erick Erickson erickerick...@gmail.comwrote: Are you sure you want spellcheck/autosuggest? Because what you're talking about almost sounds like synonyms. Best Erick On Mon, Dec 6, 2010 at 1:37 AM, rajini maski rajinima...@gmail.com wrote: How does the solr file based spell check work? How do we need to enter data in the spelling.txt...I am not clear about its functionality..If anyone know..Please reply. I want to index a word = Wear But while searching I search as =Dress I want to get results for Wear.. How do i apply this functionality.. Awaiting Reply
Re: FastVectorHighlighter ignoring fragmenter parameter . . .
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom extensions of *FragmentsBuilder http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragmentsBuilder.html **FragListBuilder http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragListBuilder.html *interfaces to take in and apply the regex. I would be happy to contribute back what I create. Appreciate whatever guidance you can offer, Christopher On 2:59 PM, Koji Sekiguchi wrote: (10/12/05 5:53), CRB wrote: Got the FVH to work in Solr 3.1 (or at least I presume I have given I can see multi-color highlighting in the output.) But I am not able to get it to recognize the regex fragmenter. I get no change in output if I specify the fragmenter. In fact, I can even enter bogus names for the fragmenter and get no change in the output. Grateful for any suggestions. Settings and output below. Christopher *Query* http://localhost:8983/solr/10k-Fragments/select? q=content%3Aliquidity rows=100 fl=id%2Ccontent qt=standard hl.fl=content hl.useFastVectorHighlighter=true hl=true hl.fragmentsBuilder=colored hl.fragmenter=regex Christopher, Because algorithm of FVH is totally different from (traditional) highlighter, FVH doesn't see hl.fragmenter and hl.formatter, but see hl.fragListBuilder and hl.fragmentsBuilder instead. I think your settings and request/response looks good except hl.fragmenter=regex. FVH simply ignores the parameter. Koji
Re: Query performance very slow even after autowarming
* Do you use EdgeNGramFilter in index analyzer only? Or you also use it on query side as well? * What if you create additional field first_letter (string) and put first character/characters (multivalued?) there in your external processing code. And then during search you can filter all documents that start with letter a using fq=a filter query. Would that solve your performance problems? * It makes sense to specify what are you trying to achieve and probably more people can help you with that. On Fri, Dec 3, 2010 at 10:47 AM, johnnyisrael johnnyi.john...@gmail.com wrote: Hi, I am using edgeNgramFilterfactory on SOLR 1.4.1 [filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 /] for my indexing. Each document will have about 5 fields in it and only one field is indexed with EdgeNGramFilterFactory. I have about 1.4 million documents in my index now and my index size is approx 296MB. I made the field that is indexed with EdgeNGramFilterFactory as default search field. All my query responses are very slow, some of them taking more than 10seconds to respond. All my query responses are very slow, Queries with single letters are still very slow. /select/?q=m So I tried query warming as follows. listener event=newSearcher class=solr.QuerySenderListener arr name=queries lststr name=qa/str/lst lststr name=qb/str/lst lststr name=qc/str/lst lststr name=qd/str/lst lststr name=qe/str/lst lststr name=qf/str/lst lststr name=qg/str/lst lststr name=qh/str/lst lststr name=qi/str/lst lststr name=qj/str/lst lststr name=qk/str/lst lststr name=ql/str/lst lststr name=qm/str/lst lststr name=qn/str/lst lststr name=qo/str/lst lststr name=qp/str/lst lststr name=qq/str/lst lststr name=qr/str/lst lststr name=qs/str/lst lststr name=qt/str/lst lststr name=qu/str/lst lststr name=qv/str/lst lststr name=qw/str/lst lststr name=qx/str/lst lststr name=qy/str/lst lststr name=qz/str/lst /arr /listener The same above is done for firstSearcher as well. My cache settings are as follows. filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=1024/ documentCache class=solr.LRUCache size=16384 initialSize=16384 / Still after query warming, few single character search is taking up to 3 seconds to respond. Am i doing anything wrong in my cache setting or autowarm setting or am i missing anything here? Thanks, Johnny -- View this message in context: http://lucene.472066.n3.nabble.com/Query-performance-very-slow-even-after-autowarming-tp2010384p2010384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimports response returns before done?
After issueing a dataimport, I've noticed solr returns a response prior to finishing the import. Is this correct? Is there anyway i can make solr not return until it finishes? Yes, you can add synchronous=true to your request. But be aware that it could take a long time and you can see http timeout exception. If not, how do I ping for the status whether it finished or not? See command=status On Fri, Dec 3, 2010 at 8:55 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, After issueing a dataimport, I've noticed solr returns a response prior to finishing the import. Is this correct? Is there anyway i can make solr not return until it finishes? If not, how do I ping for the status whether it finished or not? thanks, tri
How to get all the search results?
Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: Syncing 'delta-import' with 'select' query
Hey Juan, It seems that DataImportHandler is not a right tool for your scenario and you'd better use Solr XML update protocol. * http://wiki.apache.org/solr/UpdateXmlMessages You still can work around your outdated GUI view problem with calling DIH synchronously, by adding synchronous=true to your request. But it won't solve the problem with two parallel requests from two users to single DIH request handler, because DIH doesn't support that, and if previous request is still running it bounces the second request. HTH, Alex On Fri, Dec 3, 2010 at 10:33 PM, Juan Manuel Alvarez naici...@gmail.com wrote: Hello everyone! I would like to ask you a question about DIH. I am using a database and DIH to sync against Solr, and a GUI to display and operate on the items retrieved from Solr. When I change the state of an item through the GUI, the following happens: a. The item is updated in the DB. b. A delta-import command is fired to sync the DB with Solr. c. The GUI is refreshed by making a query to Solr. My problem comes between (b) and (c). The delta-import operation is executed in a new thread, so my call returns immediately, refreshing the GUI before the Solr index is updated causing the item state in the GUI to be outdated. I had two ideas so far: 1. Querying the status of the DIH after the delta-import operation and do not return until it is idle. The problem I see with this is that if other users execute delta-imports, the status will be busy until all operations are finished. 2. Use Zoie. The first problem is that configuring it is not as straightforward as it seems, so I don't want to spend more time trying it until I am sure that this will solve my issue. On the other hand, I think that I may suffer the same problem since the delta-import is still firing in another thread, so I can't be sure it will be called fast enough. Am I pointing on the right direction or is there another way to achieve my goal? Thanks in advance! Juan M.
Re: How to get all the search results?
Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr User solr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: How to get all the search results?
With dismax, I didn't get any results with *:*. I did the query with these options (q is empty) and got the full rowcount: q=rows=0qt=dismax I have q.alt defined in my dismax handler as *:*, don't know if that is required or not. Shawn On 12/6/2010 9:17 AM, Savvas-Andreas Moysidis wrote: Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr Usersolr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: How to get all the search results?
for dismax just pass an empty query all q= or none at all Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr Usersolr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User -- http://jetwick.com twitter search prototype
Index version on slave nodes
Hi, The indexversion command in the replicationHandler on slave nodes returns 0 for indexversion and generation while the details command does return the correct information. I haven't found an existing ticket on this one although https://issues.apache.org/jira/browse/SOLR-1573 has similarities. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Stored field value modification
Hello, Is it possible to manipulate the value of a field before it is stored? I'm indexing a database where some field contain raw HTML, including named character entities. Using solr.HTMLStripCharFilterFactory on the index analyzer, results in this HTML being correctly stripped, and named character entities replaced by the corresponding characters, in the index (as verified when searching, and with Luke). But, the stored values of the documents are stored unmodified, so the result sets, including highlights, contain HTML tags (that are escaped) and entities (where the leading '' is also escaped) which make handling the results quite difficult. So, is it possible to apply some filters to the data before it is stored in the non-indexed fields? I couldn't find a part of the documentation that said whether it was possible or not; I did find this message in the archives of this list: From: Noble Paul Sent: Tuesday, March 31, 2009 5:41 PM Subject: Re: indexed fields vs stored fields indexed = can be searched (mean you can use this to query). This undergoes tokenization filter etc stored = can be retrieved. No modification to the data. This is stored verbatim which seems to say that it is not possible; but maybe things have changed since then? Any other idea? given that: - I have zero control over what is stored in the database - using the Solr XML update protocol i could probably transform the data before sending it - ... but I'd much rather continue using DataImportHandler to access the database Thanks, Regards, EB
Re: Stored field value modification
Hi, You can create a custom update request processor [1] to strip unwanted input as it is about to enter the index. [1]: http://wiki.apache.org/solr/UpdateRequestProcessor Cheers, On Monday 06 December 2010 17:36:09 Emmanuel Bégué wrote: Hello, Is it possible to manipulate the value of a field before it is stored? I'm indexing a database where some field contain raw HTML, including named character entities. Using solr.HTMLStripCharFilterFactory on the index analyzer, results in this HTML being correctly stripped, and named character entities replaced by the corresponding characters, in the index (as verified when searching, and with Luke). But, the stored values of the documents are stored unmodified, so the result sets, including highlights, contain HTML tags (that are escaped) and entities (where the leading '' is also escaped) which make handling the results quite difficult. So, is it possible to apply some filters to the data before it is stored in the non-indexed fields? I couldn't find a part of the documentation that said whether it was possible or not; I did find this message in the archives of this list: From: Noble Paul Sent: Tuesday, March 31, 2009 5:41 PM Subject: Re: indexed fields vs stored fields indexed = can be searched (mean you can use this to query). This undergoes tokenization filter etc stored = can be retrieved. No modification to the data. This is stored verbatim which seems to say that it is not possible; but maybe things have changed since then? Any other idea? given that: - I have zero control over what is stored in the database - using the Solr XML update protocol i could probably transform the data before sending it - ... but I'd much rather continue using DataImportHandler to access the database Thanks, Regards, EB -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Taxonomy and Faceting
I have been digging through the user lists for Solr and Nutch, as well as reading lots of blogs, etc. I have yet to find a clear answer (maybe there is none ) I am trying to find the best way ahead for choosing a technology that will allow the ability to use a large taxonomy for classifying structured and unstructured data and then displaying those categorizations as facets to the user during search. There seems to be several approaches, some of which make use of index time for encoding the terms found in the text, but I have seen no mention of HOW to get those terms from the text. Some sort of text classification software I am assuming. If this is true, are there any good open source engines that can process text against a taxonomy? The other approach seems to be two patches being developed for Solr 3.0, 792 and 64. Again, I think you would have to have some sort of an engine to give you this information that could then be added at index time. I have also seen some interesting literature on using Drupal and the Solr module. My current architecture uses Nutch (1.2) for crawling, solrindex for inexing (Solr 1.4.1), and Ajax Solr for my UI. I have also seen some talk in webinars/etc from Lucid Imagination about upcoming development on Native Taxonomy Facets, any idea where that development stands? I have to use the most stable version of Solr/Nutch/Lucene possible for my implementation, because, unfortunately, once I choose, going back will be next to impossible for years to come! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2028442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get all the search results?
ahhh, right..in dismax, you pre-define the fields that will be searched upon is that right? is it also true that the query is parsed and all special characters escaped? On 6 December 2010 16:25, Peter Karich peat...@yahoo.de wrote: for dismax just pass an empty query all q= or none at all Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr Usersolr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User -- http://jetwick.com twitter search prototype
Re: Stored field value modification
- I have zero control over what is stored in the database - using the Solr XML update protocol i could probably transform the data before sending it - ... but I'd much rather continue using DataImportHandler to access the database If you are already using DIH, http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer can do what you want.
Re: Index version on slave nodes
I think this is expected behavior. You have to issue the details command to get the real indexversion for slave machines. Thanks, Xin On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, The indexversion command in the replicationHandler on slave nodes returns 0 for indexversion and generation while the details command does return the correct information. I haven't found an existing ticket on this one although https://issues.apache.org/jira/browse/SOLR-1573 has similarities. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Dynamically filtering results based on score
I've seen references to score filtering in the list archives with frange being the suggested solution, but I have a slightly different problem that I don't think frange will solve. I basically want to drop a portion of the results based on their score in relation to the other scores in the result set. I've found that some queries produce poor results because they are matching solely based on a field with a very low boost (a product description in my case). Looking at the scores it's very obvious when the result set transitions from good matches to just those pulled in by the description. I've come up with a solution on the client side of things, but need to move this to running within solr because it doesn't play well with facets (facet data is still returned for products that I'm stripping out). The basic approach is to keep a running average of the highest scores, and when a document's score is off by an order of magnitude drop it and everything else (assuming everything is sorted by score desc). This approach seems to work well because in some cases when users just enter 'long tail' terms I want results to still be returned, which a static lower bound in frange won't accommodate. Does anyone have any suggestions for an approach to this? It doesn't look like a filter has access to the scores. It doesn't look like I can subclass SolrIndexSearcher as a number of its methods are private and can't be overridden. It doesn't look like I can modify the ResponseBuilder's results docset after the query but before faceting is applied because I don't have access to the scorer (at least in a SearchComponent). I'm out of ideas for now. Thanks for any assistance, Bryan
DIH - rdbms to index confusion
I'm new to solr (and indexing in general) and am having a hard time making the transition from rdbms to indexing in terms of the DIH/data-config.xml file. I've successfully created a working index (so far) for the simple queries in my db, but I'm struggling to add a more complex query. When I say simple I mean one or two tables and when I say complex I'm referring to 3 plus. I have a table that contains the data values I'm wanting to return when someone makes a search. This table has, in addition to the data values, 3 id's (FKs) pointing to the data/info that I'm wanting the users to be able to search on (while also returning the data values). The general rdbms query would be something like: select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d where g.id=f.gar_id and c.id=f.cat_id and d.id=f.dub_id I tried following the item_category entity used in the DIH example here: http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example and am struggling to get it to work. My current attempt looks like (entity translated to the above rdbms query): dataConfig dataSource / document entity ...simple query-working for main entity, cat field ... / entity name=foo query=SELECT gar_id FROM foo WHERE cat_id='${cat.id}' entity name=gar query=SELECT name FROM gar WHERE id='${f.gar_id}' field column=name name=g_name / /entity entity name=dub query=SELECT name FROM dub WHERE id='${f.dub_id}' field column=name name=dub_name / /entity field column=value name=f_value / /entity other working entities /entity /document /dataConfig I'm getting some of the data/info back, but it's not what I am expecting. I'm hoping for/expecting a document/record to look like: cat_name 1 : g_name 1 : dub_name 1 : f_value 1 cat_name 1 : g_name 1 : dub_name 2 : f_value 2 cat_name 1 : g_name 2 : dub_name 1 : f_value 1 cat_name 1 : g_name 2 : dub_name 2 : f_value 2 cat_name 2 : g_name 1 : dub_name 1 : f_value 1 cat_name 2 : g_name 1 : dub_name 2 : f_value 2 cat_name 2 : g_name 2 : dub_name 1 : f_value 1 cat_name 2 : g_name 2 : dub_name 2 : f_value 2 (All but the values are showing up in the index in some form) Any suggestions on where my logic is failing? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-rdbms-to-index-confusion-tp2028543p2028543.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stored field value modification
2010/12/6 Ahmet Arslan iori...@yahoo.com: If you are already using DIH, http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer can do what you want. Indeed it can. Many thanks.
Re: Question about Solr Fieldtypes, Chaining of Tokenizers
Yes, that's my conclusion as well Grant. As for the example output: The symposium of TgThe(RX3fg+and) gene studies Should end up tokenizing to: symposium tg the rx3fg and gene studi Assuming I guessed right on the stemming. Anyhow, thanks for the confirmation guys. Matt On 12/4/2010 8:18 PM, Grant Ingersoll wrote: Could you expand on your example and show the output you want? FWIW, you could simply write a token filter that does the same thing as the WhitespaceTokenizer. -Grant On Dec 3, 2010, at 1:14 PM, Matthew Hall wrote: Hey folks, I'm working with a fairly specific set of requirements for our corpus that needs a somewhat tricky text type for both indexing and searching. The chain currently looks like this: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=(.*?)(\p{Punct}*)$ replacement=$1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.PatternReplaceFilterFactory pattern=\p{Punct} replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ Now you will notice that I'm trying to add in a second tokenizer to this chain at the very end, this is due to the final replacement of punctuation to whitespace. At that point I'd like to further break up these tokens to smaller tokens. The reason for this is that we have a mixed normal english word and scientific corpus. For example you could expect string like The symposium of TgThe(RX3fg+and) gene studies being added to the index, and parts of those phrases being searched on. We want to be able to remove the stopwords in the mostly english parts of these types of statements, which the whitespace tokenizer, followed by removing trailing punctuation, followed by the stopfilter takes care of. We do not want to remove references to genetic information contained in allele symbols and the like. Sadly as far as I can tell, you cannot chain tokenizers in the schema.xml, so does anyone have some suggestions on how this could be accomplished? Oh, and let me add that the WordDelimiterFilter comes really close to what I want, but since we are unwilling to promote our solr version to the trunk (we are on the 1.4x) version atm, the inability to turn off the automatic phrase queries makes it a no go. We need to be able to make searches on left/right match right/left. My searches through the old material on this subject isn't really showing me much except some advice on using the copyField attribute. But my understanding is that this will simply take your original input to the field, and then analyze it in two different ways depending on the field definitions. It would be very nice if it were copying the already analyzed version of the text... but that's not what its doing, right? Thanks for any advice on this matter. Matt -- Grant Ingersoll http://www.lucidimagination.com
Using Saxon 9 as a response writer with Solr 3.1 . . ?
Has anyone been able to get Saxon 9 working with Solr3.1? I was following the wiki page (http://wiki.apache.org/solr/XsltResponseWriter), placing all the saxon-*.jars are in Jetty's lib/ext folder and start with java -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl -jar start.jar But get an ugly dump of errors from Jetty: 2010-12-06 13:29:16.515::WARN: failed SolrRequestFilter java.lang.NoSuchMethodError: net.sf.saxon.dom.DOMEnvelope.getInstance()Lnet/sf/saxon/dom/DOMEnvelope; at net.sf.saxon.java.JavaPlatform.initialize(JavaPlatform.java:43) at net.sf.saxon.Configuration.init(Configuration.java:392) at net.sf.saxon.Configuration.init(Configuration.java:311) at net.sf.saxon.xpath.XPathFactoryImpl.makeConfiguration(XPathFactoryImpl.java:41) at net.sf.saxon.xpath.XPathFactoryImpl.init(XPathFactoryImpl.java:26) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) at javax.xml.xpath.XPathFactoryFinder.loadFromService(Unknown Source) at javax.xml.xpath.XPathFactoryFinder._newFactory(Unknown Source) at javax.xml.xpath.XPathFactoryFinder.newFactory(Unknown Source) at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at org.apache.solr.core.Config.clinit(Config.java:50) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:68) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) at org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:153) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:94) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115)
Re: FastVectorHighlighter ignoring fragmenter parameter . . .
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom extensions of FragmentsBuilder FragListBuilder interfaces to take in and apply the regex. I would be happy to contribute back what I create. Appreciate whatever guidance you can offer, Christopher
Re: DIH - rdbms to index confusion
I have a table that contains the data values I'm wanting to return when someone makes a search. This table has, in addition to the data values, 3 id's (FKs) pointing to the data/info that I'm wanting the users to be able to search on (while also returning the data values). The general rdbms query would be something like: select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d where g.id=f.gar_id and c.id=f.cat_id and d.id=f.dub_id You can put this general rdbms query as is into single DIH entity - no need to split it. You would probably want to split it if your main table has one to many relation with other tables, so you can't retrieve all the data and have single result set row per Solr document.
Re: Syncing 'delta-import' with 'select' query
Alex: Thanks for the quick reply. When you say two parallel requests from two users to single DIH request handler, what do you mean by request handler? Are you refering to the HTTP request? Would that mean that if I make the request from different HTTP sessions it would work? Cheers! Juan M. On Mon, Dec 6, 2010 at 1:12 PM, Alexey Serba ase...@gmail.com wrote: Hey Juan, It seems that DataImportHandler is not a right tool for your scenario and you'd better use Solr XML update protocol. * http://wiki.apache.org/solr/UpdateXmlMessages You still can work around your outdated GUI view problem with calling DIH synchronously, by adding synchronous=true to your request. But it won't solve the problem with two parallel requests from two users to single DIH request handler, because DIH doesn't support that, and if previous request is still running it bounces the second request. HTH, Alex On Fri, Dec 3, 2010 at 10:33 PM, Juan Manuel Alvarez naici...@gmail.com wrote: Hello everyone! I would like to ask you a question about DIH. I am using a database and DIH to sync against Solr, and a GUI to display and operate on the items retrieved from Solr. When I change the state of an item through the GUI, the following happens: a. The item is updated in the DB. b. A delta-import command is fired to sync the DB with Solr. c. The GUI is refreshed by making a query to Solr. My problem comes between (b) and (c). The delta-import operation is executed in a new thread, so my call returns immediately, refreshing the GUI before the Solr index is updated causing the item state in the GUI to be outdated. I had two ideas so far: 1. Querying the status of the DIH after the delta-import operation and do not return until it is idle. The problem I see with this is that if other users execute delta-imports, the status will be busy until all operations are finished. 2. Use Zoie. The first problem is that configuring it is not as straightforward as it seems, so I don't want to spend more time trying it until I am sure that this will solve my issue. On the other hand, I think that I may suffer the same problem since the delta-import is still firing in another thread, so I can't be sure it will be called fast enough. Am I pointing on the right direction or is there another way to achieve my goal? Thanks in advance! Juan M.
Re: Syncing 'delta-import' with 'select' query
When you say two parallel requests from two users to single DIH request handler, what do you mean by request handler? I mean DIH. Are you refering to the HTTP request? Would that mean that if I make the request from different HTTP sessions it would work? No. It means that when you have two users that simultaneously changed two objects in the UI then you have two HTTP requests to DIH to pull changes from the db into Solr index. If the second request comes when the first is not fully processed then the second request will be rejected. As a result your index would be outdated (w/o the latest update) until the next update.
Re: Taxonomy and Faceting
I'm unsure but maybe you mean something like clustering? Then carrot^2 can do this (at index time I think): http://search.carrot2.org/stable/search?query=jetwickview=visu (There is a plugin for solr) Or do you already know the categories of your docs. E.g. you already have a category tree and associated documents? Regards, Peter. I have been digging through the user lists for Solr and Nutch, as well as reading lots of blogs, etc. I have yet to find a clear answer (maybe there is none ) I am trying to find the best way ahead for choosing a technology that will allow the ability to use a large taxonomy for classifying structured and unstructured data and then displaying those categorizations as facets to the user during search. There seems to be several approaches, some of which make use of index time for encoding the terms found in the text, but I have seen no mention of HOW to get those terms from the text. Some sort of text classification software I am assuming. If this is true, are there any good open source engines that can process text against a taxonomy? The other approach seems to be two patches being developed for Solr 3.0, 792 and 64. Again, I think you would have to have some sort of an engine to give you this information that could then be added at index time. I have also seen some interesting literature on using Drupal and the Solr module. My current architecture uses Nutch (1.2) for crawling, solrindex for inexing (Solr 1.4.1), and Ajax Solr for my UI. I have also seen some talk in webinars/etc from Lucid Imagination about upcoming development on Native Taxonomy Facets, any idea where that development stands? I have to use the most stable version of Solr/Nutch/Lucene possible for my implementation, because, unfortunately, once I choose, going back will be next to impossible for years to come! Thanks!
Re: Taxonomy and Faceting
Thanks for the quick response! I was thinking more about the idea of having both structured and unstructred data coming into a system to be indexed/searched. I would like these documents to be processed by some sort of entity/keyword/semantic processing. I have a well defined taxonomy for my organization (it is quite large) and at the moment we use RetrievalWare to give keyword/classification suggestions. This does NOT work well though, and RetrievalWare is pretty much useless to us. I want a way to do this process either at index time or search time. All documents should be processed against this taxonomy. I do not want the user to be able to nominate keywords, it must happen automatically. I am assuming it is only natural for these keywords/taxonomy entities to show up as hierarchical facets? From what I can tell, there is no way to tell Solr.. here is my taxonomy.. classify my documents and give me back facets and facet counts.. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2029636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Syncing 'delta-import' with 'select' query
Thanks for all the help! It is really appreciated. For now, I can afford the parallel requests problem, but when I put synchronous=true in the delta import, the call still returns with outdated items. Examining the log, it seems that the commit operation is being executed after the operation returns, even when I am using commit=true. Is it possible to also execute the commit synchronously? Cheers! Juan M. On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba ase...@gmail.com wrote: When you say two parallel requests from two users to single DIH request handler, what do you mean by request handler? I mean DIH. Are you refering to the HTTP request? Would that mean that if I make the request from different HTTP sessions it would work? No. It means that when you have two users that simultaneously changed two objects in the UI then you have two HTTP requests to DIH to pull changes from the db into Solr index. If the second request comes when the first is not fully processed then the second request will be rejected. As a result your index would be outdated (w/o the latest update) until the next update.
high CPU usage and SelectCannelConnector threads used a lot
Hi, I'm using solr and have been load testing it for around 4 days. We use the solrj client to communicate with a separate jetty based solr process on the same box. After a few days solr's CPU% is now consistently at or above 100% (multiple processors available) and the application using it is mostly not responding because it times out talking to solr. I connected visual VM to the solr JVM and found that out of the many btpool-# threads there are 4 that are pretty much stuck in the running state 100% of the time. Their names are btpool0-1-Acceptor1 SelectChannelConnector @0.0.0.0:9983 btpool0-2-Acceptor2 SelectChannelConnector @0.0.0.0:9983 btpool0-3-Acceptor3 SelectChannelConnector @0.0.0.0:9983 btpool0-9-Acceptor0 SelectChannelConnector @0.0.0.0:9983 The stacks are all the same btpool0-2 - Acceptor2 SelectChannelConnector @ 0.0.0.0:9983 - Thread t...@27 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 106a644 (a sun.nio.ch.Util$1) - locked 18dd381 (a java.util.Collections$UnmodifiableSet) - locked 38d07d (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:419) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:169) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Locked ownable synchronizers: - None All of the other idle thread pool threads are just waiting for new tasks. The active threads never seem to change, its always these 4. The selector channel appears to be in the jetty code, receiving requests from our other process through the solrj client. Does anyone know what this might mean or how to address it? Are these running all the time because they are blocked on IO so not actually consuming CPU? If so, what else might be? Is there a better way to figure out what is pinning the CPU? Some more info that might be useful. 32 bit machine ( I know, I know) 2.7GB of RAM for solr process ~2.5 is used According to visual VM around 25% of CPU time is spent in GC with the rest in application. Thanks for the help. John
Re: DIH - rdbms to index confusion
I'm not understanding this response. My main table does have a one to many relationship with the other tables. What should I be anticipating/wanting for each document if I want to return to the user the values while allowing them to search on the other terms? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-rdbms-to-index-confusion-tp2028543p2030456.html Sent from the Solr - User mailing list archive at Nabble.com.
only index synonyms
Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Re: high CPU usage and SelectCannelConnector threads used a lot
Hi John, sounds like this bug in NIO: http://jira.codehaus.org/browse/JETTY-937 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 I think recent versions of jetty work around this bug, or maybe try the non-NIO socket connector Kent On Tue, Dec 7, 2010 at 9:10 AM, John Russell jjruss...@gmail.com wrote: Hi, I'm using solr and have been load testing it for around 4 days. We use the solrj client to communicate with a separate jetty based solr process on the same box. After a few days solr's CPU% is now consistently at or above 100% (multiple processors available) and the application using it is mostly not responding because it times out talking to solr. I connected visual VM to the solr JVM and found that out of the many btpool-# threads there are 4 that are pretty much stuck in the running state 100% of the time. Their names are btpool0-1-Acceptor1 SelectChannelConnector @0.0.0.0:9983 btpool0-2-Acceptor2 SelectChannelConnector @0.0.0.0:9983 btpool0-3-Acceptor3 SelectChannelConnector @0.0.0.0:9983 btpool0-9-Acceptor0 SelectChannelConnector @0.0.0.0:9983 The stacks are all the same btpool0-2 - Acceptor2 SelectChannelConnector @ 0.0.0.0:9983 - Thread t...@27 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 106a644 (a sun.nio.ch.Util$1) - locked 18dd381 (a java.util.Collections$UnmodifiableSet) - locked 38d07d (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:419) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:169) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Locked ownable synchronizers: - None All of the other idle thread pool threads are just waiting for new tasks. The active threads never seem to change, its always these 4. The selector channel appears to be in the jetty code, receiving requests from our other process through the solrj client. Does anyone know what this might mean or how to address it? Are these running all the time because they are blocked on IO so not actually consuming CPU? If so, what else might be? Is there a better way to figure out what is pinning the CPU? Some more info that might be useful. 32 bit machine ( I know, I know) 2.7GB of RAM for solr process ~2.5 is used According to visual VM around 25% of CPU time is spent in GC with the rest in application. Thanks for the help. John
Re: only index synonyms
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the = syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Re: FastVectorHighlighter ignoring fragmenter parameter . . .
(10/12/06 23:52), CRB wrote: Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom extensions of *FragmentsBuilder http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragmentsBuilder.html **FragListBuilder http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragListBuilder.html *interfaces to take in and apply the regex. I would be happy to contribute back what I create. Appreciate whatever guidance you can offer, Christopher Christopher, Thank you for being interested in FVH! As I'm not sure a regex-fragmenter-like-function can be implemented for FVH, I cannot give any advise to you. Sorry about that. Basically, contribution back is always welcome! Thank you, Koji -- http://www.rondhuit.com/en/
Re: Taxonomy and Faceting
That is correct. Solr is a search engine, not a text analysis engine. There are a few open source text analysis systems: Weka, OpenNLP, UIMA. Someone is working on integrating UIMA with Solr: https://issues.apache.org/jira/browse/SOLR-2129 But you should generally assume you will have a batch processing pass over the data before indexing it. On Mon, Dec 6, 2010 at 12:04 PM, webdev1977 webdev1...@gmail.com wrote: Thanks for the quick response! I was thinking more about the idea of having both structured and unstructred data coming into a system to be indexed/searched. I would like these documents to be processed by some sort of entity/keyword/semantic processing. I have a well defined taxonomy for my organization (it is quite large) and at the moment we use RetrievalWare to give keyword/classification suggestions. This does NOT work well though, and RetrievalWare is pretty much useless to us. I want a way to do this process either at index time or search time. All documents should be processed against this taxonomy. I do not want the user to be able to nominate keywords, it must happen automatically. I am assuming it is only natural for these keywords/taxonomy entities to show up as hierarchical facets? From what I can tell, there is no way to tell Solr.. here is my taxonomy.. classify my documents and give me back facets and facet counts.. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2029636.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Solr Newbie - need a point in the right direction
Hi, First time poster here - I'm not entirely sure where I need to look for this information. What I'm trying to do is extract some (presumably) structured information from non-uniform data (eg, prices from a nutch crawl) that needs to show in search queries, and I've come up against a wall. I've been unable to figure out where is the best place to begin. I had a look through the solr wiki and did a search via Lucid's search tool and I'm guessing this is handled at index time through my schema? But I've also seen dismax being thrown around as a possible solution and this has confused me. Basically, if you guys could point me in the right direction for resources (even as much as saying, you need X, it's over there) that would be a huge help. Cheers Mark
Out of memory error
Hi, When i am trying to import the data using DIH, iam getting Out of memory error.The below are the configurations which i have. Database:Mysql Os:windows No Of documents:15525532 In Db-config.xml i made batch size as -1 The solr server is running on Linux machine with tomcat. i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M Can anybody has idea, where the things are going wrong? Regards, JS -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of memory error
Batch size -1??? Strange but could be a problem. Note also you can't provide parameters to default startup.sh command; you should modify setenv.sh instead --Original Message-- From: sivaprasad To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Out of memory error Sent: Dec 7, 2010 12:03 AM Hi, When i am trying to import the data using DIH, iam getting Out of memory error.The below are the configurations which i have. Database:Mysql Os:windows No Of documents:15525532 In Db-config.xml i made batch size as -1 The solr server is running on Linux machine with tomcat. i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M Can anybody has idea, where the things are going wrong? Regards, JS -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html Sent from the Solr - User mailing list archive at Nabble.com. Sent on the TELUS Mobility network with BlackBerry
Re: Problem with DIH delta-import delete.
Thanks Koji. Problem seems to be that template transformer is not used when delete is performed. ... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: entry Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 787 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 786 ... There are entries with id 787 and 786 in database and those are marked as deleted. Query returns right number of deleted documents and right rows from database but delete fails because solr is using plain numeric id when deleting document. The same happens with blogs also. Matti 2010/12/4 Koji Sekiguchi k...@r.email.ne.jp: (10/11/17 20:18), Matti Oinas wrote: Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like fields field name=uuid type=string indexed=true stored=true required=true / field name=type type=int indexed=true stored=true required=true / field name=blog_id type=int indexed=true stored=true / field name=entry_id type=int indexed=false stored=true / field name=content type=textgen indexed=true stored=true / /fields uniqueKeyuuid/uniqueKey Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../ document entity name=blog pk=id query=SELECT id,description,1 as type FROM blogs WHERE status=2 deltaImportQuery=SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}' deltaQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt; modified AND status=2 deletedPkQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt;= modified AND status=3 transformer=TemplateTransformer field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / field column=description name=content / field column=type name=type / /entity entity name=entry pk=id query=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2 deltaImportQuery=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}' deltaQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified AND b.status=2 deletedPkQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' lt; b.modified transformer=HTMLStripTransformer,TemplateTransformer field column=uuid name=uuid template=entry-${entry.id} / field column=id name=entry_id / field column=blog_id name=blog_id / field column=content name=content stripHTML=true / field column=type name=type / /entity /document /dataConfig Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.. The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import = str name= Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. /str str
Re: Solr -File Based Spell Check and Read .cfs file generated
Anyone know abt it? how to extract the dictionary generated by default.? How do i read this .cfs files generated in index folder.. Awaiting reply On Mon, Dec 6, 2010 at 7:54 PM, rajini maski rajinima...@gmail.com wrote: Yeah.. I wanna use this Spell-check only.. I want to create myself the dictionary.. And give it as input to solr.. Because my indexes also have mis-spelled content and so I want solr to refer this file and not autogenrated. How do i get this done? I will try the spell check as suggested by michael... One more main thing I wanted to know is, how to extract the dictionary generated by default.? How do i read this .cfs files generated in index folder.. Please reply if you know anything related to this.. Awaiting reply On Mon, Dec 6, 2010 at 7:33 PM, Erick Erickson erickerick...@gmail.comwrote: Are you sure you want spellcheck/autosuggest? Because what you're talking about almost sounds like synonyms. Best Erick On Mon, Dec 6, 2010 at 1:37 AM, rajini maski rajinima...@gmail.com wrote: How does the solr file based spell check work? How do we need to enter data in the spelling.txt...I am not clear about its functionality..If anyone know..Please reply. I want to index a word = Wear But while searching I search as =Dress I want to get results for Wear.. How do i apply this functionality.. Awaiting Reply
how to config DataImport Scheduling
Hi I want to config DataImport Scheduling, but not know, how to do it. i just create and compile Scheduling classes with netbeans. and now have Scheduling.Jar. Q: how to setup it on tomcat or solr? (i using tomcat 6 on windows 2008) thanks in advanced
Re: only index synonyms
Hi Erik thanks for the reply. I only want the synonyms to be in the index how can I achieve that ? Sorry probably missing something obvious in the docs On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the = syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Re: only index synonyms
Hi Lee, On Mon, Dec 6, 2010 at 10:56 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Erik Nope, Erik is the other one. :-) thanks for the reply. I only want the synonyms to be in the index how can I achieve that ? Sorry probably missing something obvious in the docs Exactly what he said, use the = syntax. You've already got it. Add the lines pretty = scenic text = words to synonyms.txt, and it will do what you want. Tom On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the = syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Solr JVM performance issue after 2 days
Hi, I am using multi-core tomcat on 2 servers. 3 language per server. I am adding documents to solr up to 200 doc/sec. when updating process is started, every thing is fine (update performance is max 200 ms/doc. with about 800 MB memory used with minimal cpu usage). After 15-17 hours it's became so slow (more that 900 sec for update), used heap memory is about 15GB, GC time is became more than one hour. I don't know what's wrong with it? Can anyone describe me what's the problem? Is that came from Solr or JVM? Note: when i stop updating, CPU busy within 15-20 min. and when start updating again i have same issue. but when stop tomcat service and start it again, all thing is OK. I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1 thanks in advanced Hamid