Re: Newbie question on sorting
The easiest way is to do that in the app. That is, return the top 10 to the app (by score) then re-order them there. There's nothing in Solr that I know of that does what you want out of the box. Best Erick On Mon, Apr 30, 2012 at 11:10 AM, Jacek pjac...@gmail.com wrote: Hello all, I'm facing this simple problem, yet impossible to resolve for me (I'm a newbie in Solr). I need to sort the results by score (it is simple, of course), but then what I need is to take top 10 results, and re-order it (only those top 10 results) by a date field. It's not the same as sort=score,creationdate Any suggestions will be greatly appreciated!
Re: post.jar failing
Works fine for me with address_xml as string type, indexed, stored on 3.6. What version of Solr are you using? Best Erick On Mon, Apr 30, 2012 at 4:18 PM, William Bell billnb...@gmail.com wrote: I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[poffL poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff /poffL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error adding field 'address_xml'='eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL' -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: core sleep/wake
Well, that'll be kinda self-defeating. The whole point of auto-warming is to fill up the caches, consuming memory. Without that, searches will be slow. So the idea of using minimal resources is really antithetical to having these in-memory structures filled up. You can try configuring minimal caches etc. Or just give it lots of memory and count on your OS to swap the pages out if the particular core doesn't get used. Best Erick On Mon, Apr 30, 2012 at 5:18 PM, oferiko ofer...@gmail.com wrote: I have a multicore solr with a lot of cores that contains a lot of data (~50M documents), but are rarely used. Can i load a core from configuration, but have keep it in sleep mode, where is has all the configuration available, but it hardly consumes resources, and based on a query or an update, it will come to life? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types
I've no experience in the language nuances. I've found that I had to mix unigram phrase searches with free-text searces in bigram fields. This is for Chinese language, not Japanese. The bigram idea comes about apparently because Chinese characters tend to be clumped into 2-3 letter words, in a way that is not consistent across different kinds of text. I have no pretense of understanding the whys. On Mon, Apr 30, 2012 at 2:21 PM, Burton-West, Tom tburt...@umich.edu wrote: Thanks wunder, I really appreciate the help. Tom -- Lance Norskog goks...@gmail.com
Re: correct XPATH syntax
Hi David, I think you should add this option : flatten=true and the could you try to use this XPath : /MedlineCitationSet/MedlineCitation/AuthorList/Author see here for the description : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 I don't think the that the commonField option is needed here, I think you should suppress it. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Removing old documents
Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks.
Re: should slave replication be turned off / on during master clean and re-index?
hello shawn, thanks for the reply. ok - i did some testing and yes you are correct. autocommit is doing the commit work in chunks. yes - the slaves are also going to having everything to nothing, then slowly building back up again, lagging behind the master. ... and yes - this is probably not what we need - as far as a replication strategy for the slaves. you said, you don't use autocommit. if so - then why don't you use / like autocommit? since we have not done this here - there is no established reference point, from an operations perspective. i am looking to formulate some sort of operation strategy, so ANY ideas or input is really welcome. it seems to me that we have to account for two operational strategies - the first operational mode is a daily append to the solr core after the database tables have been updated. this can probably be done with a simple delta import. i would think that autocommit could remain on for the master and replication could also be left on so the slaves picked up the changes ASAP. this seems like the mode that we would / should be in most of the time. the second operational mode would be a build from scratch mode, where changes in the schema necessitated a full re-index of the data. given that our site (powered by solr) must be up all of the time, and that our full index time on the master (for the moment) is hovering somewhere around 16 hours - it makes sense that some sort of parallel path - with a cut-over, must be used. in this situation is it possible to have the indexing process going on in the background - then have one commit at the end - then turn replication on for the slaves? are there disadvantages to this approach? also - i really like your suggestion of a build core and live core. is this approach you use? thank you for all of the great input then -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: post.jar failing
Please clarify the problem, because the error message you provide refers to address data that is not in the input data that you provide. It doesn't match! The error refers to an edu element, but the input data uses a poff element. Maybe you have multiple SP2514N documents; maybe somebody made a copy of the original and edited the address_xml field value. And maybe that edited version that has an edu element has some obvious error. In short, show us the full actual input address_xml field element, but preferably the entire Solr input document for the version of the SP2514N document that actually generates the error . -- Jack Krupansky -Original Message- From: William Bell Sent: Monday, April 30, 2012 4:18 PM To: solr-user@lucene.apache.org Subject: post.jar failing I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[poffL poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff /poffL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error adding field 'address_xml'='eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL' -- Bill Bell billnb...@gmail.com cell 720-256-8076
Grouping ngroups count
Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks.
Re: extracting/indexing HTML via cURL
Thank you Jack. So, it's not doable/possible to search and highlight keywords within a field that contains the raw formatted HTML? and strip out the HTML tags during analysis...so that a user would get back nothing if they did a search for (ex. p)? On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.comwrote: I was thinking that you wanted to index the actual text from the HTML page, but have the stored field value still have the raw HTML with tags. If you just want to store only the raw HTML, a simple string field is sufficient, but then you can't easily do a text search on it. Or, you can have two fields, one string field for the raw HTML (stored, but not indexed) and then do a CopyField to a text field field that has the HTMLStripCharFilter to strip the HTML tags and index only the text (indexed, but not stored.) -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 5:06 PM To: solr-user@lucene.apache.org Subject: Re: Solr: extracting/indexing HTML via cURL Great, thank you for the input. My understanding of HTMLStripCharFilter is that it strips HTML tags, which is not what I want ~ is this correct? I want to keep the HTML tags intact. On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com **wrote: If by extracting HTML content via cURL you mean using SolrCell to parse html files, this seems to make sense. The sequence is that regardless of the file type, each file extraction parser will strip off all formatting and produce a raw text stream. Office, PDF, and HTML files are all treated the same in that way. Then, the unformatted text stream is sent through the field type analyzers to be tokenized into terms that Lucene can index. The input string to the field type analyzer is what gets stored for the field, but this occurs after the extraction file parser has already removed formatting. No way for the formatting to be preserved in that case, other than to go back to the original input document before extraction parsing. If you really do want to preserve full HTML formatted text, you would need to define a field whose field type uses the HTMLStripCharFilter and then directly add documents that direct the raw HTML to that field. There may be some other way to hook into the update processing chain, but that may be too much effort compared to the HTML strip filter. -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 10:07 AM To: solr-user@lucene.apache.org Subject: Solr: extracting/indexing HTML via cURL Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to include the HTML tags, as I would like to keep the formatted HTML intact? Any help is greatly appreciated.
Re: Solr Merge during off peak times
Hi Prabhu, I don't think such a merge policy exists, but it would be nice to have this option and I imagine it wouldn't be hard to write if you really just base the merge or no merge decision on the time of day (and maybe day of the week). Note that this should go into Lucene, not Solr, so if you decide to contribute your work, please see http://wiki.apache.org/lucene-java/HowToContribute Otis Performance Monitoring for Solr - http://sematext.com/spm From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 1, 2012 8:45 AM Subject: Solr Merge during off peak times Hi, I would like to know if there is a way to configure index merge policy in solr so that the merging happens during off peak hours. Can you please let me know if such a merge policy configuration exists? Thanks Prabhu
Re: extracting/indexing HTML via cURL
Sorry for the confusion. It is doable. If you feed the raw HTML into a field that has the HTMLStripCharFilter, the stored value will retain the HTML tags, while the indexed text will be stripped of the of the tags during analysis and be searchable just like a normal text field. Then, search will not see p. -- Jack Krupansky -Original Message- From: okayndc Sent: Tuesday, May 01, 2012 10:08 AM To: solr-user@lucene.apache.org Subject: Re: extracting/indexing HTML via cURL Thank you Jack. So, it's not doable/possible to search and highlight keywords within a field that contains the raw formatted HTML? and strip out the HTML tags during analysis...so that a user would get back nothing if they did a search for (ex. p)? On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.comwrote: I was thinking that you wanted to index the actual text from the HTML page, but have the stored field value still have the raw HTML with tags. If you just want to store only the raw HTML, a simple string field is sufficient, but then you can't easily do a text search on it. Or, you can have two fields, one string field for the raw HTML (stored, but not indexed) and then do a CopyField to a text field field that has the HTMLStripCharFilter to strip the HTML tags and index only the text (indexed, but not stored.) -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 5:06 PM To: solr-user@lucene.apache.org Subject: Re: Solr: extracting/indexing HTML via cURL Great, thank you for the input. My understanding of HTMLStripCharFilter is that it strips HTML tags, which is not what I want ~ is this correct? I want to keep the HTML tags intact. On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com **wrote: If by extracting HTML content via cURL you mean using SolrCell to parse html files, this seems to make sense. The sequence is that regardless of the file type, each file extraction parser will strip off all formatting and produce a raw text stream. Office, PDF, and HTML files are all treated the same in that way. Then, the unformatted text stream is sent through the field type analyzers to be tokenized into terms that Lucene can index. The input string to the field type analyzer is what gets stored for the field, but this occurs after the extraction file parser has already removed formatting. No way for the formatting to be preserved in that case, other than to go back to the original input document before extraction parsing. If you really do want to preserve full HTML formatted text, you would need to define a field whose field type uses the HTMLStripCharFilter and then directly add documents that direct the raw HTML to that field. There may be some other way to hook into the update processing chain, but that may be too much effort compared to the HTML strip filter. -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 10:07 AM To: solr-user@lucene.apache.org Subject: Solr: extracting/indexing HTML via cURL Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to include the HTML tags, as I would like to keep the formatted HTML intact? Any help is greatly appreciated.
Re: Removing old documents
I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks.
Re: extracting/indexing HTML via cURL
Awesome, I'll give it try. Thanks Jack! On Tue, May 1, 2012 at 10:23 AM, Jack Krupansky j...@basetechnology.comwrote: Sorry for the confusion. It is doable. If you feed the raw HTML into a field that has the HTMLStripCharFilter, the stored value will retain the HTML tags, while the indexed text will be stripped of the of the tags during analysis and be searchable just like a normal text field. Then, search will not see p. -- Jack Krupansky -Original Message- From: okayndc Sent: Tuesday, May 01, 2012 10:08 AM To: solr-user@lucene.apache.org Subject: Re: extracting/indexing HTML via cURL Thank you Jack. So, it's not doable/possible to search and highlight keywords within a field that contains the raw formatted HTML? and strip out the HTML tags during analysis...so that a user would get back nothing if they did a search for (ex. p)? On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.com* *wrote: I was thinking that you wanted to index the actual text from the HTML page, but have the stored field value still have the raw HTML with tags. If you just want to store only the raw HTML, a simple string field is sufficient, but then you can't easily do a text search on it. Or, you can have two fields, one string field for the raw HTML (stored, but not indexed) and then do a CopyField to a text field field that has the HTMLStripCharFilter to strip the HTML tags and index only the text (indexed, but not stored.) -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 5:06 PM To: solr-user@lucene.apache.org Subject: Re: Solr: extracting/indexing HTML via cURL Great, thank you for the input. My understanding of HTMLStripCharFilter is that it strips HTML tags, which is not what I want ~ is this correct? I want to keep the HTML tags intact. On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com **wrote: If by extracting HTML content via cURL you mean using SolrCell to parse html files, this seems to make sense. The sequence is that regardless of the file type, each file extraction parser will strip off all formatting and produce a raw text stream. Office, PDF, and HTML files are all treated the same in that way. Then, the unformatted text stream is sent through the field type analyzers to be tokenized into terms that Lucene can index. The input string to the field type analyzer is what gets stored for the field, but this occurs after the extraction file parser has already removed formatting. No way for the formatting to be preserved in that case, other than to go back to the original input document before extraction parsing. If you really do want to preserve full HTML formatted text, you would need to define a field whose field type uses the HTMLStripCharFilter and then directly add documents that direct the raw HTML to that field. There may be some other way to hook into the update processing chain, but that may be too much effort compared to the HTML strip filter. -- Jack Krupansky -Original Message- From: okayndc Sent: Monday, April 30, 2012 10:07 AM To: solr-user@lucene.apache.org Subject: Solr: extracting/indexing HTML via cURL Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to include the HTML tags, as I would like to keep the formatted HTML intact? Any help is greatly appreciated.
Re: Removing old documents
Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex
Logging from data-config.xml
I'm getting this error (below) when doing an import. I'd like to add a Log line so I can see if the file path is messed up. So my data-config.xml looks like below but I'm not getting any extra info in the solr.log file under jetty. Is there a way to log to this log file from data-import.xml? dataConfig dataSource type=FileDataSource / document entity name=medlineFileList processor=FileListEntityProcessor fileName=.*xml rootEntity=false dataSource=null baseDir=/index_files/pubmed/ entity name=medlineFiles processor=XPathEntityProcessor url=${medlineFileList.fileAblsolutePath} forEach=/MedlineCitationSet/MedlineCitation transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Lo gTransformer logTemplate= processing ${medlineFileList.fileAbsolutePath} logLevel=info stream=true field column=pmid xpath=/MedlineCitationSet/MedlineCitation/PMID commonField=true / ... Thanks. INFO: Starting Full Import May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log SEVERE: Exception while processing: medlineFileList document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow( DataImportHandlerException.java:64) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti tyProcessor.java:286) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathE ntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntity Processor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPro cessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java :591) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java :617) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:26 7) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.j ava:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:41 1) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392 ) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja va:113) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja va:85) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja va:47) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti tyProcessor.java:283) ... 10 more Caused by: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja va:111) ... 13 more
Re: Removing old documents
Hi What I do is I put the date created for when the doc was inserted or updated and then I do a search/delete query based on that Mav On 01/05/2012 15:31, Bai Shen baishen.li...@gmail.com wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks.
Re: Logging from data-config.xml
fixed the error, stupid typo, but log msg didn't appear until typo was fixed. I would have thought they would be unrelated. On 5/1/12 10:42 AM, Twomey, David david.two...@novartis.com wrote: I'm getting this error (below) when doing an import. I'd like to add a Log line so I can see if the file path is messed up. So my data-config.xml looks like below but I'm not getting any extra info in the solr.log file under jetty. Is there a way to log to this log file from data-import.xml? dataConfig dataSource type=FileDataSource / document entity name=medlineFileList processor=FileListEntityProcessor fileName=.*xml rootEntity=false dataSource=null baseDir=/index_files/pubmed/ entity name=medlineFiles processor=XPathEntityProcessor url=${medlineFileList.fileAblsolutePath} forEach=/MedlineCitationSet/MedlineCitation transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Lo gTransformer logTemplate= processing ${medlineFileList.fileAbsolutePath} logLevel=info stream=true field column=pmid xpath=/MedlineCitationSet/MedlineCitation/PMID commonField=true / ... Thanks. INFO: Starting Full Import May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log SEVERE: Exception while processing: medlineFileList document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow ( DataImportHandlerException.java:64) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt i tyProcessor.java:286) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPath E ntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntit y Processor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr o cessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav a :591) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav a :617) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2 6 7) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter. j ava:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4 1 1) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:39 2 ) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j a va:113) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j a va:85) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j a va:47) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt i tyProcessor.java:283) ... 10 more Caused by: java.io.FileNotFoundException: Could not find file: at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j a va:111) ... 13 more
Re: get a total count
Hello, A related question on this topic. How do I programmatically find the total number of documents across many shards ? For EmbeddedSolrServer, I use the following command to get the total count : solrSearcher.getStatistics().get(numDocs) With distributed search, how do i get the count of all records in all shards. Apart from doing a *:* query, is there a way to get the total count ? I am not able to use the same command above because, I am not able to get a handle to the SolrIndexSearcher object with distributed search. The conf and data directories of my index reside directly under a folder called solr (no core) under the weblogic domain directly. I dont have a SolrCore object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object using the following call : solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get(); Stack Information : OS : Solaris jdk : 1.5.0_14 32 bit Solr : 1.3 App Server : Weblogic 10MP1 Thank you. - Rahul On Tue, Nov 15, 2011 at 10:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I'm assuming the question was about how MANY documents have been indexed across all shards. Answer #1: Look at the Solr Admin Stats page on each of your Solr instances and add up the numDocs numbers you see there Answer #2: Use Sematext's free Performance Monitoring tool for Solr On Index report choose all, sum in the Solr Host selector and that will show you the total # of docs across the cluster, total # of deleted docs, total segments, total size on disk, etc. URL: http://www.sematext.com/spm/solr-performance-monitoring/index.html Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: U Anonym uano...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 14, 2011 11:50 AM Subject: get a total count Hello everyone, A newbie question: how do I find out how documents have been indexed across all shards? Thanks much!
Re: get latest 50 documents the fastest way
Hi, The first thing that comes to mind is to not query with *:*, which I'm guessing you are doing, but by running a query with a time range constraint that you know will return you enough docs, but not so many that performance suffers. And, of course, thinking beyond Solr, if you really know you always need last 50, you could simply keep last 50 in memory somewhere and get it from there, not from Solr, which should be faster. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Yuval Dotan yuvaldo...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, May 1, 2012 10:38 AM Subject: get latest 50 documents the fastest way Hi Guys We have a use case where we need to get the 50 *latest *documents that match my query - without additional ranking,sorting,etc on the results. My index contains 1,000,000,000 documents and i noticed that if the number of found documents is very big (larger than 50% of the index size - 500,000,000 docs) than it takes more than 5 seconds to get the results even with rows=50 parameter. Is there a way to get the results faster? Thanks Yuval
Re: post.jar failing
OK. I am using SOLR 3.6. I restarted SOLR and it started working. No idea why. You were right I showed the error log from a different document. We might want to add a test case for CDATA. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com wrote: Please clarify the problem, because the error message you provide refers to address data that is not in the input data that you provide. It doesn't match! The error refers to an edu element, but the input data uses a poff element. Maybe you have multiple SP2514N documents; maybe somebody made a copy of the original and edited the address_xml field value. And maybe that edited version that has an edu element has some obvious error. In short, show us the full actual input address_xml field element, but preferably the entire Solr input document for the version of the SP2514N document that actually generates the error . -- Jack Krupansky -Original Message- From: William Bell Sent: Monday, April 30, 2012 4:18 PM To: solr-user@lucene.apache.org Subject: post.jar failing I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[poffL poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff /poffL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error adding field 'address_xml'='eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL' -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something?
Re: solr error after relacing schema.xml
PROBLEM RESOLVED. Solr 3.6.0 changed where it looks for stopwords_en.txt (now in sub-directory /lang) . Schema.xml generated by Haystack 2.0.0 beta need to be edited. Everthing working now. - BillB1951 -- View this message in context: http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3953115.html Sent from the Solr - User mailing list archive at Nabble.com.
question on word parsing control
I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on 'evalu' it finds all the matches. Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to be a match? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: post.jar failing
Sounds as if maybe it was some other kind of error having nothing to do with the data itself. Were there any additional errors or exceptions shortly before the failure? Maybe memory was low and some component wouldn't load, or somebody caught an exception without reporting the actual cause. After all, the message you provided said nothing about the actual problem. Maybe Solr itself needs a better diagnostic in that case. -- Jack Krupansky -Original Message- From: William Bell Sent: Tuesday, May 01, 2012 11:09 AM To: solr-user@lucene.apache.org Subject: Re: post.jar failing OK. I am using SOLR 3.6. I restarted SOLR and it started working. No idea why. You were right I showed the error log from a different document. We might want to add a test case for CDATA. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com wrote: Please clarify the problem, because the error message you provide refers to address data that is not in the input data that you provide. It doesn't match! The error refers to an edu element, but the input data uses a poff element. Maybe you have multiple SP2514N documents; maybe somebody made a copy of the original and edited the address_xml field value. And maybe that edited version that has an edu element has some obvious error. In short, show us the full actual input address_xml field element, but preferably the entire Solr input document for the version of the SP2514N document that actually generates the error . -- Jack Krupansky -Original Message- From: William Bell Sent: Monday, April 30, 2012 4:18 PM To: solr-user@lucene.apache.org Subject: post.jar failing I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[poffL poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff /poffL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error adding field 'address_xml'='eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL' -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: question on word parsing control
This is a stemming artifact, that all of the forms of evaluat* are being stemmed to evalu. That may seem odd, but stemming/stemmers are odd to begin with. 1. You could choose a different stemmer. 2. You could add synonyms to map various forms of the word to the desired form, such as eval. 3. Accept that Solr ain't perfect or optimal for every fine detail. 4. Or, maybe the stemmer behavior is technically perfect, but perfection can be subjective. In this particular case, maybe you might consider a synonym rule such as eval=evaluate. -- Jack Krupansky -Original Message- From: kenf_nc Sent: Tuesday, May 01, 2012 9:23 AM To: solr-user@lucene.apache.org Subject: question on word parsing control I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on 'evalu' it finds all the matches. Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to be a match? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Email classification with solr
Hello, just a short question: Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing an e-mail to add it automatically to a category (four are available)? Thanks, Ramo
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: hierarchical faceting?
yup. fieldType name=cq_tag class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=$/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType field name=colors type=cq_tag indexed=true stored=true multiValued=true/!-- red$pink, blue ... -- field name=colors_facet type=string indexed=true stored=false multiValued=true/!-- red$pink, blue ... -- copyField source=colors dest=colors_facet/ and ?facet.field=colors_facet On Mon, Apr 30, 2012 at 9:35 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Is there a tokenizer that tokenizes the string as one token? Using KeywordTokenizer at query time should do whta you want. -Hoss
RE: Grouping ngroups count
Hello, When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query? If you're doing a distributed query, then for group.ngroups to work you need to ensure that all documents for a group exist on a single shard. However, what you're describing sounds an awful lot like this JIRA issue that I entered a while ago for distributed grouping. I found that the hit count was coming only from the shards that ended up having results in the documents that were returned. I didn't test group.ngroups at the time. https://issues.apache.org/jira/browse/SOLR-3316 If this is a similar issue then you should make a new Jira issue. Cody -Original Message- From: Francois Perron [mailto:francois.per...@wantedanalytics.com] Sent: Tuesday, May 01, 2012 6:47 AM To: solr-user@lucene.apache.org Subject: Grouping ngroups count Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks.
Re: Solr Parent/Child Searching
Hello Simon, Let me reply to solr-user. We consider BJQ as a promising solution for parent/child usecase, we have a facet component prototype for it; but it's too raw and my team had to switch to another challenges temporarily. I participated in SOLR-3076, but achievement is really modest. I've attached essential BJQParser with god-mode syntax. I think the next stage should be a block indexing support in Solr, I'm not sure how to do that right. I suppose that by next month I'll be able to provide something like essential support for block updates. Regards On Tue, May 1, 2012 at 12:05 AM, Simon Guindon simon.guindon wrote: Hello Mikhail, ** ** I came across your blog post about Solr with an alternative approach to the block join solution for LUCENE-3171. We have hit the same situation where we need the parent/child relationship for our Solr queries. ** ** I was wondering if your solution was available anywhere? It would be nice if a solution could make its way into Solr at some point J ** ** Thanks and take care, Simon Guindon -- Sincerely yours Mikhail Khludnev. Tech Lead, Grid Dynamics. http://www.griddynamics.com mkhlud...@griddynamics.com
Re: post.jar failing
I am not sure. It just started working. On Tue, May 1, 2012 at 9:39 AM, Jack Krupansky j...@basetechnology.com wrote: Sounds as if maybe it was some other kind of error having nothing to do with the data itself. Were there any additional errors or exceptions shortly before the failure? Maybe memory was low and some component wouldn't load, or somebody caught an exception without reporting the actual cause. After all, the message you provided said nothing about the actual problem. Maybe Solr itself needs a better diagnostic in that case. -- Jack Krupansky -Original Message- From: William Bell Sent: Tuesday, May 01, 2012 11:09 AM To: solr-user@lucene.apache.org Subject: Re: post.jar failing OK. I am using SOLR 3.6. I restarted SOLR and it started working. No idea why. You were right I showed the error log from a different document. We might want to add a test case for CDATA. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO amp; SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky j...@basetechnology.com wrote: Please clarify the problem, because the error message you provide refers to address data that is not in the input data that you provide. It doesn't match! The error refers to an edu element, but the input data uses a poff element. Maybe you have multiple SP2514N documents; maybe somebody made a copy of the original and edited the address_xml field value. And maybe that edited version that has an edu element has some obvious error. In short, show us the full actual input address_xml field element, but preferably the entire Solr input document for the version of the SP2514N document that actually generates the error . -- Jack Krupansky -Original Message- From: William Bell Sent: Monday, April 30, 2012 4:18 PM To: solr-user@lucene.apache.org Subject: post.jar failing I am getting a post.jar failure when trying to post the following CDATA field... It used to work on older versions. This is in SOlr 3.6. add doc field name=idSP2514N/field field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133/field field name=manuSamsung Electronics Co. Ltd./field field name=catelectronics/field field name=cathard drive/field field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor/field field name=price92/field field name=popularity6/field field name=inStocktrue/field field name=address_xml![CDATA[poffL poffoffLoffad12299 9th Ave N Ste 1A/ad1citySt Petersburg/citystFL/stzip33713/ziplat27.781593/latlng-82.663620/lngphL/faxL//off/offL/poff /poffL]]/field field name=manufacturedate_dt2006-02-13T15:26:37Z/field !-- Near Oklahoma city -- field name=store35.0752,-97.032/field /doc /add Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error adding field 'address_xml'='eduL edu edTypCMEDSCH/edTypC inst edNmUNIVERSITY OF COLORADO SCHOOL OF MEDICINE/edNm yr1974/yr degMD/deg /inst /edu /eduL' -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076
How to integrate sen and lucene-ja in SOLR 3.x
Hi, Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4 or 3.5 or 3.6 version? Thanks, Shanmugavel -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-integrate-sen-and-lucene-ja-in-SOLR-3-x-tp3953266.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Email classification with solr
There are a number of different routes you can go, one of which is to use SolrCell (Tika) to parse mbox files and then add your own update processor that does whatever mail classification analysis you desire and then generates addition field values for the classification. A simpler approach is to do the analysis yourself outside of Solr and then feed the mbox data for each message into SolrCell along with the specific literal field values derived from your classification analysis. SolrCell (Tika) would then parse the mail message and add your literal field values. Or, you may want to consider fully parsing the mail messages outside of Solr so that you have full control over what gets parsed and which schema fields are used or not used, in additional to your content analysis field values. -- Jack Krupansky -Original Message- From: Ramo Karahasan Sent: Tuesday, May 01, 2012 12:17 PM To: solr-user@lucene.apache.org Subject: Email classification with solr Hello, just a short question: Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing an e-mail to add it automatically to a category (four are available)? Thanks, Ramo
Re: Upgrading to 3.6 broke cachedsqlentityprocessor
I know about one regression at least. Fix is already committed. see https://issues.apache.org/jira/browse/SOLR-3360 On Tue, May 1, 2012 at 12:53 AM, Brent Mills bmi...@uship.com wrote: I've read some things in jira on the new functionality that was put into caching in the DIH but I wouldn't think it should break the old behavior. It doesn't look as though any errors are being thrown, it's just ignoring the caching part and opening a ton of connections. Also I cannot find any documentation on the new functionality that was added so I'm not sure what syntax is valid and what's not. Here is my entity that worked in 3.1 but no longer works in 3.6: entity name=Emails query=SELECT * FROM Account.SolrUserSearchEmails WHERE '${dataimporter.request.clean}' != 'false' OR DateUpdated = dateadd(ss, -30, '${dataimporter.last_index_time}') processor=CachedSqlEntityProcessor where=UserID=Users.UserID -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Does Solr fit my needs?
no problem - you are welcome. Nothing out-of-the-box yet. Only approach is ready http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 Regards On Mon, Apr 30, 2012 at 12:06 PM, G.Long jde...@gmail.com wrote: Hi :) Thank you all for your answers. I'll try these solutions :) Kind regards, Gary Le 27/04/2012 16:31, G.Long a écrit : Hi there :) I'm looking for a way to save xml files into some sort of database and i'm wondering if Solr would fit my needs. The xml files I want to save have a lot of child nodes which also contain child nodes with multiple values. The depth level can be more than 10. After having indexed the files, I would like to be able to query for subparts of those xml files and be able to reconstruct them as xml files with all their children included. However, I'm wondering if it is possible with an index like solr lucene to keep or easily recover the structure of my xml data? Thanks for your help, Regards, Gary -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
AW: Email classification with solr
Hi Jack, thanks for the feedback. I'm really new to that stuff and not sure if I have fully understood it. Currently I've split emails in their properties and saved them into relational tables, for example the body part. Most of my e-mails are html emails. Now I have for example three categories: newsletter is on of this category. I would like to classify incoming emails as newsletter, if they fulfill an amount of attributes, e.g. the email address of the sender comprised newsletter and variants of this word in the address AND a newsletter content (body) should be classified as an newsletter. Is that possible to do that just with solr? Or do I need another tools for classifiying on the basis of text analysis? Isn't it necessary to build up a taxonomy for newsletter emails so that the classifier can match the mail text with some ruleset (defined taxonomy)? Thanks, Ramo -Ursprüngliche Nachricht- Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Dienstag, 1. Mai 2012 18:49 An: solr-user@lucene.apache.org Betreff: Re: Email classification with solr There are a number of different routes you can go, one of which is to use SolrCell (Tika) to parse mbox files and then add your own update processor that does whatever mail classification analysis you desire and then generates addition field values for the classification. A simpler approach is to do the analysis yourself outside of Solr and then feed the mbox data for each message into SolrCell along with the specific literal field values derived from your classification analysis. SolrCell (Tika) would then parse the mail message and add your literal field values. Or, you may want to consider fully parsing the mail messages outside of Solr so that you have full control over what gets parsed and which schema fields are used or not used, in additional to your content analysis field values. -- Jack Krupansky -Original Message- From: Ramo Karahasan Sent: Tuesday, May 01, 2012 12:17 PM To: solr-user@lucene.apache.org Subject: Email classification with solr Hello, just a short question: Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing an e-mail to add it automatically to a category (four are available)? Thanks, Ramo
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
I have similar issue using log4j for logging with trunk build, the CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am using sjfj 1.5.2 10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote: On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: AW: Email classification with solr
If you have the code that does all of that analysis, then you could integrate it with Solr using one of the approaches I listed, but Solr itself would not provide any of that analysis. -- Jack Krupansky -Original Message- From: Ramo Karahasan Sent: Tuesday, May 01, 2012 1:14 PM To: solr-user@lucene.apache.org Subject: AW: Email classification with solr Hi Jack, thanks for the feedback. I'm really new to that stuff and not sure if I have fully understood it. Currently I've split emails in their properties and saved them into relational tables, for example the body part. Most of my e-mails are html emails. Now I have for example three categories: newsletter is on of this category. I would like to classify incoming emails as newsletter, if they fulfill an amount of attributes, e.g. the email address of the sender comprised newsletter and variants of this word in the address AND a newsletter content (body) should be classified as an newsletter. Is that possible to do that just with solr? Or do I need another tools for classifiying on the basis of text analysis? Isn't it necessary to build up a taxonomy for newsletter emails so that the classifier can match the mail text with some ruleset (defined taxonomy)? Thanks, Ramo -Ursprüngliche Nachricht- Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Dienstag, 1. Mai 2012 18:49 An: solr-user@lucene.apache.org Betreff: Re: Email classification with solr There are a number of different routes you can go, one of which is to use SolrCell (Tika) to parse mbox files and then add your own update processor that does whatever mail classification analysis you desire and then generates addition field values for the classification. A simpler approach is to do the analysis yourself outside of Solr and then feed the mbox data for each message into SolrCell along with the specific literal field values derived from your classification analysis. SolrCell (Tika) would then parse the mail message and add your literal field values. Or, you may want to consider fully parsing the mail messages outside of Solr so that you have full control over what gets parsed and which schema fields are used or not used, in additional to your content analysis field values. -- Jack Krupansky -Original Message- From: Ramo Karahasan Sent: Tuesday, May 01, 2012 12:17 PM To: solr-user@lucene.apache.org Subject: Email classification with solr Hello, just a short question: Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing an e-mail to add it automatically to a category (four are available)? Thanks, Ramo
dataimport handler (DIH) - notify when it has finished?
Hello all, is there a notification / trigger / callback mechanism people use that allows them to know when a dataimport process has finished? we will be doing daily delta-imports and i need some way for an operations group to know when the DIH has finished. thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-handler-DIH-notify-when-it-has-finished-tp3953339.html Sent from the Solr - User mailing list archive at Nabble.com.
How to expand list into multi-valued fields?
I am indexing content from a RDBMS. I have a column in a table with pipe separated values, and upon indexing I would like to transform these values into multi-valued fields in SOLR's index. For example, ColumnA (From RDBMS) - apple|orange|banana I want to expand this to, SOLR Index FruitField=apple FruitField=orange FruitField=banana or number expand to, SOLR Index FruitField1=apple FruitField2=orange FruitField3=banana Please help, thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping ngroups count
Thanks for your response Cody, First, I used distributed grouping on 2 shards and I'm sure then all documents of each group are in the same shard. I take a look on JIRA issue and it seem really similar. There is the same problem with group.ngroups. The count is calculated in second pass so we only had result from useful shards and it's why when I increase rows limit i got the right count (they must use all my shards). Except it's a feature (i hope not), I will create a new JIRA issue for this. Thanks On 2012-05-01, at 12:32 PM, Young, Cody wrote: Hello, When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query? If you're doing a distributed query, then for group.ngroups to work you need to ensure that all documents for a group exist on a single shard. However, what you're describing sounds an awful lot like this JIRA issue that I entered a while ago for distributed grouping. I found that the hit count was coming only from the shards that ended up having results in the documents that were returned. I didn't test group.ngroups at the time. https://issues.apache.org/jira/browse/SOLR-3316 If this is a similar issue then you should make a new Jira issue. Cody -Original Message- From: Francois Perron [mailto:francois.per...@wantedanalytics.com] Sent: Tuesday, May 01, 2012 6:47 AM To: solr-user@lucene.apache.org Subject: Grouping ngroups count Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks.
Re: How to expand list into multi-valued fields?
here you go specify regex transformer in entity tag of DIH config xml like below entity transformer=RegexTransformer ... / and then field column=ColumnA name=FruitField splitBy=\| / That's it! - Jeevanandam On 02-05-2012 12:35 am, invisbl wrote: I am indexing content from a RDBMS. I have a column in a table with pipe separated values, and upon indexing I would like to transform these values into multi-valued fields in SOLR's index. For example, ColumnA (From RDBMS) - apple|orange|banana I want to expand this to, SOLR Index FruitField=apple FruitField=orange FruitField=banana or number expand to, SOLR Index FruitField1=apple FruitField2=orange FruitField3=banana Please help, thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Removing old documents
Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex
Re: correct XPATH syntax
Ludovic, Thanks for your help. I tried your suggestion but it didn't work for Authors. Below are 3 snippets from data-config.xml, the XML file and the XML response from the DB Data-config: entity name=medlineFiles processor=XPathEntityProcessor url=${medlineFileList.fileAbsolutePath} forEach=/MedlineCitationSet/MedlineCitation transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Log Transformer logTemplate= processing ${medlineFileList.fileAbsolutePath} logLevel=info flatten=true stream=true field column=pmid xpath=/MedlineCitationSet/MedlineCitation/PMID commonField=true / field column=journal_name xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/Title commonField=true / field column=title xpath=/MedlineCitationSet/MedlineCitation/Article/ArticleTitle commonField=true / field column=abstract xpath=/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText commonField=true / field column=author xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author commonField=false / field column=year xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub Date/Year commonField=true / /entity XML Snippet for Author: AuthorList CompleteYN=Y Author ValidYN=Y LastNameMalathi/LastName ForeNameK/ForeName InitialsK/Initials /Author Author ValidYN=Y LastNameXiao/LastName ForeNameY/ForeName InitialsY/Initials /Author Author ValidYN=Y LastNameMitchell/LastName ForeNameA P/ForeName InitialsAP/Initials /Author /AuthorList Response from SOLR: arr name=author str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str /arr str name=journal_nameJournal of cancer research and clinical oncology/str Thanks David On 5/1/12 8:05 AM, lboutros boutr...@gmail.com wrote: Hi David, I think you should add this option : flatten=true and the could you try to use this XPath : /MedlineCitationSet/MedlineCitation/AuthorList/Author see here for the description : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config .xml-1 I don't think the that the commonField option is needed here, I think you should suppress it. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812. html Sent from the Solr - User mailing list archive at Nabble.com.
question on tokenization control
I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on 'evalu' it finds all the matches. Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to be a match? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport handler (DIH) - notify when it has finished?
On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote: Hello all, is there a notification / trigger / callback mechanism people use that allows them to know when a dataimport process has finished? we will be doing daily delta-imports and i need some way for an operations group to know when the DIH has finished. Never tried it myself, but this should meet your needs: http://wiki.apache.org/solr/DataImportHandler#EventListeners Regards, Gora
Boosting documents based on search term/phrase
Is there a way to boost documents based on the search term/phrase?
Re: core sleep/wake
My random searches can be a bit slow on startup, so i still would like to get that lazy load but have more cores available. I'm actually trying now the LotsOfCores way of handling things. Had to work a bit to get the patch suitable for 3.5 but it seems to be doing what i need. On Tue, May 1, 2012 at 2:31 PM, Erick Erickson erickerick...@gmail.comwrote: Well, that'll be kinda self-defeating. The whole point of auto-warming is to fill up the caches, consuming memory. Without that, searches will be slow. So the idea of using minimal resources is really antithetical to having these in-memory structures filled up. You can try configuring minimal caches etc. Or just give it lots of memory and count on your OS to swap the pages out if the particular core doesn't get used. Best Erick On Mon, Apr 30, 2012 at 5:18 PM, oferiko ofer...@gmail.com wrote: I have a multicore solr with a lot of cores that contains a lot of data (~50M documents), but are rarely used. Can i load a core from configuration, but have keep it in sleep mode, where is has all the configuration available, but it hardly consumes resources, and based on a query or an update, it will come to life? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting documents based on search term/phrase
Do you mean besides query elevation? http://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: question on tokenization control
Hi, Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval'? Without seeing the tokenizers you're using for the field type it's hard to say. You can use Solr's analysis page to see the tokens that are generated by the tokenizers in your analysis chain at both query time and index time. http://localhost:8983/solr/admin/analysis.jsp how do I get 'eval' to be a match? You could use synonyms to map 'eval' to 'evaluation'. Dan On Tue, May 1, 2012 at 8:17 PM, kfdroid kfdr...@gmail.com wrote: I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on 'evalu' it finds all the matches. Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to be a match? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question on tokenization control
Use synonyms at index time. Make eval and evaluate equivalent words. wunder On May 1, 2012, at 1:31 PM, Dan Tuffery wrote: Hi, Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval'? Without seeing the tokenizers you're using for the field type it's hard to say. You can use Solr's analysis page to see the tokens that are generated by the tokenizers in your analysis chain at both query time and index time. http://localhost:8983/solr/admin/analysis.jsp how do I get 'eval' to be a match? You could use synonyms to map 'eval' to 'evaluation'. Dan On Tue, May 1, 2012 at 8:17 PM, kfdroid kfdr...@gmail.com wrote: I have a field that is defined using what I believe is fairly standard text fieldType. I have documents with the words 'evaluate', 'evaluating', 'evaluation' in them. When I search on the whole word, obviously it works, if I search on 'eval' it finds nothing. However for some reason if I search on 'evalu' it finds all the matches. Is that an indexing setting or query setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to be a match? Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Re: NPE when faceting
it may be related this this http://stackoverflow.com/questions/10124055/solr-faceted-search-throws-nullpointerexception-with-http-500-status we are doing deletes from our index as well so it is possible that we're running into the same issue. I hope that sheds more light on things. On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote: I had reported this issue a while back, hoping that it was something with my environment, but that doesn't seem to be the case. I am getting the following stack trace on certain facet queries. Previously when I did an optimize the error went away, does anyone have any insight into why specifically this could be happening? May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807) at org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636) at org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:662)
Re: NPE when faceting
Darn... looks likely that it's another bug from when part of UnInvertedField was refactored into Lucene. We really need some random tests that can catch bugs like these though - I'll see if I can reproduce. Can you open a JIRA issue for this? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote: I had reported this issue a while back, hoping that it was something with my environment, but that doesn't seem to be the case. I am getting the following stack trace on certain facet queries. Previously when I did an optimize the error went away, does anyone have any insight into why specifically this could be happening? May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807) at org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636) at org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:662)
Re: Boosting documents based on search term/phrase
query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Yes, you can add in last-components section on default query handler. arr name=last-components strelevator/str /arr - Jeevanandam On 02-05-2012 3:53 am, Donald Organ wrote: query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Boosting documents based on search term/phrase
Here's some doc from Lucid: http://lucidworks.lucidimagination.com/display/solr/The+Query+Elevation+Component -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 5:23 PM To: solr-user@lucene.apache.org Subject: Re: Boosting documents based on search term/phrase query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote: Do you mean besides query elevation? http://wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Removing old documents
Maybe this is the HTTP caching feature? Solr comes with HTTP caching turned on by default and so when you do queries and changes your browser does not fetch your changed documents. On Tue, May 1, 2012 at 11:53 AM, alx...@aim.com wrote: Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex -- Lance Norskog goks...@gmail.com
Re: Removing old documents
I've been surprised to see Firefox cache even after empty-cache was ordered for JSOn results... this is quite annoying but I have get accustomed to it by doing the following when I need to debug: add a random parameter extra. But only when debugging! Using wget or curl showed me that the browser (and not solr-caching) was guilty of caching. I think the If-Modified-Since might be guilt, it would be still sent even after empty cache... paul Le 1 mai 2012 à 23:57, Lance Norskog a écrit : Maybe this is the HTTP caching feature? Solr comes with HTTP caching turned on by default and so when you do queries and changes your browser does not fetch your changed documents. On Tue, May 1, 2012 at 11:53 AM, alx...@aim.com wrote: Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex -- Lance Norskog goks...@gmail.com
Error with distributed search and Suggester component (Solr 3.4)
Hi list, Does anybody know if the Suggester component is designed to work with shards? I'm asking because the documentation implies that it should (since ...Suggester reuses much of the SpellCheckComponent infrastructure…, and the SpellCheckComponent is documented as supporting a distributed setup). But when I make a request, I get an exception: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Looking at the QueryComponent.java:493 code, I see: SolrDocumentList docs = (SolrDocumentList)srsp.getSolrResponse().getResponse().get(response); // calculate global maxScore and numDocsFound if (docs.getMaxScore() != null) { This is line 493 So I'm assuming the docs variable is null, which would happen if there is no response element in the Solr response. If I make a direct request to the request handler in one core (e.g. http://hostname:8080/solr/core0/select?qt=suggest-coreq=rad), the query works. But I see that there's no element named response, unlike a regular query. response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=spellcheck lst name=suggestions lst name=rad int name=numFound10/int int name=startOffset0/int int name=endOffset3/int arr name=suggestion strradair/str strradar/str /arr /lst /lst /lst /response So I'm wondering if my configuration is just borked and this should work, or the fact that the Suggester doesn't return a response field means that it just doesn't work with shards. Thanks, -- Ken http://about.me/kkrugler +1 530-210-6378 -- Ken Krugler http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Mahout Solr
response codes from http update requests
should i be concerned with the http response codes from update requests? i can't find documentation on what values come back from them anywhere (although maybe i'm not looking hard enough.) are they just http standard with 200 for success and 400/500 for failures? thanks, richard
Re: Error with distributed search and Suggester component (Solr 3.4)
I should have also included one more bit of information. If I configure the top-level (sharding) request handler to use just the suggest component as such: requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str str name=shards.qtsuggest-core/str str name=shardslocalhost:8080/solr/core0/,localhost:8080/solr/core1//str /lst arr name=components strsuggest/str /arr /requestHandler Then I don't get a NPE, but I also get a response with no results. response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=qr/str /lst /lst /response For completeness, here are the other pieces to the solrconfig.xml puzzle: requestHandler class=org.apache.solr.handler.component.SearchHandler name=suggest-core lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest-one/str str name=spellcheck.count10/str /lst arr name=components strsuggest/str /arr /requestHandler searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest-one/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str str name=fieldname/str !-- the indexed field to derive suggestions from -- float name=threshold0.05/float str name=buildOnCommittrue/str /lst lst name=spellchecker str name=namesuggest-two/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str str name=fieldcontent/str !-- the indexed field to derive suggestions from -- float name=threshold0.0/float str name=buildOnCommittrue/str /lst /searchComponent Thanks, -- Ken On May 1, 2012, at 3:48pm, Ken Krugler wrote: Hi list, Does anybody know if the Suggester component is designed to work with shards? I'm asking because the documentation implies that it should (since ...Suggester reuses much of the SpellCheckComponent infrastructure…, and the SpellCheckComponent is documented as supporting a distributed setup). But when I make a request, I get an exception: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Looking at the QueryComponent.java:493 code, I see: SolrDocumentList docs = (SolrDocumentList)srsp.getSolrResponse().getResponse().get(response); //
Re: How to integrate sen and lucene-ja in SOLR 3.x
(12/05/02 1:47), Shanmugavel SRD wrote: Hi, Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4 or 3.5 or 3.6 version? I think lucene-ja.jar no longer exists in Internet and doesn't work with Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know AttributeSource). Use lucene-gosen which is the descendant project of sen/lucene-ja instead. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Removing old documents
all caching is disabled and I restarted jetty. The same results. Thanks. Alex. -Original Message- From: Lance Norskog goks...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 2:57 pm Subject: Re: Removing old documents Maybe this is the HTTP caching feature? Solr comes with HTTP caching turned on by default and so when you do queries and changes your browser does not fetch your changed documents. On Tue, May 1, 2012 at 11:53 AM, alx...@aim.com wrote: Hello, I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/ without and with -noCommit and restarted solr server Log shows that 5 documents were removed but they are still in the search results. Is this a bug or something is missing? I use nutch-1.4 and solr 3.5 Thanks. Alex. -Original Message- From: Markus Jelsma markus.jel...@openindex.io To: solr-user solr-user@lucene.apache.org Sent: Tue, May 1, 2012 7:41 am Subject: Re: Removing old documents Nutch 1.4 has a separate tool to remove 404 and redirects documents from your index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents in one run based on segment data. On Tuesday 01 May 2012 16:31:47 Bai Shen wrote: I'm running Nutch, so it's updating the documents, but I'm wanting to remove ones that are no longer available. So in that case, there's no update possible. On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Not sure if there is an automatic way but we do it via a delete query and where possible we update doc under same id to avoid deletes. On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote: What is the best method to remove old documents? Things that no generate 404 errors, etc. Is there an automatic method or do I have to do it manually? THanks. -- Markus Jelsma - CTO - Openindex -- Lance Norskog goks...@gmail.com
Re: get latest 50 documents the fastest way
you should reverse your sort algorithm. maybe you can override the tf method of Similarity and return -1.0f * tf(). (I don't know whether default collector allow score smaller than zero) Or you can hack this by add a large number or write your own collector, in its collect(int doc) method, you can do like this: collect(int doc){ float score=scorer.score(); score*=-1.0f; } if you don't sort by relevant score, just set Sort On Tue, May 1, 2012 at 10:38 PM, Yuval Dotan yuvaldo...@gmail.com wrote: Hi Guys We have a use case where we need to get the 50 *latest *documents that match my query - without additional ranking,sorting,etc on the results. My index contains 1,000,000,000 documents and i noticed that if the number of found documents is very big (larger than 50% of the index size - 500,000,000 docs) than it takes more than 5 seconds to get the results even with rows=50 parameter. Is there a way to get the results faster? Thanks Yuval
Re: Boosting documents based on search term/phrase
Hi, Can you please give an example of what you mean? Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Donald Organ dor...@donaldorgan.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, May 1, 2012 3:59 PM Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: NPE when faceting
I don't have any more details than I provided here, but I created a ticket with this information. Thanks again https://issues.apache.org/jira/browse/SOLR-3427 On Tue, May 1, 2012 at 5:20 PM, Yonik Seeley yo...@lucidimagination.com wrote: Darn... looks likely that it's another bug from when part of UnInvertedField was refactored into Lucene. We really need some random tests that can catch bugs like these though - I'll see if I can reproduce. Can you open a JIRA issue for this? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote: I had reported this issue a while back, hoping that it was something with my environment, but that doesn't seem to be the case. I am getting the following stack trace on certain facet queries. Previously when I did an optimize the error went away, does anyone have any insight into why specifically this could be happening? May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807) at org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636) at org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:662)
Re: Boosting documents based on search term/phrase
Perfect, this is working well. On Tue, May 1, 2012 at 5:33 PM, Jeevanandam je...@myjeeva.com wrote: Yes, you can add in last-components section on default query handler. arr name=last-components strelevator/str /arr - Jeevanandam On 02-05-2012 3:53 am, Donald Organ wrote: query elevation was exactly what I was talking about. Now is there a way to add this to the default query handler? On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.com**wrote: Do you mean besides query elevation? http://wiki.apache.org/solr/QueryElevationComponenthttp://wiki.apache.org/solr/**QueryElevationComponent http:/**/wiki.apache.org/solr/**QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent And besides explicit boosting by the user (the ^ suffix operator after a term/phrase)? -- Jack Krupansky -Original Message- From: Donald Organ Sent: Tuesday, May 01, 2012 3:59 PM To: solr-user Subject: Boosting documents based on search term/phrase Is there a way to boost documents based on the search term/phrase?
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
check a release since r1332752 If things still look problematic, post a comment on: https://issues.apache.org/jira/browse/SOLR-3426 this should now have a less verbose message with an older SLF4j and with Log4j On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote: I have similar issue using log4j for logging with trunk build, the CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am using sjfj 1.5.2 10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote: On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: Ampersand issue
If your json value is amp; the proper xml value is amp;amp; What is the value you are setting on the stored field? is is or amp;? On Mon, Apr 30, 2012 at 12:57 PM, William Bell billnb...@gmail.com wrote: One idea was to wrap the field with CDATA. Or base64 encode it. On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell billnb...@gmail.com wrote: We are indexing a simple XML field from SQL Server into Solr as a stored field. We have noticed that the amp; is outputed as amp;amp; when using wt=XML. When using wt=JSON we get the normal amp;. If there a way to indicate that we don't want to encode the field since it is already XML when using wt=XML ? Bill Bell Sent from mobile -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
Yes, I'm the author of that JIRA. On Tue, May 1, 2012 at 8:45 PM, Ryan McKinley ryan...@gmail.com wrote: check a release since r1332752 If things still look problematic, post a comment on: https://issues.apache.org/jira/browse/SOLR-3426 this should now have a less verbose message with an older SLF4j and with Log4j On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote: I have similar issue using log4j for logging with trunk build, the CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am using sjfj 1.5.2 10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote: On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Looking for a way to separate MySQL query from DIH data-config.xml
Hello everyone, I have a working DIH setup with a couple of long and complicated MySQL queries in data-config.xml. To make it easier/safer for myself and other developers in my company to edit the MySQL query, I’d like to remove it from data-config.xml and store it in a separate file, and then call to that from data-config.xml. Is there anyone who’s currently doing this right now and could share what method was used to accomplish this? At some point on this list I saw someone mention that they had done just what I’m trying to do by putting the query in a separate SQL file as a MySQL stored procedure, and then calling that procedure from the query=”” portion of data-config.xml, but I don’t quite understand how/at what point that SQL file with the stored procedure would be read by DIH. Does anyone know how this would be done, or have any other suggestions for how to move the query into a separate document? Thanks in advance, Peter
Re: Error with distributed search and Suggester component (Solr 3.4)
On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi list, Does anybody know if the Suggester component is designed to work with shards? I'm not really sure it is? They would probably have to override the default merge implementation specified by SpellChecker. But, all of the current suggesters pump out over 100,000 QPS on my machine, so I'm wondering what the usefulness of this is? And if it was useful, merging results from different machines is pretty inefficient, for suggest you would shard by term instead so that you need only contact a single host? -- lucidimagination.com