facet total score instead of total count
Hi, I have a requirement where I want to sum up the scores of the faceted fields. This will be decide the relevancy for us. Is there a way to do it on a facet field? Basically instead of giving the count of records for facet field I would like to have total sum of scores for those records. Any help is greatly appreciated. Thanks Bharat Jain
logic required for newbie
Hi All, I am very new and learning solr. I have 10 column like following in table 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 when user search for landmark then I want to return only one landmark which match. Rest of the landmark should ingnored.. expected result like following if user search by landmark2.. 1. id 2. name 3. user_id 4. location 5. country 7. landmark2 or if search by landmark9 1. id 2. name 3. user_id 4. location 5. country 9. landmark9 please help me to design the schema for this kind of requirement... thanks with regards
Re: question about relevance
Well you are correct Erik that this is a database-ish thing try to achieve in solr and unfortunately the sin :) had been committed by somebody else :) and now we are running into relevancy issues. Let me try to state the problem more casually. 1. There are user records of type A, B, C etc. (userId field in index is common to all records) 2. A user can have any number of A, B, C etc (e.g. think of A being a language then user can know many languages like french, english, german etc) 3. Records are currently stored as a document in index. 4. A given query can match multiple records for the user 5. If for a user more records are matched (e.g. if he knows both french and german) then he is more relevant and should come top in UI. This is the reason I wanted to add lucene scores assuming the greater score means more relevance. Hope you got what I was saying. Another idea for this situation is doing faceting on userId field and then add the score but currently I think lucene only support facet count, basically solr will give you only count of docs it matched. Can I get sum of the score of documents that matched? Thanks Bharat Jain On Tue, Jul 27, 2010 at 5:58 AM, Erick Erickson erickerick...@gmail.comwrote: I'm having trouble getting my head around what you're trying to accomplish, so if this is off base you know why G. But what it smells like is that you're trying to do database-ish things in a SOLR index, which is almost always the wrong approach. Is there a way to index redundant data with each document so all you have to do to get the relevant users is a simple query? Adding scores is also suspect.. I don't see how that does predictable things. But I'm also failing completely to understand what a relevant user is. not much help, if this is way off base perhaps you could provide some additional use-cases? Best Erick On Mon, Jul 26, 2010 at 2:37 AM, Bharat Jain bharat.j...@gmail.com wrote: Hello All, I have a index which store multiple objects belonging to a user for e.g. schema field name=objType type=... / - Identifies user object type e.g. userBasic or userAdv !-- obj 1 -- field name=first_name type=... / MAPS to userBasicInfoObject field name=last_name type=... / !-- obj 2 -- field name=user_data_1 type=... / - MAPS to userAdvInfoObject field name=user_data_2 type=... / /schema Now when I am doing some query I get multiple records mapping to java objects (identified by objType) that belong to the same user. Now I want to show the relevant users at the top of the list. I am thinking of adding the Lucene scores of different result documents to get the best scores. Is this correct approach to get the relevance of the user? Thanks Bharat Jain
Re: Any tips/guidelines to turning the Solr/luence performance in a master/slave/sharding environment
Hi, I think the starting point should be : http://wiki.apache.org/solr/SolrPerformanceFactors For example you could start playing with the mergeFactor parameter. My 2 cents, Tommaso 2010/7/27 Chengyang atreey...@163.com How to reduce the index files size, decreate the sync time between each nodes. decrease the index create/update time. Thanks.
Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in the logs looked good. I'm stumped. It would be very nice to have a Solr implementation using the newest versions of PDFBox Tika and actually have content being extracted...=) Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 27, 2010 6:09 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract content and from others, Solr throws an exception during the Indexing Process . You must: Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8 snapshot and tika-parsers 0.8. Update PdfBox and all related libraries. After that You have to patch Solr 1.4.1 following this patch : https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel This is the firts way to solve the problem. Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception is thrown during the Indexing process, but no content is extracted. Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated) all sounds good but we don't know how stableit is! I hope you have now a clear vision of this issue, Best Regards 2010/7/26 Sharp, Jonathan jsh...@coh.org Every so often I need to index new batches of scanned PDFs and occasionally Adobe's OCR can't recognize the text in a couple of these documents. In these situations I would like to type in a small amount of text onto the document and have it be extracted by Solr CELL. Adobe Pro 9 has a number of different ways to add text directly to a PDF file: *Typewriter *Sticky Note *Callout boxes *Text boxes I tried indexing documents with each of these text additions
Re: SpatialSearch: sorting by distance
Does anybody know if this feature works correctly? Or I'm doing something wrong? 2010/7/27 Pavel Minchenkov char...@gmail.com Hi, I'm trying to sort by distance like this: sort=dist(2,lat,lon,55.755786,37.617633) asc In general results are sorted, but some documents are not in right order. I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate real distance after reading documents from Solr. Solr version from trunk. fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ field name=lat type=double indexed=true stored=true/ field name=lon type=double indexed=true stored=true/ Thanks. -- Pavel Minchenkov -- Pavel Minchenkov
Re: Integration Problem
Nobody out there who can help me with this problem? I need to edit the result of the javabin writer (adding the results from the webservice). I hope it is possible to do that. thanks in advance. Am Mo 26.07.2010 10:25 schrieb Jörg Wißmeier : Hi everybody, since a while i'm working with solr and i have integrated it with liferay 6.0.3. So every search request from liferay is processed by solr and its index. But i have to integrate another system, this system offers me a webservice. the results of these webservice should be in the results of solr but not in the index of it. I tried to do that with a custom query handler and a custom response writer and i'm able to write in the response msg of solr but only in the response node of the xml msg an not in the results node. So is there any solution how i could write in the results node of the xml msg from solr? thanks in advance best regards joerg Mit freundlichen Grüßen, Jörg Wißmeier ___ Ancud IT-Beratung GmbH Glockenhofstr. 47 90478 Nürnberg Germany T +49 911 25 25 68-0 F +49 911 25 25 68-68 joerg.wissme...@ancud.de www.ancud.de Angaben nach EHUG: Ancud IT-Beratung GmbH, Nürnberg; Geschäftsführer Konstantin Böhm; Amtsgericht Nürnberg, HRB 19954
solr log file rotation
Hi all, I am running a Solr 1.4 instance on FreeBSD that generates large log files in very short periods. I used /etc/newsyslog to configure log file rotation, however once the log file is rotated then Solr doesn't write logs to the new file. I'm wondering if there is a way to let Solr know that the log file will be rotated so that it recreates a correct file handle? Thanks Christos
Re: Spellchecking and frequency
Hi Mark, Thanks for that info looks very interesting, would be great to see your code. Out of interest did you use the dictionary and the phonetic file? Did you see better results with both? In regards to the secondary part to check the corpus for matching suggestions, would another way to do this is to have an event listener to listen for commits, and then build the dictionary for matching corpus words that way, then you avoid the performance hit at query time. Cheers, Dan On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland mark.holl...@zoopla.co.ukwrote: Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. I'd like to publish the code in case someone finds it useful (although it's a bit crude at the moment and will need a decent tidy up). Would it be appropriate to open up a Jira issue for this? Cheers, ~mark On 27 July 2010 09:33, dan sutton danbsut...@gmail.com wrote: Hi, I've recently been looking into Spellchecking in solr, and was struck by how limited the usefulness of the tool was. Like most corpora , ours contains lots of different spelling mistakes for the same word, so the 'spellcheck.onlyMorePopular' is not really that useful unless you click on it numerous times. I was thinking that since most of the time people spell words correctly why was there no other frequency parameter that could enter into the score? i.e. something like: spell_score ~ edit_dist * freq I'm sure others have come across this issue and was wonding what steps/algorithms they have used to overcome these limitations? Cheers, Dan
Re: Indexing Problem: Where's my data?
make sure to set stored=true on every field you expect to be returned in your results for later display. Chantal
Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi Lance! On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote: Should this go into the trunk, or does it only solve problems unique to your use case? The solution is generic but is an extension of XPathEntityProcessor because I didn't want to touch the solr.war. This way I can deploy the extension into SOLR_HOME/lib. The problem that it solves is not one with XPathEntityProcessor but more general. What it does: It adds an attribute to the entity that I called skipIfEmpty which takes the variable (it could even take more variables seperated by whitespace). On entityProcessor.init() which is called for sub-entities per row of root entity (:= before every new request to the data source), the value of the attribute is resolved and if it is null or empty (after trimming), the entity is not further processed. This attribute is only allowed on sub-entities. It would probably be nicer to put that somewhere higher up in the class hierarchy so that all entity processors could make use of it. But I don't know how common the use case is - all examples I found where more or less joins on primary keys. Cheers, Chantal Here comes the code== import static org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE; import java.util.Map; import java.util.logging.Logger; import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.DataImportHandlerException; import org.apache.solr.handler.dataimport.XPathEntityProcessor; public class OptionalXPathEntityProcessor extends XPathEntityProcessor { private Logger log = Logger.getLogger(OptionalXPathEntityProcessor.class.getName()); private static final String SKIP_IF_EMPTY = skipIfEmpty; private boolean skip = false; @Override protected void firstInit(Context context) { if (context.isRootEntity()) { throw new DataImportHandlerException(SEVERE, OptionalXPathEntityProcessor not allowed for root entities.); } super.firstInit(context); } @Override public void init(Context context) { String value = context.getResolvedEntityAttribute(SKIP_IF_EMPTY); if (value == null || value.trim().isEmpty()) { skip = true; } else { super.init(context); skip = false; } } @Override public MapString, Object nextRow() { if (skip) return null; return super.nextRow(); } }
Solr using 1500 threads - is that normal?
Hi, Solr seems to be crashing after a JVM exception that new threads cannot be created. I am writing in hope of advice from someone that has experienced this before. The exception that is causing the problem is: Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create new native thread The memory that is allocated to Solr is 3072MB, which should be enough memory for a ~6GB data set. The documents are not big either, they have around 10 fields of which only one stores large text ranging between 1k-50k. The top command at the time of the crash shows Solr using around 1500 threads, which I assume it is not normal. Could it be that the threads are crashing one by one and new ones are created to cope with the queries? In the log file, right after the the exception, there are several thousand commits before the server stalls completely. Normally, the log file would report 20-30 document existence queries per second, then 1 commit per 5-30 seconds, and some more infrequent faceted document searches on the data. However after the exception, there are only commits until the end of the log file. I am wondering if anyone has experienced this before or if it is some sort of known bug from Solr 1.4? Is there a way to increase the details of the exception in the logfile? I am attaching the output of a grep Exception command on the logfile. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:55:17 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:55:17 AM
Re: Strange search
try to delete solr.SnowballPorterFilterFactory from your analyzerchain. i had similar problems by using german SnowballPorterFilterFactory -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1001990.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrJ Response + JSON
Hello community, I need to transform SolrJ - responses into JSON, after some computing on those results by another application has finished. I can not do those computations on the Solr - side. So, I really have to translate SolrJ's output into JSON. Any experiences how to do so without writing your own JSON-writer? Thank you. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html Sent from the Solr - User mailing list archive at Nabble.com.
Get unique values
Hi, In my schema I have (inter ali) fields CollectionID, and CollectionName. These two values always match together, which means that for every value of CollectionID there is matching value from CollectionName. I am interested in query which allow me to get unique values of CollectionID with matching CollectionNames (rest of fields is not interested for me in this query). I was thinking about facets, but they offer a bit more than I need. Anyone has idea for query which allow me to get these results? Cheers, -- Rafał Zawadzki http://dev.bluszcz.net
Highlighted match snippets highlight non-matched words (such as 0.1 and 0.2)
Hi, I'm observing some strange highlighted words in field value snippets returned from Solr when matched term highlighting (http://wiki.apache.org/solr/HighlightingParameters) is enabled. In some cases, highlighted field value snippets contain highlighted words that are not matches: - this appears to be in addition to highlighting words that are matches - these non-match highlighted words are not pre-highlighted in the indexed content - I've determined these are non-matches by appending debugQuery=1 to the URL and examining the match detail information I've so far observed this in relation to the strings 0, 0.1, 0.2 and 0.4 in indexed content. Real life example when searching for [gas]: Relevant matched document result from Solr: doc str name=description EXAMPLE prepares an extensive range of traceable calibration gas standards with guaranteed relative uncertainties levels of 0.1% for certain species (PDF 676 KB). /str /doc Related highlighted snippet: lst name=7232 arr name=description str EXAMPLE prepares an extensive range of traceable calibration emgas/em standards with guaranteed relative uncertainties levels of em0.1/em% for certain species (PDF 676 KB). /str /arr /lst Note how the highlight snippet correctly highlights gas and incorrectly highlights 0.1. I've observed similar results for other searches where indexed content contains 0, 0.1, 0.2 and 0.4 and where these numbers are highlighted incorrectly. At this stage I'm trying to determine if this due to a poor implementation on my behalf or whether this is a bug in Solr. I'd really like to know if: 1. Anyone else has observed this behaviour 2. If this might be a known issue with Solr (I've tried to find out but haven't had any luck) 3. Anyone can test using something like http://solr/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in+resp onse)hl.fragsize=0 http://%3csolr%3e/select?hl=truehl.fl=*q=(phrase+that+contains+0.1+in +response)hl.fragsize=0 Thanks, Jon Cram
Re: clustering component
The patch should also work with trunk, but I haven't verified it yet. I've just added a patch against solr trunk to https://issues.apache.org/jira/browse/SOLR-1804. S.
Show elevated Result Differently
I want to show elevated Result Different from others is there any way to do this -- View this message in context: http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ Response + JSON
I think you should just be able to add wt=json to the end of your query (or change whatever the existing wt parameter is in your URL). Mark On 28 Jul 2010, at 12:54 pm, MitchK wrote: Hello community, I need to transform SolrJ - responses into JSON, after some computing on those results by another application has finished. I can not do those computations on the Solr - side. So, I really have to translate SolrJ's output into JSON. Any experiences how to do so without writing your own JSON-writer? Thank you. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html Sent from the Solr - User mailing list archive at Nabble.com. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
SolrJ Response + JSON
Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002115p1002115.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ Response + JSON
On 28 Jul 2010, at 2:08 pm, MitchK wrote: Second try to send a mail to the mailing list... Your first attempt got through as well. Here's my original response. I think you should just be able to add wt=json to the end of your query (or change whatever the existing wt parameter is in your URL). Mark On 28 Jul 2010, at 12:54 pm, MitchK wrote: Hello community, I need to transform SolrJ - responses into JSON, after some computing on those results by another application has finished. I can not do those computations on the Solr - side. So, I really have to translate SolrJ's output into JSON. Any experiences how to do so without writing your own JSON-writer? Thank you. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html Sent from the Solr - User mailing list archive at Nabble.com. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: SolrJ Response + JSON
Hi, I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the JSONResponseWriter, if you haven't already, and query with wt=json. Can't get mucht easier. Cheers, On Wednesday 28 July 2010 15:08:26 MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: SolrJ Response + JSON
Thank you Markus, Mark. Seems to be a problem with Nabble, not with the mailing list. Sorry. I can create a JSON-response, when I query Solr directly. But I mean, that I query Solr through a SolrJ-client (CommonsHttpSolrServer). That means my queries look a litte bit like that: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr So the response is shown as an QueryResponse-object, not as a JSON-string. Or do I miss something here? Am 28.07.2010 15:15, schrieb Markus Jelsma: Hi, I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the JSONResponseWriter, if you haven't already, and query with wt=json. Can't get mucht easier. Cheers, On Wednesday 28 July 2010 15:08:26 MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
Yesterday I did get this working with version 4.0 from trunk. I haven't fully tested it yet, but the content doesn't come through blank anymore, so that's good. Would it be more stable to stick with 1.4.1 and your patch to get to Tika 0.8, or to stick with the 4.0 trunk version? Best, Dave -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Wednesday, July 28, 2010 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in the logs looked good. I'm stumped. It would be very nice to have a Solr implementation using the newest versions of PDFBox Tika and actually have content being extracted...=) Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 27, 2010 6:09 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract content and from others, Solr throws an exception during the Indexing Process . You must: Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8 snapshot and tika-parsers 0.8. Update PdfBox and all related libraries. After that You have to patch Solr 1.4.1 following this patch : https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel This is the firts way to solve the problem. Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception is thrown during the Indexing process, but no content is extracted. Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated) all sounds good but we don't know how stableit is! I hope you have now a clear vision of this issue,
Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
In my opinion, the 1.4.1 version with the Patch is more Stable. Until 4.0 will be released 2010/7/28 David Thibault dthiba...@esperion.com Yesterday I did get this working with version 4.0 from trunk. I haven't fully tested it yet, but the content doesn't come through blank anymore, so that's good. Would it be more stable to stick with 1.4.1 and your patch to get to Tika 0.8, or to stick with the 4.0 trunk version? Best, Dave -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Wednesday, July 28, 2010 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in the logs looked good. I'm stumped. It would be very nice to have a Solr implementation using the newest versions of PDFBox Tika and actually have content being extracted...=) Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 27, 2010 6:09 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract content and from others, Solr throws an exception during the Indexing Process . You must: Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8 snapshot and tika-parsers 0.8. Update PdfBox and all related libraries. After that You have to patch Solr 1.4.1 following this patch : https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel This is the firts way to solve the problem. Using Solr 1.4.1 (with tika 0.8 snapshot and
Re: SolrJ Response + JSON
You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content
If you don't store the content then you can't do highlighting, right? Also, don't you just have to switch the text field to say stored=true in your schema to store the text? I don't understand why you're differentiating the behavior of ExtractingRequestHandler from the behavior of Solr in general. Doesn't ExtractingRequestHandler just pull the text out of whatever file you send it and then the rest of the processing happens like any other Solr post? The bug I was experiencing was the same one that someone else brought up on the list yesterday in the emails entitled Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox. It ties back to this bug: https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel I saw that email shortly after I sent this one to the list (it figures, doesn't it...=). I tried doing what they suggested on that bug report (patching Solr 1.4.x and using Tika 0.8-SNAPSHOT), but the patches failed when I applied it to my Solr 1.4.1. They have since added a patch for Solr 1.4.1. I haven't tried it yet. However, I did get it working using Solr 4.0 out of trunk (which also uses Tika 0.8 and updated PDFBox jars). I have yet to decide which will be more stable, Solr 4.0 or patched Solr 1.4.1, both of which with updated PDFbox and Tika jars. Best, Dave -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, July 27, 2010 8:09 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content There are two different datasets that Solr (Lucene really) saves from a document: raw storage and the indexed terms. I don't think the ExtractingRequestHandler ever automatically stored the raw data; in fact Lucene works in Strings internally, not raw byte arrays (this is changing). It should be indexed- that means if you search 'text' with a word from the document, it will find those documents and bring back the file name. Your app has to then use the file name. Solr/Lucene is not intended as a general-purpose content store, only an index. The ERH wiki page doesn't quite say this. It describes what the ERH does rather than what it does not do :) On Mon, Jul 26, 2010 at 12:00 PM, David Thibault dthiba...@esperion.com wrote: Hello all, I’m working on a project with Solr. I had 1.4.1 working OK using ExtractingRequestHandler except that it was crashing on some PDFs. I noticed that Tika bundled with 1.4.1 was 0.4, which was kind of old. I decided to try updating to 0.7 as per the directions here: http://wiki.apache.org/solr/ExtractingRequestHandler but it was giving me errors (I forget what they were specifically). Then I tried downloading Solr 3.1 from the source repository, which I noticed came with Tika 0.7. I figured this would be an easier route to get working. Now I’m testing with 3.1 and 0.7 and I’m noticing my documents are going into Solr OK, but they all have blank content (no document text stored in Solr). I did see that the default “text” field is not stored. Changing that to stored=true didn’t help. Changing to fmap.content=attr_contentuprefix=attr_content didn’t help either. I have attached all relevant info here. Please let me know if someone sees something I don’t (it’s entirely possible as I’m relatively new to Solr). Schema.xml: ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.3 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldtype name=binary class=solr.BinaryField/ fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ fieldType name=pint class=solr.IntField omitNorms=true/ fieldType name=plong class=solr.LongField omitNorms=true/ fieldType name=pfloat class=solr.FloatField
RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
Thanks, I'll try that then. I kind of figured that'd be the answer, but after fighting with Solr ExtractingRequestHandler for 2 days I also just wanted to be done with it once it started working with 4.0...=) However, stability would be better in the long run. Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Wednesday, July 28, 2010 9:33 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox In my opinion, the 1.4.1 version with the Patch is more Stable. Until 4.0 will be released 2010/7/28 David Thibault dthiba...@esperion.com Yesterday I did get this working with version 4.0 from trunk. I haven't fully tested it yet, but the content doesn't come through blank anymore, so that's good. Would it be more stable to stick with 1.4.1 and your patch to get to Tika 0.8, or to stick with the 4.0 trunk version? Best, Dave -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Wednesday, July 28, 2010 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in the logs looked good. I'm stumped. It would be very nice to have a Solr implementation using the newest versions of PDFBox Tika and actually have content being extracted...=) Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 27, 2010 6:09 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't
Re: logic required for newbie
you can index each of these field separately... field1- Id field2- name field3-user_id field4-country. field7- landmark While quering you can specify q=Landmark9 This will return you results.. And if you want only particular fields in output.. use the fl parameter in query... like http://localhost:8090/solr/select? indent=onq=landmark9fl=ID,user_id,country,landmark This will give your desired solution.. On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I am very new and learning solr. I have 10 column like following in table 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 when user search for landmark then I want to return only one landmark which match. Rest of the landmark should ingnored.. expected result like following if user search by landmark2.. 1. id 2. name 3. user_id 4. location 5. country 7. landmark2 or if search by landmark9 1. id 2. name 3. user_id 4. location 5. country 9. landmark9 please help me to design the schema for this kind of requirement... thanks with regards
Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
This was my same feeling :-) and so I went for the trunk to have things working quickly, but I also have to consider which one is the best version since I am going to deploy it in the near future in an enterprise environment and choosing the best version is an importat step. I am quite new to Solr but I agree with Alessandro that probably using a slightly patched release should theoretically be more stable than the trunk which get many updates weekly (and daily). Cheers, Tommaso 2010/7/28 David Thibault dthiba...@esperion.com Thanks, I'll try that then. I kind of figured that'd be the answer, but after fighting with Solr ExtractingRequestHandler for 2 days I also just wanted to be done with it once it started working with 4.0...=) However, stability would be better in the long run. Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Wednesday, July 28, 2010 9:33 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox In my opinion, the 1.4.1 version with the Patch is more Stable. Until 4.0 will be released 2010/7/28 David Thibault dthiba...@esperion.com Yesterday I did get this working with version 4.0 from trunk. I haven't fully tested it yet, but the content doesn't come through blank anymore, so that's good. Would it be more stable to stick with 1.4.1 and your patch to get to Tika 0.8, or to stick with the 4.0 trunk version? Best, Dave -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Wednesday, July 28, 2010 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from
simple question from a newbie
Hi, I'm new to Solr and have a rather dumb question. I want to do a query that returns all the Titles that start with a certain letter. For example I have these titles: Results of in-mine research in support Cancer Reports State injury indicators report Cancer Reports Indexed dermal bibliography Childhood agricultural-related injury report Childhood agricultural injury prevention I want the query to return: Cancer Reports Cancer Reports Childhood agricultural-related injury report Childhood agricultural injury prevention I want something like dc.title=c* type query I know that I can facet by dc.title and then use the parameter facet.prefix=c but it returns something like this: Cancer Reports [2] Childhood agricultural-related injury report [1] Childhood agricultural injury prevention [1] Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329
Re: SolrJ Response + JSON
Thank you, Chantal. I have looked at this one: http://www.json.org/java/index.html This seems to be an easy-to-understand-implementation. However, I am wondering how to determine whether a SolrDocument's field is multiValued or not. The JSONResponseWriter of Solr looks at the schema-configuration. However, the client shouldn't do that. How did you solved that problem? Thanks for sharing ideas. - Mitch Am 28.07.2010 15:35, schrieb Chantal Ackermann: You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
display solr result in JSP
I am new for solr. Just got example xml file index and search by following solr tutorial. I wonder how I can get the search result display in a JSP. I really appreciate any suggestions you can give. Thanks so much, Xiaohui
Re: SolrJ Response + JSON
Hi Mitch On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote: Thank you, Chantal. I have looked at this one: http://www.json.org/java/index.html This seems to be an easy-to-understand-implementation. However, I am wondering how to determine whether a SolrDocument's field is multiValued or not. The JSONResponseWriter of Solr looks at the schema-configuration. However, the client shouldn't do that. How did you solved that problem? I didn't. I'm not recreating JSON from the SolrJ results. I would try to use the same classes that SolrJ uses, actually. (Writing that without having a further look at the code.) I would avoid recreating existing code as much as possible. About multivalued fields: you need instanceof checks, I guess. The field only contains a list if there really are multiple values. (That's what works for my ScriptTransformer.) Are you sure that you cannot change the SOLR results at query time according to your needs? Maybe you should ask for that, first (ask for X instead of Y...). Cheers, Chantal Thanks for sharing ideas. - Mitch Am 28.07.2010 15:35, schrieb Chantal Ackermann: You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
Re: logic required for newbie
Hi thanks for reply.. Actually requirement is diffrent (sorry if I am unable to clerify in first mail). basically follwoing are the fields name in schema as well: 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 which carrying text... for example: id1/id namesome name/name user_iduser_id/user_id locationnew york/location countryUSA/country landmark15th avenue/landmark1 landmark2ms departmental store/landmark2 landmark3base bakery/landmark3 landmark4piza hut/landmark4 landmark5ford motor/landmark5 now if user search by piza then expected result like: id1/id namesome name/name user_iduser_id/user_id locationnew york/location countryUSA/country landmark4piza hut/landmark4 it means I want to ignore all other landmark which not match. By filter we can filter the fields but here I dont know the the field name because it depends on text match. is there any other solution.. I am ready to change in schema or in logic. I am using solrj. please help me I stuck here.. with regards On Wed, Jul 28, 2010 at 7:22 PM, rajini maski rajinima...@gmail.com wrote: you can index each of these field separately... field1- Id field2- name field3-user_id field4-country. field7- landmark While quering you can specify q=Landmark9 This will return you results.. And if you want only particular fields in output.. use the fl parameter in query... like http://localhost:8090/solr/select? indent=onq=landmark9fl=ID,user_id,country,landmark This will give your desired solution.. On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I am very new and learning solr. I have 10 column like following in table 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 when user search for landmark then I want to return only one landmark which match. Rest of the landmark should ingnored.. expected result like following if user search by landmark2.. 1. id 2. name 3. user_id 4. location 5. country 7. landmark2 or if search by landmark9 1. id 2. name 3. user_id 4. location 5. country 9. landmark9 please help me to design the schema for this kind of requirement... thanks with regards
Re: display solr result in JSP
Hi, very simple to display value in jsp. if you are using solrj then simply store value in bean from java class and can display. same thing you can do in servlet too.. get the solr server response and return in bean or can display directly(in servlet). hope you will able to do. regards Ranveer On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: I am new for solr. Just got example xml file index and search by following solr tutorial. I wonder how I can get the search result display in a JSP. I really appreciate any suggestions you can give. Thanks so much, Xiaohui
Re: simple question from a newbie
I think you using wild-card search or should use wild-card search. but first of all please provide the schema and configuration file for more details. regards Ranveer On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR) wrote: Hi, I'm new to Solr and have a rather dumb question. I want to do a query that returns all the Titles that start with a certain letter. For example I have these titles: Results of in-mine research in support Cancer Reports State injury indicators report Cancer Reports Indexed dermal bibliography Childhood agricultural-related injury report Childhood agricultural injury prevention I want the query to return: Cancer Reports Cancer Reports Childhood agricultural-related injury report Childhood agricultural injury prevention I want something like dc.title=c* type query I know that I can facet by dc.title and then use the parameter facet.prefix=c but it returns something like this: Cancer Reports [2] Childhood agricultural-related injury report [1] Childhood agricultural injury prevention [1] Vincent Vu Nguyen Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-6154 Century Bldg 2400 Atlanta, GA 30329
RE: Indexing Problem: Where's my data?
Thanks - but my schema.xml is not recognizing field names specified in the data-config.xml. For example - and I just tested this now - if I have in my data-config.xml: field column=product_id name=pid / And then in my schema.xml: field name=pid type=int indexed=true stored=true required=true / Then no documents are processed (e.g. I get rows queried, but str name=Total Documents Processed0/str in the data handler UI). But if I change that to: field name=product_id type=int indexed=true stored=true required=true / ... now documents are processed (e.g. str name=Total Documents Processed313/str). Which, quite frankly, confuses me. I may be doing something else wrong (I changed my SQL as well, so I'm getting another failure, but I think it's separate to this one). -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, July 27, 2010 8:25 PM To: solr-user@lucene.apache.org Subject: Re: Indexing Problem: Where's my data? Solr respects case for field names. Database fields are supplied in lower-case, so it should be 'attribute_name' and 'string_value'. Also 'product_id', etc. It is easier if you carefully emulate every detail in the examples, for example lower-case names. On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc ken.fos...@realestate.com wrote: for STRING_VALUE, I assume there is a property in the 'select *' results called string_value? if so I'm not sure why it wouldn't work. If not, then that's why, it doesn't have anything to put there. For ATTRIBUTE_NAME, is it possibly a case issue? you called it 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your schema...just something to check I guess. Also, not sure why you are using name= in your fields, for example, field column=PARENT_FAMILY name=Parent Family / I thought 'column' was the source field name and 'name' was supposed to be the schema field name and if not there it would assume 'column' name. You don't have a schema field called Parent Family so it looks like it's defaulting to column name too which is lucky for you I suppose. But you may want to either remove 'name=' or make it match the schema. (and I may be completely wrong on this, it's been a while since I got DIH going). -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp 1000660p1000843.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Spellchecking and frequency
I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. This is interesting to me. I also have not been that happy with standard solr spellcheck. In addition to possibly filing a JIRA for future fix to Solr itself, another option would be you could make your 'alternate' SpellCheck component available as a seperate .jar, so anyone could use it just by installing and specifying it in their solrconfig.xml. I would encourage you to consider that, not as a replacement for suggesting a patch to Solr itself, but so people can use your improved spellchecker immediately, without waiting for possible Solr patches. Jonathan
Re: Is there a cache for a query?
As far as I know all searches get cache at least for some time. I am not sure about field collapse results being cached. - Moazzam http://moazzam-khan.com On Mon, Jul 26, 2010 at 9:48 PM, Li Li fancye...@gmail.com wrote: I want a cache to cache all result of a query(all steps including collapse, highlight and facet). I read http://wiki.apache.org/solr/SolrCaching, but can't find a global cache. Maybe I can use external cache to store key-value. Is there any one in solr?
Re: SolrJ Response + JSON
Hi Chantal, thank you for the feedback. I did not see the wood for the trees! The SolrDocument's javadoc says the following: http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html |*getFieldValue ../../../../org/apache/solr/common/SolrDocument.html#getFieldValue%28java.lang.String%29*(String http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true name)| Get the value or collection of values for a given field. The magical word here is that little or :-). I will try that tomorrow and give you a feedback! Are you sure that you cannot change the SOLR results at query time according to your needs? Unfortunately, it is not possible in this case. Kind regards, Mitch Am 28.07.2010 16:49, schrieb Chantal Ackermann: Hi Mitch On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote: Thank you, Chantal. I have looked at this one: http://www.json.org/java/index.html This seems to be an easy-to-understand-implementation. However, I am wondering how to determine whether a SolrDocument's field is multiValued or not. The JSONResponseWriter of Solr looks at the schema-configuration. However, the client shouldn't do that. How did you solved that problem? I didn't. I'm not recreating JSON from the SolrJ results. I would try to use the same classes that SolrJ uses, actually. (Writing that without having a further look at the code.) I would avoid recreating existing code as much as possible. About multivalued fields: you need instanceof checks, I guess. The field only contains a list if there really are multiple values. (That's what works for my ScriptTransformer.) Are you sure that you cannot change the SOLR results at query time according to your needs? Maybe you should ask for that, first (ask for X instead of Y...). Cheers, Chantal Thanks for sharing ideas. - Mitch Am 28.07.2010 15:35, schrieb Chantal Ackermann: You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
RE: simple question from a newbie
I think I got it to work. If I do a wildcard search using the dc3.title field it seems to work fine (dc3.title:c*). The dc.title:c* returns every title that has a word in it that starts with 'c', which isn't exactly what I wanted. I'm guessing it's because of the type=caseInsensitiveSort. Well, here is my schema for reference. Thanks for your help. - schema name=example version=1.1 - types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / - !-- boolean type: true or false -- fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / fieldType name=integer class=solr.IntField omitNorms=true / fieldType name=long class=solr.LongField omitNorms=true / fieldType name=float class=solr.FloatField omitNorms=true / fieldType name=double class=solr.DoubleField omitNorms=true / fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true / fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true / fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true / fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true / fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true / - fieldType name=text_ws class=solr.TextField positionIncrementGap=100 - analyzer tokenizer class=solr.WhitespaceTokenizerFactory / /analyzer /fieldType - fieldType name=text class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer - analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType - fieldType name=textTight class=solr.TextField positionIncrementGap=100 - analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType - fieldType name=caseInsensitiveSort class=solr.TextField sortMissingLast=true omitNorms=true - analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldtype name=ignored stored=false indexed=false class=solr.StrField / /types - fields - !-- Fedora specific fields -- field name=PID type=string indexed=true stored=true / field name=fgs.state type=string indexed=true stored=true / field name=fgs.label type=text indexed=true stored=true / field name=fgs.ownerId type=string indexed=true stored=true / field name=fgs.createdDate type=date indexed=true stored=true / field name=fgs.lastModifiedDate type=date indexed=true stored=true / field name=fgs.contentModel type=string indexed=true stored=true / field name=fgs.type type=string indexed=true stored=true multiValued=true / - !-- DC Fields -- field name=dc.contributor type=text indexed=true stored=true multiValued=true / field name=dc.coverage type=text indexed=true stored=true multiValued=true / field name=dc.creator type=text indexed=true stored=true multiValued=true / field name=dc.date type=text indexed=true stored=true multiValued=true / field name=dc.description type=text indexed=true stored=true multiValued=true / field name=dc.format type=text indexed=true stored=true multiValued=true / field name=dc.identifier type=text indexed=true stored=true multiValued=true / field name=dc.language type=text indexed=true stored=true multiValued=true / field name=dc.publisher type=text indexed=true stored=true multiValued=true / field name=dc.relation type=text indexed=true
Solr 1.4.1 field collapse
Hi guys, I read somewhere that Solr 1.4.1 has field collapse support by default (without patching it) but I haven't been able to confirm it. Is this true? - Moazzam
Re: slave index is bigger than master index
Well I do have disk limitations too, and thats why I think slave nodes died, when replicating data from master node. (as it was just adding on top of existing index files). :: What do you mean here? Optimizing is too CPU expensive? What I meant by avoid playing around with slave nodes is that doing anything (including optimization on slave nodes) that may effect the live search performance, unless I have no option. :: Do you mean increase to double size? yes, as it did before on replication. But I didn't get a chance to run the indexer yesterday. -- View this message in context: http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002426.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slave index is bigger than master index
In solrconfig.xml, these two lines control that. Maybe they need to be increased. str name=httpConnTimeout5000/str str name=httpReadTimeout1/str Where do I add those in solrconfig? These lines doesn't seem to be present in the example solrconfig file... -- View this message in context: http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p1002432.html Sent from the Solr - User mailing list archive at Nabble.com.
How do NOT queries work?
I wonder how do NOT queries work. Is it a pass on the result set and filtering out the NOT property or something like that? Also is there anybody who does some performance checks on NOT queries? I want to know whether there is a significant performance degradation or not when you have NOT in a query. Thanks... //kaan
RE: display solr result in JSP
Thanks so much for your reply. I don't have much experience at JSP. I found tag library, and am trying to use xsltlib:apply xml=%= url.getContent().toString() % xsl=/xsl/result.xsl/ . Unfortunately I didn't get it work. Would you please give me more information? I really appreciate your help! Thanks, Xiaohui -Original Message- From: Ranveer [mailto:ranveer.s...@gmail.com] Sent: Wednesday, July 28, 2010 11:27 AM To: solr-user@lucene.apache.org Subject: Re: display solr result in JSP Hi, very simple to display value in jsp. if you are using solrj then simply store value in bean from java class and can display. same thing you can do in servlet too.. get the solr server response and return in bean or can display directly(in servlet). hope you will able to do. regards Ranveer On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: I am new for solr. Just got example xml file index and search by following solr tutorial. I wonder how I can get the search result display in a JSP. I really appreciate any suggestions you can give. Thanks so much, Xiaohui
Re: Total number of terms in an index?
Tom, The total number of terms... Ah well, not a big deal, however yes the flex branch does expose this so we can show this in Solr at some point, hopefully outside of Solr's Luke impl. On Tue, Jul 27, 2010 at 9:27 AM, Burton-West, Tom tburt...@umich.edu wrote: Hi Jason, Are you looking for the total number of unique terms or total number of term occurrences? Checkindex reports both, but does a bunch of other work so is probably not the fastest. If you are looking for total number of term occurrences, you might look at contrib/org/apache/lucene/misc/HighFreqTerms.java. If you are just looking for the total number of unique terms, I wonder if there is some low level API that would allow you to just access the in-memory representation of the tii file and then multiply the number of terms in it by your indexDivisor (default 128). I haven't dug in to the code so I don't actually know how the tii file gets loaded into a data structure in memory. If there is api access, it seems like this might be the quickest way to get the number of unique terms. (Of course you would have to do this for each segment). Tom -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, July 26, 2010 8:39 PM To: solr-user@lucene.apache.org Subject: Re: Total number of terms in an index? : Sorry, like the subject, I mean the total number of terms. it's not stored anywhere, so the only way to fetch it is to actually iteate all of the terms and count them (that's why LukeRequestHandler is slow slow to compute this particular value) If i remember right, someone mentioned at one point that flex would let you store data about stuff like this in your index as part of the segment writing, but frankly i'm still not sure how that iwll help -- because you unless your index is fully optimized, you still have to iterate the terms in each segment to 'de-dup' them. -Hoss
RE: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox
Tommasso, I used your patch and tried it with the 1.4.1 solr.war from a fresh 1.4.1 distribution, and it still gave me that NoSuchMethodError. However, when I tried it with the newly-patched-and-compiled apache-solr-1.4.2-dev.war file it works. I think I tried that before and it didn't work. In any case, thanks for the patch and the advice. Looks like now it's working for me. Best, Dave -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Wednesday, July 28, 2010 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps directory (that is inside Jetty). Hope this helps. Tommaso 2010/7/27 David Thibault dthiba...@esperion.com Alessandro all, I was having the same issue with Tika crashing on certain PDFs. I also noticed the bug where no content was extracted after upgrading Tika. When I went to the SOLR issue you link to below, I applied all the patches, downloaded the Tika 0.8 jars, restarted tomcat, posted a file via curl, and got the following error: SEVERE: java.lang.NoSuchMethodError: org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader; at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:93) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:619) This is really weird because I DID apply the SolrResourceLoader patch that adds the getClassLoader method. I even verified by going opening up the JARs and looking at the class file in Eclipse...I can see the SolrResourceLoader.getClassLoader() method. Does anyone know why it can't find the method? After patching the source I did ant clean dist in the base directory of the Solr source tree and everything looked like it compiles (BUILD SUCCESSFUL). Then I copied all the jars from dist/ and all the library dependencies from contrib/extraction/lib/ into my SOLR_HOME. Restarting tomcat, everything in the logs looked good. I'm stumped. It would be very nice to have a Solr implementation using the newest versions of PDFBox Tika and actually have content being extracted...=) Best, Dave -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 27, 2010 6:09 AM To: solr-user@lucene.apache.org Subject: Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox Hi Jon, During the last days we front the same problem. Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract content and from others, Solr throws an exception during the Indexing Process . You must: Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8 snapshot and tika-parsers 0.8. Update PdfBox and all related libraries. After that You have to patch Solr 1.4.1 following this patch : https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel This is the firts way to solve the problem. Using Solr 1.4.1 (with tika 0.8 snapshot and pdfbox updated) no exception is thrown during the Indexing process, but no content is extracted. Using last Solr trunk (with tika 0.8 snapshot and pdfbox updated)
Re: Total number of terms in an index?
At first I was thinking the TermsComponent might give you this, but oddly it seems not to. http://wiki.apache.org/solr/TermsComponent
RE: How to 'filter' facet results
ManBearPig is still a threat. -Kallin Nagelberg -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Tuesday, July 27, 2010 7:44 PM To: solr-user@lucene.apache.org Subject: RE: How to 'filter' facet results Is there a way to tell Solr to only return a specific set of facet values? I feel like the facet query must be able to do this, but I'm not really understanding the facet query. In my specific case, I'd like to only see facet values for the same values I pass in as query filters, i.e. if I run this query: fq=keyword:man OR keyword:bear OR keyword:pig facet=on facet.field:keyword then I only want it to return the facet counts for man, bear, and pig. The resulting docs might have a number of different values for keyword, in addition For the general case of filtering facet values, I've wanted to do that too in more complex situations, and there is no good way I've found. For your very specific use case though, yeah, you can do it with facet.query. Leave out the facet.field, but instead: facet.query=keyword:man facet.query=keyword:bear facet.query=keyword:pig You'll get three facet.query results in the response, one each for man, bear, pig. Solr behind the scenes will kind of do three seperate 'sub-queries', one for each facet.query, but since the query itself should be cached, you shouldn't notice much difference. Especially if you have a warming query that facets on the keyword field (I'm never entirely sure when caches created by warming queries will be used by a facet.query, or if it depends on the facet method in use, but it can't hurt). Jonathan
Problem with field collapsing
Hi All, Whenever I use field collapse, the numFound attribute contains exactly as many rows as I put in rows parameter instead of returning total number of documents that matched the query. Is there a way to rectify this? Thanks, Moazzam
Re: SolrCore has a large number of SolrIndexSearchers retained in infoRegistry
Hi, It didn't seem like it improved the situation. The same exception stack traces are found. I have explicitly defined the index readers to be reopened by specifying in the solrconfig.xml The exception occurs when the remote cores are being searched. I am attaching the exceptions in a text file for reference. http://lucene.472066.n3.nabble.com/file/n1002926/solrexceptions.txt solrexceptions.txt Couple of notes: 1. QueryComponent#process Is requesting for a SolrIndexSearcher twice by calling SolrQueryRequest#getSearcher() but is never being closed. I see several instances where getSearcher is being called but is never being properly closed - performing a quick call heirarchy of SolrQueryRequest#getSearcher() and SolrQueryRequest#close() will illustrate this point. 2. It may be the case that this exception was never encountered because typical deployments are not heavily using Distributed Search across multiple Solr Cores and/or it's a small memory leak and so never noticed? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCore-has-a-large-number-of-SolrIndexSearchers-retained-in-infoRegistry-tp483900p1002926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr to perform range queries in Dspace
: I'm trying to use dspace to search across a range of index created and stored : using Dsindexer.java class. I have seen where Solr can be use to perform I've never headr of Dsindexer.java but since this is hte first result google returns... http://scm.dspace.org/trac/dspace/browser/trunk/dspace/src/org/dspace/search/DSIndexer.java?rev=970 ...i'm going to assume that's what you are talking about. : numerical range queries using either TrieIntField, : TrieDoubleField,TrieLongField, etc.. classes defined in Solr's api or : SortableIntField.java, SortableLongField,SortableDoubleField.java. I would : like to know how to implement these classes in Dspace so that I can be able : to perform numerical range queries. Any help would be greatly apprciated. i *think* what you are asking is how to use Solr to search the numeric fields in an existing Lucene index (created by the above mentioned java code) -- but i may be wrong (your choice of wording implement these classes in Dspace is very perplexing to me). If i'm understanding correctly, then the key to the issue is all in how the numeric values are indexed as lucene Fields in your existing code -- but in the copy of DSIndexer.java i found, there are no numeric fields, just Text fields. If you are indexing the numeric values as simple strings, then in Solr you would want to refer to them using hte legacy IntField, FloatField, etc... these assume simple string representations, and will sort properly using the numeric FieldCache -- BUT! -- range queries won't work. Range queries require that the indexed terms be a logical ordering which isn't true for simple string representations of numbers (100 is lexigraphically before 2). If i actually have your question backwards -- if what you are asiking is how to modify the DSIndexer.java class to index fields in the same way as TrieDoubleField,TrieLongField,SortableIntField, etc... then the answer is much simpler: all FieldType's in Solr implement toInternal and toExternal methods ... the toInternal is what you need to call to encode your simple numeric values into the format to be indexed -- toExternal (or toObject) is how you cna get the original value back out. For the Trie fields, these actually just use some utilities in Lucnee, so you could look at the code and use the same utilities w/o ever needing any Solr source code. If i've completley missunderstood your question, plese post a followup explaining in more detail what it is you are trying to accomplish. -Hoss
Know which terms are in a document
I would like to be search against my index, and then *know* which of a set of given terms were found in each document. For example, let's say I want to show articles with the word pizza or cake in them, but would like to be able to say which of those two was found. I might use this to handle the article differently if it is about pizza, or if it is about cake. I understand I can do multiple queries but I would like to avoid that. One thought I had was to use a highlighter and only return a fragment with the highlighted word, but I'm not sure how to do this with the various highlighting options. Is there a way? Thanks.
Re: Show elevated Result Differently
Please expand on what this means, it's quite vague. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Wed, Jul 28, 2010 at 8:43 AM, Vishal.Arora vis...@value-one.com wrote: I want to show elevated Result Different from others is there any way to do this -- View this message in context: http://lucene.472066.n3.nabble.com/Show-elevated-Result-Differently-tp1002081p1002081.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: simple question from a newbie
What is the query you submit (don't forget debugQuery=on? In particular, what field are you sorting on? But yes, if you're searching on a tokenized field, you'll get matches on all tokens in that field. Which are probably single words. And no matter how you sort, you're still getting documents where the whole title doesn't start with c in your title. What happens if you search on your dc3.title instead? It uses the keyword tokenizer which tokenizes the entire title as a single token. Sort by that one too. Best Erick On Wed, Jul 28, 2010 at 12:26 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR) v...@cdc.gov wrote: I think I got it to work. If I do a wildcard search using the dc3.title field it seems to work fine (dc3.title:c*). The dc.title:c* returns every title that has a word in it that starts with 'c', which isn't exactly what I wanted. I'm guessing it's because of the type=caseInsensitiveSort. Well, here is my schema for reference. Thanks for your help. - schema name=example version=1.1 - types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / - !-- boolean type: true or false -- fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / fieldType name=integer class=solr.IntField omitNorms=true / fieldType name=long class=solr.LongField omitNorms=true / fieldType name=float class=solr.FloatField omitNorms=true / fieldType name=double class=solr.DoubleField omitNorms=true / fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true / fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true / fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true / fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true / fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true / - fieldType name=text_ws class=solr.TextField positionIncrementGap=100 - analyzer tokenizer class=solr.WhitespaceTokenizerFactory / /analyzer /fieldType - fieldType name=text class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer - analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType - fieldType name=textTight class=solr.TextField positionIncrementGap=100 - analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType - fieldType name=caseInsensitiveSort class=solr.TextField sortMissingLast=true omitNorms=true - analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldtype name=ignored stored=false indexed=false class=solr.StrField / /types - fields - !-- Fedora specific fields -- field name=PID type=string indexed=true stored=true / field name=fgs.state type=string indexed=true stored=true / field name=fgs.label type=text indexed=true stored=true / field name=fgs.ownerId type=string indexed=true stored=true / field name=fgs.createdDate type=date indexed=true stored=true / field name=fgs.lastModifiedDate type=date indexed=true stored=true / field name=fgs.contentModel type=string indexed=true stored=true / field name=fgs.type type=string indexed=true stored=true multiValued=true / - !-- DC Fields -- field name=dc.contributor type=text indexed=true stored=true multiValued=true / field name=dc.coverage type=text indexed=true stored=true
Re: Solr using 1500 threads - is that normal?
1,500 threads seems extreme by any standards so there is something happening in your install. Even with appservers for web apps, typically 100 would be a fair # of threads. On 7/28/10, Christos Constantinou ch...@simpleweb.co.uk wrote: Hi, Solr seems to be crashing after a JVM exception that new threads cannot be created. I am writing in hope of advice from someone that has experienced this before. The exception that is causing the problem is: Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create new native thread The memory that is allocated to Solr is 3072MB, which should be enough memory for a ~6GB data set. The documents are not big either, they have around 10 fields of which only one stores large text ranging between 1k-50k. The top command at the time of the crash shows Solr using around 1500 threads, which I assume it is not normal. Could it be that the threads are crashing one by one and new ones are created to cope with the queries? In the log file, right after the the exception, there are several thousand commits before the server stalls completely. Normally, the log file would report 20-30 document existence queries per second, then 1 commit per 5-30 seconds, and some more infrequent faceted document searches on the data. However after the exception, there are only commits until the end of the log file. I am wondering if anyone has experienced this before or if it is some sort of known bug from Solr 1.4? Is there a way to increase the details of the exception in the logfile? I am attaching the output of a grep Exception command on the logfile. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of
Re: Solr using 1500 threads - is that normal?
Your commits are very suspect. How often are you making changes to your index? Do you have autocommit on? Do you commit when updating each document? Committing too often and consequently firing off warmup queries is the first place I'd look. But I agree with dc tech, 1,500 is wy more than I would expect. Best Erick On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou ch...@simpleweb.co.uk wrote: Hi, Solr seems to be crashing after a JVM exception that new threads cannot be created. I am writing in hope of advice from someone that has experienced this before. The exception that is causing the problem is: Exception in thread btpool0-5 java.lang.OutOfMemoryError: unable to create new native thread The memory that is allocated to Solr is 3072MB, which should be enough memory for a ~6GB data set. The documents are not big either, they have around 10 fields of which only one stores large text ranging between 1k-50k. The top command at the time of the crash shows Solr using around 1500 threads, which I assume it is not normal. Could it be that the threads are crashing one by one and new ones are created to cope with the queries? In the log file, right after the the exception, there are several thousand commits before the server stalls completely. Normally, the log file would report 20-30 document existence queries per second, then 1 commit per 5-30 seconds, and some more infrequent faceted document searches on the data. However after the exception, there are only commits until the end of the log file. I am wondering if anyone has experienced this before or if it is some sort of known bug from Solr 1.4? Is there a way to increase the details of the exception in the logfile? I am attaching the output of a grep Exception command on the logfile. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
Re: WordDelimiterFilter and phrase queries?
: pos token offset : 1 3 0-1 : 2 diphenyl 2-10 : 3 propanoic 11-20 : 3 diphenylpropanoic 2-20 : Say someone enters the query string 3-diphenylpropanoic : : The query parser I'm using transforms this into a phrase query and the : indexed form is missed because based the positions of the terms '3' : and 'diphenylpropanoic' indicate they are not adjacent? : : Is this intended behavior? I expect that the catenated word : 'diphenylpropanoic' should have a position of 2 based on the position : of the first term in the concatenation, but perhaps I'm missing I believe this is correct, but i'm not certain for hte reason - i think it's just an implementation detail. Consider the ooposite scenerio: if your indexed text was diphenyl-propanoic-3 and things worked the way you are suggesting they should, the term diphenylpropanoic would up at position 1 (with diphenyl) and diphenylpropanoic-3 would not match because then the terms wouldn't be adjacent. damned if you do, damned if you don't typically for fields whwere you are using WDF with the concat options you would usually use a bit of slop on the generated phrase queries to allow for the loosenes of the position information. (in an ideal world, the token strem wouldn't have monotomic integer positions, it would be a DAG, and then these things would be easily represented, but that's pretty non-trivial to do with the internals. -Hoss
Re: Scoring Search for autocomplete
You weren't really clear on how you are generating your autocomplete results -- ie: via TermsComponent on your main index? or via a search on a custom index where each document is a word to suggested? Assuming the later, then the approach you describe below sounds good to me, but it doesn't seem like it would really make sense for hte former. : Hi, I have an autocomplete that is currently working with an : NGramTokenizer so if I search for Yo both New York and Toyota : are valid results. However I'm trying to figure out how to best : implement the search so that from a score perspective if the string : matches the beginning of an entire field it ranks first, followed by : the beginning of a term and then in the middle of a term. For example : if I was searching with vi I would want Virginia ahead of West : Virginia ahead of Five. : : I think I can do this with three seperate fields, one using a white : space tokenizer and a ngram filter, another using the edge-ngram + : whitespace and another using keyword+edge-ngram, then doing an or on : the 3 fields, so that Virginia would match all 3 and get a higher : score... but this doesn't feel right to me, so I wanted to check for : better options. : : Thanks. : -Hoss
Help with schema design
Hi, I have a use case where i get a document and a list of events that has happened on the document. For example First document: Some text content Events: Event TypeEvent By Event Time Update Pramod 06062010 2:30:00 Update Raj 06062010 2:30:00 View Rahul 07062010 1:30:00 I would like to support queries like get all document Event Type = ? and Event time greater than ? , also query like get all the documents Updated by Pramod. How should i design my schema to support this use case. Thanks, Regards, Pramod Goyal
Is solr able to merge index on different nodes
Once I want to create a large index, can I split the index on different nodes and the merge all the indexs to one node. Any further suggestion for this case?
Re: logic required for newbie
First of all I hope that in schema you have mentioned for fields indexed=true and stored=true... Next if you have done so... and now just search as q=landmark:piza... you will get one result set only.. Note : There is one constraint about applying analyzers and tokenizers... IF you apply white space tokenizer...that is , data type=text_ws. The only you will get result set of piza hut even when you query for piza... If no tokenizer applied..You will not get it... I hope this was needed reply..If something elseyou can easy question..;) On Wed, Jul 28, 2010 at 8:42 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi thanks for reply.. Actually requirement is diffrent (sorry if I am unable to clerify in first mail). basically follwoing are the fields name in schema as well: 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 which carrying text... for example: id1/id namesome name/name user_iduser_id/user_id locationnew york/location countryUSA/country landmark15th avenue/landmark1 landmark2ms departmental store/landmark2 landmark3base bakery/landmark3 landmark4piza hut/landmark4 landmark5ford motor/landmark5 now if user search by piza then expected result like: id1/id namesome name/name user_iduser_id/user_id locationnew york/location countryUSA/country landmark4piza hut/landmark4 it means I want to ignore all other landmark which not match. By filter we can filter the fields but here I dont know the the field name because it depends on text match. is there any other solution.. I am ready to change in schema or in logic. I am using solrj. please help me I stuck here.. with regards On Wed, Jul 28, 2010 at 7:22 PM, rajini maski rajinima...@gmail.com wrote: you can index each of these field separately... field1- Id field2- name field3-user_id field4-country. field7- landmark While quering you can specify q=Landmark9 This will return you results.. And if you want only particular fields in output.. use the fl parameter in query... like http://localhost:8090/solr/select? indent=onq=landmark9fl=ID,user_id,country,landmark This will give your desired solution.. On Wed, Jul 28, 2010 at 12:23 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I am very new and learning solr. I have 10 column like following in table 1. id 2. name 3. user_id 4. location 5. country 6. landmark1 7. landmark2 8. landmark3 9. landmark4 10. landmark5 when user search for landmark then I want to return only one landmark which match. Rest of the landmark should ingnored.. expected result like following if user search by landmark2.. 1. id 2. name 3. user_id 4. location 5. country 7. landmark2 or if search by landmark9 1. id 2. name 3. user_id 4. location 5. country 9. landmark9 please help me to design the schema for this kind of requirement... thanks with regards
Re: SolrJ Response + JSON
Yeah right... This query will do it http://localhost:8090/solr/select/?q=*:*version=2.2start=0rows=10indent=onwt=json This will do your work... This is more liike using xsl transformation supported by solr..:) Regards, Rajani Maski On Wed, Jul 28, 2010 at 6:24 PM, Mark Allan mark.al...@ed.ac.uk wrote: I think you should just be able to add wt=json to the end of your query (or change whatever the existing wt parameter is in your URL). Mark On 28 Jul 2010, at 12:54 pm, MitchK wrote: Hello community, I need to transform SolrJ - responses into JSON, after some computing on those results by another application has finished. I can not do those computations on the Solr - side. So, I really have to translate SolrJ's output into JSON. Any experiences how to do so without writing your own JSON-writer? Thank you. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-tp1002024p1002024.html Sent from the Solr - User mailing list archive at Nabble.com. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.