lucene document via JSON
Hi, Is adding/updating/deleting in JSON format possible? actually my need is mostly update I like to let user update certain fields of an existing results? Another solution is I let user save it in DB and then server convert/post XML to Solr.. but not so fancy :) Thanks Anton __ Ta semester! - sök efter resor hos Kelkoo. Jämför pris på flygbiljetter och hotellrum här: http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
highlight results from pdf search
Hi. I have some PDF documents indexed through solr cell. My highlighting queries work fine on standard xml doc types, eg the samples. I would now like to highlight some queries on a PDF document. Currently for my simple examples I am just indexing a PDF, providing an id, and an arbitrary ext.literal. I would like to be able to get highlighted snippets back from the extracted content of the PDF. Is this possible? Thanks in advance for your help, - Ross -- View this message in context: http://www.nabble.com/highlight-results-from-pdf-search-tp23791905p23791905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Woops, here's the patch (added you, diretly, on the To: so that you get the patch; Apache's list manager strips patches). Yes, if the fdx file is getting deleted out from under Lucene, that'd also explain what's happening. Though the timing would have to be very quick. What's happening is Lucene had opened _X.fdx for writing, written some small # bytes to it, and then closed it but found the file no longer exists. I'm not familiar with what exactly happens when you create/unload Solr cores and move them around machines; does this involve moving files from one machine to another? (Ie, deleting files)? If so, is there some way to log when such migrations take place and try to correlate to this exception? Mike On Fri, May 29, 2009 at 8:29 PM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike,I don't see a patch file here? Could another explanation be that the fdx file doesn't exist yet / has been deleted from underneath Lucene? I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly, moving the bundled cores around between machines. I find it much more likely that there's something wrong with my core admin code than there is with the Lucene internals :) It's possible that I'm occasionally removing files which are currently in use by a live core... I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm not sure how Amazon implement the local hard disk, but I assume it's a real hard disk exposed by the hypervisor. Thanks, James On Fri, May 29, 2009 at 3:41 AM, Michael McCandless luc...@mikemccandless.com wrote: Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx file, yet the directory says the file does not exist. Can you re-run with this new patch? I'm suspecting that FieldsWriter wrote to one segment, but somehow we are then looking at the wrong segment. The attached patch prints out which segment FieldsWriter actually wrote to. What filesystem underlying IO system/device are you using? Mike On Thu, May 28, 2009 at 10:53 PM, James X hello.nigerian.spamm...@gmail.com wrote: My apologies for the delay in running this patched Lucene build - I was temporarily pulled onto another piece of work. Here is a sample 'fdx size mismatch' exception using the patch Mike supplied: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1 fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12 fieldsWriter.fieldsFilePointer=2395 at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) Will now run with assertions enabled and see how that affects the behaviour! Thanks, James -- Forwarded message -- From: James X hello.nigerian.spamm...@gmail.com Date: Thu, May 21, 2009 at 2:24 PM Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch To: solr-user@lucene.apache.org Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at
SV: lucene document via JSON
I have found this https://issues.apache.org/jira/browse/SOLR-945 Seems like this might solves problem.. interesting its also faster!! Question - is there any specific reason this is not in the trunk? Also does this mean once the issue is sorted then Data Import Handler will also benefit from it?.. I can't wait to see this happen Cheers --- Den lör 2009-05-30 skrev Antonio Eggberg antonio_eggb...@yahoo.se: Från: Antonio Eggberg antonio_eggb...@yahoo.se Ämne: lucene document via JSON Till: solr-user@lucene.apache.org Datum: lördag 30 maj 2009 09.41 Hi, Is adding/updating/deleting in JSON format possible? actually my need is mostly update I like to let user update certain fields of an existing results? Another solution is I let user save it in DB and then server convert/post XML to Solr.. but not so fancy :) Thanks Anton __ Ta semester! - sök efter resor hos Kelkoo. Jämför pris på flygbiljetter och hotellrum här: http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052 __ Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. Sök och jämför priser hos Kelkoo. http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325
When searching for !...@#$%^*() all documents are matched incorrectly
Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
how to do exact serch with solrj
Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB
Re: When searching for !...@#$%^*() all documents are matched incorrectly
two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
I'm really curious. What is the most relevant result for that query? wunder On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents -are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to do exact serch with solrj
query.setQuery(title:hello the world) is what you need. Cheers Avlesh On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB