lucene document via JSON

2009-05-30 Thread Antonio Eggberg

Hi,

Is adding/updating/deleting in JSON format possible? actually my need is mostly 
update I like to let user update certain fields of an existing results?

Another solution is I let user save it in DB and then server convert/post XML 
to Solr.. but not so fancy :)

Thanks
Anton


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052


highlight results from pdf search

2009-05-30 Thread rossputin

Hi.

I have some PDF documents indexed through solr cell.  My highlighting
queries work fine on standard xml doc types, eg the samples.  I would now
like to highlight some queries on a PDF document.  Currently for my simple
examples I am just indexing a PDF, providing an id, and an arbitrary
ext.literal.  I would like to be able to get highlighted snippets back from
the extracted content of the PDF.  Is this possible?

Thanks in advance for your help,

 - Ross
-- 
View this message in context: 
http://www.nabble.com/highlight-results-from-pdf-search-tp23791905p23791905.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-30 Thread Michael McCandless
Woops, here's the patch (added you, diretly, on the To: so that you
get the patch; Apache's list manager strips patches).

Yes, if the fdx file is getting deleted out from under Lucene, that'd
also explain what's happening.  Though the timing would have to be
very quick.  What's happening is Lucene had opened _X.fdx for writing,
written some small # bytes to it, and then closed it but found the
file no longer exists.

I'm not familiar with what exactly happens when you create/unload Solr
cores and move them around machines; does this involve moving files
from one machine to another?  (Ie, deleting files)?  If so, is there
some way to log when such migrations take place and try to correlate
to this exception?

Mike

On Fri, May 29, 2009 at 8:29 PM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike,I don't see a patch file here?

 Could another explanation be that the fdx file doesn't exist yet / has been
 deleted from underneath Lucene?

 I'm constantly CREATE-ing and UNLOAD-ing Solr cores, and more importantly,
 moving the bundled cores around between machines. I find it much more likely
 that there's something wrong with my core admin code than there is with the
 Lucene internals :) It's possible that I'm occasionally removing files which
 are currently in use by a live core...

 I'm using an ext3 filesystem on a large EC2 instance's own hard disk. I'm
 not sure how Amazon implement the local hard disk, but I assume it's a real
 hard disk exposed by the hypervisor.

 Thanks,
 James

 On Fri, May 29, 2009 at 3:41 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Very interesting: FieldsWriter thinks it's written 12 bytes to the fdx
 file, yet the directory says the file does not exist.

 Can you re-run with this new patch?  I'm suspecting that FieldsWriter
 wrote to one segment, but somehow we are then looking at the wrong
 segment.  The attached patch prints out which segment FieldsWriter
 actually wrote to.

 What filesystem  underlying IO system/device are you using?

 Mike

 On Thu, May 28, 2009 at 10:53 PM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  My apologies for the delay in running this patched Lucene build - I was
  temporarily pulled onto another piece of work.
 
  Here is a sample 'fdx size mismatch' exception using the patch Mike
  supplied:
 
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _1i.fdx exists=false didInit=false inc=0 dSO=1
  fieldsWriter.doClose=true fieldsWriter.indexFilePointer=12
  fieldsWriter.fieldsFilePointer=2395
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:96)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
 
  Will now run with assertions enabled and see how that affects the
 behaviour!
 
  Thanks,
  James
 
  -- Forwarded message --
  From: James X hello.nigerian.spamm...@gmail.com
  Date: Thu, May 21, 2009 at 2:24 PM
  Subject: Re: java.lang.RuntimeException: after flush: fdx size mismatch
  To: solr-user@lucene.apache.org
 
 
  Hi Mike,Documents are web pages, about 20 fields, mostly strings, a
 couple
  of integers, booleans and one html field (for document body content).
 
  I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
  that would mean I have several active Solr worker threads.
 
  The only exceptions I have are the RuntimeException flush errors,
 followed
  by a handful (normally 10-20) of LockObtainFailedExceptions, which i
  presumed were being caused by the faulty threads dying and failing to
  release locks.
 
  Oh wait, I am getting WstxUnexpectedCharException exceptions every now
 and
  then:
  SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
  ((CTRL-CHAR, code 8))
   at [row,col {unknown-source}]: [1,26070]
         at
  com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
         at
 
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
         at
 
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
         at
 
 

SV: lucene document via JSON

2009-05-30 Thread antonio_eggberg

I have found this

https://issues.apache.org/jira/browse/SOLR-945

Seems like this might solves problem.. interesting its also faster!!

Question - is there any specific reason this is not in the trunk? Also does 
this mean once the issue is sorted then Data Import Handler will also benefit 
from it?..

I can't wait to see this happen

Cheers

--- Den lör 2009-05-30 skrev Antonio Eggberg antonio_eggb...@yahoo.se:

 Från: Antonio Eggberg antonio_eggb...@yahoo.se
 Ämne: lucene document via JSON
 Till: solr-user@lucene.apache.org
 Datum: lördag 30 maj 2009 09.41
 
 Hi,
 
 Is adding/updating/deleting in JSON format possible?
 actually my need is mostly update I like to let user update
 certain fields of an existing results?
 
 Another solution is I let user save it in DB and then
 server convert/post XML to Solr.. but not so fancy :)
 
 Thanks
 Anton
 
 
      
 __
 Ta semester! - sök efter resor hos Kelkoo.
 Jämför pris på flygbiljetter och hotellrum här:
 http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052




  __
Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. 
Sök och jämför priser hos Kelkoo.
http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325


When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Sam Michaels

Hi,

I'm running Solr 1.3/Java 1.6.  

When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
all the documents are returned even though there is not a single match.
There is no title that matches the string (which has been escaped). 

My document structure is as follows

doc
str name=activity_typeNAME/str
str name=titleBathing/str

/doc


The title field is of type text_title which is described below. 

fieldType name=text_title class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer
/fieldType

When I run the query against Luke, no results are returned. Any suggestions
are appreciated.


-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
Sent from the Solr - User mailing list archive at Nabble.com.



how to do exact serch with solrj

2009-05-30 Thread Jianbin Dai

Hi,

I want to search hello the world in the title field using solrj. I set the 
query filter
query.addFilterQuery(title);
query.setQuery(hello the world);

but it returns not exact match results as well. 

I know one way to do it is to set title field to string instead of text. But 
is there any way i can do it? If I do the search through web interface Solr 
Admin by title:hello the world, it returns exact matches.

Thanks.

JB


  



Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Ryan McKinley
two key things to try (for anyone ever wondering why a query matches documents)

1.  add debugQuery=true and look at the explain text below --
anything that contributed to the score is listed there
2.  check /admin/analysis.jsp -- this will let you see how analyzers
break text up into tokens.

Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
something to do with it...


On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any suggestions
 are appreciated.


 --
 View this message in context: 
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Walter Underwood
I'm really curious. What is the most relevant result for that query?

wunder

On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote:

 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.
 
 When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below.
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
      /analyzer
    /fieldType
 
 When I run the query against Luke, no results are returned. Any suggestions
 are appreciated.
 
 
 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents
 -are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Re: how to do exact serch with solrj

2009-05-30 Thread Avlesh Singh
query.setQuery(title:hello the world) is what you need.

Cheers
Avlesh

On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai djian...@yahoo.com wrote:


 Hi,

 I want to search hello the world in the title field using solrj. I set
 the query filter
 query.addFilterQuery(title);
 query.setQuery(hello the world);

 but it returns not exact match results as well.

 I know one way to do it is to set title field to string instead of text.
 But is there any way i can do it? If I do the search through web interface
 Solr Admin by title:hello the world, it returns exact matches.

 Thanks.

 JB