Re: Solr Cell revamped as an UpdateProcessor?

Grant Ingersoll Tue, 05 Jan 2010 10:59:02 -0800

On Jan 5, 2010, at 1:53 PM, Zacarias wrote:

> I'd attached a file to the previous mail. Is there any filter for pdf files
> or any other reason.


The mailer strips attachments, although you might be able to get a zip through. 
 Perhaps send a pointer to somewhere else or just describe it here.

> 
> On Tue, Jan 5, 2010 at 12:49 PM, Zacarias <[email protected]> wrote:
> 
>> Here is my propousal
>> 
>> Regards
>> 
>> 
>> 
>> 
>> On Tue, Jan 5, 2010 at 12:48 PM, Zacarias <[email protected]> wrote:
>> 
>>> Hi, I'm developing a directory monitor to add in a Sor implementation.
>>> Tell me if it could be interesting for you we will be glad to share it
>>> with the comunity. Also I would like your opinion about the propousal if it
>>> looks ok for you and if you like to make any change or question it will be
>>> very well welcome.
>>> 
>>> Regards
>>> Zacarias
>>> www.linebee.com
>>> 
>>> 
>>> 2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् <[email protected]>
>>> 
>>> I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
>>>> is a good idea
>>>> 
>>>> On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll <[email protected]>
>>>> wrote:
>>>>> 
>>>>> On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>> 
>>>>>> Integrating Extraction w/ DIH is a better option. DIH makes it easier
>>>>>> to do the mapping of fields etc.
>>>>> 
>>>>> Which comment is this directed at?  I'm lacking context here.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> ASs someone with very little knowledge of Solr Cell and/or Tika, I
>>>> find myself wondering if ExtractingRequestHandler would make more sense as
>>>> an extractingUpdateProcessor -- where it could be configured to take take
>>>> either binary fields (or string fields containing URLs) out of the
>>>> Documents, parse them with tika, and add the various XPath matching hunks 
>>>> of
>>>> text back into the document as new fields.
>>>>>>>> 
>>>>>>>> Then ExtractingRequestHandler just becomes a handler that slurps up
>>>> it's ContentStreams and adds them as binary data fields and adds the other
>>>> literal params as fields.
>>>>>>>> 
>>>>>>>> Wouldn't that make things like SOLR-1358, and using Tika with
>>>> URLs/filepaths in XML and CSV based updates fairly trivial?
>>>>>>> 
>>>>>>> It probably could, but am not sure how it works in a processor chain.
>>>> However, I'm not sure I understand how they work all that much either.  I
>>>> also plan on adding, BTW, a SolrJ client for Tika that does the extraction
>>>> on the client.  In many cases, the ExtrReqHandler is really only designed
>>>> for lighter weight extraction cases, as one would simply not want to send
>>>> that much rich content over the wire.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>> 
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>> 
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>> 
>>> 
>>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: Solr Cell revamped as an UpdateProcessor?

Reply via email to