On Jan 5, 2010, at 1:53 PM, Zacarias wrote: > I'd attached a file to the previous mail. Is there any filter for pdf files > or any other reason.
The mailer strips attachments, although you might be able to get a zip through. Perhaps send a pointer to somewhere else or just describe it here. > > On Tue, Jan 5, 2010 at 12:49 PM, Zacarias <zacar...@linebee.com> wrote: > >> Here is my propousal >> >> Regards >> >> >> >> >> On Tue, Jan 5, 2010 at 12:48 PM, Zacarias <zacar...@linebee.com> wrote: >> >>> Hi, I'm developing a directory monitor to add in a Sor implementation. >>> Tell me if it could be interesting for you we will be glad to share it >>> with the comunity. Also I would like your opinion about the propousal if it >>> looks ok for you and if you like to make any change or question it will be >>> very well welcome. >>> >>> Regards >>> Zacarias >>> www.linebee.com >>> >>> >>> 2009/12/8 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> >>> >>> I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor >>>> is a good idea >>>> >>>> On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll <gsing...@apache.org> >>>> wrote: >>>>> >>>>> On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: >>>>> >>>>>> Integrating Extraction w/ DIH is a better option. DIH makes it easier >>>>>> to do the mapping of fields etc. >>>>> >>>>> Which comment is this directed at? I'm lacking context here. >>>>> >>>>>> >>>>>> >>>>>> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll <gsing...@apache.org> >>>> wrote: >>>>>>> >>>>>>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: >>>>>>> >>>>>>>> >>>>>>>> ASs someone with very little knowledge of Solr Cell and/or Tika, I >>>> find myself wondering if ExtractingRequestHandler would make more sense as >>>> an extractingUpdateProcessor -- where it could be configured to take take >>>> either binary fields (or string fields containing URLs) out of the >>>> Documents, parse them with tika, and add the various XPath matching hunks >>>> of >>>> text back into the document as new fields. >>>>>>>> >>>>>>>> Then ExtractingRequestHandler just becomes a handler that slurps up >>>> it's ContentStreams and adds them as binary data fields and adds the other >>>> literal params as fields. >>>>>>>> >>>>>>>> Wouldn't that make things like SOLR-1358, and using Tika with >>>> URLs/filepaths in XML and CSV based updates fairly trivial? >>>>>>> >>>>>>> It probably could, but am not sure how it works in a processor chain. >>>> However, I'm not sure I understand how they work all that much either. I >>>> also plan on adding, BTW, a SolrJ client for Tika that does the extraction >>>> on the client. In many cases, the ExtrReqHandler is really only designed >>>> for lighter weight extraction cases, as one would simply not want to send >>>> that much rich content over the wire. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ----------------------------------------------------- >>>>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>>> >>>>> -------------------------- >>>>> Grant Ingersoll >>>>> http://www.lucidimagination.com/ >>>>> >>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>>> using Solr/Lucene: >>>>> http://www.lucidimagination.com/search >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ----------------------------------------------------- >>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>> >>> >>> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search