Re: CAS Crawler Crawling Code

Lewis John Mcgibbney Thu, 01 May 2014 17:10:26 -0700

Nailed it.
Stepping through in Eclipse will help out a lot.
Have a great weekend folks :-)
On May 1, 2014 11:40 AM, "Chris Mattmann" <chris.mattm...@gmail.com> wrote:


> Hey Lewis,
>
> That's b/c Crawler doesn't do HTTP connections.
> PushPull is the component where that occurs. We
> specifically made Crawler only handle local data,
> and refactored the protocol layer/functionality
> into PushPull and they operate through a shared
> directory structure for a 'staging' dir and through
> Crawler pre conditions and Actions.
>
> Scope out Push Pull and then we can discuss.
>
> Thanks dude.
>
> Cheers,
> Chris
>
> ------------------------
> Chris Mattmann
> chris.mattm...@gmail.com
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> Reply-To: <user@oodt.apache.org>
> Date: Thursday, May 1, 2014 10:35 AM
> To: <user@oodt.apache.org>
> Subject: CAS Crawler Crawling Code
>
> >Hi Folks,
> >Im sitting jumping between ProductCrawler and StdIngester trying to pin
> >point _exactly_ where product fetching actually happens.
> >I'm aware of the triple headed nature of crawler workflows e.g.
> >preIngestion, postIngestionSuccess and postIngestionFailure... I can see
> >the logic within the ProductCrawler code... what I cannot locate is where
> >HTTP/transport socket connections are created and used.
> >
> >Can anyone please point this out?
> >Thanks
> >Lewis
>
>
>

Re: CAS Crawler Crawling Code

Reply via email to