Hi Abe-san,
This means that the method:
protected boolean isDataIngestable(IFingerprintActivity activities,
String documentIdentifier, DocumentURLFilter filter)
throws ServiceInterruption, ManifoldCFException
... has returned false. If you look at the method, you will see that it
checks the following:
(1) That the response code is 200 (could be this if the login is failing,
because then 401 is returned).
(2) That the output connector will accept documents of that length.
(3) Whether the output connector will accept the URL as given.
(4) Whether the url matches the urls allowed for indexing in the web job.
(5) If there is no mime type at all (which I think is not correct; it
should probably ask the output connector even in this case).
(6) Whether the output connector will accept the document's mime type.
In most cases, the method logs its decision, so you may see additional
output that could clarify why the document is being excluded. If no
additional message is being output, then it is either case (1) or (5). You
would have to add logging code to figure out which one it is.
Thanks,
Karl
On Tue, Oct 8, 2013 at 2:24 AM, Shinichiro Abe
<[email protected]>wrote:
> Hi,
> I'm sure that Web Connector supports Basic Authentication and can crawl
> http sites,
> but I'm not sure about the case that https SSL site with https basic
> Authentication.
> I can register basic auth with https:// regex, user and password, but
> crawling failed.
> Does this support https basic Authentication?
> Now I watch the log, the logs shows "WEB: Decided not to ingest
> 'https://server.com/url/' because it did not match ingestability
> criteria".
> What does it mean about this message?
>
> Regards,
> Shinichiro Abe