Re: Job stuck without message

2018-10-30 Thread Karl Wright
What I am interested in now is the Document Status report for any one of
the documents that is 'stuck'.  The next crawl time value is the critical
field.  Can you include an example?

Karl

On Tue, Oct 30, 2018, 12:36 PM Bisonti Mario 
wrote:

> Thanks a lot, Karl.
>
>
>
> It happens that the job starts, it works and index for an hour and after
> it frezzes, I haven’t error or waiting status in Document Queue o Simple
> History, I have only “OK” status so, I haven’t failures.
>
>
>
> I am not able to see other log errors other from the manifoldcf.log
>
>
>
> Solr server is ok
>
> Tika server is ok
>
> Agent is ok
>
> Tomcat with ManifoldCF is ok
>
>
>
> I could search if I could to put in info log mode for example Tika servrer
> or Solr.
>
>
>
> Thanks..
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 30 ottobre 2018 16:38
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job stuck without message
>
>
>
> Hi Mario,
>
> Please look at the Queue Status report to determine what is waiting and
> why it is waiting.
> You can also look at the Simple History to see what has been happening.
> If you are getting 100% failures in fetching documents then you may need to
> address this because your infrastructure is unhappy.  If the failure is
> something that indicates that the document is never going to be readable,
> that's a different problem and we might need to address that in the
> connector.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
> wrote:
>
>
>
> Thanks a lot Karl
>
>
>
> Yes, I see many docs in the docs queue but they are inactive.
>
>
>
> Infact i see that no more docs are indexed in Solr and I see that job is
> with the same number of docs Active (35012)
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 30 ottobre 2018 13:59
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job stuck without message
>
>
>
> The reason the job is "stuck" is because:
>
> ' JCIFS: Possibly transient exception detected on attempt 1 while getting
> share security: All pipe instances are busy.'
>
> This means that ManifoldCF will retry this document for a while before it
> gives up on it.  It appears to be stuck but it is not.  You can verify that
> by looking at the Document Queue report to see what is queued and what
> times the various documents will be retried.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
> wrote:
>
> Hallo.
>
>
>
> I started a job that works for some minutes, and after it stucks.
>
>
>
> In the manifoldcf.log I see:
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTransport.send(SmbTransport.java:669)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbSession.send(SmbSession.java:238)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open0(SmbFile.java:993)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open(SmbFile.java:1010)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> 

R: Job stuck without message

2018-10-30 Thread Bisonti Mario
Thanks a lot, Karl.

It happens that the job starts, it works and index for an hour and after it 
frezzes, I haven’t error or waiting status in Document Queue o Simple History, 
I have only “OK” status so, I haven’t failures.

I am not able to see other log errors other from the manifoldcf.log

Solr server is ok
Tika server is ok
Agent is ok
Tomcat with ManifoldCF is ok

I could search if I could to put in info log mode for example Tika servrer or 
Solr.

Thanks..


Da: Karl Wright 
Inviato: martedì 30 ottobre 2018 16:38
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

Hi Mario,

Please look at the Queue Status report to determine what is waiting and why it 
is waiting.
You can also look at the Simple History to see what has been happening.  If you 
are getting 100% failures in fetching documents then you may need to address 
this because your infrastructure is unhappy.  If the failure is something that 
indicates that the document is never going to be readable, that's a different 
problem and we might need to address that in the connector.

Karl


On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)

[cid:image002.jpg@01D47065.DEFF7B40]



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]

Re: Job stuck without message

2018-10-30 Thread Karl Wright
Hi Mario,

Please look at the Queue Status report to determine what is waiting and why
it is waiting.
You can also look at the Simple History to see what has been happening.  If
you are getting 100% failures in fetching documents then you may need to
address this because your infrastructure is unhappy.  If the failure is
something that indicates that the document is never going to be readable,
that's a different problem and we might need to address that in the
connector.

Karl


On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
wrote:

>
>
> Thanks a lot Karl
>
>
>
> Yes, I see many docs in the docs queue but they are inactive.
>
>
>
> Infact i see that no more docs are indexed in Solr and I see that job is
> with the same number of docs Active (35012)
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 30 ottobre 2018 13:59
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job stuck without message
>
>
>
> The reason the job is "stuck" is because:
>
> ' JCIFS: Possibly transient exception detected on attempt 1 while getting
> share security: All pipe instances are busy.'
>
> This means that ManifoldCF will retry this document for a while before it
> gives up on it.  It appears to be stuck but it is not.  You can verify that
> by looking at the Document Queue report to see what is queued and what
> times the various documents will be retried.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
> wrote:
>
> Hallo.
>
>
>
> I started a job that works for some minutes, and after it stucks.
>
>
>
> In the manifoldcf.log I see:
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTransport.send(SmbTransport.java:669)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbSession.send(SmbSession.java:238)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open0(SmbFile.java:993)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open(SmbFile.java:1010)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
> [mcf-jcifs-connector.jar:?]
>
> at
> 

R: Job stuck without message

2018-10-30 Thread Bisonti Mario

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)

[cid:image002.jpg@01D47065.DEFF7B40]



Da: Karl Wright 
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:22:10,359 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:13,932 (Worker thread '12') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 

Re: Job stuck without message

2018-10-30 Thread Karl Wright
The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting
share security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it
gives up on it.  It appears to be stuck but it is not.  You can verify that
by looking at the Document Queue report to see what is queued and what
times the various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
wrote:

> Hallo.
>
>
>
> I started a job that works for some minutes, and after it stucks.
>
>
>
> In the manifoldcf.log I see:
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTransport.send(SmbTransport.java:669)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbSession.send(SmbSession.java:238)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open0(SmbFile.java:993)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.open(SmbFile.java:1010)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
> ~[jcifs-1.3.18.3.jar:?]
>
> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951)
> ~[jcifs-1.3.18.3.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2018-10-30T09:22:10,359 (Worker thread '27') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:13,932 (Worker thread '12') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:14,274 (Worker thread '23') - Tika Server: Tika
> Server rejects: Tika Server rejected document with the following reason:
> Unprocessable Entity
>
> WARN 2018-10-30T09:22:19,933 (Worker thread '8') - Tika 

Re: web connector : links extraction issues

2018-10-30 Thread Olivier Tavard
Hi Karl,

Thanks for your answer.
I kept looking into this and I found what was the problem. The Javascript code 
into the tags   contained the character '<'. If so the links 
extraction does not work with the web connector.

To reproduce it, I created this page hosted in local Apache then I indexed it 
with MCF 2.11 out of the box.

in the first example the page was :



test