[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-28 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699108#comment-16699108
 ] 

Tim Allison edited comment on TIKA-2776 at 11/28/18 2:20 PM:
-

Three cheers for logging, and thank you for your patience in configuring those!

Yes, exactly!  It looks like the child process restarted at 2018-11-26 13:18:26 
{{2018-11-26 13:18:26 INFO  MetadataResource:431 - meta 
(application/vnd.openxmlformats}} and then processed more files successfully.  
It can take few seconds for the server to restart, and it looks in the 
{{manifoldcf.log}} like the initial connectivity dropped at 13:18:25, and then 
there are problems logged through the end of 13:18:26 with worker threads not 
able to reach the server.  This is expected.  Are the clients (worker thread 
88, 39, 8, 86, 87, 982, 99, 75, 12) able to sleep and retry after failed 
connectivity or do they just try once and give up?  

As a side note, if you add a header telling tika-server what the file name is, 
that filename will be included in the log message so you can figure out which 
file caused the timeout.  

See: https://wiki.apache.org/tika/TikaJAXRS ... in short, add the header to 
your request:
{{"Content-Disposition: attachment; filename=foo.csv"}}

Some reasons for timeouts: the vm is overtaxed and processing is just slow, 
infinite loop in a parser (these are rare but they -can- will happen), OCR can 
take minutes per document (do you have tesseract installed)?




was (Author: talli...@mitre.org):
Three cheers for logging, and thank you for your patience in configuring those!

Yes, exactly!  It looks like the child process restarted at 2018-11-26 13:18:26 
{{2018-11-26 13:18:26 INFO  MetadataResource:431 - meta 
(application/vnd.openxmlformats}} and then processed more files successfully.  
It can take few seconds for the server to restart, and it looks in the 
{{manifoldcf.log}} like the initial connectivity dropped at 13:18:25, and then 
there are problems logged through the end of 13:18:26 with worker threads not 
able to reach the server.  This is expected.  Are the clients (worker thread 
88, 39, 8, 86, 87, 982, 99, 75, 12) able to sleep and retry after failed 
connectivity or do they just try once and give up?  

As a side note, if you add a header telling tika-server what the file name is, 
that filename will be included in the log message so you can figure out which 
file caused the timeout.  

See: https://wiki.apache.org/tika/TikaJAXRS ... in short, add the header to 
your request:
{{"Content-Disposition: attachment; filename=foo.csv"}}

Some reasons for timeouts: the vm is overtaxed and processing is just slow, 
infinite loop in a parser (these are rare but they can happen), OCR can take 
minutes per document (do you have tesseract installed)?



> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-23 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697134#comment-16697134
 ] 

Tim Allison edited comment on TIKA-2776 at 11/23/18 1:33 PM:
-

Ugh...looks like the {{append=false}} overwrote the child logs -- logger did as 
we asked :( -- which means that we're missing the critical points: the child 
log is missing between the end of tikalogchild.log.1 (12:57) and the beginning 
of tikalogchild.log (13:34).

Let's change this line to "true":
{noformat}
 
{noformat}

>From the MCF_client_log, tika was unavailable at 13:24, 13:26, 13:29, 13:31 
>and 13:34.  Can you tell from the MCF logs if any files were processed between 
>13:24 and 13:34?  Were any files processed after 13:34?




was (Author: talli...@mitre.org):
Ugh...looks like the append=false forces the logs to overwrite, which means 
that we're missing the critical points: the child log is missing between the 
end of tikalogchild.log.1 (12:57) and the beginning of tikalogchild.log (13:34).

Let's change this line to "true":
{noformat}
 
{noformat}

>From the MCF_client_log, tika was unavailable at 13:24, 13:26, 13:29, 13:31 
>and 13:34.  Can you tell from the MCF logs if any files were processed between 
>13:24 and 13:34?  Were any files processed after 13:34?



> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-23 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695874#comment-16695874
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/23/18 8:46 AM:
---

Hallo Tim, now I am able to generate the log, finally.

Today, I started a processing from my client to parse with tika.

It started to process at 8:30 a.m. and at 13:34 it crashed as you see in the 
MCF_Client.log

I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, 
after a new log tikalogchild.log at 13:34 I suppose, when the child is 
restarted.

So, I suppose that the client crashed because this restart?

 

I attatch in the Log.zip the three files.

 

Could you help me to understand, how to solve this issue?

 

I am using tika-server-1.20-20181114.215706-48.jar

 

Thanks a lot.

Mario  

 

[^Log.zip]


was (Author: bisontim):
Hallo Tim, now I am able to generate the log, finally.

Today, I started a processing from my client to parse with tika.

It started to process at 8:30 a.m. and at 13:34 as you see in the MCF_Client.log

I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, 
after a new log tikalogchild.log at 13:34 I suppose, when the child is 
restarted.

So, I suppose that the client crashed because this restart?

 

I attatch in the Log.zip the three files.

 

Could you help me to understand, how to solve this issue?

 

I am using tika-server-1.20-20181114.215706-48.jar

 

Thanks a lot.

Mario  

 

[^Log.zip]

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-22 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695874#comment-16695874
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/22/18 12:52 PM:


Hallo Tim, now I am able to generate the log, finally.

Today, I started a processing from my client to parse with tika.

It started to process at 8:30 a.m. and at 13:34 as you see in the MCF_Client.log

I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, 
after a new log tikalogchild.log at 13:34 I suppose, when the child is 
restarted.

So, I suppose that the client crashed because this restart?

 

I attatch in the Log.zip the three files.

 

Could you help me to understand, how to solve this issue?

 

I am using tika-server-1.20-20181114.215706-48.jar

 

Thanks a lot.

Mario  

 

[^Log.zip]


was (Author: bisontim):
Hallo Tim, now I am able to generate the log, finally.

Today, I started a processing from my client to parse with tika.

It started to process at 8:30 a.m. and at 13:34 as you see in the MCF_Client.log

I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, 
after a new log tikalogchild.log at 13:34 I suppose, when the child is 
restarted.

So, I suppose that the client crashed because this restart?

 

I attatch in the Log.zip the three files.

 

Could you help me to understand, how to solve this issue?

 

I am using tika-server-1.20-20181114.215706-48.jar

 

Thanks a lot.

Mario[^Log.zip]

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-21 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694671#comment-16694671
 ] 

Tim Allison edited comment on TIKA-2776 at 11/21/18 12:57 PM:
--

If you're using 1.19.1, you've hit TIKA-2785. The temporary fix is to have the 
ConsoleAppender write to stderr:
{noformat}

 
 
 
 
 
{noformat}
I tested this with 1.19.1 on Windows, and I had success.
{noformat}
java -jar tika-server-1.19.1.jar -JDlog4j.configuration=file:log4j_child.xml 
-spawnChild{noformat}
If you have an interest in testing the new mechanism, grab a nightly build of 
tika-server from, e.g. 
[here|https://builds.apache.org/job/tika-branch-1x/131/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.20-20181120.215531-52/tika-server-1.20-20181120.215531-52.jar],
 and you can use the log file as you had it configured. :D


was (Author: talli...@mitre.org):
If you're using 1.19.1, you've hit TIKA-2785.  The temporary fix is to have the 
ConsoleAppender write to stderr:

{noformat}

 
 
 
 
 
{noformat}

I tested this with 1.19.1 on Windows, and I had success.  If you have an 
interest in testing the new mechanism, grab a nightly build of tika-server 
from, e.g. 
[here|https://builds.apache.org/job/tika-branch-1x/131/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.20-20181120.215531-52/tika-server-1.20-20181120.215531-52.jar],
 and you can use the log file as you had it configured. :D

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-19 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691675#comment-16691675
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/19/18 1:05 PM:
---

Hallo.

I obtained, from the client calling Tika server, after 5 hours of processing 
files, the error:

WARN 2018-11-19T13:47:17,888 (Worker thread '56') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '96') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '20') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,071 (Worker thread '26') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,116 (Worker thread '27') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.

 

 

Perhaps it could due to:
 _If the server times out on a file, the client will receive an IOException 
from the closed socket. Note that all other files that are being processed will 
end with an IOException from a closed socket when the child process shuts down; 
e.g. if you send three files to tika-server concurrently, and one of them 
causes a catastrophic problem requiring the child to shut down, you won't be 
able to tell which file caused the problems. In the future, we may implement a 
gentler shutdown than we currently have._

 

Perhaps could a gentler shutdown solve the problem?

 

Thanks

Mario

 


was (Author: bisontim):
Hallo.

I obtained, from the client calling Tika server, after 5 hours of processing 
files, the error:

WARN 2018-11-19T13:47:17,888 (Worker thread '56') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '96') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '20') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,071 (Worker thread '26') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,116 (Worker thread '27') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.

 

 

Perhaps it could be to:
_If the server times out on a file, the client will receive an IOException from 
the closed socket. Note that all other files that are being processed will end 
with an IOException from a closed socket when the child process shuts down; 
e.g. if you send three files to tika-server concurrently, and one of them 
causes a catastrophic problem requiring the child to shut down, you won't be 
able to tell which file caused the problems. In the future, we may implement a 
gentler shutdown than we currently have._

 

Perhaps could a gentler shutdown solve the problem?

 

Thanks

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:

[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/14/18 4:06 PM:
---

Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
 instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
 java -version
openjdk version "10.0.2" 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

Do you know if there is any issue about the java version where tika server runs?

 

 

Thanks a lot a lot.

 

Mario

 


was (Author: bisontim):
Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version

openjdk version "10.0.2" 2018-07-17
 OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
 OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/14/18 4:04 PM:
---

Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version

openjdk version "10.0.2" 2018-07-17
 OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
 OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 


was (Author: bisontim):
Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version
openjdk version "10.0.2" 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/14/18 3:27 PM:
---

Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version
openjdk version "10.0.2" 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 


was (Author: bisontim):
Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
 java -version
 openjdk version "10.0.2" 2018-07-17
 OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
 OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/14/18 3:26 PM:
---

Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild".
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
 Do you mean to put the code that you mention, in the client that calls tika 
server?
 In my case ManifoldCF ?
 If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, with the option 
-spawnChild, to split ManildCF-Solr and Tika server, and the job is working by 
5 hours without crash!
 Note that in my widows host I use:
 java -version
 java version "1.8.0_92"
 Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
 java -version
 openjdk version "10.0.2" 2018-07-17
 OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
 OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 


was (Author: bisontim):
Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild.
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
Do you mean to put the code that you mention, in the client that calls tika 
server?
In my case ManifoldCF ?
If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, to split ManildCF-Solr 
and Tika server, and the job is working by 5 hours without crash!
Note that in my widows host I use:
java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version
openjdk version "10.0.2" 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686522#comment-16686522
 ] 

Tim Allison edited comment on TIKA-2776 at 11/14/18 2:53 PM:
-

bq. I am not so expert on logging.. be patient please 

Ha!  No problem at all! 

bq. I configures log4j.xml and log4j_child.xml as in the attachment...

I _think_ I just fixed this in TIKA-2782.  For now, you have to avoid 
{{debug="true"}}





was (Author: talli...@mitre.org):
bq. I am not so expert on logging.. be patient please 

Ha!  No problem at all! 

bq. I configures log4j.xml and log4j_child.xml as in the attachment...I _think_ 
I just fixed this in TIKA-2782.  For now, you have to avoid {{debug="true"}}




> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685881#comment-16685881
 ] 

Tim Allison edited comment on TIKA-2776 at 11/13/18 11:16 PM:
--

The limitation on STDOUT is a serious (but trivial-to-fix) bug: TIKA-2782

 

I wonder if that's causing your problems?


was (Author: talli...@mitre.org):
The limitation on STDOUT is a serious (but trivial-to-fix) bug: TIKA-2782

 

I wonder if that's causing you problems?

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685640#comment-16685640
 ] 

Tim Allison edited comment on TIKA-2776 at 11/13/18 7:29 PM:
-

{quote}For me is very difficult to investigate why tika server child is 
restarted/crashed. Is there any way to log Tika server?
{quote}
You should be able to use log4j for the parent process as you would expect: 
{{-Dlog4j.configuration=[file:log4j.xml|file://log4j.xml/]}}. You can select 
between {{info}} and {{debug}} when you start the server: {{-log info}}

To specify jvm arguments for the child process, add {{-J}} to the beginning 
{{-JDlog4j.configuration=[file:log4j_child.xml|file://log4j_child.xml/]}}.


was (Author: talli...@mitre.org):
bq. For me is very difficult to investigate why tika server child is 
restarted/crashed. Is there any way to log Tika server?

You should be able to use log4j for the parent process as you would expect: 
{{\\-Dlog4j.configuration=file:log4j.xml}}.  You can select between {{info}} 
and {{debug}} when you start the server: {{-log info}}

To configure logging in the child process, add {{-J}} to the beginning 
{{\\-JDlog4j.configuration=file:log4j_child.xml}}.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685640#comment-16685640
 ] 

Tim Allison edited comment on TIKA-2776 at 11/13/18 7:28 PM:
-

bq. For me is very difficult to investigate why tika server child is 
restarted/crashed. Is there any way to log Tika server?

You should be able to use log4j for the parent process as you would expect: 
{{\\-Dlog4j.configuration=file:log4j.xml}}.  You can select between {{info}} 
and {{debug}} when you start the server: {{-log info}}

To configure logging in the child process, add {{-J}} to the beginning 
{{\\-JDlog4j.configuration=file:log4j_child.xml}}.


was (Author: talli...@mitre.org):
bq. For me is very difficult to investigate why tika server child is 
restarted/crashed. Is there any way to log Tika server?

You should be able to use log4j for the parent process as you would expect: 
{{-Dlog4j.configuration=file:log4j.xml}}.  You can select between {{info}} and 
{{debug}} when you start the server: {{-log info}}

To configure logging in the child process, add {{-J}} to the beginning 
{{-JDlog4j.configuration=file:log4j_child.xml}}.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2776) Tika server child restart

2018-11-13 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684907#comment-16684907
 ] 

Mario Bisonti edited comment on TIKA-2776 at 11/13/18 9:01 AM:
---

Hallo Tim.
I have the response:
Error: Repeated service interruptions - failure processing document: The target 
server failed to respond

>From the ManifoldCF side I read the log:
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
interruptions - failure processing document: The target server failed to respond
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
[mcf-pull-agent.jar:?]
Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.manifoldcf.agents.transformation.tikaservice.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:608)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.addOrReplaceDocumentWithException(DocumentFilter.java:208)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
~[mcf-pull-agent.jar:?]
 WARN 2018-11-13T09:50:58,546 (Worker thread '48') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
 WARN 2018-11-13T09:50:58,606 (Worker thread '34') - Service interruption 
reported for job 1533797717712 connection 'WinShare':