R: web crawler https

2023-09-26 Thread Bisonti Mario
Thanks a lot Karl!
I uploaded ssl certificate and flag on “always trust” and it works

Mario


Da: Karl Wright 
Inviato: lunedì 25 settembre 2023 20:41
A: user@manifoldcf.apache.org
Oggetto: Re: web crawler https

See this article:

https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty

ManifoldCF web crawler configuration allows you to drop certs into a local 
trust store for the connection.  You need to either do that (adding whatever 
certificate authority cert you think might be missing), or by checking the 
"trust https" checkbox.

You can generally debug what certs a site might need by trying to fetch a page 
with curl and using verbose debug mode.

Karl


On Mon, Sep 25, 2023 at 10:48 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario


web crawler https

2023-09-25 Thread Bisonti Mario
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario


Documentation issue?

2023-09-14 Thread Bisonti Mario
Hi, I would like to report that at the url: 
https://manifoldcf.apache.org/release/release-2.25/en_US/index.html I obtain:

Not Found
The requested URL was not found on this server.

Thank you
Mario


R: Long Job on Windows Share

2023-06-07 Thread Bisonti Mario
In the manifoldcf.log I see many:
WARN 2023-06-05T21:36:51,630 (Worker thread '31') - JCIFS: Possibly transient 
exception detected on attempt 2 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]


I don’t see information about reindex or not reindex

Have I to search in a differetn file?

Thanks a lot Mario


R: Long Job on Windows Share

2023-05-26 Thread Bisonti Mario
Thanks a lot Karl

In the “Simple History” in ManifoldCF I see, for every document, even if it’s 
not been modified every day:

26/05/23, 08:47:47 document ingest (SolrShare) 
file:/...Avanzato%202014.pptx
26/05/23, 08:47:46 extract [TikaTrasform]  
file:/...Avanzato%202014.pptx
26/05/23, 08:47:45 access  
file:/...Avanzato%202014.pptx


In Solr, I execute the query to search the document and I see, omitting 
extended result..) :

{
  "responseHeader":{
"status":0,
"QTime":977,
"params":{
  "q":"id:*Avanzato*202014*",
  "_":"1685082709862"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":file:/...Avanzato%202014.pptx,
"last_modified":"2015-03-25T17:27:22Z",
"resourcename":"...Avanzato 2014.pptx",

"content_type":["application/vnd.openxmlformats-officedocument.presentationml.presentation"],
"allow_token_document":["Active+Directory:S-1-5-21-…..",
  "Active+Directory:S-1-..."],
"deny_token_document":["Active+Directory:DEAD_AUTHORITY"],
"allow_token_share":["Active+Directory:S-1-1-0"],
"deny_token_share":["Active+Directory:DEAD_AUTHORITY"],
"deny_token_parent":["__nosecurity__"],
"allow_token_parent":["__nosecurity__"],
"content":["ESER..
"_version_":1766940934228934656}]
  }}


Is this what did you mean when you mentioned “activity log” ?

I see that document in Solr, so, I suppose that it is indexed

What could I investigated furthermore?
Thanks a lot

Mario



Da: Karl Wright 
Inviato: venerdì 26 maggio 2023 07:20
A: user@manifoldcf.apache.org
Oggetto: Re: Long Job on Windows Share

The jcifs connector does not include a lot of information in the version string 
for a file - basically, the length, and the modified date.  So I would not 
expect there to be lot of actual work involved if there are no changes to a 
document.

The activity "access" does imply that the system believes that the document 
does need to be reindexed.  It clearly reads the document properly.  I would 
check to be sure it actually indexes the document.  I suspect that your job may 
be reading the file but determining it is not suitable for indexing and then 
repeating that every day.  You can see this by looking for the document in the 
activity log to see what ManifoldCF decided to do with it.

Karl


On Thu, May 25, 2023 at 6:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi,
I would like to understand how recrawl works

My job scan, using “Connection Type”  “Windows shares” works for near 18 hours.
My document numebr a little bit of 1 million.

If I check the documents scan from MifoldCF I see, for example:
[cid:image001.png@01D98FB1.12689F10]

It seems that re work on the document every day even if it hadn’t been modified.
So, is it right or I chose a wrong job to crawl the documents?

Thanks a lot
Mario




Long Job on Windows Share

2023-05-25 Thread Bisonti Mario
Hi,
I would like to understand how recrawl works

My job scan, using "Connection Type"  "Windows shares" works for near 18 hours.
My document numebr a little bit of 1 million.

If I check the documents scan from MifoldCF I see, for example:
[cid:image001.png@01D98F00.F3071580]

It seems that re work on the document every day even if it hadn't been modified.
So, is it right or I chose a wrong job to crawl the documents?

Thanks a lot
Mario




R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-09 Thread Bisonti Mario

Hi, could you give me any suggestion to solve my issue?

I note that to index 1 million documents (office, pdf, ect) so I use Tika, it 
finishes after near 18 hours.

My host is an Ubuntu server with 8cpu and 68GB RAM

Thanks a lot
Mario



Da: Bisonti Mario
Inviato: mercoledì 1 febbraio 2023 17:50
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I don't understand.

Would you like to explain me what "running with a profiler" mean, please?

I start agent running a start-agents.sh script, and zookeeper too.

/opt/manifoldcf/multiprocess-zk-example-proprietary/runzookeeper.sh
/opt/manifoldcf/multiprocess-zk-example-proprietary/start-agents.sh

Where start-agents.sh is:

#!/bin/bash -e

cd /opt/manifoldcf/multiprocess-zk-example-proprietary/

if [ -e "$JAVA_HOME"/bin/java ] ; then
if [ -f ./properties.xml ] ; then
./executecommand.sh -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun
   exit $?

else
echo "Working directory contains no properties.xml file." 1>&2
exit 1
fi

else
echo "Environment variable JAVA_HOME is not properly set." 1>&2
exit 1
fi

Thanks a lot Karl.



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 1 febbraio 2023 17:38
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

It looks like you are running with a profiler?  That uses a lot of memory.
Karl


On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   
https://bugs.launchpad.net/ubuntu/+source/openjdk-lts<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fopenjdk-lts=05%7C01%7CMario.Bisonti%40vimar.com%7C45cb8a6ebe294cdba68208db0472b0f2%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1%7C0%7C638108662863184656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=i3XHLbhk2ij02pOCwde3Pj5bYZRJ7KyjI6eEXDnPRSY%3D=0>
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
I don't understand.

Would you like to explain me what "running with a profiler" mean, please?

I start agent running a start-agents.sh script, and zookeeper too.

/opt/manifoldcf/multiprocess-zk-example-proprietary/runzookeeper.sh
/opt/manifoldcf/multiprocess-zk-example-proprietary/start-agents.sh

Where start-agents.sh is:

#!/bin/bash -e

cd /opt/manifoldcf/multiprocess-zk-example-proprietary/

if [ -e "$JAVA_HOME"/bin/java ] ; then
if [ -f ./properties.xml ] ; then
./executecommand.sh -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun
   exit $?

else
echo "Working directory contains no properties.xml file." 1>&2
exit 1
fi

else
echo "Environment variable JAVA_HOME is not properly set." 1>&2
exit 1
fi

Thanks a lot Karl.



Da: Karl Wright 
Inviato: mercoledì 1 febbraio 2023 17:38
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

It looks like you are running with a profiler?  That uses a lot of memory.
Karl


On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   
https://bugs.launchpad.net/ubuntu/+source/openjdk-lts<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fopenjdk-lts=05%7C01%7CMario.Bisonti%40vimar.com%7C45cb8a6ebe294cdba68208db0472b0f2%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1%7C0%7C638108662863184656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=i3XHLbhk2ij02pOCwde3Pj5bYZRJ7KyjI6eEXDnPRSY%3D=0>
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back on

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back onto the queue with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-20 Thread Bisonti Mario
I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright 
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back onto the queue with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jc

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright 
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337) 
~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
 [mcf-jcifs-connector.jar:?]

So, I have to stop the agent, restart it, and the crwling continues.

How could I solve my issue?
Thanks a lot.
Mario


JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337) 
~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
 [mcf-jcifs-connector.jar:?]

So, I have to stop the agent, restart it, and the crwling continues.

How could I solve my issue?
Thanks a lot.
Mario


R: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Bisonti Mario
Additional info.

I am using 2.17-dev version



Da: Bisonti Mario
Inviato: martedì 28 settembre 2021 17:01
A: user@manifoldcf.apache.org
Oggetto: Error: Repeated service interruptions - failure processing document: 
Read timed out

Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}<mailto:o.e.j.s.h.ContextHandler@3dd69f5a%7b/,null,AVAILABLE%7d>
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput

Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-28 Thread Bisonti Mario
Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:638)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.close(JettyHTTPDestination.java:329)
at 

R: Memory problem on Agent ?

2020-10-05 Thread Bisonti Mario
I think that you refer to options.env.unix ?

sudo nano options.env.unix
-Xmx32768m
-Xmx32768m




Da: Karl Wright 
Inviato: venerdì 2 ottobre 2020 17:31
A: user@manifoldcf.apache.org
Oggetto: Re: Memory problem on Agent ?

Please check your -Xmx switch.

Memory will not be released because that is not how Java works.  It allocates 
the memory it needs and periodically garbage collects within that.  You have 
given it too much memory and you should not expect Java to release it ever.  
The solution is to give it less.  A rule of thumb is to leave 10gb free for 
system usage and divide the remainder among your Java processes.

Thanks,
Karl


On Fri, Oct 2, 2020 at 11:21 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes, buti t seems that, when the indexing finished, the memory is not released


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 2 ottobre 2020 17:14
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Memory problem on Agent ?

Hi Mario,

Java processes only use the memory you hand them.

It looks like you are handing Java more memory than your machine has.

This will not work.

Karl


On Fri, Oct 2, 2020 at 10:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Hallo.

When I scan the content of Repository , I note that memory used is very high 
and it isn’t released

i.e. 60GB on 70GB available

I tried to free shutting down tjhe agent but I am not able:

/opt/manifoldcf/multiprocess-zk-example-proprietary/stop-agents.sh
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7f4d5800, 
34359738368, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 34359738368 bytes for 
committing reserved memory.
# An error report file with more information is saved as:
# /opt/manifoldcf/multiprocess-zk-example-proprietary/hs_err_pid2796.log

So, to free memory, I have to restart the server
How could I solve this?

Thanks a lot
Mario



R: Memory problem on Agent ?

2020-10-02 Thread Bisonti Mario
Yes, buti t seems that, when the indexing finished, the memory is not released


Da: Karl Wright 
Inviato: venerdì 2 ottobre 2020 17:14
A: user@manifoldcf.apache.org
Oggetto: Re: Memory problem on Agent ?

Hi Mario,

Java processes only use the memory you hand them.

It looks like you are handing Java more memory than your machine has.

This will not work.

Karl


On Fri, Oct 2, 2020 at 10:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Hallo.

When I scan the content of Repository , I note that memory used is very high 
and it isn’t released

i.e. 60GB on 70GB available

I tried to free shutting down tjhe agent but I am not able:

/opt/manifoldcf/multiprocess-zk-example-proprietary/stop-agents.sh
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7f4d5800, 
34359738368, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 34359738368 bytes for 
committing reserved memory.
# An error report file with more information is saved as:
# /opt/manifoldcf/multiprocess-zk-example-proprietary/hs_err_pid2796.log

So, to free memory, I have to restart the server
How could I solve this?

Thanks a lot
Mario



Memory problem on Agent ?

2020-10-02 Thread Bisonti Mario

Hallo.

When I scan the content of Repository , I note that memory used is very high 
and it isn't released

i.e. 60GB on 70GB available

I tried to free shutting down tjhe agent but I am not able:

/opt/manifoldcf/multiprocess-zk-example-proprietary/stop-agents.sh
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7f4d5800, 
34359738368, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 34359738368 bytes for 
committing reserved memory.
# An error report file with more information is saved as:
# /opt/manifoldcf/multiprocess-zk-example-proprietary/hs_err_pid2796.log

So, to free memory, I have to restart the server
How could I solve this?

Thanks a lot
Mario



R: Job interrupted

2020-08-26 Thread Bisonti Mario
Hallo Karl

Thanks a lot!
A recompiled with the last modification and my job ends right!

Thanks a lot
Mario


Da: Karl Wright 
Inviato: lunedì 24 agosto 2020 15:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job interrupted

Ok, I found the 'hard fail' situation.  Here is a patch to fix it:

Index: 
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
===
--- 
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
  (revision 1881006)
+++ 
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
  (working copy)
@@ -1349,7 +1349,7 @@
   Logging.connectors.warn("JCIFS: 'File in Use' response when "+activity+" 
for "+documentIdentifier+": retrying...",se);
   // 'File in Use' skip the document and keep going
   throw new ServiceInterruption("Timeout or other service interruption: 
"+se.getMessage(),se,currentTime + 30L,
-currentTime + 3 * 60 * 6L,-1,true);
+currentTime + 3 * 60 * 6L,-1,false);
 }
 else if (se.getMessage().indexOf("cannot find") != -1 || 
se.getMessage().indexOf("cannot be found") != -1)
 {

I'll commit to trunk as well.
Karl

On Mon, Aug 24, 2020 at 9:19 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
Ok, then let me examine the code and see why it's not catching it.
Karl


On Mon, Aug 24, 2020 at 8:49 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes, I see only that exception inside the manifoldcf.log and the job stops with:


Error: Repeated service interruptions - failure processing document: The 
process cannot access the file because it is being used by another process.


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: lunedì 24 agosto 2020 12:27
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job interrupted

Well, we look for certain kinds of exceptions from JCIFS and allow the job to 
continue if we can't succeed.  You have to be sure though that the failure was 
from *that* exception.  The reason I point that out is because we have already 
a check for that, I believe.

Karl


On Mon, Aug 24, 2020 at 5:55 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes, but after I obtain:

Error: Repeated service interruptions - failure processing document: The 
process cannot access the file because it is being used by another process.

And the job stops


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: lunedì 24 agosto 2020 11:52
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job interrupted

Hi,
That's a warning.  The job will keep running and the document will be retried 
later.

Karl


On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I have some problems about job interrupted.
The job execute a windows share scan

After many errors, sometimes it stops

I see in the manifoldcf.log many errors:


at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use' 
response when getting document version for 
smb://fileserver.net/Workgroups/Dir/Dir2/finename.xlsx<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7C4a9e3897949b4238f8b808d84830d115%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=Nz%2FJ0BjK%2B1yANRbGOPs5SLbEipF9arsGB%2Bt3ngIz%2Fw0%3D=0>:
 retrying...
jcifs.smb.SmbException: The process cannot access the file because it is being 
used by another process.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTr

R: Job interrupted

2020-08-24 Thread Bisonti Mario
Yes, I see only that exception inside the manifoldcf.log and the job stops with:


Error: Repeated service interruptions - failure processing document: The 
process cannot access the file because it is being used by another process.


Da: Karl Wright 
Inviato: lunedì 24 agosto 2020 12:27
A: user@manifoldcf.apache.org
Oggetto: Re: Job interrupted

Well, we look for certain kinds of exceptions from JCIFS and allow the job to 
continue if we can't succeed.  You have to be sure though that the failure was 
from *that* exception.  The reason I point that out is because we have already 
a check for that, I believe.

Karl


On Mon, Aug 24, 2020 at 5:55 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes, but after I obtain:

Error: Repeated service interruptions - failure processing document: The 
process cannot access the file because it is being used by another process.

And the job stops


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: lunedì 24 agosto 2020 11:52
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job interrupted

Hi,
That's a warning.  The job will keep running and the document will be retried 
later.

Karl


On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I have some problems about job interrupted.
The job execute a windows share scan

After many errors, sometimes it stops

I see in the manifoldcf.log many errors:


at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use' 
response when getting document version for 
smb://fileserver.net/Workgroups/Dir/Dir2/finename.xlsx<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7Cd726636fb2744bb0882c08d848185962%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=lvpKI2hFeY40s4vgbQViO%2BfxXQBivrz4CFD3kHNKy2Q%3D=0>:
 retrying...
jcifs.smb.SmbException: The process cannot access the file because it is being 
used by another process.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.queryPath(SmbFile.java:763) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.exists(SmbFile.java:844) ~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Timeout or other service 
interruption: The process cannot access the file because it is being used by 
another process.


What  could I check?

Thanks a lot
Mario


R: Job interrupted

2020-08-24 Thread Bisonti Mario
Yes, but after I obtain:

Error: Repeated service interruptions - failure processing document: The 
process cannot access the file because it is being used by another process.

And the job stops


Da: Karl Wright 
Inviato: lunedì 24 agosto 2020 11:52
A: user@manifoldcf.apache.org
Oggetto: Re: Job interrupted

Hi,
That's a warning.  The job will keep running and the document will be retried 
later.

Karl


On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I have some problems about job interrupted.
The job execute a windows share scan

After many errors, sometimes it stops

I see in the manifoldcf.log many errors:


at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use' 
response when getting document version for 
smb://fileserver.net/Workgroups/Dir/Dir2/finename.xlsx<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7Ca26fd37fa4af4fe8b96708d848135dc1%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=iMDk%2FqLW6FLe3gPsqwKVba6OFJw7HZd5XoRTUQGH7tg%3D=0>:
 retrying...
jcifs.smb.SmbException: The process cannot access the file because it is being 
used by another process.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.queryPath(SmbFile.java:763) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.exists(SmbFile.java:844) ~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Timeout or other service 
interruption: The process cannot access the file because it is being used by 
another process.


What  could I check?

Thanks a lot
Mario


Job interrupted

2020-08-24 Thread Bisonti Mario
Hallo.
I have some problems about job interrupted.
The job execute a windows share scan

After many errors, sometimes it stops

I see in the manifoldcf.log many errors:


at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use' 
response when getting document version for 
smb://fileserver.net/Workgroups/Dir/Dir2/finename.xlsx: retrying...
jcifs.smb.SmbException: The process cannot access the file because it is being 
used by another process.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.queryPath(SmbFile.java:763) ~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.exists(SmbFile.java:844) ~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Timeout or other service 
interruption: The process cannot access the file because it is being used by 
another process.


What  could I check?

Thanks a lot
Mario


R: How to reset job status

2020-08-19 Thread Bisonti Mario
After your suggestions, I solved and the job finished

Thanks


Da: Karl Wright 
Inviato: mercoledì 19 agosto 2020 12:51
A: user@manifoldcf.apache.org
Oggetto: Re: How to reset job status

So Mario,

First it appears that you mysteriously cannot build where everyone else can.  
Now you are having mysterious problems with ManifoldCF being able to do basic 
state transitions.  I'm unable to reproduce any of these things.  More 
worrisome, you seem to have the opinion that rather than fix underlying 
deployment or infrastructure issues, the right solution is just to hack away at 
the database or the code.

This doesn't work for me.

I'd like to help you out here but there's a basic level of cooperation needed 
for that.  The way you do deployments in ManifoldCF that we know will be 
successful is by starting with one of the distribution examples and (if needed) 
modifying that to meet your individual needs.  If you are having bizarre things 
take place, almost always it's because you didn't start with one of the 
examples and therefore you wound up configuring things in a bizarre way.  So if 
you cannot get past your current problem, I STRONGLY recommend you start over:

- Checkout a new copy of trunk and build following the instructions I gave in 
the other email thread.  Follow them to the letter please.
- Pick your deployment model.
- Point it at your database instance.
- Start it USING THE SCRIPTS PROVIDED.

Your problems should resolve.  If not, you should have logging in 
manifoldcf.log telling you what is going wrong.

Karl


On Wed, Aug 19, 2020 at 6:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
You do not see log output.  Therefore I need to ask you some questions.

What deployment model are you using?  single process or multi-process?  what is 
the synchronization method?

On Wed, Aug 19, 2020 at 6:38 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
Usually when you shut down the agents process (or the whole thing) and restart 
it will fix problems like that UNLESS the problem persists because a step in 
the state flow is failing.  If it is failing you would see log output.  Do you 
see log output?

Karl


On Wed, Aug 19, 2020 at 5:40 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I haven’t a notification connector, buti it isn’t the problem.
Manifoldcf.log is empty

The problemi s that job is on hanging state and I would like to reset its state



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 19 agosto 2020 11:31
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to reset job status

There should be output in your manifoldcf.log file, no?  This may be the result 
of you not having a notification connector's code actually registered so you 
get no class found errors.  The only solution is to put the missing jar in 
place and restart your agents process.  Have a look at the log to confirm.

Karl


On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
I have a job in a status “End notification” that hangs on this state.

Is there a way to reset it?

I tried the script lock-clean.sh without effect.

In thise state I am not able to manage jobs.


 What could I try, please?


Thanks a lot
Mario


R: How to reset job status

2020-08-19 Thread Bisonti Mario
No, I haven’t a notification connector, buti it isn’t the problem.
Manifoldcf.log is empty

The problemi s that job is on hanging state and I would like to reset its state



Da: Karl Wright 
Inviato: mercoledì 19 agosto 2020 11:31
A: user@manifoldcf.apache.org
Oggetto: Re: How to reset job status

There should be output in your manifoldcf.log file, no?  This may be the result 
of you not having a notification connector's code actually registered so you 
get no class found errors.  The only solution is to put the missing jar in 
place and restart your agents process.  Have a look at the log to confirm.

Karl


On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
I have a job in a status “End notification” that hangs on this state.

Is there a way to reset it?

I tried the script lock-clean.sh without effect.

In thise state I am not able to manage jobs.


 What could I try, please?


Thanks a lot
Mario


How to reset job status

2020-08-19 Thread Bisonti Mario
Hallo
I have a job in a status “End notification” that hangs on this state.

Is there a way to reset it?

I tried the script lock-clean.sh without effect.

In thise state I am not able to manage jobs.


 What could I try, please?


Thanks a lot
Mario


R: Manifold with OpenJDK

2019-10-17 Thread Bisonti Mario
Hallo, I use Ubuntu 18.04.02 LTS with:
openjdk version "11.0.4" 2019-07-16

And I have no issue with ManifoldCF

Mario

Da: Markus Schuch 
Inviato: giovedì 17 ottobre 2019 07:35
A: user@manifoldcf.apache.org; Praveen Bejji 
Oggetto: Re: Manifold with OpenJDK

Hi Praveen,

we use openjdk 8 in dockered red hat linux for 2 years now and didn't have 
problems with it.

We had one minor issue when we migrated: the image processing capabilities of 
openjdk are somehow different from Oracle JDK. One of our connectors creates 
image thumbnails and on openjdk some results had weird colours.

Cheers
Markus
Am 16. Oktober 2019 17:32:16 MESZ schrieb Praveen Bejji 
mailto:praveen.b...@gmail.com>>:
Hi,

We are planning on using ManifoldCF with Open JDK 1.8 on Linux  server. Can you 
please let us know if there are any known issues/challenges on using ManifldCF 
with Open JDK?


Thanks,
Praveen

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


R: Documentum connection not working

2019-07-16 Thread Bisonti Mario
Hallo.
Thanks..I didn’t read the documentation about the sidecar documentum process.

Thanks a lot.



.

Da: Karl Wright 
Inviato: martedì 16 luglio 2019 13:20
A: user@manifoldcf.apache.org
Oggetto: Re: Documentum connection not working

Are you running the documentum connector sidecar processes?  You need to be 
running those, and the documentum_server process must include a valid DFC 
distribution with a valid configuration file.  This is where the documentum 
server name comes from.

The documentation for "how to build and deploy" describes all this.
Karl


On Tue, Jul 16, 2019 at 6:12 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I am using MCF 2.12
I would like to create a Repository connection to a Documentum Docbase

I obtain always the error:
Connection temporarily failed: Connection refused to host: 127.0.0.1; nested 
exception is: java.net.ConnectException: Connection refused (Connection refused)

I don’t understand if there is a problem about the webtop url?
docbasename=abc_test
webtopbaseurl=http://servenname.domain.net/documentale/component/main/<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fservenname.domain.net%2Fdocumentale%2Fcomponent%2Fmain%2F=01%7C01%7CMario.Bisonti%40vimar.com%7C9bd7ae8e1a33428658cc08d709df9538%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=R0d5bkwvIuLZolGyIRQ07mabD%2FX65%2FVcVdsrdIb8Iz0%3D=0>
docbaseusername=documentum
domain=
docbasepassword=

Or is there any other issue about this?

Furthermore I was trying to create a Authority Connection to Documentum but I 
obtain the error:
Connection failed: Transient remote exception creating session: Connection 
refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: 
Connection refused (Connection refused)
In this case, I don’t see any field about the documentum server name:
docbasename=abc_test
docbaseusername=documentum
domain=
usernamecaseinsensitive=false
cachelifetimemins=1
cachelrusize=1000
usesystemacls=true
docbasepassword=

So how could it work?

Thanks a lot
Mario


Documentum connection not working

2019-07-16 Thread Bisonti Mario
Hallo.

I am using MCF 2.12
I would like to create a Repository connection to a Documentum Docbase

I obtain always the error:
Connection temporarily failed: Connection refused to host: 127.0.0.1; nested 
exception is: java.net.ConnectException: Connection refused (Connection refused)

I don’t understand if there is a problem about the webtop url?
docbasename=abc_test
webtopbaseurl=http://servenname.domain.net/documentale/component/main/
docbaseusername=documentum
domain=
docbasepassword=

Or is there any other issue about this?

Furthermore I was trying to create a Authority Connection to Documentum but I 
obtain the error:
Connection failed: Transient remote exception creating session: Connection 
refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: 
Connection refused (Connection refused)
In this case, I don’t see any field about the documentum server name:
docbasename=abc_test
docbaseusername=documentum
domain=
usernamecaseinsensitive=false
cachelifetimemins=1
cachelrusize=1000
usesystemacls=true
docbasepassword=

So how could it work?

Thanks a lot
Mario


R: Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

2019-02-26 Thread Bisonti Mario
Great !

The problem wasn’t about the driver!
Your suggestion illuminated me!

It isn’t necessary to add to proprierty.xml the lib-proprietary folder, my 
problem was that I deployed in Tomcat:
/opt/manifoldcf/web/war/mcf-api-service.war
/opt/manifoldcf/web/war/mcf-authority-service.war
/opt/manifoldcf/web/war/mcf-crawler-ui.war


So, I undeployed the above .war and deployed:
/opt/manifoldcf/web-proprietary/war/mcf-api-service.war
/opt/manifoldcf/web-proprietary/war/mcf-authority-service.war
/opt/manifoldcf/web-proprietary/war/mcf-crawler-ui.war


And It works!!!

Yeh!
Thanks a lot!
Mario







Da: Karl Wright 
Inviato: martedì 26 febbraio 2019 14:29
A: user@manifoldcf.apache.org
Oggetto: Re: Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'

The UI web interface finds the jar in the deployed war file.  If you are not 
using the proprietary war, it will not contain the jtds jar.

Since you aren't seeing this error in the log for the java-agents process, I 
bet that's what you did.

Karl


On Tue, Feb 26, 2019 at 2:55 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.

The only message that I obtain from the UI web interface, is  'Driver class not 
found: net.sourceforge.jtds.jdbc.Driver',  in the manifoldcf.log no info, so, I 
don’t know how could I investigate on it.

Me too I obtain:
/opt/manifoldcf/lib-proprietary$jar -tf jtds-1.2.4.jar | grep Driver
net/sourceforge/jtds/jdbc/Driver.class

What could I try to solve the issue?
Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: lunedì 25 febbraio 2019 19:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'

Hi, any news here?
Karl

On Wed, Feb 20, 2019 at 1:35 PM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
No, I stand corrected: the right class is in that jar:

>>>>>>
C:\wip\mcf\trunk\dist\lib-proprietary>"c:\Program 
Files\Java\jdk1.8.0_181\bin\jar" -tf jtds-1.2.4.jar | grep Driver
net/sourceforge/jtds/jdbc/Driver.class
<<<<<<

Interesting that it cannot be found.  Can I ask again which process it is that 
dumps this message?  Is it the UI?

Karl


On Wed, Feb 20, 2019 at 11:42 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
It's also possible that the jtds driver now needs another jar included as a 
dependency for working with mssql.
Unfortunately, you'll probably need to figure this out on your own.  Please let 
me know what you find so that I can update instructions or code.

Karl


On Wed, Feb 20, 2019 at 11:31 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
OK, you are basically having trouble with the JDBC connector, not the basic 
functioning of ManifoldCF.  That was not clear.

The JDBC driver class name for MSSQL has likely been updated and we'll need to 
figure out what it got changed to.

Karl


On Wed, Feb 20, 2019 at 11:20 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes, infact:
administrator@sengvivv01:/opt/manifoldcf/multiprocess-zk-example-proprietary$ 
more options.env.unix
-Xms3048m
-Xmx3048m
-Dorg.apache.manifoldcf.configfile=./properties.xml
-cp
.:../lib/mcf-core.jar:../lib/mcf-agents.jar:../lib/mcf-pull-agent.jar:../lib/hsqldb-2.3.2.jar:../lib/postgresql-42.1.3.jar:../lib/commons-codec-1.10.jar:../lib/commons-collections-3.2.1.jar:../l
ib/commons-collections4-4.1.jar:../lib/commons-discovery-0.5.jar:../lib/commons-el-1.0.jar:../lib/commons-exec-1.3.jar:../lib/commons-fileupload-1.3.3.jar:../lib/commons-io-2.5.jar:../lib/common
s-lang-2.6.jar:../lib/commons-lang3-3.6.jar:../lib/commons-logging-1.2.jar:../lib/ecj-4.3.1.jar:../lib/gson-2.8.0.jar:../lib/guava-25.1-jre.jar:../lib/httpclient-4.5.6.jar:../lib/httpcore-4.4.10.jar:../lib/jasper-6.0.35.jar:../lib/jasper-el-6.0.35.jar:../lib/javax.servlet-api-3.1.0.jar:../lib/jna-4.3.0.jar:../lib/jna-platform-4.3.0.jar:../lib/json-simple-1.1.1.jar:../lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:../lib/juli-6.0.35.jar:../lib/log4j-1.2-api-2.4.1.jar:../lib/log4j-api-2.4.1.jar:../lib/log4j-core-2.4.1.jar:../lib/mail-1.4.5.jar:../lib/serializer-2.7.1.jar:../lib/slf4j-api-1.7.25.jar:../lib/slf4j-simple-1.7.25.jar:../lib/velocity-1.7.jar:../lib/xalan-2.7.1.jar:../lib/xercesImpl-2.10.0.jar:../lib/xml-apis-1.4.01.jar:../lib/zookeeper-3.4.10.jar:../lib-proprietary/jtds-1.2.4.jar:../lib-proprietary/mariadb-java-client-1.1.7.jar:../lib-proprietary/mysql-connector-java-5.1.33.jar:


And:
administrator@sengvivv01:/opt/manifoldcf/multiprocess-zk-example-proprietary$ 
sudo -u tomcat ./initialize.sh
[sudo] password for administrator:
Configuration file successfully read
[main] INFO org.apache.zookeeper.ZooKeeper - Client 
environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, 
built on 03/23/2017 10:13 GMT
[main] INFO org.apache.zookeeper.ZooKeeper - Client 
environment:host.name<https

R: Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

2019-02-20 Thread Bisonti Mario
 registered notification connector 
'org.apache.manifoldcf.crawler.notifications.rocketchat.RocketChatConnector'
Successfully registered notification connector 
'org.apache.manifoldcf.crawler.notifications.email.EmailConnector'
Successfully initialized database and registered all connectors
[Shutdown thread] INFO org.apache.zookeeper.ZooKeeper - Session: 
0x1690ba14da3005b closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down 
for session: 0x1690ba14da3005b
[Shutdown thread] INFO org.apache.zookeeper.ZooKeeper - Session: 
0x1690ba14da3005a closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down 
for session: 0x1690ba14da3005a
[Shutdown thread] INFO org.apache.zookeeper.ZooKeeper - Session: 
0x1690ba14da30059 closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down 
for session: 0x1690ba14da30059



Furthermore, my MCF start from Tomcat Service and I configured  
/etc/systemd/system/tomcat.service :
[Unit]
Description=Apache Tomcat Web Application Container
After=network.target

[Service]
Type=forking

Environment=JAVA_HOME= /usr/lib/jvm/java-1.11.0-openjdk-amd64
Environment=CATALINA_PID=/opt/tomcat/temp/tomcat.pid
Environment=CATALINA_HOME=/opt/tomcat
Environment=CATALINA_BASE=/opt/tomcat
Environment='CATALINA_OPTS=-Xms512M -Xmx1024M -server -XX:+UseParallelGC 
-Dorg.apache.manifoldcf.configfile=/opt/manifoldcf/multiprocess-zk-example-proprietary/properties.xml'
Environment='JAVA_OPTS=-Djava.awt.headless=true 
-Djava.security.egd=file:/dev/./urandom'

ExecStart=/opt/tomcat/bin/startup.sh
ExecStop=/opt/tomcat/bin/shutdown.sh

User=tomcat
Group=tomcat
UMask=0007
RestartSec=10
Restart=always

[Install]
WantedBy=multi-user.target


And I added to properties.xml:
  
  

  
  


When I try to create a new “Repository connector”, I selected “JDBC”, after I 
choosed as database “ “MSSQL” I compiled server, username and password, I saved 
and I obtain the error:

Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'



Da: Karl Wright 
Inviato: mercoledì 20 febbraio 2019 17:06
A: user@manifoldcf.apache.org
Oggetto: Re: Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'

You should be doing the following to run initialize.sh:

cd dist/multiprocess-zk-example-proprietary
./initialize.sh

The class path is pulled in from options.env.unix, which should include your 
jar:

>>>>>>
C:\wip\mcf\trunk\dist\multiprocess-zk-example-proprietary>more options.env.unix
-Xms512m
-Xmx512m
-Dorg.apache.manifoldcf.configfile=./properties.xml
-cp
.:../lib/mcf-core.jar:../lib/mcf-agents.jar:../lib/mcf-pull-agent.jar:../lib/hsqldb-2.3.2.jar:../lib/postgresql-42.1.3.jar:../lib/commons-codec-1.10.jar:../lib/commons-collections-3.2.1.jar:../lib/commons-collections4-4.1.jar:../lib/commons-discovery-0.5.jar:../lib/commons-el-1.0.jar:../lib/commons-exec-1.3.jar:../lib/commons-fileupload-1.3.3.jar:../lib/commons-io-2.5.jar:../lib/commons-lang-2.6.jar:../lib/commons-lang3-3.6.jar:../lib/commons-logging-1.2.jar:../lib/ecj-4.3.1.jar:../lib/gson-2.8.0.jar:../lib/guava-25.1-jre.jar:../lib/httpclient-4.5.6.jar:../lib/httpcore-4.4.10.jar:../lib/jasper-6.0.35.jar:../lib/jasper-el-6.0.35.jar:../lib/javax.servlet-api-3.1.0.jar:../lib/jna-4.3.0.jar:../lib/jna-platform-4.3.0.jar:../lib/json-simple-1.1.1.jar:../lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:../lib/juli-6.0.35.jar:../lib/log4j-1.2-api-2.4.1.jar:../lib/log4j-api-2.4.1.jar:../lib/log4j-core-2.4.1.jar:../lib/mail-1.4.5.jar:../lib/serializer-2.7.1.jar:../lib/slf4j-api-1.7.25.jar:../lib/slf4j-simple-1.7.25.jar:../lib/velocity-1.7.jar:../lib/xalan-2.7.1.jar:../lib/xercesImpl-2.10.0.jar:../lib/xml-apis-1.4.01.jar:../lib/zookeeper-3.4.10.jar:../lib-proprietary/jtds-1.2.4.jar:../lib-proprietary/mariadb-java-client-1.1.7.jar:../lib-proprietary/mysql-connector-java-5.1.33.jar:
<<<<<<

If it does not include your jar, it's because you did not rebuild after you 
placed the jar in the appropriate place.

Karl


On Wed, Feb 20, 2019 at 10:58 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The question is: how are you *starting* the processes?  and what process are 
you seeing the error from?  You should *not* need to make any changes to the 
configuration if you put the jar file in place before building.

Karl


On Wed, Feb 20, 2019 at 9:47 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks, Karl but I didn’t download manually the .jar files.

I compiled MCF 2.12 and I found the jar in the lib-proprietary folder.

I added in properties.xml the :

  

I tried to :
initialize.sh the db but I have the same error.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 20 febbraio 2019 15:15
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'

Hi Mario,

You can't just plop down a jar 

R: Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

2019-02-20 Thread Bisonti Mario
Thanks, Karl but I didn’t download manually the .jar files.

I compiled MCF 2.12 and I found the jar in the lib-proprietary folder.

I added in properties.xml the :

  

I tried to :
initialize.sh the db but I have the same error.




Da: Karl Wright 
Inviato: mercoledì 20 febbraio 2019 15:15
A: user@manifoldcf.apache.org
Oggetto: Re: Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'

Hi Mario,

You can't just plop down a jar in a directory and have this work, because 
ManifoldCF requires all JDBC drivers to be in the root classpath.  They are 
therefore built into the classpath, which should happen if you use the startup 
scripts.  Please review the "how-to-build-and-deploy" page.

Karl


On Wed, Feb 20, 2019 at 9:09 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Hallo, I would like to use MSSQL as repository.

I use /opt/manifoldcf/multiprocess-zk-example-proprietary/ so I added in 
properties.xml
.
.
  
.

My MCF 2.12 was compiled by me.
In the folder:
/opt/manifoldcf/lib-proprietary I ave the files:
jtds-1.2.4.jar
mariadb-java-client-1.1.7.jar
mysql-connector-java-5.1.33.jar
jtds-1.2.4.jar

My tomcat startup script points to:
Environment='CATALINA_OPTS=-Xms512M -Xmx1024M -server -XX:+UseParallelGC 
-Dorg.apache.manifoldcf.configfile=/opt/manifoldcf/multiprocess-zk-example-proprietary/properties.xml'

I restarted all, but when I try to use a MSSQL repository I obtain:
Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

What could I check?
Thanks a lot

Mario



R: Job hang in aborting state for along time

2019-02-11 Thread Bisonti Mario
Today I migrated from postgres 9.3 to postgres 11.1 and it is working well.

I use MCF 2.12


Mario

Da: Karl Wright 
Inviato: lunedì 11 febbraio 2019 13:29
A: user@manifoldcf.apache.org
Oggetto: Re: Job hang in aborting state for along time

I know that 9.x works properly.  I expect later versions to also work properly, 
but I don't have any actual knowledge of that.
Karl


On Mon, Feb 11, 2019 at 5:32 AM Cihad Guzel 
mailto:cguz...@gmail.com>> wrote:
Hi Karl,

Which PostgreSQL version do you recommend to use with MCF 2.12?
 9.3, 9.4,10+ or 11+?

Thanks,
Cihad Guzel


Karl Wright mailto:daddy...@gmail.com>>, 11 Şub 2019 Pzt, 
04:01 tarihinde şunu yazdı:
No, it is not normal.  I expect that the MySQL transaction issues are causing 
lots of problems.

Karl


On Sun, Feb 10, 2019 at 7:13 PM Cihad Guzel 
mailto:cguz...@gmail.com>> wrote:
Hi Karl,

I use MySQL. I'll also try with PostgreSQL.

All docs are processed one day ago. Is it normal for the aborting process or 
finishing up threads to take so long?

Thanks,
Cihad Guzel


Karl Wright mailto:daddy...@gmail.com>>, 11 Şub 2019 Pzt, 
02:37 tarihinde şunu yazdı:
What database is this?
Basically, the "unexpected job status" means that the framework found something 
that should not have been possible, if the database had been properly enforcing 
ACID transactional constraints.  Is this MySQL?  Because if so it's known to 
have this problem.

It also looks like MCF is trying to recover from some other problem (usually a 
database error).  I can tell this because that's what the particular thread in 
question does.  In order to recover, all worker threads must finish up with 
what they are doing and then everything can resync -- and that's not working 
because the database isn't in agreement that all the worker threads are shut 
down.

Karl


On Sun, Feb 10, 2019 at 6:23 PM Cihad Guzel 
mailto:cguz...@gmail.com>> wrote:
Hi,

I try external TIKA extractor. I have 4 continuously file crawler jobs. Two of 
them have external tika extractor. One of them processed all documents that is 
only 98 docs. The job is hanging in "Aborting" state when manually abort. I 
waited more than 1 day and then the state changed.

How can I find the problem?

mysql> SELECT status, errortext, type, startmethod, id FROM jobs;
++---+--+-+---+
| status | errortext | type | startmethod | id|
++---+--+-+---+
| N  | NULL  | C| D   | 1549371059083 |
| X  | NULL  | C| D   | 1549371135463 |
| N  | NULL  | C| D   | 1549371226082 |
| N  | NULL  | C| D   | 1549805173512 |
++---+--+-+---+

I'm not sure this is relevant to it, but I have too many error logging like 
this:

ERROR 2019-02-10T22:47:28,178 (Job reset thread) - Exception tossed: Unexpected 
job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job 
status encountered: 33
at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145) 
~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608) 
~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77) 
[mcf-pull-agent.jar:?]
ERROR 2019-02-10T22:47:28,182 (Job reset thread) - Exception tossed: Unexpected 
job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job 
status encountered: 33
at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145) 
~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8608) 
~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77) 
[mcf-pull-agent.jar:?]


Regards,
Cihad Güzel


Postgres db maintenance

2019-02-08 Thread Bisonti Mario
Hallo.
I noted that my postgres dbname is 28GB

Is there a way to clean old data or do I need to maintan all data in my db ?
Thanks a lot

Mario


R: Job slower

2019-01-28 Thread Bisonti Mario
I read that vacuum full isn’t good for Postgres version >=9.3 so I don’t 
execute it.
Today, after the run of the weekly “vacuumdb --all –analyze” I see that job  
finished in 8 hours and half, so less of the last execution.
I will monitor in the next day if job will finish in more time or not.

I could think to execute daily
“vacuumdb --all –analyze”
if it could help me.



Da: Karl Wright 
Inviato: venerdì 25 gennaio 2019 17:39
A: user@manifoldcf.apache.org
Oggetto: Re: Job slower

Did you try 'vacuum full'?

Karl


On Fri, Jan 25, 2019 at 3:47 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I use MCF 2.12 and postgresql 9.3.25 Solr 7.6 Tika 1.19 on Ubuntu Server 18.04

Weekly I scheduled by crontab  for the user postgres :
15 8 * * Sun vacuumdb --all --analyze
20 10 * * Sun reindexdb postgres
25 10 * * Sun reindexdb dbname

I see that the job that indexes 70 documents daily, runs slower day by day.
It run 8 hours a few of week ago, but now it runs in 12 hours and the number of 
documents are not changed too much.

What could I do to speed up the job?

Thanks a lot
Mario


Job slower

2019-01-25 Thread Bisonti Mario
Hallo.
I use MCF 2.12 and postgresql 9.3.25 Solr 7.6 Tika 1.19 on Ubuntu Server 18.04

Weekly I scheduled by crontab  for the user postgres :
15 8 * * Sun vacuumdb --all --analyze
20 10 * * Sun reindexdb postgres
25 10 * * Sun reindexdb dbname

I see that the job that indexes 70 documents daily, runs slower day by day.
It run 8 hours a few of week ago, but now it runs in 12 hours and the number of 
documents are not changed too much.

What could I do to speed up the job?

Thanks a lot
Mario


R: How to notify mail by SMTP

2019-01-15 Thread Bisonti Mario
Hallo.
I share a small script, if it could be useful for anyone,  to notify by mail 
if, some jobs are not in “Done” status.
I schedule it in the morning, one time a day.

checkjob.sh

#!/bin/bash

# Put on file jobstatuses
curl -u admin:admin http://localhost:8080/mcf-api-service/json/jobstatuses > 
/tmp/curljobstatuses.txt


# Count the number of jobs
numberjobs=$(grep -o -i '"job_id"' /tmp/curljobstatuses.txt | wc -l)

# Count the number of jobs in status “done”
numberdone=$(grep -o -i '{"_type_":"status","_value_":"done"}' 
/tmp/curljobstatuses.txt | wc -l)

# If number of job in “Done” status are not equal to the number of job, I send 
an email
if [ $numberdone -ne $numberjobs ]
then
echo "There are Manifold Job non completed" | mail -s "Job MCF non completed" 
mailrecei...@domain.net
fi

exit 0




Da: Bisonti Mario
Inviato: giovedì 6 dicembre 2018 14:14
A: user@manifoldcf.apache.org
Oggetto: R: How to notify mail by SMTP

Hi Karl.
Yes, I created a “notification connection” but I haven’t the protocol SMTP 
available.
I see only “Host name” “Port” …
[cid:image001.jpg@01D4ACF5.938021C0]


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 6 dicembre 2018 13:49
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to notify mail by SMTP

Hi Mario, there is an email notification connector.  Have you tried to 
configure that?

On Thu, Dec 6, 2018, 3:50 AM Bisonti Mario 
mailto:mario.biso...@vimar.com> wrote:
Hallo.
I would like to notify by mail the end of a job.
I use an smtp server but I am not able how to configure this.


I read 
https://lists.apache.org/list.html?user@manifoldcf.apache.org:dfr=2016-4-1|dto=2019-4-30:smtp<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Flist.html%3Fuser%40manifoldcf.apache.org%3Adfr%3D2016-4-1%257Cdto%3D2019-4-30%3Asmtp=01%7C01%7CMario.Bisonti%40vimar.com%7C79a3838783f14b2fbdb008d65b795c31%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=fiJxEztTRvxtyE63sjX9tkIU3X1j4Mh2SUQT2lP%2Fd%2Fk%3D=0>
 but I understand that there is no way to configure with smtp now, isn’t it’

Thanks a lot Mario


R: How to notify mail by SMTP

2018-12-06 Thread Bisonti Mario
Hi Karl.
Yes, I created a “notification connection” but I haven’t the protocol SMTP 
available.
I see only “Host name” “Port” …
[cid:image002.jpg@01D48D6D.F40AFEE0]


Da: Karl Wright 
Inviato: giovedì 6 dicembre 2018 13:49
A: user@manifoldcf.apache.org
Oggetto: Re: How to notify mail by SMTP

Hi Mario, there is an email notification connector.  Have you tried to 
configure that?

On Thu, Dec 6, 2018, 3:50 AM Bisonti Mario 
mailto:mario.biso...@vimar.com> wrote:
Hallo.
I would like to notify by mail the end of a job.
I use an smtp server but I am not able how to configure this.


I read 
https://lists.apache.org/list.html?user@manifoldcf.apache.org:dfr=2016-4-1|dto=2019-4-30:smtp<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Flist.html%3Fuser%40manifoldcf.apache.org%3Adfr%3D2016-4-1%257Cdto%3D2019-4-30%3Asmtp=01%7C01%7CMario.Bisonti%40vimar.com%7C79a3838783f14b2fbdb008d65b795c31%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=fiJxEztTRvxtyE63sjX9tkIU3X1j4Mh2SUQT2lP%2Fd%2Fk%3D=0>
 but I understand that there is no way to configure with smtp now, isn’t it’

Thanks a lot Mario


How to notify mail by SMTP

2018-12-06 Thread Bisonti Mario
Hallo.
I would like to notify by mail the end of a job.
I use an smtp server but I am not able how to configure this.


I read 
https://lists.apache.org/list.html?user@manifoldcf.apache.org:dfr=2016-4-1|dto=2019-4-30:smtp
 but I understand that there is no way to configure with smtp now, isn’t it’

Thanks a lot Mario


R: External Tika Server

2018-12-05 Thread Bisonti Mario
utside raster
   at 
sun.awt.image.IntegerInterleavedRaster.createWritableChild(IntegerInterleavedRaster.java:470)
   at 
sun.awt.image.IntegerInterleavedRaster.createChild(IntegerInterleavedRaster.java:514)
   at 
sun.java2d.pipe.GeneralCompositePipe.renderPathTile(GeneralCompositePipe.java:106)
   at sun.java2d.pipe.AAShapePipe.renderTiles(AAShapePipe.java:201)
   at sun.java2d.pipe.AAShapePipe.renderPath(AAShapePipe.java:159)
   at sun.java2d.pipe.AAShapePipe.fill(AAShapePipe.java:68)
   at 
sun.java2d.pipe.PixelToParallelogramConverter.fill(PixelToParallelogramConverter.java:164)
   at sun.java2d.pipe.ValidatePipe.fill(ValidatePipe.java:160)
   at sun.java2d.SunGraphics2D.fill(SunGraphics2D.java:2527)
   at 
org.apache.pdfbox.rendering.GroupGraphics.fill(GroupGraphics.java:418)
   at 
org.apache.pdfbox.rendering.PageDrawer.fillPath(PageDrawer.java:759)
   at 
org.apache.pdfbox.contentstream.operator.graphics.FillNonZeroRule.process(FillNonZeroRule.java:36)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238)
   at 
org.apache.pdfbox.rendering.PageDrawer.access$1800(PageDrawer.java:112)
   at 
org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1641)
   at 
org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1484)
   at 
org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroup(PageDrawer.java:1425)
   at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:66)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
   at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
   at 
org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:254)
   at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:245)
   at 
org.apache.tika.parser.pdf.AbstractPDF2XHTML.doOCROnCurrentPage(AbstractPDF2XHTML.java:329)
   at 
org.apache.tika.parser.pdf.AbstractPDF2XHTML.endPage(AbstractPDF2XHTML.java:418)
   at 
org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:162)
   at 
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:393)
   at 
org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
   at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
   at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
   at 
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
   at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
   ... 42 more
INFO  tika (application/pdf)
WARN  No Unicode mapping for arrowhookright (45) in font LSUPIB+CMMI10

On Tue, Dec 4, 2018 at 3:36 PM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

In my tika server, I added:
-spawnChild -taskTimeoutMillis 100
To bypass the timeout problem

Mario


Da: Furkan KAMACI mailto:furkankam...@gmail.com>>
Inviato: martedì 4 dicembre 2018 10:16
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>; Rafa Haro 
mailto:rh...@apache.org>>
Oggetto: Re: External Tika Server

Hi Rafa,

I can parse same document via HTTP URL of Tika Server. I thought that there 
maybe a timeout parameter within ManifoldCF while communicating with Tika 
Server :)

Kind Regards,
Furkan KAMACI

On Tue, Dec 4, 2018 at 12:13 PM Rafa Haro 
mailto:rh...@apache.org>> wrote:
Hi Furkan,

You seem to be getting a Timeout from Tesseract. This might be happening with 
large documents (too many pages). Maybe there is some configuration parameter 
for increasing timeouts that you can use at Tika side

Rafa

On Tue, Dec 4, 2018 at 9:58 AM Furkan KAMACI 
mailto:furkankam...@gmail.com>> wrote:
Hi,

I try to test external OCR capabilities of Tika Server with ManifoldCF 2.11. 
Documents are parsed when I curl documents into Tika Server directly. However, 
when I try to parse them via Tika Server I get that error at most of the 
documents (not all of them):

INFO  meta (application/

R: External Tika Server

2018-12-04 Thread Bisonti Mario

In my tika server, I added:
-spawnChild -taskTimeoutMillis 100
To bypass the timeout problem

Mario


Da: Furkan KAMACI 
Inviato: martedì 4 dicembre 2018 10:16
A: user@manifoldcf.apache.org; Rafa Haro 
Oggetto: Re: External Tika Server

Hi Rafa,

I can parse same document via HTTP URL of Tika Server. I thought that there 
maybe a timeout parameter within ManifoldCF while communicating with Tika 
Server :)

Kind Regards,
Furkan KAMACI

On Tue, Dec 4, 2018 at 12:13 PM Rafa Haro 
mailto:rh...@apache.org>> wrote:
Hi Furkan,

You seem to be getting a Timeout from Tesseract. This might be happening with 
large documents (too many pages). Maybe there is some configuration parameter 
for increasing timeouts that you can use at Tika side

Rafa

On Tue, Dec 4, 2018 at 9:58 AM Furkan KAMACI 
mailto:furkankam...@gmail.com>> wrote:
Hi,

I try to test external OCR capabilities of Tika Server with ManifoldCF 2.11. 
Documents are parsed when I curl documents into Tika Server directly. However, 
when I try to parse them via Tika Server I get that error at most of the 
documents (not all of them):

INFO  meta (application/msword)
WARN  meta: Text extraction failed
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:402)
at 
org.apache.tika.server.resource.MetadataResource.parseMetadata(MetadataResource.java:126)
at 
org.apache.tika.server.resource.MetadataResource.getMetadata(MetadataResource.java:60)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)
at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
Caused by: 

R: Job stuck without message

2018-11-28 Thread Bisonti Mario
I attatched  a row that correspond to a row of one of these documents in this 
mail



I obtain the pid of:
"/bin/bash -e  
/opt/manifoldcf/multiprocess-zk-example-proprietary/start-agents.sh"
The pid is 1233

I tried to use
sudo jstack -l 1233 > /tmp/jstack_start_agent.log

but I obtain:
1233: Unable to open socket file /proc/1233/cwd/.attach_pid1233: target process 
1233 doesn't respond within 10500ms or HotSpot VM not loaded

Perhaps isn’t it the right way to obtain a thread dump?
Excuse me but I am not a Linux expert..




Da: Karl Wright 
Inviato: mercoledì 28 novembre 2018 16:36
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

Another thing you could do is get a thread dump of the agents process.

Karl


On Wed, Nov 28, 2018 at 10:35 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
Can you look into the database jobqueue table and provide a row that 
corresponds to one of these documents?

Thanks,
Karl


On Wed, Nov 28, 2018 at 10:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
Repository has Max connection=10

In the Document Status report” I see many item with :
State=“Not yet processed”
Status=”Ready for processing”
Scheduled=01-01-1970 01:00:00.000”
Scheduled Action=”Process”




But the job no more walk..


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 28 novembre 2018 16:03
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

"Pipe instances are busy" occurs because you are overloading the SMB access to 
your servers.  How many connections do you have allocated for your repository 
connection?  You probably want to limit this to 2-3 if you see this error a 
lot, and it appears you do.

" Tika Server: Tika Server rejects: Tika Server rejected document with the 
following reason: Unprocessable Entity" means the document is not properly 
formed XML.  The rejection will mean the document isn't indexed, but this will 
not stop the job.

If nothing is happening and you don't know why, I'd suggest looking at the 
Document Status report to figure out what documents are not being processed and 
why.  It is quite possible they are all in the process of being retried because 
of the "Pipe instances" issue above.

Karl

On Wed, Nov 28, 2018 at 9:46 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo Karl.
I take this ticket because,now, after I use zookeeper, my job works for 7 hours 
and now it is in hang status.
I see running but it seems hanging, no log from 1 hour

This is the last manifoldcf.log lines:


at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2446)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1222)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-11-28T14:46:21,524 (Worker thread '59') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb

R: Job stuck without message

2018-11-28 Thread Bisonti Mario
.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2446)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1222)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-11-28T14:46:55,175 (Worker thread '83') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2446)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1222)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]



I don’t know what to check.

Tika server is ok, and it doesn’t more restart it .



Da: Karl Wright 
Inviato: martedì 6 novembre 2018 15:27
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

I added a couple of questions to the ticket.  Please reply.

Thanks,
Karl


On Tue, Nov 6, 2018 at 8:56 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks a lot, Karl.
I created a ticket.
https://issues.apache.org/jira/browse/CONNECTORS-1554<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCONNECTORS-1554=01%7C01%7CMario.Bisonti%40vimar.com%7Cdad620c5f4514577be9108d643f402cc%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=jVvnNY%2FHnfc6K805bt1YdO3nsIkpoD9An4BhkMRYOU0%3D=0>


Thanks

Mario



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 6 novembre 2018 14:28
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

ok, can you create a ticket?  Also, I'd appreciate it if you can look at the 
simple history for one of these documents; I need to see what happened to it 
last.

Thanks,
Karl


On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
My version is 2.11




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 6 novembre 2018 13:07
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

Thanks.
What version of ManifoldCF are you using?  We fixed a problem a while back 
having to do with documents that (because of error processing) get put into a 
"ready for processing" state which don't have any document priority set.  But 
this should have been addressed, certainly, by the most recent release and 
probably by 2.10 as well.

Karl


On Tue, Nov 6, 2018 at 5:43 AM Bisonti Mario 
mailto:mario.biso..

Error Job stop after repeatidly interruption

2018-11-08 Thread Bisonti Mario
Hallo.

I am trying to index more than 500 documents in a Windows Share.

It happens that job is interrupted due to repeatidly interruption.
This is the manifold.log:
.
.
WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
WARN 2018-11-07T21:53:25,476 (Worker thread '89') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
WARN 2018-11-07T21:53:33,814 (Worker thread '15') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-11-07T21:53:57,861 (Worker thread '12') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 

R: Job stuck without message

2018-11-06 Thread Bisonti Mario
Thanks a lot, Karl.
I created a ticket.
https://issues.apache.org/jira/browse/CONNECTORS-1554


Thanks

Mario



Da: Karl Wright 
Inviato: martedì 6 novembre 2018 14:28
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

ok, can you create a ticket?  Also, I'd appreciate it if you can look at the 
simple history for one of these documents; I need to see what happened to it 
last.

Thanks,
Karl


On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
My version is 2.11




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 6 novembre 2018 13:07
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

Thanks.
What version of ManifoldCF are you using?  We fixed a problem a while back 
having to do with documents that (because of error processing) get put into a 
"ready for processing" state which don't have any document priority set.  But 
this should have been addressed, certainly, by the most recent release and 
probably by 2.10 as well.

Karl


On Tue, Nov 6, 2018 at 5:43 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo Karl.
When it hangs I see in the Queue status:

And in the Document Status:


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 19:32
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

What I am interested in now is the Document Status report for any one of the 
documents that is 'stuck'.  The next crawl time value is the critical field.  
Can you include an example?

Karl

On Tue, Oct 30, 2018, 12:36 PM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks a lot, Karl.

It happens that the job starts, it works and index for an hour and after it 
frezzes, I haven’t error or waiting status in Document Queue o Simple History, 
I have only “OK” status so, I haven’t failures.

I am not able to see other log errors other from the manifoldcf.log

Solr server is ok
Tika server is ok
Agent is ok
Tomcat with ManifoldCF is ok

I could search if I could to put in info log mode for example Tika servrer or 
Solr.

Thanks..


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 16:38
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

Hi Mario,

Please look at the Queue Status report to determine what is waiting and why it 
is waiting.
You can also look at the Simple History to see what has been happening.  If you 
are getting 100% failures in fetching documents then you may need to address 
this because your infrastructure is unhappy.  If the failure is something that 
indicates that the document is never going to be readable, that's a different 
problem and we might need to address that in the connector.

Karl


On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 

R: Job stuck without message

2018-11-06 Thread Bisonti Mario
My version is 2.11




Da: Karl Wright 
Inviato: martedì 6 novembre 2018 13:07
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

Thanks.
What version of ManifoldCF are you using?  We fixed a problem a while back 
having to do with documents that (because of error processing) get put into a 
"ready for processing" state which don't have any document priority set.  But 
this should have been addressed, certainly, by the most recent release and 
probably by 2.10 as well.

Karl


On Tue, Nov 6, 2018 at 5:43 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo Karl.
When it hangs I see in the Queue status:
[cid:image001.png@01D475C5.DCE8D300]

And in the Document Status:
[cid:image002.png@01D475C5.DCE8D300]


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 19:32
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

What I am interested in now is the Document Status report for any one of the 
documents that is 'stuck'.  The next crawl time value is the critical field.  
Can you include an example?

Karl

On Tue, Oct 30, 2018, 12:36 PM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks a lot, Karl.

It happens that the job starts, it works and index for an hour and after it 
frezzes, I haven’t error or waiting status in Document Queue o Simple History, 
I have only “OK” status so, I haven’t failures.

I am not able to see other log errors other from the manifoldcf.log

Solr server is ok
Tika server is ok
Agent is ok
Tomcat with ManifoldCF is ok

I could search if I could to put in info log mode for example Tika servrer or 
Solr.

Thanks..


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 16:38
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

Hi Mario,

Please look at the Queue Status report to determine what is waiting and why it 
is waiting.
You can also look at the Simple History to see what has been happening.  If you 
are getting 100% failures in fetching documents then you may need to address 
this because your infrastructure is unhappy.  If the failure is something that 
indicates that the document is never going to be readable, that's a different 
problem and we might need to address that in the connector.

Karl


On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 20

R: Job stuck without message

2018-10-30 Thread Bisonti Mario
Thanks a lot, Karl.

It happens that the job starts, it works and index for an hour and after it 
frezzes, I haven’t error or waiting status in Document Queue o Simple History, 
I have only “OK” status so, I haven’t failures.

I am not able to see other log errors other from the manifoldcf.log

Solr server is ok
Tika server is ok
Agent is ok
Tomcat with ManifoldCF is ok

I could search if I could to put in info log mode for example Tika servrer or 
Solr.

Thanks..


Da: Karl Wright 
Inviato: martedì 30 ottobre 2018 16:38
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

Hi Mario,

Please look at the Queue Status report to determine what is waiting and why it 
is waiting.
You can also look at the Simple History to see what has been happening.  If you 
are getting 100% failures in fetching documents then you may need to address 
this because your infrastructure is unhappy.  If the failure is something that 
indicates that the document is never going to be readable, that's a different 
problem and we might need to address that in the connector.

Karl


On Tue, Oct 30, 2018 at 10:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)

[cid:image002.jpg@01D47065.DEFF7B40]



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHand

R: Job stuck without message

2018-10-30 Thread Bisonti Mario

Thanks a lot Karl

Yes, I see many docs in the docs queue but they are inactive.

Infact i see that no more docs are indexed in Solr and I see that job is with 
the same number of docs Active (35012)

[cid:image002.jpg@01D47065.DEFF7B40]



Da: Karl Wright 
Inviato: martedì 30 ottobre 2018 13:59
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck without message

The reason the job is "stuck" is because:

' JCIFS: Possibly transient exception detected on attempt 1 while getting share 
security: All pipe instances are busy.'

This means that ManifoldCF will retry this document for a while before it gives 
up on it.  It appears to be stuck but it is not.  You can verify that by 
looking at the Document Queue report to see what is queued and what times the 
various documents will be retried.

Karl


On Tue, Oct 30, 2018 at 5:07 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:22:10,359 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:13,932 (Worker thread '12') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 

Job stuck without message

2018-10-30 Thread Bisonti Mario
Hallo.

I started a job that works for some minutes, and after it stucks.

In the manifoldcf.log I see:
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:21:31,440 (Worker thread '2') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:33,502 (Worker thread '14') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:37,725 (Worker thread '30') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:44,406 (Worker thread '49') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:47,310 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:52,000 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:21:53,526 (Worker thread '15') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:04,511 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:569) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:669) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:776) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:993) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1010) ~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs-1.3.18.3.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs-1.3.18.3.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs-1.3.18.3.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2951) 
~[jcifs-1.3.18.3.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-10-30T09:22:10,359 (Worker thread '27') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:13,932 (Worker thread '12') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:14,274 (Worker thread '23') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:22:19,933 (Worker thread '8') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:23:59,920 (Worker thread '39') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity
WARN 2018-10-30T09:24:09,059 (Worker thread '43') - Tika Server: Tika Server 
rejects: Tika Server rejected document with the following reason: Unprocessable 
Entity



What could I check?

Tika server works as standalone.

Could you help  me?

Thanks a lot

Mario



R: Add field to Output Solr

2018-10-16 Thread Bisonti Mario
I set in the job the connection:

  1.  Repository: WinShare
  2.  Transformation: Allowed Documents
  3.  Transformation: TikaExternal
  4.  Transformation: MetadataExtractor
  5.  Output: SolrShare

so, in
allowed contents I put the allowed mimetypes and extension

in the field mapping I added
[cid:image002.png@01D46574.F9A5A060]
and I unchecked  “keep all metadata”

in the metadata expressions I checked “Keep all incoming metadata” and “remove 
empy metadata values”

Obviously, my solr schema has to contains the field last_author, author besides 
the fields that I specified in the output connection SolrShare tab Schema
[cid:image006.png@01D46574.F9A5A060]


It works, in the solr index I find the field added last_author and author 
(where they aren’t empty)

I hope that my approach is the right way to set the architecture 
ManifoldCF-Solr-Tika

Thanks a lot, Karl for your patience..

Mario




Da: Karl Wright 
Inviato: martedì 16 ottobre 2018 13:11
A: user@manifoldcf.apache.org
Oggetto: Re: Add field to Output Solr

If it's not in your PDFs, Tika won't extract it.
If you merely want to copy another field, you can use the Metadata Adjuster 
transformer to do that.

Karl


On Tue, Oct 16, 2018 at 4:38 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
I am using Tika server as processor of file pdf, doc, etc

I configured:
[cid:image003.png@01D4653C.61DD4040]
In my solr output connection, so, when I index the documents I see the field:
id
last_modified
resourcename
content_type
allow_token_document
deny_token_document
allow_token_share
deny_token_share
stream_size
creator
deny_token_parent
allow_token_parent
content
_version_


In my schema of Solr, I have the field last_author that I would like to be 
indexed.
How can I add it?

Thanks a lot

Mario


Add field to Output Solr

2018-10-16 Thread Bisonti Mario
Hallo
I am using Tika server as processor of file pdf, doc, etc

I configured:
[cid:image003.png@01D4653C.61DD4040]
In my solr output connection, so, when I index the documents I see the field:
id
last_modified
resourcename
content_type
allow_token_document
deny_token_document
allow_token_share
deny_token_share
stream_size
creator
deny_token_parent
allow_token_parent
content
_version_


In my schema of Solr, I have the field last_author that I would like to be 
indexed.
How can I add it?

Thanks a lot

Mario


R: How to set Tika with ManifoldCF and Solr

2018-10-12 Thread Bisonti Mario
Hallo.
I downloaded and compiled ManifoldCF 2.11 from scratch, I used Tika internal 
but I obtain the same problem.
[cid:image002.jpg@01D4621B.29A03030]


Da: Karl Wright 
Inviato: giovedì 11 ottobre 2018 19:29
A: user@manifoldcf.apache.org
Oggetto: Re: How to set Tika with ManifoldCF and Solr

I cannot reproduce your problem.  Perhaps you can download a new instance and 
configure it from scratch using the embedded tika?  If that works it should be 
possible to figure out what the difference is.

Karl

On Thu, Oct 11, 2018, 12:23 PM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I tried to update Solr, Tika server and ManifoldCF to the last versions.

I tried to add another Transformation before the TikaTransformation ti filter 
the alloweddocuments as you suggested in another discussion but nothing..
I always have the same Result Code: EXCLUDEDMIMETYPE


I read other discussion ( 
https://lists.apache.org/thread.html/66a3f9780bbcc98e404e25f5a0e56a8a6c007448642c3bc15a366ed2@%3Cuser.manifoldcf.apache.org%3E<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2F66a3f9780bbcc98e404e25f5a0e56a8a6c007448642c3bc15a366ed2%40%253Cuser.manifoldcf.apache.org%253E=01%7C01%7CMario.Bisonti%40vimar.com%7Cca702fb02bb5403daa0108d62f9f2d09%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=eV%2Ftev1zr1%2FqlK4ZewkddCqjn2KylGbc%2B78bVk%2FjzjU%3D=0>)
  but I don’t understand if they solved the issue

☹

Thanks a lot.
Mario






Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 14:57
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

When you don't check the "use extracting update handler" field is UNCHECKED, 
the mime types you list are IGNORED.  Only "text" mime types are accepted by 
the Solr connection in that case.  But that is exactly what the Tika extractor 
sends along, and many other people do this, and I can make it work fine here, 
so I don't know what you are doing wrong.

Karl


On Thu, Oct 11, 2018 at 8:37 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This is my solr output connection:

I tried to put content_type as “Mime type field name:” but the result is always 
the same

Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types 
specified?

I am using a snapshot version of ManifoldCF of three monts  ago.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 14:20
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

I confirmed that both the Tika Service transformer and the Tika transformer 
check the same exact mime type:

>>>>>>
  @Override
  public boolean checkMimeTypeIndexable(VersionContext pipelineDescription, 
String mimeType, IOutputCheckActivity checkActivity)
throws ManifoldCFException, ServiceInterruption
  {
// We should see what Tika will transform
// MHL
// Do a downstream check
return checkActivity.checkMimeTypeIndexable("text/plain;charset=utf-8");
  }
<<<<<<

So: please verify that your Solr connection is set up correctly and the "use 
extracting update handler" box is UNCHECKED.

Thanks,
Karl


On Thu, Oct 11, 2018 at 8:16 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
When you uncheck the "use extracting update handler" checkbox, the Solr 
connection only accepts text/plain, and no binary formats.  The Tika extractor, 
though, should set the mime type always to "text/plain".  Since the Simple 
History says otherwise, I wonder if there's a problem with the external Tika 
extractor.  Perhaps you can try the internal one to get your pipeline working 
first?  If the external one does not send the right mime type, then we need to 
correct that so you should open a ticket.

Thanks,
Karl


On Thu, Oct 11, 2018 at 8:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Now the document isn’t ingested by solr because I obtain:

Solr connector rejected document due to mime type restrictions: 
(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)


But the mime type is on the tab


And the settings worked well when I used Tika inside solr.

Could you help me?
Thanks

Da: Bisonti Mario mailto:mario.biso...@vimar.com>>
Inviato: giovedì 11 ottobre 2018 14:03
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: How to set Tika with ManifoldCF and Solr


My mistake…
As you wrote me I had to uncheck “use extracting update handler”

Now I have to understand the field mentioned in schema etc.

Da: Bisonti Mario mailto:mario.biso...@vimar.com>>
Inviato: giovedì 11 ottobre 2018 13:45
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: How to set Tika with ManifoldCF and Solr

I se

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
This is my solr output connection:

[cid:image002.jpg@01D4616F.EA54D800]

I tried to put content_type as “Mime type field name:” but the result is always 
the same

Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types 
specified?

I am using a snapshot version of ManifoldCF of three monts  ago.




Da: Karl Wright 
Inviato: giovedì 11 ottobre 2018 14:20
A: user@manifoldcf.apache.org
Oggetto: Re: How to set Tika with ManifoldCF and Solr

I confirmed that both the Tika Service transformer and the Tika transformer 
check the same exact mime type:

>>>>>>
  @Override
  public boolean checkMimeTypeIndexable(VersionContext pipelineDescription, 
String mimeType, IOutputCheckActivity checkActivity)
throws ManifoldCFException, ServiceInterruption
  {
// We should see what Tika will transform
// MHL
// Do a downstream check
return checkActivity.checkMimeTypeIndexable("text/plain;charset=utf-8");
  }
<<<<<<

So: please verify that your Solr connection is set up correctly and the "use 
extracting update handler" box is UNCHECKED.

Thanks,
Karl


On Thu, Oct 11, 2018 at 8:16 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
When you uncheck the "use extracting update handler" checkbox, the Solr 
connection only accepts text/plain, and no binary formats.  The Tika extractor, 
though, should set the mime type always to "text/plain".  Since the Simple 
History says otherwise, I wonder if there's a problem with the external Tika 
extractor.  Perhaps you can try the internal one to get your pipeline working 
first?  If the external one does not send the right mime type, then we need to 
correct that so you should open a ticket.

Thanks,
Karl


On Thu, Oct 11, 2018 at 8:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Now the document isn’t ingested by solr because I obtain:

Solr connector rejected document due to mime type restrictions: 
(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)


But the mime type is on the tab


And the settings worked well when I used Tika inside solr.

Could you help me?
Thanks

Da: Bisonti Mario mailto:mario.biso...@vimar.com>>
Inviato: giovedì 11 ottobre 2018 14:03
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: How to set Tika with ManifoldCF and Solr


My mistake…
As you wrote me I had to uncheck “use extracting update handler”

Now I have to understand the field mentioned in schema etc.

Da: Bisonti Mario mailto:mario.biso...@vimar.com>>
Inviato: giovedì 11 ottobre 2018 13:45
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: How to set Tika with ManifoldCF and Solr

I see the job processed but without the document inside.
10-11-2018 13:32:25.649

job end

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1

10-11-2018 13:32:14.211

job start

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1





Have I to uncheck, on my Solr output connection the “Use the Extract Update 
Handler”?






Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 13:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Please have a look at your "Simple History" report to see why the documents 
aren't getting indexed.

Thanks,
Karl


On Thu, Oct 11, 2018 at 7:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in 
ManifoldCF  the problem?





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 12:24
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML 
extractor available for configuration and use.  That is because it's external 
now.
(2) In your Solr connection, you want to uncheck the box that says "use 
extracting update handler", and you want to change the output handler from 
"/update/extract" to just "/update".

Karl


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.



Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Fu

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Now the document isn’t ingested by solr because I obtain:


Solr connector rejected document due to mime type restrictions: 
(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)


But the mime type is on the tab
[cid:image001.jpg@01D4616C.27CBFFF0]


And the settings worked well when I used Tika inside solr.

Could you help me?
Thanks

Da: Bisonti Mario 
Inviato: giovedì 11 ottobre 2018 14:03
A: user@manifoldcf.apache.org
Oggetto: R: How to set Tika with ManifoldCF and Solr


My mistake…
As you wrote me I had to uncheck “use extracting update handler”

Now I have to understand the field mentioned in schema etc.

Da: Bisonti Mario mailto:mario.biso...@vimar.com>>
Inviato: giovedì 11 ottobre 2018 13:45
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: R: How to set Tika with ManifoldCF and Solr

I see the job processed but without the document inside.
10-11-2018 13:32:25.649

job end

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1

10-11-2018 13:32:14.211

job start

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1





Have I to uncheck, on my Solr output connection the “Use the Extract Update 
Handler”?

[cid:image004.jpg@01D4616C.27CBFFF0]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 13:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Please have a look at your "Simple History" report to see why the documents 
aren't getting indexed.

Thanks,
Karl


On Thu, Oct 11, 2018 at 7:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in 
ManifoldCF  the problem?





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 12:24
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML 
extractor available for configuration and use.  That is because it's external 
now.
(2) In your Solr connection, you want to uncheck the box that says "use 
extracting update handler", and you want to change the output handler from 
"/update/extract" to just "/update".

Karl


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.



Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:374) ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use 
Tika because I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/107708451/Data+Extraction+Tika+Embedded+in+Solr+Deactivation+Configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatafari.atlassian.net%2Fwiki%2Fspaces%2FDATAFARI%2Fpages%2F107708451%2FData%2BExtraction%2BTika%2BEmbedded%2Bin%2BSolr%2BDeactivation%2BConfiguration=01%7C01%7CMario.Bisonti%40vimar.com%7C94121032337b4b8c0ed308d62f718964%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=M%2B%2B%2F5IFICTgRKDcmvAwrANaTaS308x1NoR3NsbQUSrY%3D=0>
 but I haven’t Datafari, so, in a Solr standard configuration, how could I 
deactivated the tika ?

Thanks a lot

Mario



R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario

My mistake…
As you wrote me I had to uncheck “use extracting update handler”

Now I have to understand the field mentioned in schema etc.

Da: Bisonti Mario 
Inviato: giovedì 11 ottobre 2018 13:45
A: user@manifoldcf.apache.org
Oggetto: R: How to set Tika with ManifoldCF and Solr

I see the job processed but without the document inside.
10-11-2018 13:32:25.649

job end

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1

10-11-2018 13:32:14.211

job start

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1





Have I to uncheck, on my Solr output connection the “Use the Extract Update 
Handler”?

[cid:image001.jpg@01D4616B.1EDDBBF0]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 13:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Please have a look at your "Simple History" report to see why the documents 
aren't getting indexed.

Thanks,
Karl


On Thu, Oct 11, 2018 at 7:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in 
ManifoldCF  the problem?





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 12:24
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML 
extractor available for configuration and use.  That is because it's external 
now.
(2) In your Solr connection, you want to uncheck the box that says "use 
extracting update handler", and you want to change the output handler from 
"/update/extract" to just "/update".

Karl


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.



Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:374) ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use 
Tika because I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/107708451/Data+Extraction+Tika+Embedded+in+Solr+Deactivation+Configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatafari.atlassian.net%2Fwiki%2Fspaces%2FDATAFARI%2Fpages%2F107708451%2FData%2BExtraction%2BTika%2BEmbedded%2Bin%2BSolr%2BDeactivation%2BConfiguration=01%7C01%7CMario.Bisonti%40vimar.com%7C9721be54f950473568bf08d62f6f02fa%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=zjPUSVPHLvQOtpG8ICN1BnM9ewD4Q9OBcqCcbxUI9Rg%3D=0>
 but I haven’t Datafari, so, in a Solr standard configuration, how could I 
deactivated the tika ?

Thanks a lot

Mario



R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
I see the job processed but without the document inside.
10-11-2018 13:32:25.649

job end

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1

10-11-2018 13:32:14.211

job start

1539153700219(G_IT_Area_condivisa_Mario_XLSM)

0

1





Have I to uncheck, on my Solr output connection the “Use the Extract Update 
Handler”?

[cid:image002.jpg@01D46168.9A8CAA70]





Da: Karl Wright 
Inviato: giovedì 11 ottobre 2018 13:36
A: user@manifoldcf.apache.org
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Please have a look at your "Simple History" report to see why the documents 
aren't getting indexed.

Thanks,
Karl


On Thu, Oct 11, 2018 at 7:10 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in 
ManifoldCF  the problem?





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 11 ottobre 2018 12:24
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML 
extractor available for configuration and use.  That is because it's external 
now.
(2) In your Solr connection, you want to uncheck the box that says "use 
extracting update handler", and you want to change the output handler from 
"/update/extract" to just "/update".

Karl


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.


Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:374) ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use 
Tika because I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/107708451/Data+Extraction+Tika+Embedded+in+Solr+Deactivation+Configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatafari.atlassian.net%2Fwiki%2Fspaces%2FDATAFARI%2Fpages%2F107708451%2FData%2BExtraction%2BTika%2BEmbedded%2Bin%2BSolr%2BDeactivation%2BConfiguration=01%7C01%7CMario.Bisonti%40vimar.com%7Cc642acf35d86405a36ae08d62f6db5f9%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=3TheV5EcDQOPMsvbOZPLjNpOobbhdgsvpysG%2Bua21PM%3D=0>
 but I haven’t Datafari, so, in a Solr standard configuration, how could I 
deactivated the tika ?

Thanks a lot

Mario



R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in 
ManifoldCF  the problem?





Da: Karl Wright 
Inviato: giovedì 11 ottobre 2018 12:24
A: user@manifoldcf.apache.org
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML 
extractor available for configuration and use.  That is because it's external 
now.
(2) In your Solr connection, you want to uncheck the box that says "use 
extracting update handler", and you want to change the output handler from 
"/update/extract" to just "/update".

Karl


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.

[cid:image003.png@01D4614F.84B2AD80]

Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:374) ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use 
Tika because I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/107708451/Data+Extraction+Tika+Embedded+in+Solr+Deactivation+Configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatafari.atlassian.net%2Fwiki%2Fspaces%2FDATAFARI%2Fpages%2F107708451%2FData%2BExtraction%2BTika%2BEmbedded%2Bin%2BSolr%2BDeactivation%2BConfiguration=01%7C01%7CMario.Bisonti%40vimar.com%7Cb423213e15654257911308d62f63b2f3%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=rvkicOO6EdBJaVavJb2dmOMvnd%2Bv3C2oFQsjGSN%2Fy3g%3D=0>
 but I haven’t Datafari, so, in a Solr standard configuration, how could I 
deactivated the tika ?

Thanks a lot

Mario



How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Hallo.
I would like to use Tika server started from command line into ManifoldCF so, 
ManifoldCF as Trasformation connector, process with Tika and index to the 
output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and 
Tika port 998 and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation 
yet created Before the Output Solr.

[cid:image003.png@01D4614F.84B2AD80]

Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:374) ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use 
Tika because I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/107708451/Data+Extraction+Tika+Embedded+in+Solr+Deactivation+Configuration
 but I haven’t Datafari, so, in a Solr standard configuration, how could I 
deactivated the tika ?

Thanks a lot

Mario




R: Different time in Simple History Report

2018-08-14 Thread Bisonti Mario
I used those commands:

sudo service tomcat start
cd /home/administrator/mcfsorce/
sudo svn co https://svn.apache.org/repos/asf/manifoldcf/trunk
cd trunk
sudo ant make-core-deps
sudo ant make-deps
sudo ant build
cd /opt/
ll
sudo mv manifoldcf manifoldcf_ok
sudo mkdir /opt/manifoldcf
sudo cp -a /home/administrator/mcfsorce/trunk/dist/. /opt/manifoldcf/

sudo cp 
/opt/manifoldcf_ok/multiprocess-file-example-proprietary/options.env.unix 
/opt/manifoldcf/multiprocess-file-example-proprietary/
sudo cp /opt/manifoldcf_ok/multiprocess-file-example-proprietary/stop-agents.sh 
/opt/manifoldcf/multiprocess-file-example-proprietary/
sudo cp 
/opt/manifoldcf_ok/multiprocess-file-example-proprietary/start-agents.sh 
/opt/manifoldcf/multiprocess-file-example-proprietary/
sudo cp 
/opt/manifoldcf_ok/multiprocess-file-example-proprietary/start-agents-2.sh 
/opt/manifoldcf/multiprocess-file-example-proprietary/
sudo cp /opt/manifoldcf_ok/multiprocess-file-example-proprietary/properties.xml 
/opt/manifoldcf/multiprocess-file-example-proprietary/

sudo service tomcat start


I obtained some warnings in the compilation but no errors.



Da: Karl Wright 
Inviato: martedì 14 agosto 2018 16:03
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

It does not look at all like you have properly built with the changed source 
code.

Karl


On Tue, Aug 14, 2018 at 9:51 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I am not able to check the file..

Furthermore, I try to explain better the behaviour that I see with the 
attachment



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 15:25
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

There were a number of files committed.


On Tue, Aug 14, 2018 at 9:02 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I don’t obtain a different result..
Where could I check the commit?
Which is the file commited , so I can check it ?
Thanks a lot


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 14:17
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Ok, I committed code that insures that all times displayed in reports are in 
the browser client timezone.  The same timezone is used throughout.  Hopefully 
this will clear up any remaining confusion.

Karl


On Tue, Aug 14, 2018 at 6:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This from 
http://browserspy.dk/date.php<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbrowserspy.dk%2Fdate.php=01%7C01%7CMario.Bisonti%40vimar.com%7Cb1074327e2064b37e73a08d601eea72a%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=OeHHt6VIiRnzdAf3Y099pKA69nMvNl3WvXvkjLweNR4%3D=0>





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 12:20
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Hi Mario,

I did not change how the Start Time filter was defined in the UI at all.  The 
only change was in how the report data was presented.

Can you please check your browser time?

Karl


On Tue, Aug 14, 2018 at 6:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.

In my environment, browser time and server timezone are the same.

Before the modification, I saw that the “Start Time:” filter was right, so in 
the timezone of the browser, equal to the server timezone (Europe/Rome)
Instead the report, had the column “start Time” equal to my timezone plus two 
hours.

After the modification, I see that the “Start Time:” filter timezone less one 
hour,  Instead the report, has the column “start Time” equal to my timezone 
less two hours.

After,for mw, the server timezone, equal to the browser timezone, would be 
right if they would be equal to the “Start time:” filter and to the “Start time 
column”



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 12:04
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Hi Mario,

The UI uses browser time exclusively.  The columns displayed, however, are 
based on the server's time.  This is how MCF functioned up until the year 2016, 
when the columns were changed to display UTC instead.  I reverted that behavior 
with my commit.

I am not sure I know what it is you are asking for me to do here.  Do you want 
all times displayed in UTC?  Or, all times displayed in the browser timezone?  
Or all times displayed in the server timezone?

Karl


On Tue, Aug 14, 2018 at 5:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I compiled, but with this version I see the time 2 hour less of the right time 
and the report seems wrong time by the actual time as you can see in the 
attachment.

S

R: Different time in Simple History Report

2018-08-14 Thread Bisonti Mario
I don’t obtain a different result..
Where could I check the commit?
Which is the file commited , so I can check it ?
Thanks a lot


Da: Karl Wright 
Inviato: martedì 14 agosto 2018 14:17
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Ok, I committed code that insures that all times displayed in reports are in 
the browser client timezone.  The same timezone is used throughout.  Hopefully 
this will clear up any remaining confusion.

Karl


On Tue, Aug 14, 2018 at 6:33 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This from 
http://browserspy.dk/date.php<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbrowserspy.dk%2Fdate.php=01%7C01%7CMario.Bisonti%40vimar.com%7C84606c76fe814a5198d208d601dfdb32%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=qlFuFMjwu%2BXeM%2BGqmtLtRmPMw5QFVhd%2BqpMix7agMZQ%3D=0>


[cid:image002.jpg@01D433CA.F56CDDF0]



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 12:20
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Hi Mario,

I did not change how the Start Time filter was defined in the UI at all.  The 
only change was in how the report data was presented.

Can you please check your browser time?

Karl


On Tue, Aug 14, 2018 at 6:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.

In my environment, browser time and server timezone are the same.

Before the modification, I saw that the “Start Time:” filter was right, so in 
the timezone of the browser, equal to the server timezone (Europe/Rome)
Instead the report, had the column “start Time” equal to my timezone plus two 
hours.

After the modification, I see that the “Start Time:” filter timezone less one 
hour,  Instead the report, has the column “start Time” equal to my timezone 
less two hours.

After,for mw, the server timezone, equal to the browser timezone, would be 
right if they would be equal to the “Start time:” filter and to the “Start time 
column”



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 12:04
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Hi Mario,

The UI uses browser time exclusively.  The columns displayed, however, are 
based on the server's time.  This is how MCF functioned up until the year 2016, 
when the columns were changed to display UTC instead.  I reverted that behavior 
with my commit.

I am not sure I know what it is you are asking for me to do here.  Do you want 
all times displayed in UTC?  Or, all times displayed in the browser timezone?  
Or all times displayed in the server timezone?

Karl


On Tue, Aug 14, 2018 at 5:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I compiled, but with this version I see the time 2 hour less of the right time 
and the report seems wrong time by the actual time as you can see in the 
attachment.

So I rollback to the previous version.





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 17:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Try it now.

Karl


On Fri, Aug 10, 2018 at 10:57 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes
sudo ant make-core-deps
Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]

Total time: 0 seconds
administrator@sengvivv01:~/mcfsorce/trunk$

But the downloaded trunk directory is very small, instead the last trunk was 
bigger:
administrator@sengvivv01:~/mcfsorce$ du -sh tr*
121Mtrunk
1.8Gtrunk_19062018




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 16:47
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Did you first do:

ant make-core-deps

?

Karl


On Fri, Aug 10, 2018 at 5:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wrigh

R: Different time in Simple History Report

2018-08-14 Thread Bisonti Mario
This from http://browserspy.dk/date.php


[cid:image002.jpg@01D433CA.F56CDDF0]



Da: Karl Wright 
Inviato: martedì 14 agosto 2018 12:20
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Hi Mario,

I did not change how the Start Time filter was defined in the UI at all.  The 
only change was in how the report data was presented.

Can you please check your browser time?

Karl


On Tue, Aug 14, 2018 at 6:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.

In my environment, browser time and server timezone are the same.

Before the modification, I saw that the “Start Time:” filter was right, so in 
the timezone of the browser, equal to the server timezone (Europe/Rome)
Instead the report, had the column “start Time” equal to my timezone plus two 
hours.

After the modification, I see that the “Start Time:” filter timezone less one 
hour,  Instead the report, has the column “start Time” equal to my timezone 
less two hours.

After,for mw, the server timezone, equal to the browser timezone, would be 
right if they would be equal to the “Start time:” filter and to the “Start time 
column”



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 14 agosto 2018 12:04
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Hi Mario,

The UI uses browser time exclusively.  The columns displayed, however, are 
based on the server's time.  This is how MCF functioned up until the year 2016, 
when the columns were changed to display UTC instead.  I reverted that behavior 
with my commit.

I am not sure I know what it is you are asking for me to do here.  Do you want 
all times displayed in UTC?  Or, all times displayed in the browser timezone?  
Or all times displayed in the server timezone?

Karl


On Tue, Aug 14, 2018 at 5:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I compiled, but with this version I see the time 2 hour less of the right time 
and the report seems wrong time by the actual time as you can see in the 
attachment.

So I rollback to the previous version.





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 17:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Try it now.

Karl


On Fri, Aug 10, 2018 at 10:57 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes
sudo ant make-core-deps
Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]

Total time: 0 seconds
administrator@sengvivv01:~/mcfsorce/trunk$

But the downloaded trunk directory is very small, instead the last trunk was 
bigger:
administrator@sengvivv01:~/mcfsorce$ du -sh tr*
121Mtrunk
1.8Gtrunk_19062018




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 16:47
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Did you first do:

ant make-core-deps

?

Karl


On Fri, Aug 10, 2018 at 5:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The code that formats the time is here:

>>>>>>
String startTimeString = 
org.apache.manifoldcf.ui.util.Formatter.formatTime(Converter.asLong(row.getValue("starttime")));
<<<<<<

This explicitly uses UTC as the timezone:

>>>>>>
  /** Format a long as an understandable date.
  *@param time is the long.
  *@return the date, as a human-readable string.  This date will be in local 
time.
  */
  public static String formatTime(long time)
  {
Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
c.setTimeInMillis(time);
// We want to format this string in a compact way:
// mm-dd- hh:mm:ss.mmm
StringBuilder returnString = new StringBuilder();
writechars(returnString,c.get(Calendar.MONTH)+1,2);
returnString.append("-");
w

R: Different time in Simple History Report

2018-08-14 Thread Bisonti Mario
Hi Karl.

In my environment, browser time and server timezone are the same.

Before the modification, I saw that the “Start Time:” filter was right, so in 
the timezone of the browser, equal to the server timezone (Europe/Rome)
Instead the report, had the column “start Time” equal to my timezone plus two 
hours.

After the modification, I see that the “Start Time:” filter timezone less one 
hour,  Instead the report, has the column “start Time” equal to my timezone 
less two hours.

After,for mw, the server timezone, equal to the browser timezone, would be 
right if they would be equal to the “Start time:” filter and to the “Start time 
column”



Da: Karl Wright 
Inviato: martedì 14 agosto 2018 12:04
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Hi Mario,

The UI uses browser time exclusively.  The columns displayed, however, are 
based on the server's time.  This is how MCF functioned up until the year 2016, 
when the columns were changed to display UTC instead.  I reverted that behavior 
with my commit.

I am not sure I know what it is you are asking for me to do here.  Do you want 
all times displayed in UTC?  Or, all times displayed in the browser timezone?  
Or all times displayed in the server timezone?

Karl


On Tue, Aug 14, 2018 at 5:13 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
I compiled, but with this version I see the time 2 hour less of the right time 
and the report seems wrong time by the actual time as you can see in the 
attachment.

So I rollback to the previous version.

[cid:image001.jpg@01D433BF.C8289560]




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 17:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Try it now.

Karl


On Fri, Aug 10, 2018 at 10:57 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes
sudo ant make-core-deps
Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]

Total time: 0 seconds
administrator@sengvivv01:~/mcfsorce/trunk$

But the downloaded trunk directory is very small, instead the last trunk was 
bigger:
administrator@sengvivv01:~/mcfsorce$ du -sh tr*
121Mtrunk
1.8Gtrunk_19062018




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 16:47
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Did you first do:

ant make-core-deps

?

Karl


On Fri, Aug 10, 2018 at 5:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The code that formats the time is here:

>>>>>>
String startTimeString = 
org.apache.manifoldcf.ui.util.Formatter.formatTime(Converter.asLong(row.getValue("starttime")));
<<<<<<

This explicitly uses UTC as the timezone:

>>>>>>
  /** Format a long as an understandable date.
  *@param time is the long.
  *@return the date, as a human-readable string.  This date will be in local 
time.
  */
  public static String formatTime(long time)
  {
Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
c.setTimeInMillis(time);
// We want to format this string in a compact way:
// mm-dd- hh:mm:ss.mmm
StringBuilder returnString = new StringBuilder();
writechars(returnString,c.get(Calendar.MONTH)+1,2);
returnString.append("-");
writechars(returnString,c.get(Calendar.DAY_OF_MONTH),2);
returnString.append("-");
writechars(returnString,c.get(Calendar.YEAR),4);
returnString.append(" ");
writechars(returnString,c.get(Calendar.HOUR_OF_DAY),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.MINUTE),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.SECOND),2);
returnString.append(".");
writechars(returnString,c.get(Calendar.MILLISECOND),3);
r

R: Different time in Simple History Report

2018-08-14 Thread Bisonti Mario
Hallo.
I compiled, but with this version I see the time 2 hour less of the right time 
and the report seems wrong time by the actual time as you can see in the 
attachment.

So I rollback to the previous version.

[cid:image001.jpg@01D433BF.C8289560]




Da: Karl Wright 
Inviato: venerdì 10 agosto 2018 17:23
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Try it now.

Karl


On Fri, Aug 10, 2018 at 10:57 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Yes
sudo ant make-core-deps
Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]

Total time: 0 seconds
administrator@sengvivv01:~/mcfsorce/trunk$

But the downloaded trunk directory is very small, instead the last trunk was 
bigger:
administrator@sengvivv01:~/mcfsorce$ du -sh tr*
121Mtrunk
1.8Gtrunk_19062018




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 16:47
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

Did you first do:

ant make-core-deps

?

Karl


On Fri, Aug 10, 2018 at 5:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The code that formats the time is here:

>>>>>>
String startTimeString = 
org.apache.manifoldcf.ui.util.Formatter.formatTime(Converter.asLong(row.getValue("starttime")));
<<<<<<

This explicitly uses UTC as the timezone:

>>>>>>
  /** Format a long as an understandable date.
  *@param time is the long.
  *@return the date, as a human-readable string.  This date will be in local 
time.
  */
  public static String formatTime(long time)
  {
Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
c.setTimeInMillis(time);
// We want to format this string in a compact way:
// mm-dd- hh:mm:ss.mmm
StringBuilder returnString = new StringBuilder();
writechars(returnString,c.get(Calendar.MONTH)+1,2);
returnString.append("-");
writechars(returnString,c.get(Calendar.DAY_OF_MONTH),2);
returnString.append("-");
writechars(returnString,c.get(Calendar.YEAR),4);
returnString.append(" ");
writechars(returnString,c.get(Calendar.HOUR_OF_DAY),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.MINUTE),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.SECOND),2);
returnString.append(".");
writechars(returnString,c.get(Calendar.MILLISECOND),3);
return returnString.toString();
  }
<<<<<<

This was last changed:

>>>>>>
1756230kwright Calendar c = new 
GregorianCalendar(TimeZone.getTimeZone("UTC"), Locale.ROOT);
<<<<<<

The reason for the change:

>>>>>>

r1756230 | kwright | 2016-08-12 18:20:00 -0400 (Fri, 12 Aug 2016) | 1 line

Fix for CONNECTORS-1332.  Committed on behalf of Furkan KAMACI.
<<<<<<
CONNECTORS-1332 is about calling forbidden APIS:

>>>>>>
We should avoid forbidden 
calls<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpoliceman-tools%2Fforbidden-apis%2Fwiki=01%7C01%7CMario.Bisonti%40vimar.com%7C1ec7ca89f95a492b29f008d5fed548ca%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=9YiDIu%2BSaYOvGnqo%2BqN89xc4x3voi58Op%2F2EP07X4UY%3D=0>
 and check for it in the ant build.
<<<<<<

The actual change was:
>>>>>>
C:\wip\mcf\trunk\framework\ui-core\src\main\java\org\apache\manifoldcf\ui\util>svn
 diff -c 1756230
Index: Formatter.java
===
--- Formatter.java  (revision 1756229)
+++ Formatter.java  (revision 1756230)
@@ -32,7 +32,7 @@
   */
   public static String formatTime(long time)
   {
-Calendar c = new GregorianCalenda

R: Different time in Simple History Report

2018-08-10 Thread Bisonti Mario
Yes
sudo ant make-core-deps
Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]

Total time: 0 seconds
administrator@sengvivv01:~/mcfsorce/trunk$

But the downloaded trunk directory is very small, instead the last trunk was 
bigger:
administrator@sengvivv01:~/mcfsorce$ du -sh tr*
121Mtrunk
1.8Gtrunk_19062018




Da: Karl Wright 
Inviato: venerdì 10 agosto 2018 16:47
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Did you first do:

ant make-core-deps

?

Karl


On Fri, Aug 10, 2018 at 5:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The code that formats the time is here:

>>>>>>
String startTimeString = 
org.apache.manifoldcf.ui.util.Formatter.formatTime(Converter.asLong(row.getValue("starttime")));
<<<<<<

This explicitly uses UTC as the timezone:

>>>>>>
  /** Format a long as an understandable date.
  *@param time is the long.
  *@return the date, as a human-readable string.  This date will be in local 
time.
  */
  public static String formatTime(long time)
  {
Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
c.setTimeInMillis(time);
// We want to format this string in a compact way:
// mm-dd- hh:mm:ss.mmm
StringBuilder returnString = new StringBuilder();
writechars(returnString,c.get(Calendar.MONTH)+1,2);
returnString.append("-");
writechars(returnString,c.get(Calendar.DAY_OF_MONTH),2);
returnString.append("-");
writechars(returnString,c.get(Calendar.YEAR),4);
returnString.append(" ");
writechars(returnString,c.get(Calendar.HOUR_OF_DAY),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.MINUTE),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.SECOND),2);
returnString.append(".");
writechars(returnString,c.get(Calendar.MILLISECOND),3);
return returnString.toString();
  }
<<<<<<

This was last changed:

>>>>>>
1756230kwright Calendar c = new 
GregorianCalendar(TimeZone.getTimeZone("UTC"), Locale.ROOT);
<<<<<<

The reason for the change:

>>>>>>

r1756230 | kwright | 2016-08-12 18:20:00 -0400 (Fri, 12 Aug 2016) | 1 line

Fix for CONNECTORS-1332.  Committed on behalf of Furkan KAMACI.
<<<<<<
CONNECTORS-1332 is about calling forbidden APIS:

>>>>>>
We should avoid forbidden 
calls<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpoliceman-tools%2Fforbidden-apis%2Fwiki=01%7C01%7CMario.Bisonti%40vimar.com%7Cb972bc1583c84ceddca208d5fed02a74%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=5MXIWJowhwAncVYJqKA3YzfZ70jLs4c2NbYpkNhWLlg%3D=0>
 and check for it in the ant build.
<<<<<<

The actual change was:
>>>>>>
C:\wip\mcf\trunk\framework\ui-core\src\main\java\org\apache\manifoldcf\ui\util>svn
 diff -c 1756230
Index: Formatter.java
===
--- Formatter.java  (revision 1756229)
+++ Formatter.java  (revision 1756230)
@@ -32,7 +32,7 @@
   */
   public static String formatTime(long time)
   {
-Calendar c = new GregorianCalendar();
+Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
 c.setTimeInMillis(time);
 // We want to format this string in a compact way:
 // mm-dd- hh:mm:ss.mmm
<<<<<<


As you see, formerly the timezone was local time.  The change required an 
explicit timezone in order to pass the forbidden APIs test, and UTC was used.

I am happy to try to change this since it's been this way only since 2016, if I 
can find a way that will not break forbiddenAPIs.

Karl


On Fri, Aug 10, 2018 at 2:42 AM Bisonti Mario 
mailt

R: Different time in Simple History Report

2018-08-10 Thread Bisonti Mario
Thanks Karl.
I tried to compile the trunk version but I obtian:

Buildfile: /home/administrator/mcfsorce/trunk/build.xml
Trying to override old definition of task javac

BUILD FAILED
/home/administrator/mcfsorce/trunk/build.xml:2929: taskdef class 
de.thetaphi.forbiddenapis.ant.AntTask cannot be found
using the classloader AntClassLoader[]





Da: Karl Wright 
Inviato: venerdì 10 agosto 2018 10:53
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

I've committed a change to trunk which will restore the pre-2016 behavior.

Karl

On Fri, Aug 10, 2018 at 3:40 AM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
The code that formats the time is here:

>>>>>>
String startTimeString = 
org.apache.manifoldcf.ui.util.Formatter.formatTime(Converter.asLong(row.getValue("starttime")));
<<<<<<

This explicitly uses UTC as the timezone:

>>>>>>
  /** Format a long as an understandable date.
  *@param time is the long.
  *@return the date, as a human-readable string.  This date will be in local 
time.
  */
  public static String formatTime(long time)
  {
Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
c.setTimeInMillis(time);
// We want to format this string in a compact way:
// mm-dd- hh:mm:ss.mmm
StringBuilder returnString = new StringBuilder();
writechars(returnString,c.get(Calendar.MONTH)+1,2);
returnString.append("-");
writechars(returnString,c.get(Calendar.DAY_OF_MONTH),2);
returnString.append("-");
writechars(returnString,c.get(Calendar.YEAR),4);
returnString.append(" ");
writechars(returnString,c.get(Calendar.HOUR_OF_DAY),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.MINUTE),2);
returnString.append(":");
writechars(returnString,c.get(Calendar.SECOND),2);
returnString.append(".");
writechars(returnString,c.get(Calendar.MILLISECOND),3);
return returnString.toString();
  }
<<<<<<

This was last changed:

>>>>>>
1756230kwright Calendar c = new 
GregorianCalendar(TimeZone.getTimeZone("UTC"), Locale.ROOT);
<<<<<<

The reason for the change:

>>>>>>

r1756230 | kwright | 2016-08-12 18:20:00 -0400 (Fri, 12 Aug 2016) | 1 line

Fix for CONNECTORS-1332.  Committed on behalf of Furkan KAMACI.
<<<<<<
CONNECTORS-1332 is about calling forbidden APIS:

>>>>>>
We should avoid forbidden 
calls<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpoliceman-tools%2Fforbidden-apis%2Fwiki=01%7C01%7CMario.Bisonti%40vimar.com%7Caf3f9916df6b44be7f5008d5fe9eba3f%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=3DoA3TwNmhnHDXmteYxw%2FEa7T9%2FJS3hBjuYSdX9sh6w%3D=0>
 and check for it in the ant build.
<<<<<<

The actual change was:
>>>>>>
C:\wip\mcf\trunk\framework\ui-core\src\main\java\org\apache\manifoldcf\ui\util>svn
 diff -c 1756230
Index: Formatter.java
===
--- Formatter.java  (revision 1756229)
+++ Formatter.java  (revision 1756230)
@@ -32,7 +32,7 @@
   */
   public static String formatTime(long time)
   {
-Calendar c = new GregorianCalendar();
+Calendar c = new GregorianCalendar(TimeZone.getTimeZone("UTC"), 
Locale.ROOT);
 c.setTimeInMillis(time);
 // We want to format this string in a compact way:
 // mm-dd- hh:mm:ss.mmm
<<<<<<


As you see, formerly the timezone was local time.  The change required an 
explicit timezone in order to pass the forbidden APIs test, and UTC was used.

I am happy to try to change this since it's been this way only since 2016, if I 
can find a way that will not break forbiddenAPIs.

Karl


On Fri, Aug 10, 2018 at 2:42 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo Karl.
My server timezone is set as the browser timezone (europe/Rome) as you can see, 
but the list is two hour less my time zone.
So, it seems that the list uses the “universal time” instead of time zone

administrator@sengvivv01:~$ timedatectl
  Local time: Fri 2018-08-10 08:39:28 CEST
  Universal time: Fri 2018-08-10 06:39:28 UTC
RTC time: Fri 2018-08-10 06:39:28
   Time zone: Europe/Rome (CEST, +0200)
   System clock synchronized: yes
systemd-timesyncd.service active: yes
 RTC in local TZ: no


What could I do?
Thanks a lot



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: giovedì 9 agosto 2018 21:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Different time in Simple History Report

R: Different time in Simple History Report

2018-08-10 Thread Bisonti Mario
Hallo Karl.
My server timezone is set as the browser timezone (europe/Rome) as you can see, 
but the list is two hour less my time zone.
So, it seems that the list uses the “universal time” instead of time zone

administrator@sengvivv01:~$ timedatectl
  Local time: Fri 2018-08-10 08:39:28 CEST
  Universal time: Fri 2018-08-10 06:39:28 UTC
RTC time: Fri 2018-08-10 06:39:28
   Time zone: Europe/Rome (CEST, +0200)
   System clock synchronized: yes
systemd-timesyncd.service active: yes
 RTC in local TZ: no


What could I do?
Thanks a lot



Da: Karl Wright 
Inviato: giovedì 9 agosto 2018 21:36
A: user@manifoldcf.apache.org
Oggetto: Re: Different time in Simple History Report

Hi Mario.

The pulldown allows you to select times based on the current (browser) time 
zone.

The display is in *server* timezone.  That accounts for the difference.

Karl


On Thu, Aug 9, 2018 at 10:23 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
I see a difference from the start time in “Simple History Report”

It seems late of 2 hours.

Have I to set timezone for this report?

Thanks a lot
See the attachment


[cid:image003.jpg@01D42FFB.CFEC1780]


Different time in Simple History Report

2018-08-09 Thread Bisonti Mario
Hallo
I see a difference from the start time in “Simple History Report”

It seems late of 2 hours.

Have I to set timezone for this report?

Thanks a lot
See the attachment


[cid:image003.jpg@01D42FFB.CFEC1780]


R: Job stuck internal http error 500

2018-08-08 Thread Bisonti Mario
I substitute all my four .jar tika files 1.17 (parsers, core, java7, xmp)  
versions with the 1.19 versions nightly version and it works!
No more 500 error and the file has been indexed!

From the link:
https://builds.apache.org/job/tika-branch-1x/73/
you can use the subfolder:
Apache Tika core
Apache Tika Java-7 Components
Apache Tika parsers
Apache Tika XMP

I downloaded the:
tika-xmp-1.19-20180807.184545-61.jar
tika-core-1.19-20180807.184018-61.jar
tika-parsers-1.19-20180807.184508-61.jar
tika-java7-1.19-20180807.185414-60.jar
and I renamed them in:
-rw-r--r-- 1 root root  687651 Aug  8 14:16 tika-core-1.19.jar
-rw-r--r-- 1 root root   14012 Aug  8 14:16 tika-java7-1.19.jar
-rw-r--r-- 1 root root 1131862 Aug  8 14:16 tika-parsers-1.19.jar
-rw-r--r-- 1 root root   34447 Aug  8 14:16 tika-xmp-1.19.jar

So, in my /opt/solr-7.3.1/contrib/extraction/lib directory of solr I have:
-rw-r--r-- 1 root root  663109 Dec  9  2017 tika-core-1.17.jarOLD
-rw-r--r-- 1 root root  687651 Aug  8 14:16 tika-core-1.19.jar
-rw-r--r-- 1 root root   13268 Dec  9  2017 tika-java7-1.17.jarOLD
-rw-r--r-- 1 root root   14012 Aug  8 14:16 tika-java7-1.19.jar
-rw-r--r-- 1 root root 1078626 Dec  9  2017 tika-parsers-1.17.jarOO
-rw-r--r-- 1 root root 1131862 Aug  8 14:16 tika-parsers-1.19.jar
-rw-r--r-- 1 root root   33705 Dec  9  2017 tika-xmp-1.17.jarOLD
-rw-r--r-- 1 root root   34447 Aug  8 14:16 tika-xmp-1.19.jar

You have to restart solr to use the new tika version

Tha tika 1.19 version will be released in the next few weeks.

Here is the link about my issue:

https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=16573125#comment-16573125


Mario



Da: Karl Wright 
Inviato: mercoledì 8 agosto 2018 14:54
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

Thanks for the update!

Did the Tika people say when 1.19 will be released?

Karl


On Wed, Aug 8, 2018 at 8:29 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
You had right, Karl.

I have been helped by the tika people and they patched the tika jar of the solr 
installation and the problem was solved!

Now I solved using the tika 1.19 versions nightly build.


Thanks a lot.



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 12:39
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am afraid you will need to open a Tika ticket, and be prepared to attach your 
file to it.

Thanks,

Karl


On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.c

R: Job stuck internal http error 500

2018-08-08 Thread Bisonti Mario
Hallo
You had right, Karl.

I have been helped by the tika people and they patched the tika jar of the solr 
installation and the problem was solved!

Now I solved using the tika 1.19 versions nightly build.


Thanks a lot.



Da: Karl Wright 
Inviato: venerdì 27 luglio 2018 12:39
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am afraid you will need to open a Tika ticket, and be prepared to attach your 
file to it.

Thanks,

Karl


On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:8

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
 

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright 
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.process

Re: Solr connection, max connections and CPU

2018-07-27 Thread Bisonti Mario
Thanks a lot Karl!!!

On 2018/07/26 13:28:47, Karl Wright  wrote:
> Hi Mario,>
>
> There is no connection between the number of CPUs and the number output>
> connections.  You pick the maximum number of output connections based on>
> the number of listening threads that you can use at the same time in Solr.>
>
> Karl>
>
> On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario >
> wrote:>
>
> > Hallo, I setup solr connection in the "Output connections" of Manifold>
> >>
> >>
> >>
> > I don't understand if there is a relation between "Max Connections" and>
> > the number of CPUs in the host.>
> >>
> >>
> >>
> > Could you help me ti understand it?>
> >>
> >>
> >>
> > Thanks a lot>
> >>
> > Mario>
> >>
>


Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processShapes(XSSFExcelExtractorDecorator.java:279)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:185)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:120)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:143)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
at 

Solr connection, max connections and CPU

2018-07-26 Thread Bisonti Mario
Hallo, I setup solr connection in the "Output connections" of Manifold

I don't understand if there is a relation between "Max Connections" and the 
number of CPUs in the host.

Could you help me ti understand it?

Thanks a lot
Mario


Webdav Repository

2018-06-21 Thread Bisonti Mario
Hallo.

Is it possible to scan a remote webdav repository?

I don’t find any info about it

Thanks a lot

Mario


script to schedule MCF Jobs by crontab login unauthorized

2018-06-19 Thread Bisonti Mario
Hallo, I used a script to start remotely a job from crontab on MCF 2.9.1 and it 
worked
The sam script, now, in MCF 2.10 not ork.

Now, I tried this command:

curl -c "cookie" -XPOST 'http://localhost:8080/mcf-api-service/json/LOGIN' -d 
@/SCRIPTS/user.json

wher user.json:
{
"user":"admin",
"password":"admin"
}

And, in the log of Tomcat I obtain the error Unauthorized 401:
POST /mcf-api-service/json/LOGIN HTTP/1.1" 401

Is the security of command called remotely changed?

Thanks a lot

Mario


R: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-19 Thread Bisonti Mario

Hallo Karl!

Now I found how to build (my first building …  ) and now I am using:
/multiprocess-file-example-proprietary folder and I deployed into Tomcat .

I recreated the configuration that I used on binary version and I created the 
same job
It works !!!

I see on manifold cf.lg the error:
WARN 2018-06-19T12:11:52,476 (Worker thread '5') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) 
~[jcifs.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:663) ~[jcifs.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) ~[jcifs.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:775) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:992) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1009) ~[jcifs.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) 
~[jcifs.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) 
~[jcifs.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) ~[jcifs.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) 
~[jcifs.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2946) ~[jcifs.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2438)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1221)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
WARN 2018-06-19T12:11:53,435 (Worker thread '3') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) 
~[jcifs.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:663) ~[jcifs.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) ~[jcifs.jar:?]
at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.send(SmbFile.java:775) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.open0(SmbFile.java:992) ~[jcifs.jar:?]
at jcifs.smb.SmbFile.open(SmbFile.java:1009) ~[jcifs.jar:?]
at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142) 
~[jcifs.jar:?]
at 
jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputStream.java:32)
 ~[jcifs.jar:?]
at 
jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) 
~[jcifs.jar:?]


but it doesn’t stuck no more , so for me it isn’t a problem if they are only 
warning!

Very very good!
Thanks for your help!

Mario


Da: Karl Wright 
Inviato: martedì 19 giugno 2018 09:52
A: user@manifoldcf.apache.org
Oggetto: Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: 
null

Hi Mario,

You cannot patch the binary.  You must build from source to apply the patch.

The easiest way forward is to check out trunk directly (with svn) and build it. 
 The trunk svn URL is 
https://svn.apache.org/repos/asf/manifoldcf/trunk<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fmanifoldcf%2Ftrunk=01%7C01%7CMario.Bisonti%40vimar.com%7Ccea091a50fe347b3c2cc08d5d5b9888e%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=XzTfotr%2B8KS4wjHws3sKU79U9xgrWwfSAS4qK7Y%2F7NI%3D=0>
 .

Karl


On Tue, Jun 19, 2018 at 3:35 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
Note that I specified the mime types on my solr output connection

Furthermore, I used the binary distribution, how cold I path it with tour fix?

I read on my job, that stuck up on 3 docs with:

WARN 2018-06-19T09:29:21,366 (Worker thread '14') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) 
~[jcifs-1.3.19.jar:?]
at jcifs.smb.SmbTransport.send(SmbTransport.java:663) 
~[jcifs-1.3.19.jar:?]
at jcifs.smb.SmbSession.send(SmbSession.java:238) ~[jcif

R: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-19 Thread Bisonti Mario
') - Error tossed: null
java.lang.NullPointerException
FATAL 2018-06-19T09:29:36,849 (Worker thread '42') - Error tossed: null


Thanks a lot for your great help

Da: Karl Wright 
Inviato: lunedì 18 giugno 2018 20:39
A: user@manifoldcf.apache.org
Oggetto: Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: 
null

Created CONNECTORS-1510 and committed a fix.
Karl

On Mon, Jun 18, 2018 at 2:33 PM Karl Wright 
mailto:daddy...@gmail.com>> wrote:
It certainly is a particular file -- the mime type is null, and that's causing 
this line to blow up:
final String lowerMimeType = mimeType.toLowerCase(Locale.ROOT);


That code was added a couple of revs back to address a different problem; it's 
a trivial fix:

final String lowerMimeType = (mimeType != 
null)?mimeType.toLowerCase(Locale.ROOT):null;

(This is HttpPoster line 811)

Karl


On Mon, Jun 18, 2018 at 2:30 PM Steph van Schalkwyk 
mailto:st...@remcam.net>> wrote:

Looks like a particular file may be causing this. Try to find the filanem it 
crashes on and copy that to asmall crawl directory. Repeat crawl.


On Mon, Jun 18, 2018 at 11:34 AM, Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo

I configured ManifoldCF 2.10 with Tomcat 9.0.8 and Postgres 9.3

I configured multiprocess-file-example


When I create a Job to scan a big Windows share (22000 docs word, pdf, etc,) 
manifoldcf crash with the message:
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null
java.lang.NullPointerException
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.checkMimeTypeIndexable(HttpPoster.java:811)
 ~[?:?]
at 
org.apache.manifoldcf.agents.output.solr.SolrConnector.checkMimeTypeIndexable(SolrConnector.java:534)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineCheckEntryPoint.checkMimeTypeIndexable(IncrementalIngester.java:2937)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineCheckFanout.checkMimeTypeIndexable(IncrementalIngester.java:2864)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObject.checkMimeTypeIndexable(IncrementalIngester.java:2589)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.checkMimeTypeIndexable(IncrementalIngester.java:273)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMimeTypeIndexable(WorkerThread.java:2029)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.checkIncludeFile(SharedDriveConnector.java:1439)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector$ProcessDocumentsFilter.accept(SharedDriveConnector.java:4874)
 ~[?:?]
at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:2016) ~[?:?]
at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) ~[?:?]
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) ~[?:?]
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2318)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:798)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]


If I use a smaller windows share it works.

Note that with ManifoldCF 2.9.1 HDSQLDB and QuickStart with Jetty it worked.

What could I do?

Thanks a lot
Mario



FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-18 Thread Bisonti Mario
Hallo

I configured ManifoldCF 2.10 with Tomcat 9.0.8 and Postgres 9.3

I configured multiprocess-file-example


When I create a Job to scan a big Windows share (22000 docs word, pdf, etc,) 
manifoldcf crash with the message:
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null
java.lang.NullPointerException
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.checkMimeTypeIndexable(HttpPoster.java:811)
 ~[?:?]
at 
org.apache.manifoldcf.agents.output.solr.SolrConnector.checkMimeTypeIndexable(SolrConnector.java:534)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineCheckEntryPoint.checkMimeTypeIndexable(IncrementalIngester.java:2937)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineCheckFanout.checkMimeTypeIndexable(IncrementalIngester.java:2864)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObject.checkMimeTypeIndexable(IncrementalIngester.java:2589)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.checkMimeTypeIndexable(IncrementalIngester.java:273)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMimeTypeIndexable(WorkerThread.java:2029)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.checkIncludeFile(SharedDriveConnector.java:1439)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector$ProcessDocumentsFilter.accept(SharedDriveConnector.java:4874)
 ~[?:?]
at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:2016) ~[?:?]
at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) ~[?:?]
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) ~[?:?]
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2318)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:798)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]


If I use a smaller windows share it works.

Note that with ManifoldCF 2.9.1 HDSQLDB and QuickStart with Jetty it worked.

What could I do?

Thanks a lot
Mario


R: connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
I solved!
I executed:
sudo ./initialize.sh
And connectors have been refreshed!

Thanks!



Da: Bisonti Mario 
Inviato: venerdì 15 giugno 2018 11:46
A: user@manifoldcf.apache.org
Oggetto: R: connectors.xml modified: new repository not in the list

I leave the name jcifs-1.3.19.jar without rename it because when I used with 
jetty on HSQLDB it worked.

Now I am using MCF 2.10

Thanks




Oggetto: RE: connectors.xml modified: new repository not in the list

Hello Mario,

Your jcifs is named jcifs.jar or jcifs-1.3.19.jar ?

What is your ManifoldCF version ?

Maxence,




Objet : connectors.xml modified: new repository not in the list

Hallo.
I installed ManifoldCF on Tomcat and with postgres and I am configuring for use 
with folder /manifoldcf/multiprocess-file-example

Now I would like to add repository "Windows Shares" so I decommented in 
connectors.xml:


And I added on connector-lib-proprietary jcifs-1.3.19.jar

I restarted Tomcat but I don't see the choice "Windows shares" repository from 
manifoldcf
It seems that It didn't reload the list of the connectors.
How could I refresh them?
Thanks a lot
Mario




R: connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
I leave the name jcifs-1.3.19.jar without rename it because when I used with 
jetty on HSQLDB it worked.

Now I am using MCF 2.10

Thanks



Da: msaunier 
Inviato: venerdì 15 giugno 2018 11:42
A: user@manifoldcf.apache.org
Oggetto: RE: connectors.xml modified: new repository not in the list

Hello Mario,

Your jcifs is named jcifs.jar or jcifs-1.3.19.jar ?

What is your ManifoldCF version ?

Maxence,


De : Bisonti Mario [mailto:mario.biso...@vimar.com]
Envoyé : vendredi 15 juin 2018 11:39
À : user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Objet : connectors.xml modified: new repository not in the list

Hallo.
I installed ManifoldCF on Tomcat and with postgres and I am configuring for use 
with folder /manifoldcf/multiprocess-file-example

Now I would like to add repository "Windows Shares" so I decommented in 
connectors.xml:


And I added on connector-lib-proprietary jcifs-1.3.19.jar

I restarted Tomcat but I don't see the choice "Windows shares" repository from 
manifoldcf
It seems that It didn't reload the list of the connectors.
How could I refresh them?
Thanks a lot
Mario




connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
Hallo.
I installed ManifoldCF on Tomcat and with postgres and I am configuring for use 
with folder /manifoldcf/multiprocess-file-example

Now I would like to add repository "Windows Shares" so I decommented in 
connectors.xml:


And I added on connector-lib-proprietary jcifs-1.3.19.jar

I restarted Tomcat but I don't see the choice "Windows shares" repository from 
manifoldcf
It seems that It didn't reload the list of the connectors.
How could I refresh them?
Thanks a lot
Mario





R: Job in aborting status

2018-06-13 Thread Bisonti Mario
Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.


Furthermore, I set in logging.xml:





  

  
%5p %d{ISO8601} (%t) - %m%n
  

  
  

  

  



For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario



Da: Karl Wright 
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing.  Also, debug output is quite 
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the 
manifoldcf process.

Karl


On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
For Job “A” it use as repository “Windows Share” connector  and for output 
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"?  Is it your own connector? 
 If so, you likely have bugs in it that are causing problems with the entire 
framework.  Please verify that this is the case; ManifoldCF In Action is freely 
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default.  If you 
want to set up connector debug logging, you need to set a properties.xml 
property, not a logging.xml property:



See: 
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fuser%40manifoldcf.apache.org%2Fmsg01034.html=01%7C01%7CMario.Bisonti%40vimar.com%7Cf5a495395cab4e4ab06208d5d06fe5ce%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=yvKCChtnl8h0pK6A6%2BrZURckQz41DCQreng9XJbiVzQ%3D=0>



On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not 
start, after, I tried to abort it too and it hanged in Aborting state as of “B” 
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as 
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I 
am not able to solve.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem 
and the lock-clean script is inappropriate to use.  Locks are all internal in 
that model.  That is why lock-clean is only distributed as part of the 
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.  
How many are there, and how many documents are involved?  The embedded HSQLDB 
database has limits because it caches all tables in memory, so the 
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl




On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl


On Tue, Jun 12, 2018 at 9:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to execute lock-clean from my example directory after I stop manifoldcf 
but

Richiamo: Job in aborting status

2018-06-13 Thread Bisonti Mario
Bisonti Mario desidera richiamare il messaggio Job in aborting status.

R: Job in aborting status

2018-06-13 Thread Bisonti Mario
Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.


Furthermore, I set in logging.xml:





  

  
%5p %d{ISO8601} (%t) - %m%n
  

  
  

  

  



For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario




Mario Bisonti
Information and Communications Technology

VIMAR SpA
Tel. +39 0424 488 600
mario.biso...@vimar.com<mailto:mario.biso...@vimar.com>
Rispetta l’ambiente. Stampa solo se necessario.
Take care of the environment. Print only if necessary.

Da: Karl Wright 
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing.  Also, debug output is quite 
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the 
manifoldcf process.

Karl


On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
For Job “A” it use as repository “Windows Share” connector  and for output 
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"?  Is it your own connector? 
 If so, you likely have bugs in it that are causing problems with the entire 
framework.  Please verify that this is the case; ManifoldCF In Action is freely 
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default.  If you 
want to set up connector debug logging, you need to set a properties.xml 
property, not a logging.xml property:



See: 
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fuser%40manifoldcf.apache.org%2Fmsg01034.html=01%7C01%7CMario.Bisonti%40vimar.com%7Cf5a495395cab4e4ab06208d5d06fe5ce%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=yvKCChtnl8h0pK6A6%2BrZURckQz41DCQreng9XJbiVzQ%3D=0>



On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not 
start, after, I tried to abort it too and it hanged in Aborting state as of “B” 
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as 
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I 
am not able to solve.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem 
and the lock-clean script is inappropriate to use.  Locks are all internal in 
that model.  That is why lock-clean is only distributed as part of the 
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.  
How many are there, and how many documents are involved?  The embedded HSQLDB 
database has limits because it caches all tables in memory, so the 
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl




On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(

R: Job in aborting status

2018-06-12 Thread Bisonti Mario
For Job “A” it use as repository “Windows Share” connector  and for output 
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log


Da: Karl Wright 
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"?  Is it your own connector? 
 If so, you likely have bugs in it that are causing problems with the entire 
framework.  Please verify that this is the case; ManifoldCF In Action is freely 
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default.  If you 
want to set up connector debug logging, you need to set a properties.xml 
property, not a logging.xml property:



See: 
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fuser%40manifoldcf.apache.org%2Fmsg01034.html=01%7C01%7CMario.Bisonti%40vimar.com%7Cf5a495395cab4e4ab06208d5d06fe5ce%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=yvKCChtnl8h0pK6A6%2BrZURckQz41DCQreng9XJbiVzQ%3D=0>



On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not 
start, after, I tried to abort it too and it hanged in Aborting state as of “B” 
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as 
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I 
am not able to solve.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem 
and the lock-clean script is inappropriate to use.  Locks are all internal in 
that model.  That is why lock-clean is only distributed as part of the 
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.  
How many are there, and how many documents are involved?  The embedded HSQLDB 
database has limits because it caches all tables in memory, so the 
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl




On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl


On Tue, Jun 12, 2018 at 9:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to execute lock-clean from my example directory after I stop manifoldcf 
but I obtain:


administrator@sslrvivv01:/usr/share/manifoldcf/example$ sudo -E ./lock-clean.sh
Configuration file successfully read
Synchronization storage cleaned up
2018-06-12 15:03:35,395 Shutdown thread FATAL Unable to register shutdown hook 
because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
shutdown hook as this is not started. Current state: STOPPED
at 
org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
at 
org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at 
org.apa

R: Job in aborting status

2018-06-12 Thread Bisonti Mario
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright 
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl


On Tue, Jun 12, 2018 at 9:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to execute lock-clean from my example directory after I stop manifoldcf 
but I obtain:


administrator@sslrvivv01:/usr/share/manifoldcf/example$ sudo -E ./lock-clean.sh
Configuration file successfully read
Synchronization storage cleaned up
2018-06-12 15:03:35,395 Shutdown thread FATAL Unable to register shutdown hook 
because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
shutdown hook as this is not started. Current state: STOPPED
at 
org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
at 
org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)
at org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)
at org.apache.log4j.Logger.getLogger(Logger.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.hsqldb.lib.FrameworkLogger.(Unknown Source)
at org.hsqldb.lib.FrameworkLogger.getLog(Unknown Source)
at org.hsqldb.lib.FrameworkLogger.getLog(Unknown Source)
at org.hsqldb.persist.Logger.getEventLogger(Unknown Source)
at org.hsqldb.persist.Logger.logInfoEvent(Unknown Source)
at org.hsqldb.persist.DataFileCache.logInfoEvent(Unknown Source)
at org.hsqldb.persist.DataFileCache.open(Unknown Source)
at org.hsqldb.persist.Log.getCache(Unknown Source)
at org.hsqldb.persist.Logger.getCache(Unknown Source)
at org.hsqldb.persist.Logger.newStore(Unknown Source)
at 
org.hsqldb.persist.PersistentStoreCollectionDatabase.getStore(Unknown Source)
at org.hsqldb.Table.getRowStore(Unknown Source)
at org.hsqldb.TableBase.isEmpty(Unknown Source)
at org.hsqldb.TableWorks.addIndex(Unknown Source)
at org.hsqldb.StatementSchema.getResult(Unknown Source)
at org.hsqldb.StatementSchema.execute(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
   at org.hsqldb.scriptio.ScriptReaderText.readDDL(Unknown Source)
at org.hsqldb.scriptio.ScriptReaderBase.readAll(Unknown Source)
at org.hsqldb.persist.Log.processScript(Unknown Source)
at org.hsqldb.persist.Log.open(Unknown Source)
at org.hsqldb.persist.Logger.open(Unknown Source)
at org.hsqldb.Database.reopen(Unknown Source)
at org.hsqldb.Database.open(Unknown Source)
at org.hsqldb.DatabaseManager.getDatabase(Unknown Source)
at org.hsqldb.DatabaseManager.newSession(Unknown Source)
at org.hsqldb.jdbc.JDBCConnection.(Unknown Source)
at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at 
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.closeDatabase(DBInterfaceHSQLDB.java:161)
at 
org.apache.manifoldcf.core.system.ManifoldCF$DatabaseShutdown.closeDatabase(ManifoldCF.java:1680)
at 
org.apache.manifoldcf.core.system.ManifoldCF$DatabaseShutdown.doCleanup(ManifoldCF.java:1664)
at 
org.apache.manifoldcf.core.system.ManifoldCF.cleanUpEnvironment(ManifoldCF.java:1540)
at 
org.apache.manifoldcf.core.system.ManifoldCF$ShutdownThread.run(ManifoldCF.java:1718)

What could I do?

R: Job in aborting status

2018-06-12 Thread Bisonti Mario


On Tue, Jun 12, 2018 at 7:11 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.

I have jobs in aborting status and it hangs.
I tried to restart manifoldcf, I restarted the machine, but the job hangs in 
aborting status.

Now, I am not able to start every job because they stay in starting status

How could I solve it?

Thanks.


Job in aborting status

2018-06-12 Thread Bisonti Mario
Hallo.

I have jobs in aborting status and it hangs.
I tried to restart manifoldcf, I restarted the machine, but the job hangs in 
aborting status.

Now, I am not able to start every job because they stay in starting status

How could I solve it?

Thanks.


seeds not working?

2018-06-12 Thread Bisonti Mario
Hallo.
I created a job to crawl a site and I want only to crawl subfolder so I used on 
seeds:
http://abc.mydomain.net/intranet/aaa/

But I see that it is crawling even the:
http://abc.mydomain.net/intranet/abc/
http://abc.mydomain.net/intranet/abd/
etc.

Why this?
What have I wrong ?

Thanks a lot
Mario



List of file to index or remove to Solr

2014-09-18 Thread Bisonti Mario
Hallo.

Scenario:

I would like to index a list of file for example:
http://aaa.bb.com/ccc/folder1/doc1.pdf
http://aaa.bb.com/ccc/folder1/doc2.pdf
http://aaa.bb.com/ccc/folder1/doc3.pdf

At another day, it could be that I want to remove from indexing for example
http://aaa.bb.com/ccc/folder1/doc2.pdf
and add
http://aaa.bb.com/ccc/folder1/doc4.pdf
How could I do this?
Could I do this by means of an xml file to instruct Solr on the action to 
execute (add/delete) and the list of files?
ManifoldCF read the xml and make the action ?
Generic connector to an xml file with entrypoint action?
API ?
Could you help me?
I am a little bit confused on it..
Thanks a lot.

Mario







R: List of file to index or remove to Solr

2014-09-18 Thread Bisonti Mario
Yes, you understood.

But, how could I do it programmatically?

Because I have a list of files to index and/or delete but I can’t do a job 
manually  every time that the list changes




Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: giovedì 18 settembre 2014 12:26
A: user@manifoldcf.apache.org
Oggetto: Re: List of file to index or remove to Solr

Hi Mario,
I'm having some difficulty understanding your scenario.  It sounds like you are 
asking if MCF will allow you to change document specifications and will honor 
that on subsequent job runs.  The answer is: it does.

Thanks,
Karl

On Thu, Sep 18, 2014 at 5:40 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Hallo.

Scenario:

I would like to index a list of file for example:
http://aaa.bb.com/ccc/folder1/doc1.pdf
http://aaa.bb.com/ccc/folder1/doc2.pdf
http://aaa.bb.com/ccc/folder1/doc3.pdf
At another day, it could be that I want to remove from indexing for example
http://aaa.bb.com/ccc/folder1/doc2.pdf
and add
http://aaa.bb.com/ccc/folder1/doc4.pdf
How could I do this?
Could I do this by means of an xml file to instruct Solr on the action to 
execute (add/delete) and the list of files?
ManifoldCF read the xml and make the action ?
Generic connector to an xml file with entrypoint action?
API ?
Could you help me?
I am a little bit confused on it..
Thanks a lot.

Mario







R: Web crawling , robots.txt and access credentials

2014-09-17 Thread Bisonti Mario
Hallo.
Does MCF use robots.txt is on http://aaa.bb.com/ccc/robots.txt  or does it 
search for robots.txt only on the root  http://aaa.bb.com/  ?
I restart today , so after many hours, and I suppose caches expires but MCF 
scans everithing on the subfolders.
I read on the postgres table robotsdata of MCF:
binary data;aaa.bb.com:80;1410939267040

Details of the MCF job:
Seeds:
http://aaa.bb.com/ccc/
Include in crawl : .*
Include in index: .*
Include only hosts matching seeds? X

Thanks a lot
Mario








Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 19:22
A: user@manifoldcf.apache.org
Oggetto: Re: Web crawling , robots.txt and access credentials

Hi Mario,
I looked at your robots.txt.  In its current form, it should disallow 
EVERYTHING from your site.  The reason is that some of your paths start with 
/, but the allow clauses do not.
As for why MCF is letting files through, I suspect that this is because MCF 
caches robots data.  If you changed the file and expected MCF to pick that up 
immediately, it won't.  The cached copy expires after, I believe, 1 hour.  It's 
kept in the database so even if you recycle the agents process it won't purge 
the cache.
Karl

On Tue, Sep 16, 2014 at 11:44 AM, Karl Wright 
daddy...@gmail.commailto:daddy...@gmail.com wrote:
Authentication does not bypass robots ever.
You will want to turn on connector debug logging to see the decisions that the 
web connector is making with respect to which documents are fetched or not 
fetched, and why.

Karl

On Tue, Sep 16, 2014 at 11:04 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:

Hallo.

I would like to crawl some documents in a subfolder of a web site:
http://aaa.bb.com/

Structure is:
http://aaa.bb.com/ccc/folder1
http://aaa.bb.com/ccc/folder2
http://aaa.bb.com/ccc/folder3

Folder ccc and subfolder, are with a Basic security
username: joe
Password: p

I want to permit the crawling of only some docs on folder1
So I put robots.txt on
http://aaa.bb.com/ccc/robots.txt

The contents of file robots.txt is
User-agent: *
Disallow: /
Allow: folder1/doc1.pdf
Allow: folder1/doc2.pdf
Allow: folder1/doc3.pdf


I setup on MCF 1.7 a repository web connection with:
“Obey robots.txt for all fetches”
and on Access credentials:
http://aaa.bb.com/ccc/
Basic authentication: joe and ppp

When I create a job :
Include in crawl : .*
Include in index: .*
Include only hosts matching seeds? X

and I start it, it happens that it crawls all the content of folder1, folder2, 
and folder3,
instead, as I expected, only the :
http://aaa.bb.com/ccc/folder1/doc1.pdf

http://aaa.bb.com/ccc/folder1/doc2.pdf

http://aaa.bb.com/ccc/folder1/doc3.pdf


Why this?

Perhaps the Basic Authentication, bypass the specific “Obey robots.txt for all 
fetches” ?

Thanks a lot for your help.
Mario





R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario





R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario





R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario






R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Yes, it works, and “ “ aren’t necessary.

Note this:
from MySql Client
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
not work
instead
SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
Works.
So it seems that “ “ are necessary, but when I use insiede ManifoldCF it 
doesn’t work with “ “


Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 13:50
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
What's happening is that the JDBC connector cannot find the proper column in 
the resultset.
Can you do the following in the mysql client:
SELECT command_id AS lcf__id FROM icinga_commands
Please let me know what the returned columns are.  If there is not a column 
that precisely matches lcf__id then that explains the error.
Karl

On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario







R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
MySql Version : '5.5.38'

ManifoldCF : 1.7

The problem still remains with single quote ‘  ‘

How could I attach a comment to the ticket, please?


Mario




Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 14:05
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
I've create CONNECTORS-1032 to track your issue.  MySQL queries have worked 
fine in the past against MySQL 5.5.  So I suggest that you try single-quotes, 
and if that does not work either, we're going to have to have more information 
and some debugging time.
First -- what version of MySQL is this?
Second, what version of MCF are you working with?
I will propose a debugging output patch that will let us see what the column 
names the JDBC query is returning if I have that information.  Please attach it 
as a comment to the ticket.

Thanks,
Karl

On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, it works, and “ “ aren’t necessary.

Note this:
from MySql Client
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
not work
instead
SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
Works.
So it seems that “ “ are necessary, but when I use insiede ManifoldCF it 
doesn’t work with “ “


Mario



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 13:50

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
What's happening is that the JDBC connector cannot find the proper column in 
the resultset.
Can you do the following in the mysql client:
SELECT command_id AS lcf__id FROM icinga_commands
Please let me know what the returned columns are.  If there is not a column 
that precisely matches lcf__id then that explains the error.
Karl

On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-15 Thread Bisonti Mario


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario


R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-15 Thread Bisonti Mario
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario




R: Populate field Solr

2014-08-29 Thread Bisonti Mario
Ok, thanks.

Tab Name
Name:ScanPdftatankamNEW

Tab Connection
StageType  Precedent  DescriptionConnection name
1.Repository
ConnessioneWeb
2.Output 1. 
  Solr

Tab Forced metadata
Parameter name:category
Parameter value: manuale

Tab Seeds
http://tatankam.herobo.com/prova/sotto/

Tab Inclusions
Include in crawl:
.*sotto*
Include in index:
.*sotto*


Tab Security, Metadata, Solr Field Mapping
Empty


I omit the scheduled Tab because I start it manually.
I am using ManifoldCF 1.7

Thanks a lot for your support

Mario





Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: giovedì 28 agosto 2014 17:54
A: user@manifoldcf.apache.org
Oggetto: Re: Populate field Solr

Hi Mario,
No metadata whatsoever is getting through to Solr.
Can you cut/paste the data on the view page of your job please?  View your job, 
and then select the output so I can see how everything is configured.

Karl

On Thu, Aug 28, 2014 at 11:30 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
INFO  - 2014-08-28 17:26:47.372; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/resource.name=index.htmlwt=xmlversion=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/ (1477694830537605120)]} 0 5
INFO  - 2014-08-28 17:26:48.976; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdfresource.name=Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdfwt=xmlversion=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/Using%20the%20various%20optional%20Film%20Adapters.pdf
 (1477694832220569600)]} 0 4
INFO  - 2014-08-28 17:26:51.409; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdfresource.name=DopoFullCrawl.pdfwt=xmlversion=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdf 
(1477694834770706432)]} 0 67
INFO  - 2014-08-28 17:26:51.747; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/SAP%2520SSO%2520Authentication%2520with%2520verify.pdfresource.name=SAP%2520SSO%2520Authentication%2520with%2520verify.pdfwt=xmlversion=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/SAP%20SSO%20Authentication%20with%20verify.pdf
 (1477694835126173696)]} 0 58
INFO  - 2014-08-28 17:26:57.372; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.search.SolrIndexSearcher; 
Opening Searcher@45d1f61c[collection1] main
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.core.QuerySenderListener; 
QuerySenderListener sending requests to Searcher@45d1f61c[collection1] 
main{StandardDirectoryReader(segments_alc:42455:nrt _ex1(4.9):C4)}
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.core.QuerySenderListener; 
QuerySenderListener done.
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.core.SolrCore; [collection1] 
Registered new searcher Searcher@45d1f61c[collection1] 
main{StandardDirectoryReader(segments_alc:42455:nrt _ex1(4.9):C4)}
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.update.DirectUpdateHandler2; 
end_commit_flush
INFO  - 2014-08-28 17:27:01.329; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2014-08-28 17:27:01.344; org.apache.solr.core.SolrDeletionPolicy; 
SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(MMapDirectory@/usr/share/solr/example/solr/collection1/data/index
 
lockFactory=NativeFSLockFactory@/usr/share/solr/example/solr/collection1/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_alc,generation=13728}

commit{dir=NRTCachingDirectory(MMapDirectory@/usr/share/solr/example/solr/collection1/data/index
 
lockFactory=NativeFSLockFactory@/usr/share/solr/example/solr/collection1/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ald,generation=13729}
INFO  - 2014-08-28 17:27:01.344; org.apache.solr.core.SolrDeletionPolicy; 
newest commit generation = 13729
INFO  - 2014-08-28 17:27:01.345; org.apache.solr.core.SolrCore; 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher

R: Populate field Solr

2014-08-29 Thread Bisonti Mario

Hi Karl.
In the tab “Solr Field Mapping” there aren’t field mapping and the “Keep all 
metadata” is unchecked

I see in the output connection “Solr” that in tab Schema:
Use the Extract Update Handler : Checked

Is this right?

Did you use Tika on pdf file for your testing?




Mario







Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: venerdì 29 agosto 2014 13:46
A: Karl Wright; Bisonti Mario; user@manifoldcf.apache.org
Oggetto: RE: Populate field Solr

Hi Mario,

I tried this here on 1.7 and it worked as expected.

Please look at your solr field mapping tab.  There is a checkbox there which 
suppresses all unmapped fields.  How is this set for you?

Karl

Sent from my Windows Phone

From: Karl Wright
Sent: 8/29/2014 6:37 AM
To: Bisonti Mario; user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Subject: RE: Populate field Solr
Hi Mario,

The reason I wanted the view job output is because there are multiple ways you 
can do forced metadata with a we connection.  There's a Forced Metadata tab, a 
Metadata tab, and you can add a Metadata Transformer to the pipeline as well.

I will have a look at why Forced Metadata is no longer working, but I suggest 
that you try the other two possibilities while I do that.

Thanks,

Karl

Sent from my Windows Phone

From: Bisonti Mario
Sent: 8/29/2014 2:50 AM
To: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Subject: R: Populate field Solr
Ok, thanks.

Tab Name
Name:ScanPdftatankamNEW

Tab Connection
StageType  Precedent  DescriptionConnection name
1.Repository
ConnessioneWeb
2.Output 1. 
  Solr

Tab Forced metadata
Parameter name:category
Parameter value: manuale

Tab Seeds
http://tatankam.herobo.com/prova/sotto/

Tab Inclusions
Include in crawl:
.*sotto*
Include in index:
.*sotto*


Tab Security, Metadata, Solr Field Mapping
Empty


I omit the scheduled Tab because I start it manually.
I am using ManifoldCF 1.7

Thanks a lot for your support

Mario





Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: giovedì 28 agosto 2014 17:54
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Populate field Solr

Hi Mario,
No metadata whatsoever is getting through to Solr.
Can you cut/paste the data on the view page of your job please?  View your job, 
and then select the output so I can see how everything is configured.

Karl

On Thu, Aug 28, 2014 at 11:30 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
INFO  - 2014-08-28 17:26:47.372; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/resource.name=index.htmlwt=xmlversion=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/ (1477694830537605120)]} 0 5
INFO  - 2014-08-28 17:26:48.976; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdfresource.name=Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdfwt=xmlversion=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/Using%20the%20various%20optional%20Film%20Adapters.pdf
 (1477694832220569600)]} 0 4
INFO  - 2014-08-28 17:26:51.409; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdfresource.name=DopoFullCrawl.pdfwt=xmlversion=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdf 
(1477694834770706432)]} 0 67
INFO  - 2014-08-28 17:26:51.747; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=1literal.idhttp://literal.id=http://tatankam.herobo.com/prova/sotto/SAP%2520SSO%2520Authentication%2520with%2520verify.pdfresource.name=SAP%2520SSO%2520Authentication%2520with%2520verify.pdfwt=xmlversion=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/SAP%20SSO%20Authentication%20with%20verify.pdf
 (1477694835126173696)]} 0 58
INFO  - 2014-08-28 17:26:57.372; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.search.SolrIndexSearcher; 
Opening Searcher@45d1f61c[collection1] main
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.core.QuerySenderListener; 
QuerySenderListener sending requests to Searcher@45d1f61c[collection1] 
main{StandardDirectoryReader(segments_alc:42455:nrt _ex1(4.9

  1   2   >