Sharepoint 2013 indexation time performance

2018-06-13 Thread Olivier Tavard
Hi,

I have a question regarding the performance of the Sharepoint repository 
connector.
Recently we did some tests using MCF 2.8.1 to crawl some documents on a 
Sharepoint 2013 server. There were few documents : only 700 all located on the 
same documents list.
For full indexation the indexation speed was about 60 docs/min (with security 
activated : SP native authority connector).
For incremental indexation with one file modified and one file added, the 
indexation speed was 80 docs/min.

I know that of course, the performance depends of many factors but if someone 
already indexed large Sharepoint servers and could post the performances 
obtained, it would be be great to compare.

Tests done on two VMs on a ESXI server CPU : Xeon D-1520  RAM : 64GB
- VM for Sharepoint 2013 :
all Sharepoint services installed on the same VM 
4 VCPU, 16 GB RAM

- VM for MCF 
4 VCPU, 24 GB RAM
MCF 2.8.1, multiprocess with Zookeeper

Thanks,

Best regards,

Olivier TAVARD



Re: Job in aborting status

2018-06-13 Thread Karl Wright
I'm not in a position to teach you how to use the Java tools, but:
(1) You want to use the JDK, and
(2) The utility you want to run to get a thread dump is jstack (distributed
with the JDK).  If you can't attach to the process, there's a switch to
force attachment: -F

Karl


On Wed, Jun 13, 2018 at 5:26 AM Bisonti Mario 
wrote:

> Ciao Karl.
>
>
>
> I am not able to thread dump the start.jar process beacuse I obtain:
> Error attaching to core file: cannot open binary file
>
> sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
>
> .
>
> .
>
>
>
>
>
> Furthermore, I set in logging.xml:
>
>
> 
>
> 
>
> 
>
> 
>
>   
>
> 
>
>   
>
> %5p %d{ISO8601} (%t) - %m%n
>
>   
>
> 
>
>   
>
>   
>
> 
>
>   
>
> 
>
>   
>
> 
>
>
>
>
>
> For the debug, perhaps isn’t it the right mode?
>
> Thank a lot
>
> Mario
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 17:40
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Then I cannot explain the behavior you are seeing.  Also, debug output is
> quite verbose so clearly you are not setting that up right either.
>
>
>
> If you want me to give a further analysis, please provide a thread dump of
> the manifoldcf process.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
> wrote:
>
> For Job “A” it use as repository “Windows Share” connector  and for output
> “Solr”
>
> For Job “B” it use as repository “Generic Web” connector and for output
> “Solr”
>
>
>
> No own connector
>
>
>
> I set DEBUG but I have no log
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 16:22
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Hi Mario,
>
>
>
> What repository connector are you using for Job "B"?  Is it your own
> connector?  If so, you likely have bugs in it that are causing problems
> with the entire framework.  Please verify that this is the case; ManifoldCF
> In Action is freely available online and you should read it before writing
> connectors.
>
>
> The problems are not likely due to HSQLDB internal locks.
>
> Major errors should be logged already in manifoldcf.log by default.  If
> you want to set up connector debug logging, you need to set a
> properties.xml property, not a logging.xml property:
>
> 
>
>
>
> See: https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html
> 
>
>
>
>
>
>
>
> On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
> wrote:
>
> I setup jobs :
>
> Job “A” to crawls “Windows Shares”
>
> Job “B” to crawl my internal site
>
>
>
> The problem was when I tried to aborted the second job “B”
>
> It hang in aborting state
>
>
>
> After, I tried to start job “A” but it hanged in “Starting” state, and not
> start, after, I tried to abort it too and it hanged in Aborting state as of
> “B” job.
>
>
>
> I increased log level of logging.xml to “info” but when I start manifoldcf
> as standalone I do not have many info on the logs/manifoldcf.log
>
>
>
> I read:
>
> INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
>
> INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end
>
>
>
> And nothing more
>
>
>
> So, I think that there could be a lock situation in the internal HSQLDB
> that I am not able to solve.
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 15:46
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Hi Mario,
>
>
>
> If you are using the single-process model, then stuck locks are not the
> problem and the lock-clean script is inappropriate to use.  Locks are all
> internal in that model.  That is why lock-clean is only distributed as part
> of the file-based multiprocess example.
>
>
>
> Please tell me more about what you have set up for your jobs on this
> example.  How many are there, and how many documents are involved?  The
> embedded HSQLDB database has limits because it caches all tables in memory,
> so the single-process example is not going to be able to handle huge jobs.
>
> Please have a look at the log to be sure there are no serious errors in it.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
>
>
>
>
> On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
> wrote:
>
> No, I am testing on the /example directory so I am using local HSQLDB
>
> I copied lock-clean.sh script from the
> /usr/share/manifoldcf/multiprocess-file-example to the
> /usr/share/manifoldcf/example to try to clean-up my situation, but perhaps
> the script isn’t good for me because I am using jetty on the example
> directory?
>
>
>
> Thanks
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 15:23
> *A:* user@manifoldcf.apache.org
> 

R: Job in aborting status

2018-06-13 Thread Bisonti Mario
Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.


Furthermore, I set in logging.xml:





  

  
%5p %d{ISO8601} (%t) - %m%n
  

  
  

  

  



For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario



Da: Karl Wright 
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing.  Also, debug output is quite 
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the 
manifoldcf process.

Karl


On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
For Job “A” it use as repository “Windows Share” connector  and for output 
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"?  Is it your own connector? 
 If so, you likely have bugs in it that are causing problems with the entire 
framework.  Please verify that this is the case; ManifoldCF In Action is freely 
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default.  If you 
want to set up connector debug logging, you need to set a properties.xml 
property, not a logging.xml property:



See: 
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html



On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not 
start, after, I tried to abort it too and it hanged in Aborting state as of “B” 
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as 
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I 
am not able to solve.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem 
and the lock-clean script is inappropriate to use.  Locks are all internal in 
that model.  That is why lock-clean is only distributed as part of the 
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.  
How many are there, and how many documents are involved?  The embedded HSQLDB 
database has limits because it caches all tables in memory, so the 
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl




On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl


On Tue, Jun 12, 2018 at 9:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to execute lock-clean from my example directory after I stop manifoldcf 
but I obtain:


administrator@sslrvivv01:/usr/share/manifoldcf/example$ sudo -E 

Richiamo: Job in aborting status

2018-06-13 Thread Bisonti Mario
Bisonti Mario desidera richiamare il messaggio Job in aborting status.

R: Job in aborting status

2018-06-13 Thread Bisonti Mario
Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.


Furthermore, I set in logging.xml:





  

  
%5p %d{ISO8601} (%t) - %m%n
  

  
  

  

  



For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario




Mario Bisonti
Information and Communications Technology

VIMAR SpA
Tel. +39 0424 488 600
mario.biso...@vimar.com
Rispetta l’ambiente. Stampa solo se necessario.
Take care of the environment. Print only if necessary.

Da: Karl Wright 
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing.  Also, debug output is quite 
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the 
manifoldcf process.

Karl


On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
For Job “A” it use as repository “Windows Share” connector  and for output 
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"?  Is it your own connector? 
 If so, you likely have bugs in it that are causing problems with the entire 
framework.  Please verify that this is the case; ManifoldCF In Action is freely 
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default.  If you 
want to set up connector debug logging, you need to set a properties.xml 
property, not a logging.xml property:



See: 
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html



On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not 
start, after, I tried to abort it too and it hanged in Aborting state as of “B” 
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as 
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I 
am not able to solve.




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem 
and the lock-clean script is inappropriate to use.  Locks are all internal in 
that model.  That is why lock-clean is only distributed as part of the 
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.  
How many are there, and how many documents are involved?  The embedded HSQLDB 
database has limits because it caches all tables in memory, so the 
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl




On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the 
/usr/share/manifoldcf/multiprocess-file-example to the 
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the 
script isn’t good for me because I am using jetty on the example directory?

Thanks




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment. 
 That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl


On Tue, Jun