date:20180613

Sharepoint 2013 indexation time performance

2018-06-13 Thread Olivier Tavard

Hi,

I have a question regarding the performance of the Sharepoint repository 
connector.
Recently we did some tests using MCF 2.8.1 to crawl some documents on a 
Sharepoint 2013 server. There were few documents : only 700 all located on the 
same documents list.
For full indexation the indexation speed was about 60 docs/min (with security 
activated : SP native authority connector).
For incremental indexation with one file modified and one file added, the 
indexation speed was 80 docs/min.

I know that of course, the performance depends of many factors but if someone 
already indexed large Sharepoint servers and could post the performances 
obtained, it would be be great to compare.

Tests done on two VMs on a ESXI server CPU : Xeon D-1520  RAM : 64GB
- VM for Sharepoint 2013 :
all Sharepoint services installed on the same VM 
4 VCPU, 16 GB RAM

- VM for MCF 
4 VCPU, 24 GB RAM
MCF 2.8.1, multiprocess with Zookeeper

Thanks,

Best regards,

Olivier TAVARD

Re: Job in aborting status

2018-06-13 Thread Karl Wright

I'm not in a position to teach you how to use the Java tools, but:
(1) You want to use the JDK, and
(2) The utility you want to run to get a thread dump is jstack (distributed
with the JDK).  If you can't attach to the process, there's a switch to
force attachment: -F

Karl


On Wed, Jun 13, 2018 at 5:26 AM Bisonti Mario 
wrote:

> Ciao Karl.
>
>
>
> I am not able to thread dump the start.jar process beacuse I obtain:
> Error attaching to core file: cannot open binary file
>
> sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
>
> .
>
> .
>
>
>
>
>
> Furthermore, I set in logging.xml:
>
>
> 
>
> 
>
> 
>
> 
>
>   
>
> 
>
>   
>
> %5p %d{ISO8601} (%t) - %m%n
>
>   
>
> 
>
>   
>
>   
>
> 
>
>   
>
> 
>
>   
>
> 
>
>
>
>
>
> For the debug, perhaps isn’t it the right mode?
>
> Thank a lot
>
> Mario
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 17:40
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Then I cannot explain the behavior you are seeing.  Also, debug output is
> quite verbose so clearly you are not setting that up right either.
>
>
>
> If you want me to give a further analysis, please provide a thread dump of
> the manifoldcf process.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario 
> wrote:
>
> For Job “A” it use as repository “Windows Share” connector  and for output
> “Solr”
>
> For Job “B” it use as repository “Generic Web” connector and for output
> “Solr”
>
>
>
> No own connector
>
>
>
> I set DEBUG but I have no log
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 16:22
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Hi Mario,
>
>
>
> What repository connector are you using for Job "B"?  Is it your own
> connector?  If so, you likely have bugs in it that are causing problems
> with the entire framework.  Please verify that this is the case; ManifoldCF
> In Action is freely available online and you should read it before writing
> connectors.
>
>
> The problems are not likely due to HSQLDB internal locks.
>
> Major errors should be logged already in manifoldcf.log by default.  If
> you want to set up connector debug logging, you need to set a
> properties.xml property, not a logging.xml property:
>
> 
>
>
>
> See: https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html
> 
>
>
>
>
>
>
>
> On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario 
> wrote:
>
> I setup jobs :
>
> Job “A” to crawls “Windows Shares”
>
> Job “B” to crawl my internal site
>
>
>
> The problem was when I tried to aborted the second job “B”
>
> It hang in aborting state
>
>
>
> After, I tried to start job “A” but it hanged in “Starting” state, and not
> start, after, I tried to abort it too and it hanged in Aborting state as of
> “B” job.
>
>
>
> I increased log level of logging.xml to “info” but when I start manifoldcf
> as standalone I do not have many info on the logs/manifoldcf.log
>
>
>
> I read:
>
> INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
>
> INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end
>
>
>
> And nothing more
>
>
>
> So, I think that there could be a lock situation in the internal HSQLDB
> that I am not able to solve.
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 15:46
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job in aborting status
>
>
>
> Hi Mario,
>
>
>
> If you are using the single-process model, then stuck locks are not the
> problem and the lock-clean script is inappropriate to use.  Locks are all
> internal in that model.  That is why lock-clean is only distributed as part
> of the file-based multiprocess example.
>
>
>
> Please tell me more about what you have set up for your jobs on this
> example.  How many are there, and how many documents are involved?  The
> embedded HSQLDB database has limits because it caches all tables in memory,
> so the single-process example is not going to be able to handle huge jobs.
>
> Please have a look at the log to be sure there are no serious errors in it.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
>
>
>
>
> On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario 
> wrote:
>
> No, I am testing on the /example directory so I am using local HSQLDB
>
> I copied lock-clean.sh script from the
> /usr/share/manifoldcf/multiprocess-file-example to the
> /usr/share/manifoldcf/example to try to clean-up my situation, but perhaps
> the script isn’t good for me because I am using jetty on the example
> directory?
>
>
>
> Thanks
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* martedì 12 giugno 2018 15:23
> *A:* user@manifoldcf.apache.org
>

R: Job in aborting status

2018-06-13 Thread Bisonti Mario

Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.

Furthermore, I set in logging.xml:

%5p %d{ISO8601} (%t) - %m%n

For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario

Da: Karl Wright
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing. Also, debug output is quite
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the
manifoldcf process.

Karl

On Tue, Jun 12, 2018 at 10:38 AM Bisonti Mario
mailto:mario.biso...@vimar.com>> wrote:
For Job “A” it use as repository “Windows Share” connector and for output
“Solr”
For Job “B” it use as repository “Generic Web” connector and for output “Solr”

No own connector

I set DEBUG but I have no log

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

What repository connector are you using for Job "B"? Is it your own connector?
If so, you likely have bugs in it that are causing problems with the entire
framework. Please verify that this is the case; ManifoldCF In Action is freely
available online and you should read it before writing connectors.

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default. If you
want to set up connector debug logging, you need to set a properties.xml
property, not a logging.xml property:

See:
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html

On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not
start, after, I tried to abort it too and it hanged in Aborting state as of “B”
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I
am not able to solve.

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

If you are using the single-process model, then stuck locks are not the problem
and the lock-clean script is inappropriate to use. Locks are all internal in
that model. That is why lock-clean is only distributed as part of the
file-based multiprocess example.

Please tell me more about what you have set up for your jobs on this example.
How many are there, and how many documents are involved? The embedded HSQLDB
database has limits because it caches all tables in memory, so the
single-process example is not going to be able to handle huge jobs.

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl

On Tue, Jun 12, 2018 at 9:26 AM Bisonti Mario
mailto:mario.biso...@vimar.com>> wrote:
No, I am testing on the /example directory so I am using local HSQLDB
I copied lock-clean.sh script from the
/usr/share/manifoldcf/multiprocess-file-example to the
/usr/share/manifoldcf/example to try to clean-up my situation, but perhaps the
script isn’t good for me because I am using jetty on the example directory?

Thanks

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment.
That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl

On Tue, Jun 12, 2018 at 9:06 AM Bisonti Mario
mailto:mario.biso...@vimar.com>> wrote:
Thanks Karl.
I tried to execute lock-clean from my example directory after I stop manifoldcf
but I obtain:

administrator@sslrvivv01:/usr/share/manifoldcf/example$ sudo -E

Richiamo: Job in aborting status

2018-06-13 Thread Bisonti Mario

Bisonti Mario desidera richiamare il messaggio Job in aborting status.

R: Job in aborting status

2018-06-13 Thread Bisonti Mario

Ciao Karl.

I am not able to thread dump the start.jar process beacuse I obtain:
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
.
.

Furthermore, I set in logging.xml:

%5p %d{ISO8601} (%t) - %m%n

For the debug, perhaps isn’t it the right mode?
Thank a lot
Mario

Mario Bisonti
Information and Communications Technology

VIMAR SpA
Tel. +39 0424 488 600
mario.biso...@vimar.com
Rispetta l’ambiente. Stampa solo se necessario.
Take care of the environment. Print only if necessary.

Da: Karl Wright
Inviato: martedì 12 giugno 2018 17:40
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Then I cannot explain the behavior you are seeing. Also, debug output is quite
verbose so clearly you are not setting that up right either.

If you want me to give a further analysis, please provide a thread dump of the
manifoldcf process.

Karl

No own connector

I set DEBUG but I have no log

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 16:22
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

The problems are not likely due to HSQLDB internal locks.

Major errors should be logged already in manifoldcf.log by default. If you
want to set up connector debug logging, you need to set a properties.xml
property, not a logging.xml property:

See:
https://www.mail-archive.com/user@manifoldcf.apache.org/msg01034.html

On Tue, Jun 12, 2018 at 10:03 AM Bisonti Mario
mailto:mario.biso...@vimar.com>> wrote:
I setup jobs :
Job “A” to crawls “Windows Shares”
Job “B” to crawl my internal site

The problem was when I tried to aborted the second job “B”
It hang in aborting state

After, I tried to start job “A” but it hanged in “Starting” state, and not
start, after, I tried to abort it too and it hanged in Aborting state as of “B”
job.

I increased log level of logging.xml to “info” but when I start manifoldcf as
standalone I do not have many info on the logs/manifoldcf.log

I read:
INFO 2018-06-12T15:58:02,748 (main) - dataFileCache open start
INFO 2018-06-12T15:58:02,753 (main) - dataFileCache open end

And nothing more

So, I think that there could be a lock situation in the internal HSQLDB that I
am not able to solve.

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:46
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

Please have a look at the log to be sure there are no serious errors in it.

Thanks,
Karl

Thanks

Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: martedì 12 giugno 2018 15:23
A: user@manifoldcf.apache.org
Oggetto: Re: Job in aborting status

Hi Mario,

It appears you are trying to use embedded HSQLDB in a multiprocess environment.
That is not possible.

In a multiprocess environment, you have the following choices:

(1) standalone HSQLDB
(2) postgresql
(3) mysql

Thanks,
Karl

On Tue, Jun

Sharepoint 2013 indexation time performance

Re: Job in aborting status

R: Job in aborting status

Richiamo: Job in aborting status

R: Job in aborting status

5 matches

Site Navigation

Mail list logo

Footer information