Re: Reconciliation of documents crawled

2014-09-16 Thread lalit jangra
Thanks Karl,

As compared to all three methods suggested by you, i believe writing to
file would be easier, correct me if i am wrong.

What i initially thought that while job is running, i need to write counter
values for each document seeded and processed as we are calling
addSeedDocument()  processDocument() methods for each document. In this
case, it would not be easy to reconcile after job is complete as i do have
loads of data once job finishes and mapping them would be tough. This is
why i am trying to avoid file based mechanism. Also i would hit the
tracking issue as we are calling connector object multiple times and having
multiple agents running parallely.

Please suggest.

Regards.

On Tue, Sep 16, 2014 at 11:59 AM, Karl Wright daddy...@gmail.com wrote:

 Hi Lalit,

 So, let me clarify: you want some independent measure as to whether every
 document seeded, per job, has been in fact processed?

 If that is a correct statement, there is by definition no in code way to
 do it, since there are multiple agents running in your setup. Each agent
 may process some of the documents, and certainly no agent will process all
 of them.  Also, restarting any agents process will lose the information you
 are attempting to record.

 So you are stuck with three possibilities:

 The first possibility is to use [INFO] statements written to the log.
 This would work, but you don't have the information you need in your
 connector (specifically the job ID), so you would have to add these logging
 statements to various places in the ManifoldCF framework.

 The second possibility is to make use of the history database table, where
 events are recorded.  You could create two new activity types, also written
 within the framework, for tracking seeding of records and for tracking
 processing of records.  There are already activity types for job start and
 end.

 Finally, the third possibility: If you must absolutely avoid the file
 system, you would have to write a tracking process which allowed ManifoldCF
 threads to connect via sockets and communicate document seeding and
 processing events.  Once again, within the framework, you would transmit
 events to the recording process.  This system would be at risk of losing
 tracking data when your tracking process needed to be restarted, however.

 None of these are trivial to implement.  Essentially, keeping track of
 documents is what MCF uses the database for in the first place, so this
 requirement is like insisting that there be a second ManifoldCF there to be
 sure that the first one did the right thing.  It's an incredible waste of
 resources, frankly.  Using the log is perhaps the simplest to implement and
 most consistent with what clients might be expecting, but it has very
 significant I/O costs.  Using the history table has a similar problem,
 while also putting your database under load.  The last solution requires a
 lot of well-constructed code and remains vulnerable to system instability.
 Take your pick.

 Karl


 Thanks,
 Karl


 On Tue, Sep 16, 2014 at 12:54 AM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Greetings ,

 As part of implementation, i need to put a reconciliation mechanism in
 place where it can be verified how many documents have been crawled for a
 job and same can be displayed in logs.

 First thing came into my mind is to put counters in e.g. CMIS connector
 code in addSeed() and proecessDocuments() methods and increase it as we
 progress but as i could see for CMIS that CmisRepositoryConnector.java is
 getting called for each seeded document to be ingested, these counters are
 not accurate. Is there any method where i can persist these counters within
 code itself as i do not want to persist them in file system.

 Please suggest.
 --
 Regards,
 Lalit.





-- 
Regards,
Lalit.


Re: Reconciliation of documents crawled

2014-09-16 Thread Karl Wright
If you are going to write to a file, you might as well write to the log
file, since that mechanism is already available.

Karl




On Tue, Sep 16, 2014 at 2:44 AM, lalit jangra lalit.j.jan...@gmail.com
wrote:

 Thanks Karl,

 As compared to all three methods suggested by you, i believe writing to
 file would be easier, correct me if i am wrong.

 What i initially thought that while job is running, i need to write
 counter values for each document seeded and processed as we are calling
 addSeedDocument()  processDocument() methods for each document. In this
 case, it would not be easy to reconcile after job is complete as i do have
 loads of data once job finishes and mapping them would be tough. This is
 why i am trying to avoid file based mechanism. Also i would hit the
 tracking issue as we are calling connector object multiple times and having
 multiple agents running parallely.

 Please suggest.

 Regards.

 On Tue, Sep 16, 2014 at 11:59 AM, Karl Wright daddy...@gmail.com wrote:

 Hi Lalit,

 So, let me clarify: you want some independent measure as to whether every
 document seeded, per job, has been in fact processed?

 If that is a correct statement, there is by definition no in code way
 to do it, since there are multiple agents running in your setup. Each agent
 may process some of the documents, and certainly no agent will process all
 of them.  Also, restarting any agents process will lose the information you
 are attempting to record.

 So you are stuck with three possibilities:

 The first possibility is to use [INFO] statements written to the log.
 This would work, but you don't have the information you need in your
 connector (specifically the job ID), so you would have to add these logging
 statements to various places in the ManifoldCF framework.

 The second possibility is to make use of the history database table,
 where events are recorded.  You could create two new activity types, also
 written within the framework, for tracking seeding of records and for
 tracking processing of records.  There are already activity types for job
 start and end.

 Finally, the third possibility: If you must absolutely avoid the file
 system, you would have to write a tracking process which allowed ManifoldCF
 threads to connect via sockets and communicate document seeding and
 processing events.  Once again, within the framework, you would transmit
 events to the recording process.  This system would be at risk of losing
 tracking data when your tracking process needed to be restarted, however.

 None of these are trivial to implement.  Essentially, keeping track of
 documents is what MCF uses the database for in the first place, so this
 requirement is like insisting that there be a second ManifoldCF there to be
 sure that the first one did the right thing.  It's an incredible waste of
 resources, frankly.  Using the log is perhaps the simplest to implement and
 most consistent with what clients might be expecting, but it has very
 significant I/O costs.  Using the history table has a similar problem,
 while also putting your database under load.  The last solution requires a
 lot of well-constructed code and remains vulnerable to system instability.
 Take your pick.

 Karl


 Thanks,
 Karl


 On Tue, Sep 16, 2014 at 12:54 AM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Greetings ,

 As part of implementation, i need to put a reconciliation mechanism in
 place where it can be verified how many documents have been crawled for a
 job and same can be displayed in logs.

 First thing came into my mind is to put counters in e.g. CMIS connector
 code in addSeed() and proecessDocuments() methods and increase it as we
 progress but as i could see for CMIS that CmisRepositoryConnector.java is
 getting called for each seeded document to be ingested, these counters are
 not accurate. Is there any method where i can persist these counters within
 code itself as i do not want to persist them in file system.

 Please suggest.
 --
 Regards,
 Lalit.





 --
 Regards,
 Lalit.



Re: regarding database configuration

2014-09-16 Thread Jitu
Hi Karl,
Today when deploying the war on server. i got below error. what is
the use of template1 db?

Error getting connection: FATAL: no pg_hba.conf entry for host
serverHost, user postgres, database template1, SSL off

Thanks,
Jitu

On Tue, Sep 16, 2014 at 1:51 AM, Karl Wright daddy...@gmail.com wrote:

 Hi Jitu,

 You can read about the parameters here:


 http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#file+properties

 Some hints:

 (1) org.apache.manifoldcf.database.username is the name of a user that
 will be CREATED to own the MCF database instance.
 (2) org.apache.manifoldcf.database.password is the password that user will
 be given.
 (3) org.apache.manifoldcf.dbsuperusername is the name of the database
 superuser, which is the user that will be used to create the ManifoldCF
 database instance.
 (4) org.apache.manifoldcf.dbsuperuserpassword is the corresponding
 superuser password.

 The only some operations require that the superuser name and password be
 correct.  Specifically, when database initialization/upgrade is taking
 place.  This occurs in the single-process example when ManifoldCF is
 started.  In the multiprocess example, this occurs when initialize.bat/.sh
 is run.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 3:58 PM, Jitu abj...@gmail.com wrote:

 Hi,

 i am currently using manifoldcf for crawling from sharepoint server using
 postgres db. But in my properties.xml file is it mandatory to mention both
 username and dbsuperusername. if i just use dbsuperusername then agents are
 not able to access db. if i just mention username then i get another error?
 Please correct me if i am wrong.


   property name=org.apache.manifoldcf.database.username
 value=postgresUser/
   property name=org.apache.manifoldcf.database.password
 value=postgresPassword/
   property name=org.apache.manifoldcf.dbsuperusername
 value=postgresUser/
   property name=org.apache.manifoldcf.dbsuperuserpassword
 value=postgresPassword/

 My Question is can we mention username and password only once. if so
 which properties.

 Thanks,
 Jitu





R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario





Zookeeper configured MCF not working in production mode

2014-09-16 Thread Erlend Garåsen
I'm not able to run MCF 1.7 properly with Zookeeper-based 
synchronization. After some hours, it just stops fetching documents. 
Until now I have been using FileLockManager to get around this problem.


A thread dump and my manifoldcf.log file can be found here:
http://folk.uio.no/erlendfg/manifoldcf/

Erlend


Re: MCF cluster agents do not shut down.

2014-09-16 Thread lalit jangra
Hi Karl,

Below are the logs i get when i try to stop agents. Please suggest.


[root@iwdc1preecma03 ecmadmin]# cd
/app/IW/MCF1.7/dist/multiprocess-zk-example

[root@iwdc1preecma03 multiprocess-zk-example]# ./stop-agents.sh

Configuration file successfully read

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=
iwdc1preecma03.iwater.ie

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.version=1.7.0_25

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.vendor=Oracle Corporation

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.home=/app/IW/java/jre

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.class.path=.:../lib/mcf-pull-agent.jar:../lib/mcf-agents.jar:../lib/mcf-core.jar:../lib/hsqldb.jar:../lib/derbyLocale_zh_TW.jar:../lib/derbyLocale_zh_CN.jar:../lib/derbyLocale_ru.jar:../lib/derbyLocale_pt_BR.jar:../lib/derbyLocale_pl.jar:../lib/derbyLocale_ko_KR.jar:../lib/derbyLocale_ja_JP.jar:../lib/derbyLocale_it.jar:../lib/derbyLocale_hu.jar:../lib/derbyLocale_fr.jar:../lib/derbyLocale_es.jar:../lib/derbyLocale_de_DE.jar:../lib/derbyLocale_cs.jar:../lib/derbytools.jar:../lib/derbynet.jar:../lib/derby.jar:../lib/postgresql.jar:../lib/mail.jar:../lib/slf4j-simple.jar:../lib/slf4j-api.jar:../lib/velocity.jar:../lib/xml-apis.jar:../lib/xercesImpl.jar:../lib/xalan.jar:../lib/servlet-api.jar:../lib/serializer.jar:../lib/log4j.jar:../lib/commons-logging.jar:../lib/commons-lang.jar:../lib/commons-io.jar:../lib/httpclient.jar:../lib/httpcore.jar:../lib/commons-fileupload.jar:../lib/commons-el.jar:../lib/commons-collections.jar:../lib/commons-codec.jar:../lib/json-simple.jar:../lib/json.jar:../lib/zookeeper.jar:../lib/activation.jar:../lib/saaj-impl.jar:../lib/saaj-api.jar:../lib/wsdl4j.jar:../lib/wss4j.jar:../lib/axis.jar:../lib/axis-jaxrpc.jar:../lib/commons-discovery.jar:../lib/geronimo-javamail_1.4_spec.jar:../lib/castor.jar:

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.io.tmpdir=/tmp

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.compiler=NA

[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name
=Linux

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.arch=amd64

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.version=2.6.32-358.el6.x86_64

[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name
=root

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:user.home=/root

[main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:user.dir=/app/IW/MCF1.7/dist/multiprocess-zk-example

[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183
sessionTimeout=4000
watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@78618ac5

[main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
iwdc1preecma03.iwater.ie/10.231.72.24:2181. Will not attempt to
authenticate using SASL (unknown error)

[main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
iwdc1preecma03.iwater.ie/10.231.72.24:2181, initiating session

[main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
org.apache.zookeeper.ClientCnxn - Unable to read additional data from
server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect

[main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to
authenticate using SASL (unknown error)

[main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session

[main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
org.apache.zookeeper.ClientCnxn - Unable to read additional data from
server sessionid 0x0, likely server has closed socket, closing socket
connection and attempting reconnect

[main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
iwdc1preecma03.iwater.ie/10.231.72.24:2183. Will not attempt to
authenticate using SASL (unknown error)

[main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
iwdc1preecma03.iwater.ie/10.231.72.24:2183, initiating session


R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario





Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Erlend Garåsen

On 16.09.14 10:53, lalit jangra wrote:

Hi Erlend,

Can you please elaborate on how you have configured zookeeper based
synchronization, is it in stand alone mode or clustered mode? How many
zookeeper nodes are you running for each of node and how many agents are
you running?


I'm not very familiar with Zookeeper, so I have just followed the 
examples inside the multiprocess-zk-example folder, i.e.:

$MCF_HOME/../runzookeeper.sh  /dev/null 21 
# Reading global properties:
$MCF_HOME/../setglobalproperties.sh  /dev/null 21 
# Starting Agent process:
$MCF_HOME/processes/executecommand.sh 
org.apache.manifoldcf.agents.AgentRun \
1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log 
 pid=$!


The above lines are from my startup script. I see now that I haven't 
specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is 
important, but I can of course try to include that into my script and 
restart everything.


So to the question about how many zookeeper nodes I'm using, the answer 
is one. The same applies to the number of running agents.


Erlend


Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
Hi Erlend,

The zookeeper configuration supplied will likely fill up your disk with
zookeeper synch data, because the parameters that control the cleanup of
that data are not properly set up for long-term execution.

Graeme Seaton would be the best resource for using Zookeeper properly; he's
on this list and I've cc'd him directly as well.

Karl


On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no
wrote:

 On 16.09.14 10:53, lalit jangra wrote:

 Hi Erlend,

 Can you please elaborate on how you have configured zookeeper based
 synchronization, is it in stand alone mode or clustered mode? How many
 zookeeper nodes are you running for each of node and how many agents are
 you running?


 I'm not very familiar with Zookeeper, so I have just followed the examples
 inside the multiprocess-zk-example folder, i.e.:
 $MCF_HOME/../runzookeeper.sh  /dev/null 21 
 # Reading global properties:
 $MCF_HOME/../setglobalproperties.sh  /dev/null 21 
 # Starting Agent process:
 $MCF_HOME/processes/executecommand.sh
 org.apache.manifoldcf.agents.AgentRun \
 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log 
 pid=$!

 The above lines are from my startup script. I see now that I haven't
 specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is
 important, but I can of course try to include that into my script and
 restart everything.

 So to the question about how many zookeeper nodes I'm using, the answer is
 one. The same applies to the number of running agents.

 Erlend



Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Karl Wright
Hi Mario,

Did you try putting quotes in your query around $(IDCOLUMN) as it
suggests?  For some databases this is necessary to preserve case properly.

Karl


On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com
wrote:

  When I start a test  job to extract a table I obtain:



 Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
 around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).



 My configuration:



 Seeding query:

 SELECT command_id AS $(IDCOLUMN) FROM icinga_commands



 Data query:
 SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id
 AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)



 What could I check?



 Thanks a lot



 Mario





 *Da:* Bisonti Mario
 *Inviato:* martedì 16 settembre 2014 09:58
 *A:* 'user@manifoldcf.apache.org'
 *Oggetto:* R: Connection status:Threw exception: 'Driver class not found:
 com.mysql.jdbc.Driver'



 Thanks a lot!



 Connection now is working!



 Mario







 *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:26

 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 You can download the -src and -lib distribution, and then run ant
 make-deps build, and you should be able to use proprietary MySQL database
 connections.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I understood.



 Infact, I haven’t example-proprietary because I use a binary version of
 ManifoldCF so i can’t use MySQL as repository connection.



 Thanks a lot.









 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:21


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 cd /usr/share/manifoldcf/example-proprietary
 sudo java –jar start.jar

 If you do not have example-proprietary, it is because you did not actually
 build ManifoldCF yourself.  In order to use MySQL as a backend, you must
 build ManifoldCF yourself.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I am on

 /usr/share/manifoldcf/example/

 and I execute: sudo java –jar start.jar



 Instead *mysql-connector-java-5.1.32-bin.jar  is in
 /usr/share/manifoldcf/connector-lib-proprietary/*

 *So, how could I run ManifoldCF ?*

 *Excuse me but I am not a linux expertise…*

 *Thanks a lot.*







 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:04
 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 You need to run ManifoldCF out of one of the example-proprietary
 directories in order for it to pick up the mysql jar in the classpath.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:





 *Hallo.*



 *I tried to setup a mysql repository connection but I obtain the error
 mentioned.*



 *I put mysql-connector-java-5.1.32-bin.jar in *

 */apache-manifoldcf-1.7/connector-lib-proprietary*

 *and*

 */apache-manifoldcf-1.7/lib-proprietary*

 *folder but I obtain the error in the object*



 *What could I check?*



 *Thanks a lot*



 *Mario*









Re: MCF cluster agents do not shut down.

2014-09-16 Thread Karl Wright
Hi Lalit,

I asked for thread dumps, not logs.  To get a thread dump, use your jdk's
jstack command on the agents process that won't shut down.

Thanks,
Karl


On Tue, Sep 16, 2014 at 4:51 AM, lalit jangra lalit.j.jan...@gmail.com
wrote:

 Hi Karl,

 Below are the logs i get when i try to stop agents. Please suggest.


 [root@iwdc1preecma03 ecmadmin]# cd
 /app/IW/MCF1.7/dist/multiprocess-zk-example

 [root@iwdc1preecma03 multiprocess-zk-example]# ./stop-agents.sh

 Configuration file successfully read

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=
 iwdc1preecma03.iwater.ie

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.version=1.7.0_25

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.vendor=Oracle Corporation

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.home=/app/IW/java/jre

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.class.path=.:../lib/mcf-pull-agent.jar:../lib/mcf-agents.jar:../lib/mcf-core.jar:../lib/hsqldb.jar:../lib/derbyLocale_zh_TW.jar:../lib/derbyLocale_zh_CN.jar:../lib/derbyLocale_ru.jar:../lib/derbyLocale_pt_BR.jar:../lib/derbyLocale_pl.jar:../lib/derbyLocale_ko_KR.jar:../lib/derbyLocale_ja_JP.jar:../lib/derbyLocale_it.jar:../lib/derbyLocale_hu.jar:../lib/derbyLocale_fr.jar:../lib/derbyLocale_es.jar:../lib/derbyLocale_de_DE.jar:../lib/derbyLocale_cs.jar:../lib/derbytools.jar:../lib/derbynet.jar:../lib/derby.jar:../lib/postgresql.jar:../lib/mail.jar:../lib/slf4j-simple.jar:../lib/slf4j-api.jar:../lib/velocity.jar:../lib/xml-apis.jar:../lib/xercesImpl.jar:../lib/xalan.jar:../lib/servlet-api.jar:../lib/serializer.jar:../lib/log4j.jar:../lib/commons-logging.jar:../lib/commons-lang.jar:../lib/commons-io.jar:../lib/httpclient.jar:../lib/httpcore.jar:../lib/commons-fileupload.jar:../lib/commons-el.jar:../lib/commons-collections.jar:../lib/commons-codec.jar:../lib/json-simple.jar:../lib/json.jar:../lib/zookeeper.jar:../lib/activation.jar:../lib/saaj-impl.jar:../lib/saaj-api.jar:../lib/wsdl4j.jar:../lib/wss4j.jar:../lib/axis.jar:../lib/axis-jaxrpc.jar:../lib/commons-discovery.jar:../lib/geronimo-javamail_1.4_spec.jar:../lib/castor.jar:

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.io.tmpdir=/tmp

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:java.compiler=NA

 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name
 =Linux

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:os.arch=amd64

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:os.version=2.6.32-358.el6.x86_64

 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name
 =root

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:user.home=/root

 [main] INFO org.apache.zookeeper.ZooKeeper - Client
 environment:user.dir=/app/IW/MCF1.7/dist/multiprocess-zk-example

 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection,
 connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183
 sessionTimeout=4000
 watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@78618ac5

 [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
 org.apache.zookeeper.ClientCnxn - Opening socket connection to server
 iwdc1preecma03.iwater.ie/10.231.72.24:2181. Will not attempt to
 authenticate using SASL (unknown error)

 [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
 org.apache.zookeeper.ClientCnxn - Socket connection established to
 iwdc1preecma03.iwater.ie/10.231.72.24:2181, initiating session

 [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO
 org.apache.zookeeper.ClientCnxn - Unable to read additional data from
 server sessionid 0x0, likely server has closed socket, closing socket
 connection and attempting reconnect

 [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
 org.apache.zookeeper.ClientCnxn - Opening socket connection to server
 iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to
 authenticate using SASL (unknown error)

 [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
 org.apache.zookeeper.ClientCnxn - Socket connection established to
 iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session

 [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO
 org.apache.zookeeper.ClientCnxn - Unable to read additional data from
 server sessionid 0x0, likely server has closed socket, closing socket
 connection and attempting reconnect

 [main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO
 org.apache.zookeeper.ClientCnxn - Opening socket connection to server
 

Re: Getting errors in zookeeper logs

2014-09-16 Thread Karl Wright
Hi Lalit,

I believe there is no space between -Xmx and 1024m:  -Xmx1024m.  Same
with -Xms.

Karl


On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com
wrote:

 Greetings,

 I updated zookeeper java heap settings by adding java/.conf under
 zookeeper/conf folder and added below line to all six zookeeper nodes and
 restarted.

 export JVMFLAGS=-Xms 1024m -Xmx 1024m

 Still i can see zookeeper connection reset while starting agent and my
 crawls is stuck.

 Please suggest. Is there any way to read into zookeeper logs as these are
 in binary format.

 Regards.



 On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Thanks Karl,

 I am running zookeepers using zkServer.sh script file and i will try with
 your suggestions.

 Regards.

 On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com wrote:

 If you are running a batch/shell script to start zookeeper, have a look
 at the script you are running.  I am sure there is a way to include an
 environment variable that controls the amount of memory, or at least Java
 options.  The java option you'd include would be something like: -Xmx500m
 (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc.

 Karl


 On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com wrote:

 How are you starting your zookeeper instances?
 Karl


 On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra lalit.j.jan...@gmail.com
  wrote:

 Thanks Karl,

 After updated configurations, still i am hitting same zookeeper
 connection reset issue.

 I was trying to assign memory to zookeeper instances but i could not
 see any way to do same. Can you suggest any way?

 What else i can do?


 Regards.

 On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 If you have more than one unspecified Java process, EACH ONE will
 allocate 25% of available memory by default.  So you will have to do more
 than just free up some MCF memory to get this to work.

 Karl


 On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Thanks Karl,

 I think this is the reason why my zookeeper nodes are resetting
 connection due to instability. What i will try in the meantime is to 
 reduce
 MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for 
 Java
 itself , more than 25% rule and see if it works.

 I also checked out Zookeeper documentation but no specific inputs i
 could take from it.

 Regards.

 On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 I can't speak for Solr's memory consumption, but you absolutely
 need to give Solr enough memory to avoid OOM errors or things will not 
 work
 properly.

 As for MCF, 3G is more than enough; probably you could give it 1G
 and be fine.

 For Zookeeper, remember that it is a Java process.  On 64-bit unix
 machines, Java by default takes 25% of the total system memory.  I 
 would
 look at their documentation to figure out what they need, and assign
 precisely that amount, otherwise zk will obviously not be stable.

 Thanks,
 Karl


 On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Out of 12G, i have assigned 5G to solr as i could see a lot of Out
 of Memory errors/Java heap space issues while crawling large 
 jobs,after
 which it seems to be OK. Also i have assigned 3G to MCF where it is 
 quire
 comfortable. In rest of 4G, i am assuming is enough for OS  zookeeper
 nodes. I am currently running job for 35K documents  i could see 
 more than
 500MB memory free.

 Any thoughts?

 Regards.

 On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com
 wrote:

 HI Lalit,

 The best way in Java to assess memory usage is to turn on JVM
 garbage collection verbose output.  Then you can see how often the 
 system
 garbage collects etc, and whether post-GC usage grows over time.

 12G should be more than enough, so if you find you are running
 into memory limits with that configuration, it would be worth trying 
 to
 figure out why that is happening.

 Karl


 On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Can i see zookeeper connection reset messages due to system
 running on top of memory limits as i have 12G of RAM and can see 
 its using
 11.5G while job is running?


 Is there any way i should ascertain memory to zookeeper nodes 
 if so, is there any yardstick?

 Regards.

 On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright daddy...@gmail.com
  wrote:

 Hi Lalit,

 Looks like this is the result of a tomcat shutdown, and is a
 probable race condition bug in Zookeeper:


 http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3cbay174-w32b2284bedae503e9d22d3a8...@phx.gbl%3E

 Karl


 On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Along with this, i could see below errors in tomcat
 catalina.out.

 Sep 15, 2014 1:06:14 PM
 org.apache.catalina.loader.WebappClassLoader 

Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
Hi Erlend,

If you could obtain a thread dump from the agents process when MCF hangs
that would also be very helpful.


IN GENERAL, when something hangs in Java, it's essential to get a thread
dump in order to diagnose the problem.

Thanks,
Karl


On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no
wrote:

 On 16.09.14 10:53, lalit jangra wrote:

 Hi Erlend,

 Can you please elaborate on how you have configured zookeeper based
 synchronization, is it in stand alone mode or clustered mode? How many
 zookeeper nodes are you running for each of node and how many agents are
 you running?


 I'm not very familiar with Zookeeper, so I have just followed the examples
 inside the multiprocess-zk-example folder, i.e.:
 $MCF_HOME/../runzookeeper.sh  /dev/null 21 
 # Reading global properties:
 $MCF_HOME/../setglobalproperties.sh  /dev/null 21 
 # Starting Agent process:
 $MCF_HOME/processes/executecommand.sh
 org.apache.manifoldcf.agents.AgentRun \
 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log 
 pid=$!

 The above lines are from my startup script. I see now that I haven't
 specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is
 important, but I can of course try to include that into my script and
 restart everything.

 So to the question about how many zookeeper nodes I'm using, the answer is
 one. The same applies to the number of running agents.

 Erlend



Re: Getting errors in zookeeper logs

2014-09-16 Thread lalit jangra
Sorry Karl,

Its a typo actual values are export JVMFLAGS=-Xms1024m -Xmx1024m.

Regards.

On Tue, Sep 16, 2014 at 3:50 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Lalit,

 I believe there is no space between -Xmx and 1024m:  -Xmx1024m.  Same
 with -Xms.

 Karl


 On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Greetings,

 I updated zookeeper java heap settings by adding java/.conf under
 zookeeper/conf folder and added below line to all six zookeeper nodes and
 restarted.

 export JVMFLAGS=-Xms 1024m -Xmx 1024m

 Still i can see zookeeper connection reset while starting agent and my
 crawls is stuck.

 Please suggest. Is there any way to read into zookeeper logs as these are
 in binary format.

 Regards.



 On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Thanks Karl,

 I am running zookeepers using zkServer.sh script file and i will try
 with your suggestions.

 Regards.

 On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com
 wrote:

 If you are running a batch/shell script to start zookeeper, have a look
 at the script you are running.  I am sure there is a way to include an
 environment variable that controls the amount of memory, or at least Java
 options.  The java option you'd include would be something like: -Xmx500m
 (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc.

 Karl


 On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com
 wrote:

 How are you starting your zookeeper instances?
 Karl


 On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Thanks Karl,

 After updated configurations, still i am hitting same zookeeper
 connection reset issue.

 I was trying to assign memory to zookeeper instances but i could not
 see any way to do same. Can you suggest any way?

 What else i can do?


 Regards.

 On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 If you have more than one unspecified Java process, EACH ONE will
 allocate 25% of available memory by default.  So you will have to do 
 more
 than just free up some MCF memory to get this to work.

 Karl


 On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Thanks Karl,

 I think this is the reason why my zookeeper nodes are resetting
 connection due to instability. What i will try in the meantime is to 
 reduce
 MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for 
 Java
 itself , more than 25% rule and see if it works.

 I also checked out Zookeeper documentation but no specific inputs i
 could take from it.

 Regards.

 On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 I can't speak for Solr's memory consumption, but you absolutely
 need to give Solr enough memory to avoid OOM errors or things will 
 not work
 properly.

 As for MCF, 3G is more than enough; probably you could give it 1G
 and be fine.

 For Zookeeper, remember that it is a Java process.  On 64-bit unix
 machines, Java by default takes 25% of the total system memory.  I 
 would
 look at their documentation to figure out what they need, and assign
 precisely that amount, otherwise zk will obviously not be stable.

 Thanks,
 Karl


 On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Out of 12G, i have assigned 5G to solr as i could see a lot of
 Out of Memory errors/Java heap space issues while crawling large 
 jobs,after
 which it seems to be OK. Also i have assigned 3G to MCF where it is 
 quire
 comfortable. In rest of 4G, i am assuming is enough for OS  
 zookeeper
 nodes. I am currently running job for 35K documents  i could see 
 more than
 500MB memory free.

 Any thoughts?

 Regards.

 On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com
 wrote:

 HI Lalit,

 The best way in Java to assess memory usage is to turn on JVM
 garbage collection verbose output.  Then you can see how often the 
 system
 garbage collects etc, and whether post-GC usage grows over time.

 12G should be more than enough, so if you find you are running
 into memory limits with that configuration, it would be worth 
 trying to
 figure out why that is happening.

 Karl


 On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Can i see zookeeper connection reset messages due to system
 running on top of memory limits as i have 12G of RAM and can see 
 its using
 11.5G while job is running?


 Is there any way i should ascertain memory to zookeeper nodes 
 if so, is there any yardstick?

 Regards.

 On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright 
 daddy...@gmail.com wrote:

 Hi Lalit,

 Looks like this is the result of a tomcat shutdown, and is a
 probable race condition bug in Zookeeper:


 http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3cbay174-w32b2284bedae503e9d22d3a8...@phx.gbl%3E

 Karl


 On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra 
 

ManifoldCF and Zookeeper

2014-09-16 Thread Karl Wright
Hi all,

I added Zookeeper synchronization support in 1.5 as part of making MCF be
able to be clustered.  While this has worked for some, others have had
difficulty getting a zookeeper setup correctly configured for long-term
crawling.  I'd very much love to find out what is wrong with these failing
setups, and (if possible) add a page to our documentation about zookeeper
best practices.  However, I'm definitely not the person to do that.

Are there any readers on this forum who have had success in this area?  If
so, can you let us know what you did to get zookeeper happy for the long
term?

Thanks,
Karl


Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
Sorry, Erlend, I missed that.  The thread dump indicates that it is waiting
for the Zookeeper server to respond.  Do you have corresponding zookeeper
server logs?

Thanks,
Karl


On Tue, Sep 16, 2014 at 7:08 AM, Erlend Garåsen e.f.gara...@usit.uio.no
wrote:

 On 16.09.14 12:27, Karl Wright wrote:

  If you could obtain a thread dump from the agents process when MCF hangs
 that would also be very helpful.


 Hmm, I thought I did that:
 http://folk.uio.no/erlendfg/manifoldcf/

 Erlend



Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread lalit jangra
Hello,

To restrain zookeeper from taking too much disk space, use below
parameters. These will help to purge extra data one may not need.

autopurge.snapRetainCount=3 : default value
autopurge.purgeInterval=1: default value

Feel free to update as per needs.

Regards.

On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Erlend,

 The zookeeper configuration supplied will likely fill up your disk with
 zookeeper synch data, because the parameters that control the cleanup of
 that data are not properly set up for long-term execution.

 Graeme Seaton would be the best resource for using Zookeeper properly;
 he's on this list and I've cc'd him directly as well.

 Karl


 On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no
 wrote:

 On 16.09.14 10:53, lalit jangra wrote:

 Hi Erlend,

 Can you please elaborate on how you have configured zookeeper based
 synchronization, is it in stand alone mode or clustered mode? How many
 zookeeper nodes are you running for each of node and how many agents are
 you running?


 I'm not very familiar with Zookeeper, so I have just followed the
 examples inside the multiprocess-zk-example folder, i.e.:
 $MCF_HOME/../runzookeeper.sh  /dev/null 21 
 # Reading global properties:
 $MCF_HOME/../setglobalproperties.sh  /dev/null 21 
 # Starting Agent process:
 $MCF_HOME/processes/executecommand.sh
 org.apache.manifoldcf.agents.AgentRun \
 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log
  pid=$!

 The above lines are from my startup script. I see now that I haven't
 specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is
 important, but I can of course try to include that into my script and
 restart everything.

 So to the question about how many zookeeper nodes I'm using, the answer
 is one. The same applies to the number of running agents.

 Erlend





-- 
Regards,
Lalit.


Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
Hi Erlend,

How many worker threads are you using?  How many documents (about) do you
crawl before things hang?

You may also want to try to increase the parameter: maxClientCnxns in
zookeeper.cfg to something bigger, if you have a lot of worker threads.
I'm thinking 1000 or some such.  See if it makes a difference for you.

I'll try a large crawl here using Zookeeper also, but it would be good to
know your parameters before I begin.

Karl


On Tue, Sep 16, 2014 at 7:21 AM, lalit jangra lalit.j.jan...@gmail.com
wrote:

 Hello,

 To restrain zookeeper from taking too much disk space, use below
 parameters. These will help to purge extra data one may not need.

 autopurge.snapRetainCount=3 : default value
 autopurge.purgeInterval=1: default value

 Feel free to update as per needs.

 Regards.

 On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Erlend,

 The zookeeper configuration supplied will likely fill up your disk with
 zookeeper synch data, because the parameters that control the cleanup of
 that data are not properly set up for long-term execution.

 Graeme Seaton would be the best resource for using Zookeeper properly;
 he's on this list and I've cc'd him directly as well.

 Karl


 On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no
 wrote:

 On 16.09.14 10:53, lalit jangra wrote:

 Hi Erlend,

 Can you please elaborate on how you have configured zookeeper based
 synchronization, is it in stand alone mode or clustered mode? How many
 zookeeper nodes are you running for each of node and how many agents are
 you running?


 I'm not very familiar with Zookeeper, so I have just followed the
 examples inside the multiprocess-zk-example folder, i.e.:
 $MCF_HOME/../runzookeeper.sh  /dev/null 21 
 # Reading global properties:
 $MCF_HOME/../setglobalproperties.sh  /dev/null 21 
 # Starting Agent process:
 $MCF_HOME/processes/executecommand.sh
 org.apache.manifoldcf.agents.AgentRun \
 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log
  pid=$!

 The above lines are from my startup script. I see now that I haven't
 specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is
 important, but I can of course try to include that into my script and
 restart everything.

 So to the question about how many zookeeper nodes I'm using, the answer
 is one. The same applies to the number of running agents.

 Erlend





 --
 Regards,
 Lalit.



R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario






R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
Yes, it works, and “ “ aren’t necessary.

Note this:
from MySql Client
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
not work
instead
SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
Works.
So it seems that “ “ are necessary, but when I use insiede ManifoldCF it 
doesn’t work with “ “


Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 13:50
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
What's happening is that the JDBC connector cannot find the proper column in 
the resultset.
Can you do the following in the mysql client:
SELECT command_id AS lcf__id FROM icinga_commands
Please let me know what the returned columns are.  If there is not a column 
that precisely matches lcf__id then that explains the error.
Karl

On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar

Instead mysql-connector-java-5.1.32-bin.jar  is in 
/usr/share/manifoldcf/connector-lib-proprietary/
So, how could I run ManifoldCF ?
Excuse me but I am not a linux expertise…
Thanks a lot.



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:04
A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You need to run ManifoldCF out of one of the example-proprietary directories in 
order for it to pick up the mysql jar in the classpath.

Thanks,
Karl

On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:


Hallo.

I tried to setup a mysql repository connection but I obtain the error mentioned.

I put mysql-connector-java-5.1.32-bin.jar in
/apache-manifoldcf-1.7/connector-lib-proprietary
and
/apache-manifoldcf-1.7/lib-proprietary
folder but I obtain the error in the object

What could I check?

Thanks a lot

Mario







Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Karl Wright
Hi Mario,

I've create CONNECTORS-1032 to track your issue.  MySQL queries have worked
fine in the past against MySQL 5.5.  So I suggest that you try
single-quotes, and if that does not work either, we're going to have to
have more information and some debugging time.

First -- what version of MySQL is this?
Second, what version of MCF are you working with?

I will propose a debugging output patch that will let us see what the
column names the JDBC query is returning if I have that information.
Please attach it as a comment to the ticket.

Thanks,
Karl


On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario mario.biso...@vimar.com
wrote:

  Yes, it works, and “ “ aren’t necessary.



 Note this:
 from MySql Client

 SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
 not work
 instead
 SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands

 Works.

 So it seems that “ “ are necessary, but when I use insiede ManifoldCF it
 doesn’t work with “ “





 Mario







 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* martedì 16 settembre 2014 13:50

 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 What's happening is that the JDBC connector cannot find the proper column
 in the resultset.

 Can you do the following in the mysql client:

 SELECT command_id AS lcf__id FROM icinga_commands

 Please let me know what the returned columns are.  If there is not a
 column that precisely matches lcf__id then that explains the error.

 Karl



 On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  Yes, but I obtained the same error.



 SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands

 I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
 by a MySql Client and it works.















 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* martedì 16 settembre 2014 12:17


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 Did you try putting quotes in your query around $(IDCOLUMN) as it
 suggests?  For some databases this is necessary to preserve case properly.

 Karl



 On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  When I start a test  job to extract a table I obtain:



 Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
 around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).



 My configuration:



 Seeding query:

 SELECT command_id AS $(IDCOLUMN) FROM icinga_commands



 Data query:
 SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id
 AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)



 What could I check?



 Thanks a lot



 Mario





 *Da:* Bisonti Mario
 *Inviato:* martedì 16 settembre 2014 09:58
 *A:* 'user@manifoldcf.apache.org'
 *Oggetto:* R: Connection status:Threw exception: 'Driver class not found:
 com.mysql.jdbc.Driver'



 Thanks a lot!



 Connection now is working!



 Mario







 *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:26


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 You can download the -src and -lib distribution, and then run ant
 make-deps build, and you should be able to use proprietary MySQL database
 connections.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I understood.



 Infact, I haven’t example-proprietary because I use a binary version of
 ManifoldCF so i can’t use MySQL as repository connection.



 Thanks a lot.









 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:21


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 cd /usr/share/manifoldcf/example-proprietary
 sudo java –jar start.jar

 If you do not have example-proprietary, it is because you did not actually
 build ManifoldCF yourself.  In order to use MySQL as a backend, you must
 build ManifoldCF yourself.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I am on

 /usr/share/manifoldcf/example/

 and I execute: sudo java –jar start.jar



 Instead *mysql-connector-java-5.1.32-bin.jar  is in
 /usr/share/manifoldcf/connector-lib-proprietary/*

 *So, how could I run ManifoldCF ?*

 *Excuse me but I am not a linux expertise…*

 *Thanks a lot.*







 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:04
 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 You need to run ManifoldCF out of one of the example-proprietary
 directories 

Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
Believe it or not, I was able to reproduce this here with a crawl of 10
documents.  I get this in the Zookeeper server-side log, hundreds of times:


[SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected
Exce
ption:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
va:153)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
java:1076)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
lRequestProcessor.java:170)
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
cessor.java:167)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
ssor.java:101)
[SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected
Exce
ption:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
va:153)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
java:1076)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
lRequestProcessor.java:170)
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
cessor.java:167)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
ssor.java:101)


... and then everything locks up.  I have no idea what is happening; seems
to be an NIO exception ZooKeeper is not expecting.

Karl


On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no
wrote:


 Ouch, I forgot to place the Zookeeper logs on web. Since they do not
 include timestamps and I have restarted MCF after a few changes, I guess it
 will be difficult to get the relevant lines. I'll do that next time it
 hangs, probably in the end of the day.

 I will add the new Zookeeper configuration settings as Lalit suggested
 next time I'm restarting MCF.

  How many worker threads are you using?  How many documents (about) do
 you crawl before things hang?


 Throttling - max connections: 30
 Throttling - Max fetches/min: 100
 Bandwith - max connections: 25
 Bandwith - max kbytes/sec: 8000
 Bandwith - max fetches/min: 20

 I have four jobs configured. The one I'm running now has 100,000 documents
 configured. Totally around 110,000 documents for all four jobs.

 I guess there are more documents involved since the largest job excludes a
 lot of documents based on sophisticated and complex filtering rules. Maybe
 50% more even though they are not added to Solr (but they are of course
 fetched).

 Erlend


 You may also want to try to increase the parameter: maxClientCnxns in
 zookeeper.cfg to something bigger, if you have a lot of worker threads.
 I'm thinking 1000 or some such.  See if it makes a difference for you.


 I'll try that at next restart.

 Erlend



R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Bisonti Mario
MySql Version : '5.5.38'

ManifoldCF : 1.7

The problem still remains with single quote ‘  ‘

How could I attach a comment to the ticket, please?


Mario




Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 14:05
A: user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
I've create CONNECTORS-1032 to track your issue.  MySQL queries have worked 
fine in the past against MySQL 5.5.  So I suggest that you try single-quotes, 
and if that does not work either, we're going to have to have more information 
and some debugging time.
First -- what version of MySQL is this?
Second, what version of MCF are you working with?
I will propose a debugging output patch that will let us see what the column 
names the JDBC query is returning if I have that information.  Please attach it 
as a comment to the ticket.

Thanks,
Karl

On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, it works, and “ “ aren’t necessary.

Note this:
from MySql Client
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
not work
instead
SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
Works.
So it seems that “ “ are necessary, but when I use insiede ManifoldCF it 
doesn’t work with “ “


Mario



Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 13:50

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
What's happening is that the JDBC connector cannot find the proper column in 
the resultset.
Can you do the following in the mysql client:
SELECT command_id AS lcf__id FROM icinga_commands
Please let me know what the returned columns are.  If there is not a column 
that precisely matches lcf__id then that explains the error.
Karl

On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
Yes, but I obtained the same error.

SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands
I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a 
MySql Client and it works.







Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: martedì 16 settembre 2014 12:17

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
Did you try putting quotes in your query around $(IDCOLUMN) as it suggests?  
For some databases this is necessary to preserve case properly.
Karl

On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
When I start a test  job to extract a table I obtain:

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes 
around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).

My configuration:

Seeding query:
SELECT command_id AS $(IDCOLUMN) FROM icinga_commands

Data query:
SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS 
$(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)

What could I check?

Thanks a lot

Mario


Da: Bisonti Mario
Inviato: martedì 16 settembre 2014 09:58
A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org'
Oggetto: R: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Thanks a lot!

Connection now is working!

Mario



Da: Karl Wright [mailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:26

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
You can download the -src and -lib distribution, and then run ant make-deps 
build, and you should be able to use proprietary MySQL database connections.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I understood.

Infact, I haven’t example-proprietary because I use a binary version of 
ManifoldCF so i can’t use MySQL as repository connection.

Thanks a lot.




Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com]
Inviato: lunedì 15 settembre 2014 16:21

A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org
Oggetto: Re: Connection status:Threw exception: 'Driver class not found: 
com.mysql.jdbc.Driver'

Hi Mario,
cd /usr/share/manifoldcf/example-proprietary
sudo java –jar start.jar
If you do not have example-proprietary, it is because you did not actually 
build ManifoldCF yourself.  In order to use MySQL as a backend, you must build 
ManifoldCF yourself.

Thanks,
Karl

On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario 
mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote:
I am on
/usr/share/manifoldcf/example/
and I execute: sudo java –jar start.jar


Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'

2014-09-16 Thread Karl Wright
Hi Mario,

Register and log in, and then you can attach a comment.

Karl


On Tue, Sep 16, 2014 at 8:34 AM, Bisonti Mario mario.biso...@vimar.com
wrote:

  MySql Version : '5.5.38'



 ManifoldCF : 1.7



 The problem still remains with single quote ‘  ‘



 How could I attach a comment to the ticket, please?





 *Mario*









 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* martedì 16 settembre 2014 14:05

 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 I've create CONNECTORS-1032 to track your issue.  MySQL queries have
 worked fine in the past against MySQL 5.5.  So I suggest that you try
 single-quotes, and if that does not work either, we're going to have to
 have more information and some debugging time.

 First -- what version of MySQL is this?
 Second, what version of MCF are you working with?

 I will propose a debugging output patch that will let us see what the
 column names the JDBC query is returning if I have that information.
 Please attach it as a comment to the ticket.

 Thanks,
 Karl



 On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  Yes, it works, and “ “ aren’t necessary.



 Note this:
 from MySql Client

 SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
 not work
 instead
 SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands

 Works.

 So it seems that “ “ are necessary, but when I use insiede ManifoldCF it
 doesn’t work with “ “





 Mario







 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* martedì 16 settembre 2014 13:50


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 What's happening is that the JDBC connector cannot find the proper column
 in the resultset.

 Can you do the following in the mysql client:

 SELECT command_id AS lcf__id FROM icinga_commands

 Please let me know what the returned columns are.  If there is not a
 column that precisely matches lcf__id then that explains the error.

 Karl



 On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  Yes, but I obtained the same error.



 SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands

 I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands
 by a MySql Client and it works.















 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* martedì 16 settembre 2014 12:17


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 Did you try putting quotes in your query around $(IDCOLUMN) as it
 suggests?  For some databases this is necessary to preserve case properly.

 Karl



 On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  When I start a test  job to extract a table I obtain:



 Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
 around $(IDCOLUMN) variable, e.g. $(IDCOLUMN).



 My configuration:



 Seeding query:

 SELECT command_id AS $(IDCOLUMN) FROM icinga_commands



 Data query:
 SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id
 AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST)



 What could I check?



 Thanks a lot



 Mario





 *Da:* Bisonti Mario
 *Inviato:* martedì 16 settembre 2014 09:58
 *A:* 'user@manifoldcf.apache.org'
 *Oggetto:* R: Connection status:Threw exception: 'Driver class not found:
 com.mysql.jdbc.Driver'



 Thanks a lot!



 Connection now is working!



 Mario







 *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:26


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 You can download the -src and -lib distribution, and then run ant
 make-deps build, and you should be able to use proprietary MySQL database
 connections.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I understood.



 Infact, I haven’t example-proprietary because I use a binary version of
 ManifoldCF so i can’t use MySQL as repository connection.



 Thanks a lot.









 *Da:* Karl Wright [mailto:daddy...@gmail.com]
 *Inviato:* lunedì 15 settembre 2014 16:21


 *A:* user@manifoldcf.apache.org
 *Oggetto:* Re: Connection status:Threw exception: 'Driver class not
 found: com.mysql.jdbc.Driver'



 Hi Mario,

 cd /usr/share/manifoldcf/example-proprietary
 sudo java –jar start.jar

 If you do not have example-proprietary, it is because you did not actually
 build ManifoldCF yourself.  In order to use MySQL as a backend, you must
 build ManifoldCF yourself.

 Thanks,
 Karl



 On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:

  I am on

 

Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
After some research, I found that increasing the zookeeper.cfg tick time
count from 2000 to 5000 makes this problem go away for me.

Clearly we have an issue, still, with resetting zookeeper connections after
tick timeout failures.  The connections are reset but the state of the
connections are somehow incorrect.  I'll need to do more research to figure
out how this can be addressed.

For the interim, increasing the tick time seems to be a reasonable
workaround.

Thanks,
Karl


On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright daddy...@gmail.com wrote:

 Believe it or not, I was able to reproduce this here with a crawl of
 10 documents.  I get this in the Zookeeper server-side log, hundreds of
 times:

 
 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
 Unexpected Exce
 ption:
 java.nio.channels.CancelledKeyException
 at
 sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at
 sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
 va:153)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
 java:1076)
 at
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
 lRequestProcessor.java:170)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
 cessor.java:167)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
 ssor.java:101)
 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
 Unexpected Exce
 ption:
 java.nio.channels.CancelledKeyException
 at
 sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at
 sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
 va:153)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
 java:1076)
 at
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
 lRequestProcessor.java:170)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
 cessor.java:167)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
 ssor.java:101)
 

 ... and then everything locks up.  I have no idea what is happening; seems
 to be an NIO exception ZooKeeper is not expecting.

 Karl


 On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no
 wrote:


 Ouch, I forgot to place the Zookeeper logs on web. Since they do not
 include timestamps and I have restarted MCF after a few changes, I guess it
 will be difficult to get the relevant lines. I'll do that next time it
 hangs, probably in the end of the day.

 I will add the new Zookeeper configuration settings as Lalit suggested
 next time I'm restarting MCF.

  How many worker threads are you using?  How many documents (about) do
 you crawl before things hang?


 Throttling - max connections: 30
 Throttling - Max fetches/min: 100
 Bandwith - max connections: 25
 Bandwith - max kbytes/sec: 8000
 Bandwith - max fetches/min: 20

 I have four jobs configured. The one I'm running now has 100,000
 documents configured. Totally around 110,000 documents for all four jobs.

 I guess there are more documents involved since the largest job excludes
 a lot of documents based on sophisticated and complex filtering rules.
 Maybe 50% more even though they are not added to Solr (but they are of
 course fetched).

 Erlend


 You may also want to try to increase the parameter: maxClientCnxns in
 zookeeper.cfg to something bigger, if you have a lot of worker threads.
 I'm thinking 1000 or some such.  See if it makes a difference for you.


 I'll try that at next restart.

 Erlend





Re: Getting errors in zookeeper logs

2014-09-16 Thread Karl Wright
Hi Lalit,

Please see my email about increasing the value of the tick interval
significantly.  I think this will help a lot.  There are still issues that
I need to deal with, but you may be able to succeed in the interim with
that one change.

Hopefully I'll also have a code fix as well, but that may take longer.

Thanks,
Karl


On Tue, Sep 16, 2014 at 6:33 AM, lalit jangra lalit.j.jan...@gmail.com
wrote:

 Sorry Karl,

 Its a typo actual values are export JVMFLAGS=-Xms1024m -Xmx1024m.

 Regards.

 On Tue, Sep 16, 2014 at 3:50 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Lalit,

 I believe there is no space between -Xmx and 1024m:  -Xmx1024m.  Same
 with -Xms.

 Karl


 On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com
 wrote:

 Greetings,

 I updated zookeeper java heap settings by adding java/.conf under
 zookeeper/conf folder and added below line to all six zookeeper nodes and
 restarted.

 export JVMFLAGS=-Xms 1024m -Xmx 1024m

 Still i can see zookeeper connection reset while starting agent and my
 crawls is stuck.

 Please suggest. Is there any way to read into zookeeper logs as these
 are in binary format.

 Regards.



 On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com
  wrote:

 Thanks Karl,

 I am running zookeepers using zkServer.sh script file and i will try
 with your suggestions.

 Regards.

 On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com
 wrote:

 If you are running a batch/shell script to start zookeeper, have a
 look at the script you are running.  I am sure there is a way to include 
 an
 environment variable that controls the amount of memory, or at least Java
 options.  The java option you'd include would be something like: -Xmx500m
 (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc.

 Karl


 On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com
 wrote:

 How are you starting your zookeeper instances?
 Karl


 On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Thanks Karl,

 After updated configurations, still i am hitting same zookeeper
 connection reset issue.

 I was trying to assign memory to zookeeper instances but i could not
 see any way to do same. Can you suggest any way?

 What else i can do?


 Regards.

 On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 If you have more than one unspecified Java process, EACH ONE will
 allocate 25% of available memory by default.  So you will have to do 
 more
 than just free up some MCF memory to get this to work.

 Karl


 On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Thanks Karl,

 I think this is the reason why my zookeeper nodes are resetting
 connection due to instability. What i will try in the meantime is to 
 reduce
 MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G 
 for Java
 itself , more than 25% rule and see if it works.

 I also checked out Zookeeper documentation but no specific inputs
 i could take from it.

 Regards.

 On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com
 wrote:

 Hi Lalit,

 I can't speak for Solr's memory consumption, but you absolutely
 need to give Solr enough memory to avoid OOM errors or things will 
 not work
 properly.

 As for MCF, 3G is more than enough; probably you could give it 1G
 and be fine.

 For Zookeeper, remember that it is a Java process.  On 64-bit
 unix machines, Java by default takes 25% of the total system memory. 
  I
 would look at their documentation to figure out what they need, and 
 assign
 precisely that amount, otherwise zk will obviously not be stable.

 Thanks,
 Karl


 On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Out of 12G, i have assigned 5G to solr as i could see a lot of
 Out of Memory errors/Java heap space issues while crawling large 
 jobs,after
 which it seems to be OK. Also i have assigned 3G to MCF where it is 
 quire
 comfortable. In rest of 4G, i am assuming is enough for OS  
 zookeeper
 nodes. I am currently running job for 35K documents  i could see 
 more than
 500MB memory free.

 Any thoughts?

 Regards.

 On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com
  wrote:

 HI Lalit,

 The best way in Java to assess memory usage is to turn on JVM
 garbage collection verbose output.  Then you can see how often the 
 system
 garbage collects etc, and whether post-GC usage grows over time.

 12G should be more than enough, so if you find you are running
 into memory limits with that configuration, it would be worth 
 trying to
 figure out why that is happening.

 Karl


 On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra 
 lalit.j.jan...@gmail.com wrote:

 Hi Karl,

 Can i see zookeeper connection reset messages due to system
 running on top of memory limits as i have 12G of RAM and can see 
 its using
 11.5G while job is running?


 Is there any way i should ascertain memory to zookeeper nodes
  if 

Re: Web crawling , robots.txt and access credentials

2014-09-16 Thread Karl Wright
Hi Mario,

I looked at your robots.txt.  In its current form, it should disallow
EVERYTHING from your site.  The reason is that some of your paths start
with /, but the allow clauses do not.

As for why MCF is letting files through, I suspect that this is because MCF
caches robots data.  If you changed the file and expected MCF to pick that
up immediately, it won't.  The cached copy expires after, I believe, 1
hour.  It's kept in the database so even if you recycle the agents process
it won't purge the cache.

Karl


On Tue, Sep 16, 2014 at 11:44 AM, Karl Wright daddy...@gmail.com wrote:

 Authentication does not bypass robots ever.

 You will want to turn on connector debug logging to see the decisions that
 the web connector is making with respect to which documents are fetched or
 not fetched, and why.

 Karl


 On Tue, Sep 16, 2014 at 11:04 AM, Bisonti Mario mario.biso...@vimar.com
 wrote:



 *Hallo.*



 I would like to crawl some documents in a subfolder of a web site:

 http://aaa.bb.com/



 Structure is:

 http://aaa.bb.com/ccc/folder1

 http://aaa.bb.com/ccc/folder2

 http://aaa.bb.com/ccc/folder3



 Folder ccc and subfolder, are with a Basic security
 username: joe

 Password: p



 I want to permit the crawling of only some docs on folder1

 So I put robots.txt on

 http://aaa.bb.com/ccc/robots.txt



 The contents of file robots.txt is

 User-agent: *

 Disallow: /

 Allow: folder1/doc1.pdf

 Allow: folder1/doc2.pdf

 Allow: folder1/doc3.pdf





 I setup on MCF 1.7 a repository web connection with:
 “Obey robots.txt for all fetches”
 and on Access credentials:
 http://aaa.bb.com/ccc/

 Basic authentication: joe and ppp



 When I create a job :

 Include in crawl : .*

 Include in index: .*

 Include only hosts matching seeds? X



 and I start it, it happens that it crawls all the content of folder1,
 folder2, and folder3,

 instead, as I expected, only the :

 http://aaa.bb.com/ccc/folder1/doc1.pdf



 http://aaa.bb.com/ccc/folder1/doc2.pdf



 http://aaa.bb.com/ccc/folder1/doc3.pdf





 Why this?



 Perhaps the Basic Authentication, bypass the specific “Obey robots.txt
 for all fetches” ?



 Thanks a lot for your help.

 Mario







Re: Zookeeper configured MCF not working in production mode

2014-09-16 Thread Karl Wright
I believe I've fixed the problem for real.  There's a patch attached to the
CONNECTORS-1031 ticket, which should be applicable to 1.7.  The fix is
already checked into the dev_1x branch, as well as trunk (which is MCF 2.0,
so don't use that yet).

I also believe that we're going to need to make a 1.7.1 release that
contains this fix, and others of similar importance.

Karl


On Tue, Sep 16, 2014 at 9:15 AM, Karl Wright daddy...@gmail.com wrote:

 After some research, I found that increasing the zookeeper.cfg tick time
 count from 2000 to 5000 makes this problem go away for me.

 Clearly we have an issue, still, with resetting zookeeper connections
 after tick timeout failures.  The connections are reset but the state of
 the connections are somehow incorrect.  I'll need to do more research to
 figure out how this can be addressed.

 For the interim, increasing the tick time seems to be a reasonable
 workaround.

 Thanks,
 Karl


 On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright daddy...@gmail.com wrote:

 Believe it or not, I was able to reproduce this here with a crawl of
 10 documents.  I get this in the Zookeeper server-side log, hundreds of
 times:

 
 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
 Unexpected Exce
 ption:
 java.nio.channels.CancelledKeyException
 at
 sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at
 sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
 va:153)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
 java:1076)
 at
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
 lRequestProcessor.java:170)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
 cessor.java:167)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
 ssor.java:101)
 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
 Unexpected Exce
 ption:
 java.nio.channels.CancelledKeyException
 at
 sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at
 sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
 va:153)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
 java:1076)
 at
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
 lRequestProcessor.java:170)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
 cessor.java:167)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
 ssor.java:101)
 

 ... and then everything locks up.  I have no idea what is happening;
 seems to be an NIO exception ZooKeeper is not expecting.

 Karl


 On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no
 wrote:


 Ouch, I forgot to place the Zookeeper logs on web. Since they do not
 include timestamps and I have restarted MCF after a few changes, I guess it
 will be difficult to get the relevant lines. I'll do that next time it
 hangs, probably in the end of the day.

 I will add the new Zookeeper configuration settings as Lalit suggested
 next time I'm restarting MCF.

  How many worker threads are you using?  How many documents (about) do
 you crawl before things hang?


 Throttling - max connections: 30
 Throttling - Max fetches/min: 100
 Bandwith - max connections: 25
 Bandwith - max kbytes/sec: 8000
 Bandwith - max fetches/min: 20

 I have four jobs configured. The one I'm running now has 100,000
 documents configured. Totally around 110,000 documents for all four jobs.

 I guess there are more documents involved since the largest job excludes
 a lot of documents based on sophisticated and complex filtering rules.
 Maybe 50% more even though they are not added to Solr (but they are of
 course fetched).

 Erlend


 You may also want to try to increase the parameter: maxClientCnxns in
 zookeeper.cfg to something bigger, if you have a lot of worker threads.
 I'm thinking 1000 or some such.  See if it makes a difference for you.


 I'll try that at next restart.

 Erlend