Re: Reconciliation of documents crawled
Thanks Karl, As compared to all three methods suggested by you, i believe writing to file would be easier, correct me if i am wrong. What i initially thought that while job is running, i need to write counter values for each document seeded and processed as we are calling addSeedDocument() processDocument() methods for each document. In this case, it would not be easy to reconcile after job is complete as i do have loads of data once job finishes and mapping them would be tough. This is why i am trying to avoid file based mechanism. Also i would hit the tracking issue as we are calling connector object multiple times and having multiple agents running parallely. Please suggest. Regards. On Tue, Sep 16, 2014 at 11:59 AM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, So, let me clarify: you want some independent measure as to whether every document seeded, per job, has been in fact processed? If that is a correct statement, there is by definition no in code way to do it, since there are multiple agents running in your setup. Each agent may process some of the documents, and certainly no agent will process all of them. Also, restarting any agents process will lose the information you are attempting to record. So you are stuck with three possibilities: The first possibility is to use [INFO] statements written to the log. This would work, but you don't have the information you need in your connector (specifically the job ID), so you would have to add these logging statements to various places in the ManifoldCF framework. The second possibility is to make use of the history database table, where events are recorded. You could create two new activity types, also written within the framework, for tracking seeding of records and for tracking processing of records. There are already activity types for job start and end. Finally, the third possibility: If you must absolutely avoid the file system, you would have to write a tracking process which allowed ManifoldCF threads to connect via sockets and communicate document seeding and processing events. Once again, within the framework, you would transmit events to the recording process. This system would be at risk of losing tracking data when your tracking process needed to be restarted, however. None of these are trivial to implement. Essentially, keeping track of documents is what MCF uses the database for in the first place, so this requirement is like insisting that there be a second ManifoldCF there to be sure that the first one did the right thing. It's an incredible waste of resources, frankly. Using the log is perhaps the simplest to implement and most consistent with what clients might be expecting, but it has very significant I/O costs. Using the history table has a similar problem, while also putting your database under load. The last solution requires a lot of well-constructed code and remains vulnerable to system instability. Take your pick. Karl Thanks, Karl On Tue, Sep 16, 2014 at 12:54 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Greetings , As part of implementation, i need to put a reconciliation mechanism in place where it can be verified how many documents have been crawled for a job and same can be displayed in logs. First thing came into my mind is to put counters in e.g. CMIS connector code in addSeed() and proecessDocuments() methods and increase it as we progress but as i could see for CMIS that CmisRepositoryConnector.java is getting called for each seeded document to be ingested, these counters are not accurate. Is there any method where i can persist these counters within code itself as i do not want to persist them in file system. Please suggest. -- Regards, Lalit. -- Regards, Lalit.
Re: Reconciliation of documents crawled
If you are going to write to a file, you might as well write to the log file, since that mechanism is already available. Karl On Tue, Sep 16, 2014 at 2:44 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, As compared to all three methods suggested by you, i believe writing to file would be easier, correct me if i am wrong. What i initially thought that while job is running, i need to write counter values for each document seeded and processed as we are calling addSeedDocument() processDocument() methods for each document. In this case, it would not be easy to reconcile after job is complete as i do have loads of data once job finishes and mapping them would be tough. This is why i am trying to avoid file based mechanism. Also i would hit the tracking issue as we are calling connector object multiple times and having multiple agents running parallely. Please suggest. Regards. On Tue, Sep 16, 2014 at 11:59 AM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, So, let me clarify: you want some independent measure as to whether every document seeded, per job, has been in fact processed? If that is a correct statement, there is by definition no in code way to do it, since there are multiple agents running in your setup. Each agent may process some of the documents, and certainly no agent will process all of them. Also, restarting any agents process will lose the information you are attempting to record. So you are stuck with three possibilities: The first possibility is to use [INFO] statements written to the log. This would work, but you don't have the information you need in your connector (specifically the job ID), so you would have to add these logging statements to various places in the ManifoldCF framework. The second possibility is to make use of the history database table, where events are recorded. You could create two new activity types, also written within the framework, for tracking seeding of records and for tracking processing of records. There are already activity types for job start and end. Finally, the third possibility: If you must absolutely avoid the file system, you would have to write a tracking process which allowed ManifoldCF threads to connect via sockets and communicate document seeding and processing events. Once again, within the framework, you would transmit events to the recording process. This system would be at risk of losing tracking data when your tracking process needed to be restarted, however. None of these are trivial to implement. Essentially, keeping track of documents is what MCF uses the database for in the first place, so this requirement is like insisting that there be a second ManifoldCF there to be sure that the first one did the right thing. It's an incredible waste of resources, frankly. Using the log is perhaps the simplest to implement and most consistent with what clients might be expecting, but it has very significant I/O costs. Using the history table has a similar problem, while also putting your database under load. The last solution requires a lot of well-constructed code and remains vulnerable to system instability. Take your pick. Karl Thanks, Karl On Tue, Sep 16, 2014 at 12:54 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Greetings , As part of implementation, i need to put a reconciliation mechanism in place where it can be verified how many documents have been crawled for a job and same can be displayed in logs. First thing came into my mind is to put counters in e.g. CMIS connector code in addSeed() and proecessDocuments() methods and increase it as we progress but as i could see for CMIS that CmisRepositoryConnector.java is getting called for each seeded document to be ingested, these counters are not accurate. Is there any method where i can persist these counters within code itself as i do not want to persist them in file system. Please suggest. -- Regards, Lalit. -- Regards, Lalit.
Re: regarding database configuration
Hi Karl, Today when deploying the war on server. i got below error. what is the use of template1 db? Error getting connection: FATAL: no pg_hba.conf entry for host serverHost, user postgres, database template1, SSL off Thanks, Jitu On Tue, Sep 16, 2014 at 1:51 AM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, You can read about the parameters here: http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#file+properties Some hints: (1) org.apache.manifoldcf.database.username is the name of a user that will be CREATED to own the MCF database instance. (2) org.apache.manifoldcf.database.password is the password that user will be given. (3) org.apache.manifoldcf.dbsuperusername is the name of the database superuser, which is the user that will be used to create the ManifoldCF database instance. (4) org.apache.manifoldcf.dbsuperuserpassword is the corresponding superuser password. The only some operations require that the superuser name and password be correct. Specifically, when database initialization/upgrade is taking place. This occurs in the single-process example when ManifoldCF is started. In the multiprocess example, this occurs when initialize.bat/.sh is run. Thanks, Karl On Mon, Sep 15, 2014 at 3:58 PM, Jitu abj...@gmail.com wrote: Hi, i am currently using manifoldcf for crawling from sharepoint server using postgres db. But in my properties.xml file is it mandatory to mention both username and dbsuperusername. if i just use dbsuperusername then agents are not able to access db. if i just mention username then i get another error? Please correct me if i am wrong. property name=org.apache.manifoldcf.database.username value=postgresUser/ property name=org.apache.manifoldcf.database.password value=postgresPassword/ property name=org.apache.manifoldcf.dbsuperusername value=postgresUser/ property name=org.apache.manifoldcf.dbsuperuserpassword value=postgresPassword/ My Question is can we mention username and password only once. if so which properties. Thanks, Jitu
R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Thanks a lot! Connection now is working! Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:26 A: user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:21 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/ So, how could I run ManifoldCF ? Excuse me but I am not a linux expertise… Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:04 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories in order for it to pick up the mysql jar in the classpath. Thanks, Karl On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Hallo. I tried to setup a mysql repository connection but I obtain the error mentioned. I put mysql-connector-java-5.1.32-bin.jar in /apache-manifoldcf-1.7/connector-lib-proprietary and /apache-manifoldcf-1.7/lib-proprietary folder but I obtain the error in the object What could I check? Thanks a lot Mario
Zookeeper configured MCF not working in production mode
I'm not able to run MCF 1.7 properly with Zookeeper-based synchronization. After some hours, it just stops fetching documents. Until now I have been using FileLockManager to get around this problem. A thread dump and my manifoldcf.log file can be found here: http://folk.uio.no/erlendfg/manifoldcf/ Erlend
Re: MCF cluster agents do not shut down.
Hi Karl, Below are the logs i get when i try to stop agents. Please suggest. [root@iwdc1preecma03 ecmadmin]# cd /app/IW/MCF1.7/dist/multiprocess-zk-example [root@iwdc1preecma03 multiprocess-zk-example]# ./stop-agents.sh Configuration file successfully read [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name= iwdc1preecma03.iwater.ie [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.7.0_25 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/app/IW/java/jre [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=.:../lib/mcf-pull-agent.jar:../lib/mcf-agents.jar:../lib/mcf-core.jar:../lib/hsqldb.jar:../lib/derbyLocale_zh_TW.jar:../lib/derbyLocale_zh_CN.jar:../lib/derbyLocale_ru.jar:../lib/derbyLocale_pt_BR.jar:../lib/derbyLocale_pl.jar:../lib/derbyLocale_ko_KR.jar:../lib/derbyLocale_ja_JP.jar:../lib/derbyLocale_it.jar:../lib/derbyLocale_hu.jar:../lib/derbyLocale_fr.jar:../lib/derbyLocale_es.jar:../lib/derbyLocale_de_DE.jar:../lib/derbyLocale_cs.jar:../lib/derbytools.jar:../lib/derbynet.jar:../lib/derby.jar:../lib/postgresql.jar:../lib/mail.jar:../lib/slf4j-simple.jar:../lib/slf4j-api.jar:../lib/velocity.jar:../lib/xml-apis.jar:../lib/xercesImpl.jar:../lib/xalan.jar:../lib/servlet-api.jar:../lib/serializer.jar:../lib/log4j.jar:../lib/commons-logging.jar:../lib/commons-lang.jar:../lib/commons-io.jar:../lib/httpclient.jar:../lib/httpcore.jar:../lib/commons-fileupload.jar:../lib/commons-el.jar:../lib/commons-collections.jar:../lib/commons-codec.jar:../lib/json-simple.jar:../lib/json.jar:../lib/zookeeper.jar:../lib/activation.jar:../lib/saaj-impl.jar:../lib/saaj-api.jar:../lib/wsdl4j.jar:../lib/wss4j.jar:../lib/axis.jar:../lib/axis-jaxrpc.jar:../lib/commons-discovery.jar:../lib/geronimo-javamail_1.4_spec.jar:../lib/castor.jar: [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=NA [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name =Linux [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=2.6.32-358.el6.x86_64 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name =root [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/app/IW/MCF1.7/dist/multiprocess-zk-example [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 sessionTimeout=4000 watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@78618ac5 [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc1preecma03.iwater.ie/10.231.72.24:2181. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc1preecma03.iwater.ie/10.231.72.24:2181, initiating session [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect [main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc1preecma03.iwater.ie/10.231.72.24:2183. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc1preecma03.iwater.ie/10.231.72.24:2183, initiating session
R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario Da: Bisonti Mario Inviato: martedì 16 settembre 2014 09:58 A: 'user@manifoldcf.apache.org' Oggetto: R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:26 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:21 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/ So, how could I run ManifoldCF ? Excuse me but I am not a linux expertise… Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:04 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories in order for it to pick up the mysql jar in the classpath. Thanks, Karl On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Hallo. I tried to setup a mysql repository connection but I obtain the error mentioned. I put mysql-connector-java-5.1.32-bin.jar in /apache-manifoldcf-1.7/connector-lib-proprietary and /apache-manifoldcf-1.7/lib-proprietary folder but I obtain the error in the object What could I check? Thanks a lot Mario
Re: Zookeeper configured MCF not working in production mode
On 16.09.14 10:53, lalit jangra wrote: Hi Erlend, Can you please elaborate on how you have configured zookeeper based synchronization, is it in stand alone mode or clustered mode? How many zookeeper nodes are you running for each of node and how many agents are you running? I'm not very familiar with Zookeeper, so I have just followed the examples inside the multiprocess-zk-example folder, i.e.: $MCF_HOME/../runzookeeper.sh /dev/null 21 # Reading global properties: $MCF_HOME/../setglobalproperties.sh /dev/null 21 # Starting Agent process: $MCF_HOME/processes/executecommand.sh org.apache.manifoldcf.agents.AgentRun \ 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log pid=$! The above lines are from my startup script. I see now that I haven't specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is important, but I can of course try to include that into my script and restart everything. So to the question about how many zookeeper nodes I'm using, the answer is one. The same applies to the number of running agents. Erlend
Re: Zookeeper configured MCF not working in production mode
Hi Erlend, The zookeeper configuration supplied will likely fill up your disk with zookeeper synch data, because the parameters that control the cleanup of that data are not properly set up for long-term execution. Graeme Seaton would be the best resource for using Zookeeper properly; he's on this list and I've cc'd him directly as well. Karl On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: On 16.09.14 10:53, lalit jangra wrote: Hi Erlend, Can you please elaborate on how you have configured zookeeper based synchronization, is it in stand alone mode or clustered mode? How many zookeeper nodes are you running for each of node and how many agents are you running? I'm not very familiar with Zookeeper, so I have just followed the examples inside the multiprocess-zk-example folder, i.e.: $MCF_HOME/../runzookeeper.sh /dev/null 21 # Reading global properties: $MCF_HOME/../setglobalproperties.sh /dev/null 21 # Starting Agent process: $MCF_HOME/processes/executecommand.sh org.apache.manifoldcf.agents.AgentRun \ 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log pid=$! The above lines are from my startup script. I see now that I haven't specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is important, but I can of course try to include that into my script and restart everything. So to the question about how many zookeeper nodes I'm using, the answer is one. The same applies to the number of running agents. Erlend
Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario *Da:* Bisonti Mario *Inviato:* martedì 16 settembre 2014 09:58 *A:* 'user@manifoldcf.apache.org' *Oggetto:* R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:26 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:21 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead *mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/* *So, how could I run ManifoldCF ?* *Excuse me but I am not a linux expertise…* *Thanks a lot.* *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:04 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories in order for it to pick up the mysql jar in the classpath. Thanks, Karl On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.com wrote: *Hallo.* *I tried to setup a mysql repository connection but I obtain the error mentioned.* *I put mysql-connector-java-5.1.32-bin.jar in * */apache-manifoldcf-1.7/connector-lib-proprietary* *and* */apache-manifoldcf-1.7/lib-proprietary* *folder but I obtain the error in the object* *What could I check?* *Thanks a lot* *Mario*
Re: MCF cluster agents do not shut down.
Hi Lalit, I asked for thread dumps, not logs. To get a thread dump, use your jdk's jstack command on the agents process that won't shut down. Thanks, Karl On Tue, Sep 16, 2014 at 4:51 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Below are the logs i get when i try to stop agents. Please suggest. [root@iwdc1preecma03 ecmadmin]# cd /app/IW/MCF1.7/dist/multiprocess-zk-example [root@iwdc1preecma03 multiprocess-zk-example]# ./stop-agents.sh Configuration file successfully read [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name= iwdc1preecma03.iwater.ie [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.7.0_25 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/app/IW/java/jre [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=.:../lib/mcf-pull-agent.jar:../lib/mcf-agents.jar:../lib/mcf-core.jar:../lib/hsqldb.jar:../lib/derbyLocale_zh_TW.jar:../lib/derbyLocale_zh_CN.jar:../lib/derbyLocale_ru.jar:../lib/derbyLocale_pt_BR.jar:../lib/derbyLocale_pl.jar:../lib/derbyLocale_ko_KR.jar:../lib/derbyLocale_ja_JP.jar:../lib/derbyLocale_it.jar:../lib/derbyLocale_hu.jar:../lib/derbyLocale_fr.jar:../lib/derbyLocale_es.jar:../lib/derbyLocale_de_DE.jar:../lib/derbyLocale_cs.jar:../lib/derbytools.jar:../lib/derbynet.jar:../lib/derby.jar:../lib/postgresql.jar:../lib/mail.jar:../lib/slf4j-simple.jar:../lib/slf4j-api.jar:../lib/velocity.jar:../lib/xml-apis.jar:../lib/xercesImpl.jar:../lib/xalan.jar:../lib/servlet-api.jar:../lib/serializer.jar:../lib/log4j.jar:../lib/commons-logging.jar:../lib/commons-lang.jar:../lib/commons-io.jar:../lib/httpclient.jar:../lib/httpcore.jar:../lib/commons-fileupload.jar:../lib/commons-el.jar:../lib/commons-collections.jar:../lib/commons-codec.jar:../lib/json-simple.jar:../lib/json.jar:../lib/zookeeper.jar:../lib/activation.jar:../lib/saaj-impl.jar:../lib/saaj-api.jar:../lib/wsdl4j.jar:../lib/wss4j.jar:../lib/axis.jar:../lib/axis-jaxrpc.jar:../lib/commons-discovery.jar:../lib/geronimo-javamail_1.4_spec.jar:../lib/castor.jar: [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=NA [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name =Linux [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=2.6.32-358.el6.x86_64 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name =root [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/app/IW/MCF1.7/dist/multiprocess-zk-example [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 sessionTimeout=4000 watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@78618ac5 [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc1preecma03.iwater.ie/10.231.72.24:2181. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc1preecma03.iwater.ie/10.231.72.24:2181, initiating session [main-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to authenticate using SASL (unknown error) [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session [main-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect [main-SendThread(iwdc1preecma03.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server
Re: Getting errors in zookeeper logs
Hi Lalit, I believe there is no space between -Xmx and 1024m: -Xmx1024m. Same with -Xms. Karl On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Greetings, I updated zookeeper java heap settings by adding java/.conf under zookeeper/conf folder and added below line to all six zookeeper nodes and restarted. export JVMFLAGS=-Xms 1024m -Xmx 1024m Still i can see zookeeper connection reset while starting agent and my crawls is stuck. Please suggest. Is there any way to read into zookeeper logs as these are in binary format. Regards. On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I am running zookeepers using zkServer.sh script file and i will try with your suggestions. Regards. On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com wrote: If you are running a batch/shell script to start zookeeper, have a look at the script you are running. I am sure there is a way to include an environment variable that controls the amount of memory, or at least Java options. The java option you'd include would be something like: -Xmx500m (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc. Karl On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com wrote: How are you starting your zookeeper instances? Karl On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, After updated configurations, still i am hitting same zookeeper connection reset issue. I was trying to assign memory to zookeeper instances but i could not see any way to do same. Can you suggest any way? What else i can do? Regards. On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, If you have more than one unspecified Java process, EACH ONE will allocate 25% of available memory by default. So you will have to do more than just free up some MCF memory to get this to work. Karl On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I think this is the reason why my zookeeper nodes are resetting connection due to instability. What i will try in the meantime is to reduce MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself , more than 25% rule and see if it works. I also checked out Zookeeper documentation but no specific inputs i could take from it. Regards. On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, I can't speak for Solr's memory consumption, but you absolutely need to give Solr enough memory to avoid OOM errors or things will not work properly. As for MCF, 3G is more than enough; probably you could give it 1G and be fine. For Zookeeper, remember that it is a Java process. On 64-bit unix machines, Java by default takes 25% of the total system memory. I would look at their documentation to figure out what they need, and assign precisely that amount, otherwise zk will obviously not be stable. Thanks, Karl On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Out of 12G, i have assigned 5G to solr as i could see a lot of Out of Memory errors/Java heap space issues while crawling large jobs,after which it seems to be OK. Also i have assigned 3G to MCF where it is quire comfortable. In rest of 4G, i am assuming is enough for OS zookeeper nodes. I am currently running job for 35K documents i could see more than 500MB memory free. Any thoughts? Regards. On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com wrote: HI Lalit, The best way in Java to assess memory usage is to turn on JVM garbage collection verbose output. Then you can see how often the system garbage collects etc, and whether post-GC usage grows over time. 12G should be more than enough, so if you find you are running into memory limits with that configuration, it would be worth trying to figure out why that is happening. Karl On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Can i see zookeeper connection reset messages due to system running on top of memory limits as i have 12G of RAM and can see its using 11.5G while job is running? Is there any way i should ascertain memory to zookeeper nodes if so, is there any yardstick? Regards. On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, Looks like this is the result of a tomcat shutdown, and is a probable race condition bug in Zookeeper: http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3cbay174-w32b2284bedae503e9d22d3a8...@phx.gbl%3E Karl On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Along with this, i could see below errors in tomcat catalina.out. Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader
Re: Zookeeper configured MCF not working in production mode
Hi Erlend, If you could obtain a thread dump from the agents process when MCF hangs that would also be very helpful. IN GENERAL, when something hangs in Java, it's essential to get a thread dump in order to diagnose the problem. Thanks, Karl On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: On 16.09.14 10:53, lalit jangra wrote: Hi Erlend, Can you please elaborate on how you have configured zookeeper based synchronization, is it in stand alone mode or clustered mode? How many zookeeper nodes are you running for each of node and how many agents are you running? I'm not very familiar with Zookeeper, so I have just followed the examples inside the multiprocess-zk-example folder, i.e.: $MCF_HOME/../runzookeeper.sh /dev/null 21 # Reading global properties: $MCF_HOME/../setglobalproperties.sh /dev/null 21 # Starting Agent process: $MCF_HOME/processes/executecommand.sh org.apache.manifoldcf.agents.AgentRun \ 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log pid=$! The above lines are from my startup script. I see now that I haven't specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is important, but I can of course try to include that into my script and restart everything. So to the question about how many zookeeper nodes I'm using, the answer is one. The same applies to the number of running agents. Erlend
Re: Getting errors in zookeeper logs
Sorry Karl, Its a typo actual values are export JVMFLAGS=-Xms1024m -Xmx1024m. Regards. On Tue, Sep 16, 2014 at 3:50 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, I believe there is no space between -Xmx and 1024m: -Xmx1024m. Same with -Xms. Karl On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Greetings, I updated zookeeper java heap settings by adding java/.conf under zookeeper/conf folder and added below line to all six zookeeper nodes and restarted. export JVMFLAGS=-Xms 1024m -Xmx 1024m Still i can see zookeeper connection reset while starting agent and my crawls is stuck. Please suggest. Is there any way to read into zookeeper logs as these are in binary format. Regards. On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I am running zookeepers using zkServer.sh script file and i will try with your suggestions. Regards. On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com wrote: If you are running a batch/shell script to start zookeeper, have a look at the script you are running. I am sure there is a way to include an environment variable that controls the amount of memory, or at least Java options. The java option you'd include would be something like: -Xmx500m (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc. Karl On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com wrote: How are you starting your zookeeper instances? Karl On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, After updated configurations, still i am hitting same zookeeper connection reset issue. I was trying to assign memory to zookeeper instances but i could not see any way to do same. Can you suggest any way? What else i can do? Regards. On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, If you have more than one unspecified Java process, EACH ONE will allocate 25% of available memory by default. So you will have to do more than just free up some MCF memory to get this to work. Karl On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I think this is the reason why my zookeeper nodes are resetting connection due to instability. What i will try in the meantime is to reduce MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself , more than 25% rule and see if it works. I also checked out Zookeeper documentation but no specific inputs i could take from it. Regards. On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, I can't speak for Solr's memory consumption, but you absolutely need to give Solr enough memory to avoid OOM errors or things will not work properly. As for MCF, 3G is more than enough; probably you could give it 1G and be fine. For Zookeeper, remember that it is a Java process. On 64-bit unix machines, Java by default takes 25% of the total system memory. I would look at their documentation to figure out what they need, and assign precisely that amount, otherwise zk will obviously not be stable. Thanks, Karl On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Out of 12G, i have assigned 5G to solr as i could see a lot of Out of Memory errors/Java heap space issues while crawling large jobs,after which it seems to be OK. Also i have assigned 3G to MCF where it is quire comfortable. In rest of 4G, i am assuming is enough for OS zookeeper nodes. I am currently running job for 35K documents i could see more than 500MB memory free. Any thoughts? Regards. On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com wrote: HI Lalit, The best way in Java to assess memory usage is to turn on JVM garbage collection verbose output. Then you can see how often the system garbage collects etc, and whether post-GC usage grows over time. 12G should be more than enough, so if you find you are running into memory limits with that configuration, it would be worth trying to figure out why that is happening. Karl On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Can i see zookeeper connection reset messages due to system running on top of memory limits as i have 12G of RAM and can see its using 11.5G while job is running? Is there any way i should ascertain memory to zookeeper nodes if so, is there any yardstick? Regards. On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, Looks like this is the result of a tomcat shutdown, and is a probable race condition bug in Zookeeper: http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3cbay174-w32b2284bedae503e9d22d3a8...@phx.gbl%3E Karl On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra
ManifoldCF and Zookeeper
Hi all, I added Zookeeper synchronization support in 1.5 as part of making MCF be able to be clustered. While this has worked for some, others have had difficulty getting a zookeeper setup correctly configured for long-term crawling. I'd very much love to find out what is wrong with these failing setups, and (if possible) add a page to our documentation about zookeeper best practices. However, I'm definitely not the person to do that. Are there any readers on this forum who have had success in this area? If so, can you let us know what you did to get zookeeper happy for the long term? Thanks, Karl
Re: Zookeeper configured MCF not working in production mode
Sorry, Erlend, I missed that. The thread dump indicates that it is waiting for the Zookeeper server to respond. Do you have corresponding zookeeper server logs? Thanks, Karl On Tue, Sep 16, 2014 at 7:08 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: On 16.09.14 12:27, Karl Wright wrote: If you could obtain a thread dump from the agents process when MCF hangs that would also be very helpful. Hmm, I thought I did that: http://folk.uio.no/erlendfg/manifoldcf/ Erlend
Re: Zookeeper configured MCF not working in production mode
Hello, To restrain zookeeper from taking too much disk space, use below parameters. These will help to purge extra data one may not need. autopurge.snapRetainCount=3 : default value autopurge.purgeInterval=1: default value Feel free to update as per needs. Regards. On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright daddy...@gmail.com wrote: Hi Erlend, The zookeeper configuration supplied will likely fill up your disk with zookeeper synch data, because the parameters that control the cleanup of that data are not properly set up for long-term execution. Graeme Seaton would be the best resource for using Zookeeper properly; he's on this list and I've cc'd him directly as well. Karl On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: On 16.09.14 10:53, lalit jangra wrote: Hi Erlend, Can you please elaborate on how you have configured zookeeper based synchronization, is it in stand alone mode or clustered mode? How many zookeeper nodes are you running for each of node and how many agents are you running? I'm not very familiar with Zookeeper, so I have just followed the examples inside the multiprocess-zk-example folder, i.e.: $MCF_HOME/../runzookeeper.sh /dev/null 21 # Reading global properties: $MCF_HOME/../setglobalproperties.sh /dev/null 21 # Starting Agent process: $MCF_HOME/processes/executecommand.sh org.apache.manifoldcf.agents.AgentRun \ 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log pid=$! The above lines are from my startup script. I see now that I haven't specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is important, but I can of course try to include that into my script and restart everything. So to the question about how many zookeeper nodes I'm using, the answer is one. The same applies to the number of running agents. Erlend -- Regards, Lalit.
Re: Zookeeper configured MCF not working in production mode
Hi Erlend, How many worker threads are you using? How many documents (about) do you crawl before things hang? You may also want to try to increase the parameter: maxClientCnxns in zookeeper.cfg to something bigger, if you have a lot of worker threads. I'm thinking 1000 or some such. See if it makes a difference for you. I'll try a large crawl here using Zookeeper also, but it would be good to know your parameters before I begin. Karl On Tue, Sep 16, 2014 at 7:21 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hello, To restrain zookeeper from taking too much disk space, use below parameters. These will help to purge extra data one may not need. autopurge.snapRetainCount=3 : default value autopurge.purgeInterval=1: default value Feel free to update as per needs. Regards. On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright daddy...@gmail.com wrote: Hi Erlend, The zookeeper configuration supplied will likely fill up your disk with zookeeper synch data, because the parameters that control the cleanup of that data are not properly set up for long-term execution. Graeme Seaton would be the best resource for using Zookeeper properly; he's on this list and I've cc'd him directly as well. Karl On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: On 16.09.14 10:53, lalit jangra wrote: Hi Erlend, Can you please elaborate on how you have configured zookeeper based synchronization, is it in stand alone mode or clustered mode? How many zookeeper nodes are you running for each of node and how many agents are you running? I'm not very familiar with Zookeeper, so I have just followed the examples inside the multiprocess-zk-example folder, i.e.: $MCF_HOME/../runzookeeper.sh /dev/null 21 # Reading global properties: $MCF_HOME/../setglobalproperties.sh /dev/null 21 # Starting Agent process: $MCF_HOME/processes/executecommand.sh org.apache.manifoldcf.agents.AgentRun \ 1$LOGDIR/mcf_agent.stdout.log 2$LOGDIR/mcf_agent.stderr.log pid=$! The above lines are from my startup script. I see now that I haven't specified -Dorg.apache.manifoldcf.processid=A, I'm not sure this is important, but I can of course try to include that into my script and restart everything. So to the question about how many zookeeper nodes I'm using, the answer is one. The same applies to the number of running agents. Erlend -- Regards, Lalit.
R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Yes, but I obtained the same error. SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a MySql Client and it works. Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 12:17 A: user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario Da: Bisonti Mario Inviato: martedì 16 settembre 2014 09:58 A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org' Oggetto: R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:26 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:21 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/ So, how could I run ManifoldCF ? Excuse me but I am not a linux expertise… Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:04 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories in order for it to pick up the mysql jar in the classpath. Thanks, Karl On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Hallo. I tried to setup a mysql repository connection but I obtain the error mentioned. I put mysql-connector-java-5.1.32-bin.jar in /apache-manifoldcf-1.7/connector-lib-proprietary and /apache-manifoldcf-1.7/lib-proprietary folder but I obtain the error in the object What could I check? Thanks a lot Mario
R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Yes, it works, and “ “ aren’t necessary. Note this: from MySql Client SELECT command_id AS $(IDCOLUMN) FROM icinga_commands not work instead SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands Works. So it seems that “ “ are necessary, but when I use insiede ManifoldCF it doesn’t work with “ “ Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 13:50 A: user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, What's happening is that the JDBC connector cannot find the proper column in the resultset. Can you do the following in the mysql client: SELECT command_id AS lcf__id FROM icinga_commands Please let me know what the returned columns are. If there is not a column that precisely matches lcf__id then that explains the error. Karl On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Yes, but I obtained the same error. SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a MySql Client and it works. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 12:17 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario Da: Bisonti Mario Inviato: martedì 16 settembre 2014 09:58 A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org' Oggetto: R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:26 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:21 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/ So, how could I run ManifoldCF ? Excuse me but I am not a linux expertise… Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:04 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories in order for it to pick up the mysql jar in the classpath. Thanks, Karl On Mon, Sep 15, 2014 at 9:32 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Hallo. I tried to setup a mysql repository connection but I obtain the error mentioned. I put mysql-connector-java-5.1.32-bin.jar in /apache-manifoldcf-1.7/connector-lib-proprietary and /apache-manifoldcf-1.7/lib-proprietary folder but I obtain the error in the object What could I check? Thanks a lot Mario
Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Hi Mario, I've create CONNECTORS-1032 to track your issue. MySQL queries have worked fine in the past against MySQL 5.5. So I suggest that you try single-quotes, and if that does not work either, we're going to have to have more information and some debugging time. First -- what version of MySQL is this? Second, what version of MCF are you working with? I will propose a debugging output patch that will let us see what the column names the JDBC query is returning if I have that information. Please attach it as a comment to the ticket. Thanks, Karl On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario mario.biso...@vimar.com wrote: Yes, it works, and “ “ aren’t necessary. Note this: from MySql Client SELECT command_id AS $(IDCOLUMN) FROM icinga_commands not work instead SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands Works. So it seems that “ “ are necessary, but when I use insiede ManifoldCF it doesn’t work with “ “ Mario *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* martedì 16 settembre 2014 13:50 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, What's happening is that the JDBC connector cannot find the proper column in the resultset. Can you do the following in the mysql client: SELECT command_id AS lcf__id FROM icinga_commands Please let me know what the returned columns are. If there is not a column that precisely matches lcf__id then that explains the error. Karl On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.com wrote: Yes, but I obtained the same error. SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a MySql Client and it works. *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* martedì 16 settembre 2014 12:17 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario *Da:* Bisonti Mario *Inviato:* martedì 16 settembre 2014 09:58 *A:* 'user@manifoldcf.apache.org' *Oggetto:* R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:26 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:21 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar Instead *mysql-connector-java-5.1.32-bin.jar is in /usr/share/manifoldcf/connector-lib-proprietary/* *So, how could I run ManifoldCF ?* *Excuse me but I am not a linux expertise…* *Thanks a lot.* *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:04 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You need to run ManifoldCF out of one of the example-proprietary directories
Re: Zookeeper configured MCF not working in production mode
Believe it or not, I was able to reproduce this here with a crawl of 10 documents. I get this in the Zookeeper server-side log, hundreds of times: [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) ... and then everything locks up. I have no idea what is happening; seems to be an NIO exception ZooKeeper is not expecting. Karl On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: Ouch, I forgot to place the Zookeeper logs on web. Since they do not include timestamps and I have restarted MCF after a few changes, I guess it will be difficult to get the relevant lines. I'll do that next time it hangs, probably in the end of the day. I will add the new Zookeeper configuration settings as Lalit suggested next time I'm restarting MCF. How many worker threads are you using? How many documents (about) do you crawl before things hang? Throttling - max connections: 30 Throttling - Max fetches/min: 100 Bandwith - max connections: 25 Bandwith - max kbytes/sec: 8000 Bandwith - max fetches/min: 20 I have four jobs configured. The one I'm running now has 100,000 documents configured. Totally around 110,000 documents for all four jobs. I guess there are more documents involved since the largest job excludes a lot of documents based on sophisticated and complex filtering rules. Maybe 50% more even though they are not added to Solr (but they are of course fetched). Erlend You may also want to try to increase the parameter: maxClientCnxns in zookeeper.cfg to something bigger, if you have a lot of worker threads. I'm thinking 1000 or some such. See if it makes a difference for you. I'll try that at next restart. Erlend
R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
MySql Version : '5.5.38' ManifoldCF : 1.7 The problem still remains with single quote ‘ ‘ How could I attach a comment to the ticket, please? Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 14:05 A: user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, I've create CONNECTORS-1032 to track your issue. MySQL queries have worked fine in the past against MySQL 5.5. So I suggest that you try single-quotes, and if that does not work either, we're going to have to have more information and some debugging time. First -- what version of MySQL is this? Second, what version of MCF are you working with? I will propose a debugging output patch that will let us see what the column names the JDBC query is returning if I have that information. Please attach it as a comment to the ticket. Thanks, Karl On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Yes, it works, and “ “ aren’t necessary. Note this: from MySql Client SELECT command_id AS $(IDCOLUMN) FROM icinga_commands not work instead SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands Works. So it seems that “ “ are necessary, but when I use insiede ManifoldCF it doesn’t work with “ “ Mario Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 13:50 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, What's happening is that the JDBC connector cannot find the proper column in the resultset. Can you do the following in the mysql client: SELECT command_id AS lcf__id FROM icinga_commands Please let me know what the returned columns are. If there is not a column that precisely matches lcf__id then that explains the error. Karl On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: Yes, but I obtained the same error. SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a MySql Client and it works. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: martedì 16 settembre 2014 12:17 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario Da: Bisonti Mario Inviato: martedì 16 settembre 2014 09:58 A: 'user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org' Oggetto: R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario Da: Karl Wright [mailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:26 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. Da: Karl Wright [mailto:daddy...@gmail.commailto:daddy...@gmail.com] Inviato: lunedì 15 settembre 2014 16:21 A: user@manifoldcf.apache.orgmailto:user@manifoldcf.apache.org Oggetto: Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.commailto:mario.biso...@vimar.com wrote: I am on /usr/share/manifoldcf/example/ and I execute: sudo java –jar start.jar
Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver'
Hi Mario, Register and log in, and then you can attach a comment. Karl On Tue, Sep 16, 2014 at 8:34 AM, Bisonti Mario mario.biso...@vimar.com wrote: MySql Version : '5.5.38' ManifoldCF : 1.7 The problem still remains with single quote ‘ ‘ How could I attach a comment to the ticket, please? *Mario* *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* martedì 16 settembre 2014 14:05 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, I've create CONNECTORS-1032 to track your issue. MySQL queries have worked fine in the past against MySQL 5.5. So I suggest that you try single-quotes, and if that does not work either, we're going to have to have more information and some debugging time. First -- what version of MySQL is this? Second, what version of MCF are you working with? I will propose a debugging output patch that will let us see what the column names the JDBC query is returning if I have that information. Please attach it as a comment to the ticket. Thanks, Karl On Tue, Sep 16, 2014 at 7:59 AM, Bisonti Mario mario.biso...@vimar.com wrote: Yes, it works, and “ “ aren’t necessary. Note this: from MySql Client SELECT command_id AS $(IDCOLUMN) FROM icinga_commands not work instead SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands Works. So it seems that “ “ are necessary, but when I use insiede ManifoldCF it doesn’t work with “ “ Mario *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* martedì 16 settembre 2014 13:50 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, What's happening is that the JDBC connector cannot find the proper column in the resultset. Can you do the following in the mysql client: SELECT command_id AS lcf__id FROM icinga_commands Please let me know what the returned columns are. If there is not a column that precisely matches lcf__id then that explains the error. Karl On Tue, Sep 16, 2014 at 7:41 AM, Bisonti Mario mario.biso...@vimar.com wrote: Yes, but I obtained the same error. SELECT command_id AS “$(IDCOLUMN)” FROM icinga_commands I tried the query SELECT command_id AS $(IDCOLUMN) FROM icinga_commands by a MySql Client and it works. *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* martedì 16 settembre 2014 12:17 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, Did you try putting quotes in your query around $(IDCOLUMN) as it suggests? For some databases this is necessary to preserve case properly. Karl On Tue, Sep 16, 2014 at 4:53 AM, Bisonti Mario mario.biso...@vimar.com wrote: When I start a test job to extract a table I obtain: Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around $(IDCOLUMN) variable, e.g. $(IDCOLUMN). My configuration: Seeding query: SELECT command_id AS $(IDCOLUMN) FROM icinga_commands Data query: SELECT command_id AS $(IDCOLUMN), command_line AS $(URLCOLUMN),object_id AS $(DATACOLUMN) FROM icinga_commands where command_id IN $(IDLIST) What could I check? Thanks a lot Mario *Da:* Bisonti Mario *Inviato:* martedì 16 settembre 2014 09:58 *A:* 'user@manifoldcf.apache.org' *Oggetto:* R: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Thanks a lot! Connection now is working! Mario *Da:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:26 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, You can download the -src and -lib distribution, and then run ant make-deps build, and you should be able to use proprietary MySQL database connections. Thanks, Karl On Mon, Sep 15, 2014 at 10:24 AM, Bisonti Mario mario.biso...@vimar.com wrote: I understood. Infact, I haven’t example-proprietary because I use a binary version of ManifoldCF so i can’t use MySQL as repository connection. Thanks a lot. *Da:* Karl Wright [mailto:daddy...@gmail.com] *Inviato:* lunedì 15 settembre 2014 16:21 *A:* user@manifoldcf.apache.org *Oggetto:* Re: Connection status:Threw exception: 'Driver class not found: com.mysql.jdbc.Driver' Hi Mario, cd /usr/share/manifoldcf/example-proprietary sudo java –jar start.jar If you do not have example-proprietary, it is because you did not actually build ManifoldCF yourself. In order to use MySQL as a backend, you must build ManifoldCF yourself. Thanks, Karl On Mon, Sep 15, 2014 at 10:14 AM, Bisonti Mario mario.biso...@vimar.com wrote: I am on
Re: Zookeeper configured MCF not working in production mode
After some research, I found that increasing the zookeeper.cfg tick time count from 2000 to 5000 makes this problem go away for me. Clearly we have an issue, still, with resetting zookeeper connections after tick timeout failures. The connections are reset but the state of the connections are somehow incorrect. I'll need to do more research to figure out how this can be addressed. For the interim, increasing the tick time seems to be a reasonable workaround. Thanks, Karl On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright daddy...@gmail.com wrote: Believe it or not, I was able to reproduce this here with a crawl of 10 documents. I get this in the Zookeeper server-side log, hundreds of times: [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) ... and then everything locks up. I have no idea what is happening; seems to be an NIO exception ZooKeeper is not expecting. Karl On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: Ouch, I forgot to place the Zookeeper logs on web. Since they do not include timestamps and I have restarted MCF after a few changes, I guess it will be difficult to get the relevant lines. I'll do that next time it hangs, probably in the end of the day. I will add the new Zookeeper configuration settings as Lalit suggested next time I'm restarting MCF. How many worker threads are you using? How many documents (about) do you crawl before things hang? Throttling - max connections: 30 Throttling - Max fetches/min: 100 Bandwith - max connections: 25 Bandwith - max kbytes/sec: 8000 Bandwith - max fetches/min: 20 I have four jobs configured. The one I'm running now has 100,000 documents configured. Totally around 110,000 documents for all four jobs. I guess there are more documents involved since the largest job excludes a lot of documents based on sophisticated and complex filtering rules. Maybe 50% more even though they are not added to Solr (but they are of course fetched). Erlend You may also want to try to increase the parameter: maxClientCnxns in zookeeper.cfg to something bigger, if you have a lot of worker threads. I'm thinking 1000 or some such. See if it makes a difference for you. I'll try that at next restart. Erlend
Re: Getting errors in zookeeper logs
Hi Lalit, Please see my email about increasing the value of the tick interval significantly. I think this will help a lot. There are still issues that I need to deal with, but you may be able to succeed in the interim with that one change. Hopefully I'll also have a code fix as well, but that may take longer. Thanks, Karl On Tue, Sep 16, 2014 at 6:33 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Sorry Karl, Its a typo actual values are export JVMFLAGS=-Xms1024m -Xmx1024m. Regards. On Tue, Sep 16, 2014 at 3:50 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, I believe there is no space between -Xmx and 1024m: -Xmx1024m. Same with -Xms. Karl On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Greetings, I updated zookeeper java heap settings by adding java/.conf under zookeeper/conf folder and added below line to all six zookeeper nodes and restarted. export JVMFLAGS=-Xms 1024m -Xmx 1024m Still i can see zookeeper connection reset while starting agent and my crawls is stuck. Please suggest. Is there any way to read into zookeeper logs as these are in binary format. Regards. On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I am running zookeepers using zkServer.sh script file and i will try with your suggestions. Regards. On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright daddy...@gmail.com wrote: If you are running a batch/shell script to start zookeeper, have a look at the script you are running. I am sure there is a way to include an environment variable that controls the amount of memory, or at least Java options. The java option you'd include would be something like: -Xmx500m (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc. Karl On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright daddy...@gmail.com wrote: How are you starting your zookeeper instances? Karl On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, After updated configurations, still i am hitting same zookeeper connection reset issue. I was trying to assign memory to zookeeper instances but i could not see any way to do same. Can you suggest any way? What else i can do? Regards. On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, If you have more than one unspecified Java process, EACH ONE will allocate 25% of available memory by default. So you will have to do more than just free up some MCF memory to get this to work. Karl On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Thanks Karl, I think this is the reason why my zookeeper nodes are resetting connection due to instability. What i will try in the meantime is to reduce MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself , more than 25% rule and see if it works. I also checked out Zookeeper documentation but no specific inputs i could take from it. Regards. On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright daddy...@gmail.com wrote: Hi Lalit, I can't speak for Solr's memory consumption, but you absolutely need to give Solr enough memory to avoid OOM errors or things will not work properly. As for MCF, 3G is more than enough; probably you could give it 1G and be fine. For Zookeeper, remember that it is a Java process. On 64-bit unix machines, Java by default takes 25% of the total system memory. I would look at their documentation to figure out what they need, and assign precisely that amount, otherwise zk will obviously not be stable. Thanks, Karl On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Out of 12G, i have assigned 5G to solr as i could see a lot of Out of Memory errors/Java heap space issues while crawling large jobs,after which it seems to be OK. Also i have assigned 3G to MCF where it is quire comfortable. In rest of 4G, i am assuming is enough for OS zookeeper nodes. I am currently running job for 35K documents i could see more than 500MB memory free. Any thoughts? Regards. On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright daddy...@gmail.com wrote: HI Lalit, The best way in Java to assess memory usage is to turn on JVM garbage collection verbose output. Then you can see how often the system garbage collects etc, and whether post-GC usage grows over time. 12G should be more than enough, so if you find you are running into memory limits with that configuration, it would be worth trying to figure out why that is happening. Karl On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra lalit.j.jan...@gmail.com wrote: Hi Karl, Can i see zookeeper connection reset messages due to system running on top of memory limits as i have 12G of RAM and can see its using 11.5G while job is running? Is there any way i should ascertain memory to zookeeper nodes if
Re: Web crawling , robots.txt and access credentials
Hi Mario, I looked at your robots.txt. In its current form, it should disallow EVERYTHING from your site. The reason is that some of your paths start with /, but the allow clauses do not. As for why MCF is letting files through, I suspect that this is because MCF caches robots data. If you changed the file and expected MCF to pick that up immediately, it won't. The cached copy expires after, I believe, 1 hour. It's kept in the database so even if you recycle the agents process it won't purge the cache. Karl On Tue, Sep 16, 2014 at 11:44 AM, Karl Wright daddy...@gmail.com wrote: Authentication does not bypass robots ever. You will want to turn on connector debug logging to see the decisions that the web connector is making with respect to which documents are fetched or not fetched, and why. Karl On Tue, Sep 16, 2014 at 11:04 AM, Bisonti Mario mario.biso...@vimar.com wrote: *Hallo.* I would like to crawl some documents in a subfolder of a web site: http://aaa.bb.com/ Structure is: http://aaa.bb.com/ccc/folder1 http://aaa.bb.com/ccc/folder2 http://aaa.bb.com/ccc/folder3 Folder ccc and subfolder, are with a Basic security username: joe Password: p I want to permit the crawling of only some docs on folder1 So I put robots.txt on http://aaa.bb.com/ccc/robots.txt The contents of file robots.txt is User-agent: * Disallow: / Allow: folder1/doc1.pdf Allow: folder1/doc2.pdf Allow: folder1/doc3.pdf I setup on MCF 1.7 a repository web connection with: “Obey robots.txt for all fetches” and on Access credentials: http://aaa.bb.com/ccc/ Basic authentication: joe and ppp When I create a job : Include in crawl : .* Include in index: .* Include only hosts matching seeds? X and I start it, it happens that it crawls all the content of folder1, folder2, and folder3, instead, as I expected, only the : http://aaa.bb.com/ccc/folder1/doc1.pdf http://aaa.bb.com/ccc/folder1/doc2.pdf http://aaa.bb.com/ccc/folder1/doc3.pdf Why this? Perhaps the Basic Authentication, bypass the specific “Obey robots.txt for all fetches” ? Thanks a lot for your help. Mario
Re: Zookeeper configured MCF not working in production mode
I believe I've fixed the problem for real. There's a patch attached to the CONNECTORS-1031 ticket, which should be applicable to 1.7. The fix is already checked into the dev_1x branch, as well as trunk (which is MCF 2.0, so don't use that yet). I also believe that we're going to need to make a 1.7.1 release that contains this fix, and others of similar importance. Karl On Tue, Sep 16, 2014 at 9:15 AM, Karl Wright daddy...@gmail.com wrote: After some research, I found that increasing the zookeeper.cfg tick time count from 2000 to 5000 makes this problem go away for me. Clearly we have an issue, still, with resetting zookeeper connections after tick timeout failures. The connections are reset but the state of the connections are somehow incorrect. I'll need to do more research to figure out how this can be addressed. For the interim, increasing the tick time seems to be a reasonable workaround. Thanks, Karl On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright daddy...@gmail.com wrote: Believe it or not, I was able to reproduce this here with a crawl of 10 documents. I get this in the Zookeeper server-side log, hundreds of times: [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce ption: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja va:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina lRequestProcessor.java:170) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro cessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce ssor.java:101) ... and then everything locks up. I have no idea what is happening; seems to be an NIO exception ZooKeeper is not expecting. Karl On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: Ouch, I forgot to place the Zookeeper logs on web. Since they do not include timestamps and I have restarted MCF after a few changes, I guess it will be difficult to get the relevant lines. I'll do that next time it hangs, probably in the end of the day. I will add the new Zookeeper configuration settings as Lalit suggested next time I'm restarting MCF. How many worker threads are you using? How many documents (about) do you crawl before things hang? Throttling - max connections: 30 Throttling - Max fetches/min: 100 Bandwith - max connections: 25 Bandwith - max kbytes/sec: 8000 Bandwith - max fetches/min: 20 I have four jobs configured. The one I'm running now has 100,000 documents configured. Totally around 110,000 documents for all four jobs. I guess there are more documents involved since the largest job excludes a lot of documents based on sophisticated and complex filtering rules. Maybe 50% more even though they are not added to Solr (but they are of course fetched). Erlend You may also want to try to increase the parameter: maxClientCnxns in zookeeper.cfg to something bigger, if you have a lot of worker threads. I'm thinking 1000 or some such. See if it makes a difference for you. I'll try that at next restart. Erlend