Yes, i am using DUCC v2.0.1 i have a three node cluster with 32gb ram, 40gb ram and 28gb ram. Job runs fine for 15-20 days after that it goes into the infinite loop with the same batch of the id's. We have a 75 minutes cap for a job to complete if not then its start again so after every 75 minutes new job starts but with the same id batch as previous and not even a single document ingested in the data store it goes in the same state untill we restarts the server.

Is this because of the DUCC v2.0.1, are this version of DUCC having that bug?

Is this problem occur because of the Java Heap Space?

Please suggest something as there are nothing in the logs regarding to my problem.

Thanks and Regards
Priyank Sharma

On Friday 10 November 2017 09:00 PM, Eddie Epstein wrote:
Hi Priyank,

Looks like you are running DUCC v2.0.x. There are so many bugs fixed in
subsequent versions, the latest being v2.2.1. Newer versions have a
ducc_update command that will upgrade an existing install, but given all
the changes since v2.0.x I suggest a clean install.

Eddie

On Fri, Nov 10, 2017 at 12:11 AM, priyank sharma <[email protected]>
wrote:

There is nothing on the work item page and performance page on the web
server. There is only one log file for the main node, no log files for
other two nodes. Ducc job processes not able to pick the data from the data
source and no UIMA aggregator is working for that batches.

Are the issue because of the java heap space? We are giving 4gb ram to the
job-process.

Attaching the Log file.

Thanks and Regards
Priyank Sharma

On Thursday 09 November 2017 04:33 PM, Lou DeGenaro wrote:

The first place to look is in your job's logs.  Visit the ducc-mon jobs
page ducchost:42133/jobs.jsp then click on the id of your job.  Examine
the
logs by clicking on each log file name looking for any revealing
information.

Feel free to post non-confidential snippets here, or If you'd like to chat
in real time we can use hipchat.

Lou.

On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma <[email protected]
wrote:

All!
I have a problem regarding DUCC cluster in which a job process gets stuck
and keeps on processing the same batch again and again due to maximum
duration the batch gets reason or extraordinary status
*"**CanceledByUser"
*and then gets restarted with the same ID's. This usually happens after
15
to 20 days and goes away after restarting the ducc cluster. While going
through the data store that is being used by CAS consumer to ingest data,
the data regarding this batch does never get ingested. So most probably
this data is not being processed.

How to check if this data is being processed or not?

Are the resources the issue and why it is being processed after
restarting
the cluster?

We have three nodes cluster with  32gb ram, 40gb ram and 28 gb ram.



--
Thanks and Regards
Priyank Sharma




Reply via email to