Are you running with a shared file system on your cluster? Is your user log directory located there? Look at the DUCC daemon log files located in $DUCC_HOME/logs. They should provide some clues as to what is wrong. Feel free to post (non-confidential versions of) them here for a second opinion.
Lou. On Fri, Nov 10, 2017 at 12:11 AM, priyank sharma <[email protected]> wrote: > There is nothing on the work item page and performance page on the web > server. There is only one log file for the main node, no log files for > other two nodes. Ducc job processes not able to pick the data from the data > source and no UIMA aggregator is working for that batches. > > Are the issue because of the java heap space? We are giving 4gb ram to the > job-process. > > Attaching the Log file. > > Thanks and Regards > Priyank Sharma > > On Thursday 09 November 2017 04:33 PM, Lou DeGenaro wrote: > >> The first place to look is in your job's logs. Visit the ducc-mon jobs >> page ducchost:42133/jobs.jsp then click on the id of your job. Examine >> the >> logs by clicking on each log file name looking for any revealing >> information. >> >> Feel free to post non-confidential snippets here, or If you'd like to chat >> in real time we can use hipchat. >> >> Lou. >> >> On Thu, Nov 9, 2017 at 5:19 AM, priyank sharma <[email protected] >> > >> wrote: >> >> All! >>> >>> I have a problem regarding DUCC cluster in which a job process gets stuck >>> and keeps on processing the same batch again and again due to maximum >>> duration the batch gets reason or extraordinary status >>> *"**CanceledByUser" >>> *and then gets restarted with the same ID's. This usually happens after >>> 15 >>> to 20 days and goes away after restarting the ducc cluster. While going >>> through the data store that is being used by CAS consumer to ingest data, >>> the data regarding this batch does never get ingested. So most probably >>> this data is not being processed. >>> >>> How to check if this data is being processed or not? >>> >>> Are the resources the issue and why it is being processed after >>> restarting >>> the cluster? >>> >>> We have three nodes cluster with 32gb ram, 40gb ram and 28 gb ram. >>> >>> >>> >>> -- >>> Thanks and Regards >>> Priyank Sharma >>> >>> >>> >
