[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts
[ https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173168#comment-15173168 ] Shyam Gavulla commented on MAPREDUCE-4522: -- I also tested it by running org.apache.hadoop.examples.DBCountPageView.java and I didn't see any exceptions and the test was successful. > DBOutputFormat Times out on large batch inserts > --- > > Key: MAPREDUCE-4522 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller >Affects Versions: 0.20.205.0 >Reporter: Nathan Jarus > Labels: newbie > > In DBRecordWriter#close(), progress is never updated. In large batch inserts, > this can cause the reduce task to time out due to the amount of time it takes > the SQL engine to process that insert. > Potential solutions I can see: > Don't batch inserts; do the insert when DBRecordWriter#write() is called > (awful) > Spin up a thread in DBRecordWriter#close() and update progress in that. > (gross) > I can provide code for either if you're interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts
[ https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173166#comment-15173166 ] Shyam Gavulla commented on MAPREDUCE-4522: -- [~ozawa] I made the configuration change. Added a property in mapred-default.xml mapreduce.output.dboutputformat.batch-size 1000 The batch size of SQL statements that will be executed before reporting progress. Default is 1000 Added a constant in MRJobConfig.java - public static final String MR_DBOUTPUTFORMAT_BATCH_SIZE="mapreduce.output.dboutputformat.batch-size"; Let me know if this good and I will create a patch. > DBOutputFormat Times out on large batch inserts > --- > > Key: MAPREDUCE-4522 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller >Affects Versions: 0.20.205.0 >Reporter: Nathan Jarus > Labels: newbie > > In DBRecordWriter#close(), progress is never updated. In large batch inserts, > this can cause the reduce task to time out due to the amount of time it takes > the SQL engine to process that insert. > Potential solutions I can see: > Don't batch inserts; do the insert when DBRecordWriter#write() is called > (awful) > Spin up a thread in DBRecordWriter#close() and update progress in that. > (gross) > I can provide code for either if you're interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts
[ https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172182#comment-15172182 ] Shyam Gavulla commented on MAPREDUCE-4522: -- Yes, I can make the batch size configurable. I will work on it. > DBOutputFormat Times out on large batch inserts > --- > > Key: MAPREDUCE-4522 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller >Affects Versions: 0.20.205.0 >Reporter: Nathan Jarus > Labels: newbie > > In DBRecordWriter#close(), progress is never updated. In large batch inserts, > this can cause the reduce task to time out due to the amount of time it takes > the SQL engine to process that insert. > Potential solutions I can see: > Don't batch inserts; do the insert when DBRecordWriter#write() is called > (awful) > Spin up a thread in DBRecordWriter#close() and update progress in that. > (gross) > I can provide code for either if you're interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172155#comment-15172155 ] Ray Chiang commented on MAPREDUCE-6622: --- Good observation, Zhihai. Since it's an optional setting that's off by default, I'd be fine with adding it to the 2.6/2.7 line. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)