[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts

2016-02-29 Thread Shyam Gavulla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173168#comment-15173168
 ] 

Shyam Gavulla commented on MAPREDUCE-4522:
--

I also tested it by running org.apache.hadoop.examples.DBCountPageView.java and 
I didn't see any exceptions and the test was successful. 

> DBOutputFormat Times out on large batch inserts
> ---
>
> Key: MAPREDUCE-4522
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller
>Affects Versions: 0.20.205.0
>Reporter: Nathan Jarus
>  Labels: newbie
>
> In DBRecordWriter#close(), progress is never updated. In large batch inserts, 
> this can cause the reduce task to time out due to the amount of time it takes 
> the SQL engine to process that insert. 
> Potential solutions I can see:
> Don't batch inserts; do the insert when DBRecordWriter#write() is called 
> (awful)
> Spin up a thread in DBRecordWriter#close() and update progress in that. 
> (gross)
> I can provide code for either if you're interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts

2016-02-29 Thread Shyam Gavulla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173166#comment-15173166
 ] 

Shyam Gavulla commented on MAPREDUCE-4522:
--

[~ozawa] I made the configuration change. Added a property  in 
mapred-default.xml 

  mapreduce.output.dboutputformat.batch-size
  1000
  The batch size of SQL statements that will be executed before 
reporting progress. Default is 1000


Added a constant in MRJobConfig.java - public static final String 
MR_DBOUTPUTFORMAT_BATCH_SIZE="mapreduce.output.dboutputformat.batch-size";

Let me know if this good and I will create a patch. 

> DBOutputFormat Times out on large batch inserts
> ---
>
> Key: MAPREDUCE-4522
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller
>Affects Versions: 0.20.205.0
>Reporter: Nathan Jarus
>  Labels: newbie
>
> In DBRecordWriter#close(), progress is never updated. In large batch inserts, 
> this can cause the reduce task to time out due to the amount of time it takes 
> the SQL engine to process that insert. 
> Potential solutions I can see:
> Don't batch inserts; do the insert when DBRecordWriter#write() is called 
> (awful)
> Spin up a thread in DBRecordWriter#close() and update progress in that. 
> (gross)
> I can provide code for either if you're interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts

2016-02-29 Thread Shyam Gavulla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172182#comment-15172182
 ] 

Shyam Gavulla commented on MAPREDUCE-4522:
--

Yes, I can make the batch size configurable. I will work on it. 

> DBOutputFormat Times out on large batch inserts
> ---
>
> Key: MAPREDUCE-4522
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller
>Affects Versions: 0.20.205.0
>Reporter: Nathan Jarus
>  Labels: newbie
>
> In DBRecordWriter#close(), progress is never updated. In large batch inserts, 
> this can cause the reduce task to time out due to the amount of time it takes 
> the SQL engine to process that insert. 
> Potential solutions I can see:
> Don't batch inserts; do the insert when DBRecordWriter#write() is called 
> (awful)
> Spin up a thread in DBRecordWriter#close() and update progress in that. 
> (gross)
> I can provide code for either if you're interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-02-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172155#comment-15172155
 ] 

Ray Chiang commented on MAPREDUCE-6622:
---

Good observation, Zhihai.

Since it's an optional setting that's off by default, I'd be fine with adding 
it to the 2.6/2.7 line.


> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)