Hello,
Thank you for your answers, this solves the issue.
I have set mapred.max.split.size to 1024000000 in hive-site.xml and jobs
are using appropriate number of mappers.
I have played a little with different configurations and
CombineHiveInputFormat gives better performance than HiveInputFormat in
my case.
Thanks again.
--
Wojciech Langiewicz
On 29.07.2011 05:43, Carl Steinbach wrote:
Hi Wojciech,
Vaibhav is correct. There's a configuration problem in the copy of
hive-default.xml that ships with CDH3u1 which sets
hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
undefined. You can fix this problem by setting mapred.max.split.size in
hive-default.xml to some reasonable value (it currently defaults
to 256000000 on trunk).
Sorry for the inconvenience.
Carl
On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav<vagg...@amazon.com>wrote:
If you are using CombineHiveInputFormat it might be the case that all files
are being combined into one large split and hence 1 mapper gets created.**
**
** **
If that is the case you can set the max split size in hive-default.xml
config file to create more splits and hence more map tasks:****
** **
<property>****
<name>mapred.max.split.size</name>****
<value> 134217728</value>****
<description>The maximum size chunk that map input should be split****
into.</description>****
</property>****
****
Thanks****
Vaibhav****
** **
*From:* Edward Capriolo [mailto:edlinuxg...@gmail.com]
*Sent:* Thursday, July 28, 2011 7:10 AM
*To:* user@hive.apache.org
*Subject:* Re: Hive 0.7 using only one mapper****
** **
** **
On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz<
wlangiew...@gmail.com> wrote:****
Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
0.7 (from CDHb4 to CDHu1).
No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with
many reducers or no reducers.
For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers - so
I guess this is a Hive issue.
Thank you for all your help.
--
Wojciech Langiewicz****
You should also check that your hive-default.xml and other conf/ files is
up to 0.7.X. Having older versions of that file can lead to problems.
Edward****