Re: CombineHiveInputFormat not working
I'm running with CDH 5.3.3 (Hadoop 2.5.0 + cdh patches)... so those two issues are hopefully not an issue. I'll try the two configs suggested and report back. Thanks! On Wed, Sep 30, 2015 at 3:14 PM, Ryan Harris wrote: > I would suggest trying: > > set hive.hadoop.supports.splittable.combineinputformat = true; > > > > you might also need to increase mapreduce.input.fileinputformat.split.minsize > to something larger, like 32MB > > set mapreduce.input.fileinputformat.split.minsize = 33554432; > > > > Depending on your hadoop distro and version, be potentially aware of > > https://issues.apache.org/jira/browse/MAPREDUCE-1597 > > and > > https://issues.apache.org/jira/browse/MAPREDUCE-5537 > > > > test it and see... > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday, September 30, 2015 3:33 PM > *To:* user@hive.apache.org > *Subject:* Re: CombineHiveInputFormat not working > > > > mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1 > mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = > 134217728 > hive.hadoop.supports.splittable.combineinputformat = false > > > > My average file size is pretty small... it's usually between 500K and 20MB. > > > > So it looks like the splittable support is turned off? I've been seeing > some posts on the mailing list saying there's correctness problems when > using this and LZO. > > > > Is this still the case? Can I turn this on with LZ4? > > > > Thanks! > > > > On Wed, Sep 30, 2015 at 1:38 PM, Ryan Harris > wrote: > > Also... > > mapreduce.input.fileinputformat.split.maxsize > > > > and, what is the size of your input files? > > > > *From:* Ryan Harris > *Sent:* Wednesday, September 30, 2015 2:37 PM > *To:* 'user@hive.apache.org' > *Subject:* RE: CombineHiveInputFormat not working > > > > what are your values for: > > mapred.min.split.size > > mapred.max.split.size > > hive.hadoop.supports.splittable.combineinputformat > > > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday, September 30, 2015 2:20 PM > *To:* user@hive.apache.org > *Subject:* CombineHiveInputFormat not working > > > > Hi all, > > > > I have an external table of with the following DDL. > > > > ``` > > DROP TABLE IF EXISTS raw_events; > > CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( > > raw_event_string string) > > PARTITIONED BY (dc string, community string, dt string) > > STORED AS TEXTFILE > > LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' > > ``` > > > > The files are loaded externally and are LZ4 compressed. When I run a query > on this table for a single day, I'm getting 1 mapper per file even though > the input format is set to CombineHiveInputFormat. > > > > Does anyone know if CombineHiveInputFormat does not work with LZ4 > compressed files or have any idea why split combination is not working? > > > > Thanks! > > Pradeep > -- > > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. > > > -- > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. >
RE: CombineHiveInputFormat not working
I would suggest trying: set hive.hadoop.supports.splittable.combineinputformat = true; you might also need to increase mapreduce.input.fileinputformat.split.minsize to something larger, like 32MB set mapreduce.input.fileinputformat.split.minsize = 33554432; Depending on your hadoop distro and version, be potentially aware of https://issues.apache.org/jira/browse/MAPREDUCE-1597 and https://issues.apache.org/jira/browse/MAPREDUCE-5537 test it and see... From: Pradeep Gollakota [mailto:pradeep...@gmail.com] Sent: Wednesday, September 30, 2015 3:33 PM To: user@hive.apache.org Subject: Re: CombineHiveInputFormat not working mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1 mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = 134217728 hive.hadoop.supports.splittable.combineinputformat = false My average file size is pretty small... it's usually between 500K and 20MB. So it looks like the splittable support is turned off? I've been seeing some posts on the mailing list saying there's correctness problems when using this and LZO. Is this still the case? Can I turn this on with LZ4? Thanks! On Wed, Sep 30, 2015 at 1:38 PM, Ryan Harris mailto:ryan.har...@zionsbancorp.com>> wrote: Also... mapreduce.input.fileinputformat.split.maxsize and, what is the size of your input files? From: Ryan Harris Sent: Wednesday, September 30, 2015 2:37 PM To: 'user@hive.apache.org<mailto:user@hive.apache.org>' Subject: RE: CombineHiveInputFormat not working what are your values for: mapred.min.split.size mapred.max.split.size hive.hadoop.supports.splittable.combineinputformat From: Pradeep Gollakota [mailto:pradeep...@gmail.com<mailto:pradeep...@gmail.com>] Sent: Wednesday, September 30, 2015 2:20 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: CombineHiveInputFormat not working Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' ``` The files are loaded externally and are LZ4 compressed. When I run a query on this table for a single day, I'm getting 1 mapper per file even though the input format is set to CombineHiveInputFormat. Does anyone know if CombineHiveInputFormat does not work with LZ4 compressed files or have any idea why split combination is not working? Thanks! Pradeep THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you. == THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.
Re: CombineHiveInputFormat not working
mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1 mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = 134217728 hive.hadoop.supports.splittable.combineinputformat = false My average file size is pretty small... it's usually between 500K and 20MB. So it looks like the splittable support is turned off? I've been seeing some posts on the mailing list saying there's correctness problems when using this and LZO. Is this still the case? Can I turn this on with LZ4? Thanks! On Wed, Sep 30, 2015 at 1:38 PM, Ryan Harris wrote: > Also... > > mapreduce.input.fileinputformat.split.maxsize > > > > and, what is the size of your input files? > > > > *From:* Ryan Harris > *Sent:* Wednesday, September 30, 2015 2:37 PM > *To:* 'user@hive.apache.org' > *Subject:* RE: CombineHiveInputFormat not working > > > > what are your values for: > > mapred.min.split.size > > mapred.max.split.size > > hive.hadoop.supports.splittable.combineinputformat > > > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday, September 30, 2015 2:20 PM > *To:* user@hive.apache.org > *Subject:* CombineHiveInputFormat not working > > > > Hi all, > > > > I have an external table of with the following DDL. > > > > ``` > > DROP TABLE IF EXISTS raw_events; > > CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( > > raw_event_string string) > > PARTITIONED BY (dc string, community string, dt string) > > STORED AS TEXTFILE > > LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' > > ``` > > > > The files are loaded externally and are LZ4 compressed. When I run a query > on this table for a single day, I'm getting 1 mapper per file even though > the input format is set to CombineHiveInputFormat. > > > > Does anyone know if CombineHiveInputFormat does not work with LZ4 > compressed files or have any idea why split combination is not working? > > > > Thanks! > > Pradeep > -- > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. >
RE: CombineHiveInputFormat not working
Also... mapreduce.input.fileinputformat.split.maxsize and, what is the size of your input files? From: Ryan Harris Sent: Wednesday, September 30, 2015 2:37 PM To: 'user@hive.apache.org' Subject: RE: CombineHiveInputFormat not working what are your values for: mapred.min.split.size mapred.max.split.size hive.hadoop.supports.splittable.combineinputformat From: Pradeep Gollakota [mailto:pradeep...@gmail.com] Sent: Wednesday, September 30, 2015 2:20 PM To: user@hive.apache.org Subject: CombineHiveInputFormat not working Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' ``` The files are loaded externally and are LZ4 compressed. When I run a query on this table for a single day, I'm getting 1 mapper per file even though the input format is set to CombineHiveInputFormat. Does anyone know if CombineHiveInputFormat does not work with LZ4 compressed files or have any idea why split combination is not working? Thanks! Pradeep == THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.
RE: CombineHiveInputFormat not working
what are your values for: mapred.min.split.size mapred.max.split.size hive.hadoop.supports.splittable.combineinputformat From: Pradeep Gollakota [mailto:pradeep...@gmail.com] Sent: Wednesday, September 30, 2015 2:20 PM To: user@hive.apache.org Subject: CombineHiveInputFormat not working Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' ``` The files are loaded externally and are LZ4 compressed. When I run a query on this table for a single day, I'm getting 1 mapper per file even though the input format is set to CombineHiveInputFormat. Does anyone know if CombineHiveInputFormat does not work with LZ4 compressed files or have any idea why split combination is not working? Thanks! Pradeep == THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.
CombineHiveInputFormat not working
Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' ``` The files are loaded externally and are LZ4 compressed. When I run a query on this table for a single day, I'm getting 1 mapper per file even though the input format is set to CombineHiveInputFormat. Does anyone know if CombineHiveInputFormat does not work with LZ4 compressed files or have any idea why split combination is not working? Thanks! Pradeep