I think we noticed this around SeqFileTableSource. It almost seems like the table source didn't explicitly sets those configs; and the CrunchInputFormat expects it to be set to *false *to enable combine files.
https://github.com/apache/crunch/blob/apache-crunch-0.15.0/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileTableSource.java#L38-L48 https://github.com/apache/crunch/blob/apache-crunch-0.15.0/crunch-core/src/main/java/org/apache/crunch/impl/mr/run/CrunchInputFormat.java#L55-L57 I believe Avro table source is working fine since it's an extension of the AvroFileSource; however SeqFileTableSource doesn't follow the same pattern; It is an extension of FileTableSourceImpl. And I wonder if it's part of the problem. Thanks, Nithin On Mon, Apr 17, 2017 at 8:25 PM Micah Whitacre <[email protected]> wrote: > It might have been me: > https://issues.apache.org/jira/browse/CRUNCH-331 > > Also can you clarify where you see it being set to true? In the current > stream of code they are both set the same[1][2]. > > [1] - > https://github.com/apache/crunch/blob/047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileSource.java#L42 > [2] - > https://github.com/apache/crunch/blob/047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/main/java/org/apache/crunch/io/avro/AvroFileSource.java#L44 > > > On Mon, Apr 17, 2017 at 7:33 PM, Josh Wills <[email protected]> wrote: > >> +tomwhite >> >> I think Tom was the one who set this originally, but it might be my >> faulty memory. :/ >> >> J >> >> On Mon, Apr 17, 2017 at 2:11 PM, Kodimala,Rajashekar < >> [email protected]> wrote: >> >>> Hello Team, >>> >>> >>> >>> Recently we have observed that Crunch API by default disabling the >>> combine file flag in sequence files, but it is not disabling when input >>> files are avro files. Is their any specific reason for why combine file for >>> sequence files is disabled by default. >>> >>> >>> >>> seqFileSource.inputConf(RuntimeParameters.DISABLE_COMBINE_FILE, "true"); >>> >>> >>> >>> Thanks >>> >>> -- >>> >>> *Rajashekar Kodimala* >>> >>> Software Engineer, Population Health Dev >>> >>> [email protected] >>> >>> www.cerner.com >>> >>> >>> >>> >>> CONFIDENTIALITY NOTICE This message and any included attachments are >>> from Cerner Corporation and are intended only for the addressee. The >>> information contained in this message is confidential and may constitute >>> inside or non-public information under international, federal, or state >>> securities laws. Unauthorized forwarding, printing, copying, distribution, >>> or use of such information is strictly prohibited and may be unlawful. If >>> you are not the addressee, please promptly delete this message and notify >>> the sender of the delivery error by e-mail or you may call Cerner's >>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024 >>> <(816)%20221-1024>. >>> >> >> >
