Yes the double negative to enable combining the small files is a bit on the confusing side. I think that the SeqFileTableSource combining small files by default is an oversight versus intentional.
On Mon, Apr 17, 2017 at 8:43 PM, Nithin Asokan <[email protected]> wrote: > I think we noticed this around SeqFileTableSource. It almost seems like > the table source didn't explicitly sets those configs; and the > CrunchInputFormat expects it to be set to *false *to enable combine > files. > > https://github.com/apache/crunch/blob/apache-crunch-0. > 15.0/crunch-core/src/main/java/org/apache/crunch/io/seq/ > SeqFileTableSource.java#L38-L48 > https://github.com/apache/crunch/blob/apache-crunch-0. > 15.0/crunch-core/src/main/java/org/apache/crunch/impl/ > mr/run/CrunchInputFormat.java#L55-L57 > > I believe Avro table source is working fine since it's an extension of the > AvroFileSource; however SeqFileTableSource doesn't follow the same pattern; > It is an extension of FileTableSourceImpl. And I wonder if it's part of the > problem. > > Thanks, > Nithin > > > On Mon, Apr 17, 2017 at 8:25 PM Micah Whitacre <[email protected]> > wrote: > >> It might have been me: >> https://issues.apache.org/jira/browse/CRUNCH-331 >> >> Also can you clarify where you see it being set to true? In the current >> stream of code they are both set the same[1][2]. >> >> [1] - https://github.com/apache/crunch/blob/ >> 047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/ >> main/java/org/apache/crunch/io/seq/SeqFileSource.java#L42 >> [2] - https://github.com/apache/crunch/blob/ >> 047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/ >> main/java/org/apache/crunch/io/avro/AvroFileSource.java#L44 >> >> >> On Mon, Apr 17, 2017 at 7:33 PM, Josh Wills <[email protected]> wrote: >> >>> +tomwhite >>> >>> I think Tom was the one who set this originally, but it might be my >>> faulty memory. :/ >>> >>> J >>> >>> On Mon, Apr 17, 2017 at 2:11 PM, Kodimala,Rajashekar < >>> [email protected]> wrote: >>> >>>> Hello Team, >>>> >>>> >>>> >>>> Recently we have observed that Crunch API by default disabling the >>>> combine file flag in sequence files, but it is not disabling when input >>>> files are avro files. Is their any specific reason for why combine file for >>>> sequence files is disabled by default. >>>> >>>> >>>> >>>> seqFileSource.inputConf(RuntimeParameters.DISABLE_COMBINE_FILE, >>>> "true"); >>>> >>>> >>>> >>>> Thanks >>>> >>>> -- >>>> >>>> *Rajashekar Kodimala* >>>> >>>> Software Engineer, Population Health Dev >>>> >>>> [email protected] >>>> >>>> www.cerner.com >>>> >>>> >>>> >>>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are >>>> from Cerner Corporation and are intended only for the addressee. The >>>> information contained in this message is confidential and may constitute >>>> inside or non-public information under international, federal, or state >>>> securities laws. Unauthorized forwarding, printing, copying, distribution, >>>> or use of such information is strictly prohibited and may be unlawful. If >>>> you are not the addressee, please promptly delete this message and notify >>>> the sender of the delivery error by e-mail or you may call Cerner's >>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024 >>>> <(816)%20221-1024>. >>>> >>> >>> >>
