Yes the double negative to enable combining the small files is a bit on the
confusing side.  I think that the SeqFileTableSource combining small files
by default is an oversight versus intentional.

On Mon, Apr 17, 2017 at 8:43 PM, Nithin Asokan <[email protected]> wrote:

> I think we noticed this around SeqFileTableSource. It almost seems like
> the table source didn't explicitly sets those configs; and the
> CrunchInputFormat expects it to be set to *false *to enable combine
> files.
>
> https://github.com/apache/crunch/blob/apache-crunch-0.
> 15.0/crunch-core/src/main/java/org/apache/crunch/io/seq/
> SeqFileTableSource.java#L38-L48
> https://github.com/apache/crunch/blob/apache-crunch-0.
> 15.0/crunch-core/src/main/java/org/apache/crunch/impl/
> mr/run/CrunchInputFormat.java#L55-L57
>
> I believe Avro table source is working fine since it's an extension of the
> AvroFileSource; however SeqFileTableSource doesn't follow the same pattern;
> It is an extension of FileTableSourceImpl. And I wonder if it's part of the
> problem.
>
> Thanks,
> Nithin
>
>
> On Mon, Apr 17, 2017 at 8:25 PM Micah Whitacre <[email protected]>
> wrote:
>
>> It might have been me:
>> https://issues.apache.org/jira/browse/CRUNCH-331
>>
>> Also can you clarify where you see it being set to true?  In the current
>> stream of code they are both set the same[1][2].
>>
>> [1] - https://github.com/apache/crunch/blob/
>> 047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/
>> main/java/org/apache/crunch/io/seq/SeqFileSource.java#L42
>> [2] - https://github.com/apache/crunch/blob/
>> 047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/
>> main/java/org/apache/crunch/io/avro/AvroFileSource.java#L44
>>
>>
>> On Mon, Apr 17, 2017 at 7:33 PM, Josh Wills <[email protected]> wrote:
>>
>>> +tomwhite
>>>
>>> I think Tom was the one who set this originally, but it might be my
>>> faulty memory. :/
>>>
>>> J
>>>
>>> On Mon, Apr 17, 2017 at 2:11 PM, Kodimala,Rajashekar <
>>> [email protected]> wrote:
>>>
>>>> Hello Team,
>>>>
>>>>
>>>>
>>>> Recently we have observed that Crunch API by default disabling the
>>>> combine file flag in sequence files, but it is not disabling when input
>>>> files are avro files. Is their any specific reason for why combine file for
>>>> sequence files is disabled by default.
>>>>
>>>>
>>>>
>>>> seqFileSource.inputConf(RuntimeParameters.DISABLE_COMBINE_FILE,
>>>> "true");
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> --
>>>>
>>>> *Rajashekar Kodimala*
>>>>
>>>> Software Engineer, Population Health Dev
>>>>
>>>> [email protected]
>>>>
>>>> www.cerner.com
>>>>
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>>> from Cerner Corporation and are intended only for the addressee. The
>>>> information contained in this message is confidential and may constitute
>>>> inside or non-public information under international, federal, or state
>>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>>> or use of such information is strictly prohibited and may be unlawful. If
>>>> you are not the addressee, please promptly delete this message and notify
>>>> the sender of the delivery error by e-mail or you may call Cerner's
>>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024
>>>> <(816)%20221-1024>.
>>>>
>>>
>>>
>>

Reply via email to