I think we noticed this around SeqFileTableSource. It almost seems like the
table source didn't explicitly sets those configs; and the
CrunchInputFormat expects it to be set to *false *to enable combine files.

https://github.com/apache/crunch/blob/apache-crunch-0.15.0/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileTableSource.java#L38-L48
https://github.com/apache/crunch/blob/apache-crunch-0.15.0/crunch-core/src/main/java/org/apache/crunch/impl/mr/run/CrunchInputFormat.java#L55-L57

I believe Avro table source is working fine since it's an extension of the
AvroFileSource; however SeqFileTableSource doesn't follow the same pattern;
It is an extension of FileTableSourceImpl. And I wonder if it's part of the
problem.

Thanks,
Nithin


On Mon, Apr 17, 2017 at 8:25 PM Micah Whitacre <[email protected]> wrote:

> It might have been me:
> https://issues.apache.org/jira/browse/CRUNCH-331
>
> Also can you clarify where you see it being set to true?  In the current
> stream of code they are both set the same[1][2].
>
> [1] -
> https://github.com/apache/crunch/blob/047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileSource.java#L42
> [2] -
> https://github.com/apache/crunch/blob/047d8fd36773608a3d2cf6445881173e7d26377c/crunch-core/src/main/java/org/apache/crunch/io/avro/AvroFileSource.java#L44
>
>
> On Mon, Apr 17, 2017 at 7:33 PM, Josh Wills <[email protected]> wrote:
>
>> +tomwhite
>>
>> I think Tom was the one who set this originally, but it might be my
>> faulty memory. :/
>>
>> J
>>
>> On Mon, Apr 17, 2017 at 2:11 PM, Kodimala,Rajashekar <
>> [email protected]> wrote:
>>
>>> Hello Team,
>>>
>>>
>>>
>>> Recently we have observed that Crunch API by default disabling the
>>> combine file flag in sequence files, but it is not disabling when input
>>> files are avro files. Is their any specific reason for why combine file for
>>> sequence files is disabled by default.
>>>
>>>
>>>
>>> seqFileSource.inputConf(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
>>>
>>>
>>>
>>> Thanks
>>>
>>> --
>>>
>>> *Rajashekar Kodimala*
>>>
>>> Software Engineer, Population Health Dev
>>>
>>> [email protected]
>>>
>>> www.cerner.com
>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>> from Cerner Corporation and are intended only for the addressee. The
>>> information contained in this message is confidential and may constitute
>>> inside or non-public information under international, federal, or state
>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>> or use of such information is strictly prohibited and may be unlawful. If
>>> you are not the addressee, please promptly delete this message and notify
>>> the sender of the delivery error by e-mail or you may call Cerner's
>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024
>>> <(816)%20221-1024>.
>>>
>>
>>
>

Reply via email to