Yeah per-source config is done via Tap.sourceConfInit and Tap.sinkConfInit -- so these custom settings will only apply after one of those methods is called.
So it can't be used to control things that happen before then, eg, the heap size of your mappers or things like that. On Fri, Jan 6, 2017 at 6:00 PM, Kostya Salomatin <[email protected]> wrote: > Wow, per source config is really useful. I've needed this feature for a > while, did not know it already existed. > > Kostya > > On Fri, Jan 6, 2017 at 5:06 PM, 'Alex Levenson' via Scalding Development < > [email protected]> wrote: > >> I think you can set this per-source as well (instead of for all sources) >> by overriding `tapConfig` here: https://github.com/twitt >> er/scalding/blob/develop/scalding-core/src/main/scala/com/ >> twitter/scalding/HfsConfPropertySetter.scala#L55 >> >> On Fri, Jan 6, 2017 at 4:58 PM, 'Oscar Boykin' via Scalding Development < >> [email protected]> wrote: >> >>> You want to set this config: >>> >>> http://docs.cascading.org/cascading/2.2/javadoc/constant-val >>> ues.html#cascading.tap.hadoop.HfsProps.COMBINE_INPUT_FILES >>> >>> "cascading.hadoop.hfs.combine.files" -> true >>> >>> which you can do in the job: >>> >>> override def config = super.config + ("cascading.hadoop.hfs.combine.files" >>> -> true) >>> >>> or with a -Dcascading.hadoop.hfs.combine.files=true >>> >>> >>> option to hadoop. >>> >>> That should work. Let us know if it does not. >>> >>> On Fri, Jan 6, 2017 at 12:52 PM Nikhil J Joshi <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> I recently converted a Pig script to an equivalent scalding. While >>>> running the pig script on the input consisting of many small files I see >>>> the inputs are combined as per logs here: >>>> >>>> >>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input >>>> paths to process : 1000 06-01-2017 14:37:58 PST >>>> referral-scoring_scoring_feature-generation-v2_extract-postf >>>> east-fields-jobs-basic >>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >>>> input paths to process : 1000 06-01-2017 14:37:58 PST >>>> referral-scoring_scoring_feature-generation-v2_extract-postf >>>> east-fields-jobs-basic >>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >>>> input paths (combined) to process : 77 06-01-2017 14:37:58 PST >>>> referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic >>>> INFO - 2017-01-06 22:37:58,517 org.apache.hadoop.mapreduce.JobSubmitter >>>> - number of splits:77 >>>> >>>> However the scalding job doesn't seem to combine and run 1000 mappers, >>>> one per input file which is causing bad performance. Is there something >>>> wrong with the way I am executing the scalding job? >>>> >>>> The part of the script responsible for the step above is >>>> >>>> private val ids: TypedPipe[Int] = TypedPipe >>>> .from(PackedAvroSource[Identifiers](args("identifiers"))) >>>> .map{ featureNamePrefix match { >>>> case "member" => _.getMemberId.toInt >>>> case "item" => _.getItemId.toInt >>>> }} >>>> >>>> Any help is greatly appreciated. >>>> Thanks, >>>> Nikhil >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Scalding Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Scalding Development" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Alex Levenson >> @THISWILLWORK >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Scalding Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Konstantin mailto:[email protected] > -- Alex Levenson @THISWILLWORK -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
