Hi Alex, Thanks for the explanation. I realized that we are still on 0.13 with scala 2.10 and some of the things were not introduced before 0.16. I will need to figure out a work around this issue.
Thanks, Nikhil On Tue, Jan 10, 2017 at 1:05 PM Alex Levenson <[email protected]> wrote: > If PackedAvroSource extends FileSource (which extends HfsTapProvider) -- > or if it just extends HfsTapProvider on its own, then you can just do > something like: > > new PackedAvroSource[Identifiers](args("identifiers"))) with > HfsConfPropertySetter { > override def tapConfig = Config(Map("foo" -> "bar")) > } > > Does that make sense? > > On Tue, Jan 10, 2017 at 9:49 AM, Nikhil J Joshi <[email protected]> > wrote: > > Hi Alex, > > I am trying the `HfsConfPropertySetter` way. I couldn't find an example to > implement it correctly, it seems. Could you share with me some more details > on this? An example code will be great. > > Thanks again, > Nikhil > > On Fri, Jan 6, 2017 at 6:23 PM Nikhil J Joshi <[email protected]> wrote: > > Thanks Oscar and Alex. I will follow up and update you on these incredible > ideas. > Have a great weekend, > Nikhil > > On Fri, Jan 6, 2017 at 6:12 PM Alex Levenson <[email protected]> > wrote: > > Yeah per-source config is done via Tap.sourceConfInit and Tap.sinkConfInit > -- so these custom settings will only apply after one of those methods is > called. > > So it can't be used to control things that happen before then, eg, the > heap size of your mappers or things like that. > > On Fri, Jan 6, 2017 at 6:00 PM, Kostya Salomatin <[email protected]> > wrote: > > Wow, per source config is really useful. I've needed this feature for a > while, did not know it already existed. > > Kostya > > On Fri, Jan 6, 2017 at 5:06 PM, 'Alex Levenson' via Scalding Development < > [email protected]> wrote: > > I think you can set this per-source as well (instead of for all sources) > by overriding `tapConfig` here: > https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/HfsConfPropertySetter.scala#L55 > > On Fri, Jan 6, 2017 at 4:58 PM, 'Oscar Boykin' via Scalding Development < > [email protected]> wrote: > > You want to set this config: > > > http://docs.cascading.org/cascading/2.2/javadoc/constant-values.html#cascading.tap.hadoop.HfsProps.COMBINE_INPUT_FILES > > "cascading.hadoop.hfs.combine.files" -> true > > which you can do in the job: > > override def config = super.config + ("cascading.hadoop.hfs.combine.files" > -> true) > > or with a -Dcascading.hadoop.hfs.combine.files=true > > > option to hadoop. > > That should work. Let us know if it does not. > > On Fri, Jan 6, 2017 at 12:52 PM Nikhil J Joshi <[email protected]> > wrote: > > Hi, > > > I recently converted a Pig script to an equivalent scalding. While running > the pig script on the input consisting of many small files I see the inputs > are combined as per logs here: > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths > to process : 1000 06-01-2017 14:37:58 PST > referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input paths to process : 1000 06-01-2017 14:37:58 PST > referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input paths (combined) to process : 77 06-01-2017 14:37:58 PST > referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic > INFO - 2017-01-06 22:37:58,517 org.apache.hadoop.mapreduce.JobSubmitter - > number of splits:77 > > However the scalding job doesn't seem to combine and run 1000 mappers, one > per input file which is causing bad performance. Is there something wrong > with the way I am executing the scalding job? > > The part of the script responsible for the step above is > > private val ids: TypedPipe[Int] = TypedPipe > .from(PackedAvroSource[Identifiers](args("identifiers"))) > .map{ featureNamePrefix match { > case "member" => _.getMemberId.toInt > case "item" => _.getItemId.toInt > }} > > Any help is greatly appreciated. > Thanks, > Nikhil > > -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > Alex Levenson > @THISWILLWORK > > -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > > > > -- > Konstantin mailto:[email protected] > > > > > -- > Alex Levenson > @THISWILLWORK > > -- > > Nikhil J Joshi > Senior Applied Researcher - Machine Learning, Data Science > LinkedIn Corp. > > -- > > Nikhil J Joshi > Senior Applied Researcher - Machine Learning, Data Science > LinkedIn Corp. > > > > > -- > Alex Levenson > @THISWILLWORK > -- Nikhil J Joshi Senior Applied Researcher - Machine Learning, Data Science LinkedIn Corp. -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
