Hi Alex,

Thanks for the explanation. I realized that we are still on 0.13 with scala
2.10 and some of the things were not introduced before 0.16. I will need to
figure out a work around this issue.

Thanks,
Nikhil

On Tue, Jan 10, 2017 at 1:05 PM Alex Levenson <[email protected]>
wrote:

> If PackedAvroSource extends FileSource (which extends HfsTapProvider) --
> or if it just extends HfsTapProvider on its own, then you can just do
> something like:
>
> new PackedAvroSource[Identifiers](args("identifiers"))) with
> HfsConfPropertySetter {
>   override def tapConfig = Config(Map("foo" -> "bar"))
> }
>
> Does that make sense?
>
> On Tue, Jan 10, 2017 at 9:49 AM, Nikhil J Joshi <[email protected]>
> wrote:
>
> Hi Alex,
>
> I am trying the `HfsConfPropertySetter` way. I couldn't find an example to
> implement it correctly, it seems. Could you share with me some more details
> on this? An example code will be great.
>
> Thanks again,
> Nikhil
>
> On Fri, Jan 6, 2017 at 6:23 PM Nikhil J Joshi <[email protected]> wrote:
>
> Thanks Oscar and Alex. I will follow up and update you on these incredible
> ideas.
> Have a great weekend,
> Nikhil
>
> On Fri, Jan 6, 2017 at 6:12 PM Alex Levenson <[email protected]>
> wrote:
>
> Yeah per-source config is done via Tap.sourceConfInit and Tap.sinkConfInit
> -- so these custom settings will only apply after one of those methods is
> called.
>
> So it can't be used to control things that happen before then, eg, the
> heap size of your mappers or things like that.
>
> On Fri, Jan 6, 2017 at 6:00 PM, Kostya Salomatin <[email protected]>
> wrote:
>
> Wow, per source config is really useful. I've needed this feature for a
> while, did not know it already existed.
>
> Kostya
>
> On Fri, Jan 6, 2017 at 5:06 PM, 'Alex Levenson' via Scalding Development <
> [email protected]> wrote:
>
> I think you can set this per-source as well (instead of for all sources)
> by overriding `tapConfig` here:
> https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/HfsConfPropertySetter.scala#L55
>
> On Fri, Jan 6, 2017 at 4:58 PM, 'Oscar Boykin' via Scalding Development <
> [email protected]> wrote:
>
> You want to set this config:
>
>
> http://docs.cascading.org/cascading/2.2/javadoc/constant-values.html#cascading.tap.hadoop.HfsProps.COMBINE_INPUT_FILES
>
> "cascading.hadoop.hfs.combine.files" -> true
>
> which you can do in the job:
>
> override def config = super.config + ("cascading.hadoop.hfs.combine.files"
> -> true)
>
> or with a -Dcascading.hadoop.hfs.combine.files=true
>
>
> option to hadoop.
>
> That should work. Let us know if it does not.
>
> On Fri, Jan 6, 2017 at 12:52 PM Nikhil J Joshi <[email protected]>
> wrote:
>
> Hi,
>
>
> I recently converted a Pig script to an equivalent scalding. While running
> the pig script on the input consisting of many small files I see the inputs
> are combined as per logs here:
>
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1000 06-01-2017 14:37:58 PST
> referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths to process : 1000 06-01-2017 14:37:58 PST
> referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 77 06-01-2017 14:37:58 PST
> referral-scoring_scoring_feature-generation-v2_extract-postfeast-fields-jobs-basic
> INFO - 2017-01-06 22:37:58,517 org.apache.hadoop.mapreduce.JobSubmitter -
> number of splits:77
>
> However the scalding job doesn't seem to combine and run 1000 mappers, one
> per input file which is causing bad performance. Is there something wrong
> with the way I am executing the scalding job?
>
> The part of the script responsible for the step above is
>
> private val ids: TypedPipe[Int] = TypedPipe
>     .from(PackedAvroSource[Identifiers](args("identifiers")))
>     .map{ featureNamePrefix match {
>       case "member" => _.getMemberId.toInt
>       case "item" => _.getItemId.toInt
>     }}
>
> Any help is greatly appreciated.
> Thanks,
> Nikhil
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Alex Levenson
> @THISWILLWORK
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Konstantin                              mailto:[email protected]
>
>
>
>
> --
> Alex Levenson
> @THISWILLWORK
>
> --
>
> Nikhil J Joshi
> Senior Applied Researcher - Machine Learning, Data Science
> LinkedIn Corp.
>
> --
>
> Nikhil J Joshi
> Senior Applied Researcher - Machine Learning, Data Science
> LinkedIn Corp.
>
>
>
>
> --
> Alex Levenson
> @THISWILLWORK
>
-- 

Nikhil J Joshi
Senior Applied Researcher - Machine Learning, Data Science
LinkedIn Corp.

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to