Your suggestion makes sense. I will try to give typed API a try. Thanks very much for your input!
On Friday, June 30, 2017 at 4:13:53 PM UTC-7, Alex Levenson wrote: > > Yeah, I guess what I was sort of getting as is that if you are using the > Typed API, you try to use types instead of names for these sorts of things, > and a lot of your code about casting one type to another goes away. But it > can be painful to rewrite your entire world in this way. I'm not familiar > enough with the un-typed API to tell you how to do this unfortunately, but > at least here at Twitter we would try to push our users towards using > strong types, and maybe the implicit typeclass pattern for extractors / > converters / etc. For example, you can see how TypedPipe.sumByKey takes an > implicit strongly typed Semigroup which explains how to "sum" two values. > Similarly, you can create an implicit Sparser type class that is picked > based on the types of the data at compile time. > > On Fri, Jun 30, 2017 at 4:07 PM, <[email protected] <javascript:>> wrote: > >> Thanks for the reply Alex! >> >> I'm trying to implement a couple of scenarios. The first scenario is >> pretty much what I explained in the post (i.e. appending a fixed >> prefix/suffix to every field name in a pipe). The second scenario, is that >> I want to iterate through all fields in a pipe and call a function on them >> based on their names. For example, let's say I have a bunch of different >> fields in a pipe and if the pipe name contains the string "_list_" I want >> to convert the List[Any] to a sparse representation of the list in the >> String format. I guess if I write a Cascading Function in java and invoke >> an "each" method on my pipe that should do the trick, but I was wondering >> if there is a cleaner/easier way of doing this in scalding: >> >> import java.util.Iterator; >> import cascading.operation.*; >> import cascading.tuple.*; >> import cascading.flow.*; >> >> public class Sparser extends BaseOperation<Tuple> implements >> Function<Tuple> >> { >> public Sparser() >> { >> super(new Fields( "sum" ) ); >> } >> >> public Sparser( Fields fieldDeclaration ) >> { >> super(fieldDeclaration ); >> } >> >> public void operate( FlowProcess flowProcess, FunctionCall<Tuple> >> functionCall ) >> { >> // get the arguments TupleEntry >> Fields fieldNames = functionCall.getArgumentFields(); >> TupleEntry arguments = functionCall.getArguments(); >> >> // create a Tuple to hold our result values >> Tuple result = new Tuple(); >> >> Iterator iterator = arguments.getTuple().iterator(); >> int i = 0; >> while(iterator.hasNext()) >> { >> Object obj = iterator.next(); >> if (fieldNames.get(i).toString().contains("_list_")){ >> java.util.List<Double> tmp = (java.util.List<Double>)obj; >> String sparsRepresentation = tmp.toString();// TO BE IMPLEMENTED >> result.add(sparsRepresentation); >> } >> else >> result.add((String)obj); >> i++; >> } >> >> // return the result Tuple >> functionCall.getOutputCollector().add( result ); >> } >> } >> >> btw, I'm not sure if I understand what you mean by "an extractor method", >> can you please send me a pointer to an example? >> >> Any input is greatly appreciated! >> >> On Friday, June 30, 2017 at 3:51:09 PM UTC-7, Alex Levenson wrote: >>> >>> Probably not what you want to hear, but the scalding dev team is really >>> only developing + supporting the Typed API at this point -- which would >>> make something like this even more difficult. >>> But the question I'd probably ask is what are you trying to do, and can >>> you use strong types, the Typed Api, and maybe an extractor method or >>> similar instead? >>> >>> On Fri, Jun 30, 2017 at 2:13 PM, <[email protected]> wrote: >>> >>>> Here is the question: >>>> >>>> Assume I have a pipe and I want to rename all the fields in the pipe >>>> programmatically, meaning that I don't want to hard code the field names >>>> in >>>> my code. Any idea how I can do this? >>>> >>>> As a concrete example, assume I have a pipe with two fields: "name" and >>>> "age" and I want to rename these fields to "employee_name" and >>>> "employee_age". Obviously the natural solution is to write a piece of code >>>> as below: >>>> >>>> pipe.rename(('name, 'age) -> ('employee_name, 'employee_age)) >>>> >>>> or >>>> >>>> pipe.rename(new Fields("name", "age") -> new Fields("employee_name", >>>> "employee_age")) >>>> >>>> However, what I need is to be able to iterate through all fields in the >>>> pipe without knowing their names. >>>> >>>> There are a couple of methods (resolveIncomingOperationArgumentFields >>>> and resolveIncomingOperationPassThroughFields) callable on a pipe which >>>> look promising but the issue is that they both take and input argument of >>>> type cascading.flow.planner.Scope which I don't know where can I get it >>>> from in a scalding job. >>>> >>>> Another solution that comes to my mind is using "each" method on the >>>> pipe and implementing a cascading function and pass it to the each >>>> statement. But I was now able to find any sample code for that either. >>>> >>>> Thanks! >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Scalding Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Alex Levenson >>> @THISWILLWORK >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Scalding Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Alex Levenson > @THISWILLWORK > -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
