Re: getting/manupulating all fields in a pipe in scalding

mstroger Fri, 30 Jun 2017 16:18:35 -0700

Your suggestion makes sense. I will try to give typed API a try. Thanks 
very much for your input!


On Friday, June 30, 2017 at 4:13:53 PM UTC-7, Alex Levenson wrote:
>
> Yeah, I guess what I was sort of getting as is that if you are using the 
> Typed API, you try to use types instead of names for these sorts of things, 
> and a lot of your code about casting one type to another goes away. But it 
> can be painful to rewrite your entire world in this way. I'm not familiar 
> enough with the un-typed API to tell you how to do this unfortunately, but 
> at least here at Twitter we would try to push our users towards using 
> strong types, and maybe the implicit typeclass pattern for extractors / 
> converters / etc. For example, you can see how TypedPipe.sumByKey takes an 
> implicit strongly typed Semigroup which explains how to "sum" two values. 
> Similarly, you can create an implicit Sparser type class that is picked 
> based on the types of the data at compile time.
>
> On Fri, Jun 30, 2017 at 4:07 PM, <[email protected] <javascript:>> wrote:
>
>> Thanks for the reply Alex!
>>
>> I'm trying to implement a couple of scenarios. The first scenario is 
>> pretty much what I explained in the post (i.e. appending a fixed 
>> prefix/suffix to every field name in a pipe). The second scenario, is that 
>> I want to iterate through all fields in a pipe and call a function on them 
>> based on their names. For example, let's say I have a bunch of different 
>> fields in a pipe and if the pipe name contains the string "_list_" I want 
>> to convert the List[Any] to a sparse representation of the list in the 
>> String format. I guess if I write a Cascading Function in java and invoke 
>> an "each" method on my pipe that should do the trick, but I was wondering 
>> if there is a cleaner/easier way of doing this in scalding:
>>
>> import java.util.Iterator;
>> import cascading.operation.*;
>> import cascading.tuple.*;
>> import cascading.flow.*;
>>
>> public class Sparser extends BaseOperation<Tuple> implements 
>> Function<Tuple>
>> {
>> public Sparser()
>>   {
>>   super(new Fields( "sum" ) );
>>   }
>>
>> public Sparser( Fields fieldDeclaration )
>>   {
>>   super(fieldDeclaration );
>>   }
>>
>> public void operate( FlowProcess flowProcess, FunctionCall<Tuple> 
>> functionCall )
>>   {
>>   // get the arguments TupleEntry
>>   Fields fieldNames = functionCall.getArgumentFields();
>>   TupleEntry arguments = functionCall.getArguments();
>>
>>   // create a Tuple to hold our result values
>>   Tuple result = new Tuple();
>>
>>   Iterator iterator = arguments.getTuple().iterator();
>>   int i = 0;
>>   while(iterator.hasNext())
>>   {
>>       Object obj = iterator.next();
>>       if (fieldNames.get(i).toString().contains("_list_")){
>>           java.util.List<Double> tmp = (java.util.List<Double>)obj;
>>           String sparsRepresentation = tmp.toString();// TO BE IMPLEMENTED
>>           result.add(sparsRepresentation);
>>       }
>>       else
>>           result.add((String)obj);
>>       i++;
>>   }
>>
>>   // return the result Tuple
>>   functionCall.getOutputCollector().add( result );
>>   }
>> }
>>
>> btw, I'm not sure if I understand what you mean by "an extractor method", 
>> can you please send me a pointer to an example?
>>
>> Any input is greatly appreciated!
>>
>> On Friday, June 30, 2017 at 3:51:09 PM UTC-7, Alex Levenson wrote:
>>>
>>> Probably not what you want to hear, but the scalding dev team is really 
>>> only developing + supporting the Typed API at this point -- which would 
>>> make something like this even more difficult.
>>> But the question I'd probably ask is what are you trying to do, and can 
>>> you use strong types, the Typed Api, and maybe an extractor method or 
>>> similar instead? 
>>>
>>> On Fri, Jun 30, 2017 at 2:13 PM, <[email protected]> wrote:
>>>
>>>> Here is the question:
>>>>
>>>> Assume I have a pipe and I want to rename all the fields in the pipe 
>>>> programmatically, meaning that I don't want to hard code the field names 
>>>> in 
>>>> my code. Any idea how I can do this?
>>>>
>>>> As a concrete example, assume I have a pipe with two fields: "name" and 
>>>> "age" and I want to rename these fields to "employee_name" and 
>>>> "employee_age". Obviously the natural solution is to write a piece of code 
>>>> as below:
>>>>
>>>> pipe.rename(('name, 'age) -> ('employee_name, 'employee_age))
>>>>
>>>> or 
>>>>
>>>> pipe.rename(new Fields("name", "age") ->  new Fields("employee_name", 
>>>> "employee_age"))
>>>>
>>>> However, what I need is to be able to iterate through all fields in the 
>>>> pipe without knowing their names.
>>>>
>>>> There are a couple of methods (resolveIncomingOperationArgumentFields 
>>>> and resolveIncomingOperationPassThroughFields) callable on a pipe which 
>>>> look promising but the issue is that they both take and input argument of 
>>>> type cascading.flow.planner.Scope which I don't know where can I get it 
>>>> from in a scalding job.
>>>>
>>>> Another solution that comes to my mind is using "each" method on the 
>>>> pipe and implementing a cascading function and pass it to the each 
>>>> statement. But I was now able to find any sample code for that either.
>>>>
>>>> Thanks!
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Scalding Development" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Alex Levenson
>>> @THISWILLWORK
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Scalding Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Alex Levenson
> @THISWILLWORK
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: getting/manupulating all fields in a pipe in scalding

Reply via email to