Sure! Sorry I don't know how to do that at the cascading layer.

On Fri, Jun 30, 2017 at 4:18 PM, <[email protected]> wrote:

> Your suggestion makes sense. I will try to give typed API a try. Thanks
> very much for your input!
>
> On Friday, June 30, 2017 at 4:13:53 PM UTC-7, Alex Levenson wrote:
>>
>> Yeah, I guess what I was sort of getting as is that if you are using the
>> Typed API, you try to use types instead of names for these sorts of things,
>> and a lot of your code about casting one type to another goes away. But it
>> can be painful to rewrite your entire world in this way. I'm not familiar
>> enough with the un-typed API to tell you how to do this unfortunately, but
>> at least here at Twitter we would try to push our users towards using
>> strong types, and maybe the implicit typeclass pattern for extractors /
>> converters / etc. For example, you can see how TypedPipe.sumByKey takes an
>> implicit strongly typed Semigroup which explains how to "sum" two values.
>> Similarly, you can create an implicit Sparser type class that is picked
>> based on the types of the data at compile time.
>>
>> On Fri, Jun 30, 2017 at 4:07 PM, <[email protected]> wrote:
>>
>>> Thanks for the reply Alex!
>>>
>>> I'm trying to implement a couple of scenarios. The first scenario is
>>> pretty much what I explained in the post (i.e. appending a fixed
>>> prefix/suffix to every field name in a pipe). The second scenario, is that
>>> I want to iterate through all fields in a pipe and call a function on them
>>> based on their names. For example, let's say I have a bunch of different
>>> fields in a pipe and if the pipe name contains the string "_list_" I want
>>> to convert the List[Any] to a sparse representation of the list in the
>>> String format. I guess if I write a Cascading Function in java and invoke
>>> an "each" method on my pipe that should do the trick, but I was wondering
>>> if there is a cleaner/easier way of doing this in scalding:
>>>
>>> import java.util.Iterator;
>>> import cascading.operation.*;
>>> import cascading.tuple.*;
>>> import cascading.flow.*;
>>>
>>> public class Sparser extends BaseOperation<Tuple> implements
>>> Function<Tuple>
>>> {
>>> public Sparser()
>>>   {
>>>   super(new Fields( "sum" ) );
>>>   }
>>>
>>> public Sparser( Fields fieldDeclaration )
>>>   {
>>>   super(fieldDeclaration );
>>>   }
>>>
>>> public void operate( FlowProcess flowProcess, FunctionCall<Tuple>
>>> functionCall )
>>>   {
>>>   // get the arguments TupleEntry
>>>   Fields fieldNames = functionCall.getArgumentFields();
>>>   TupleEntry arguments = functionCall.getArguments();
>>>
>>>   // create a Tuple to hold our result values
>>>   Tuple result = new Tuple();
>>>
>>>   Iterator iterator = arguments.getTuple().iterator();
>>>   int i = 0;
>>>   while(iterator.hasNext())
>>>   {
>>>       Object obj = iterator.next();
>>>       if (fieldNames.get(i).toString().contains("_list_")){
>>>           java.util.List<Double> tmp = (java.util.List<Double>)obj;
>>>           String sparsRepresentation = tmp.toString();// TO BE
>>> IMPLEMENTED
>>>           result.add(sparsRepresentation);
>>>       }
>>>       else
>>>           result.add((String)obj);
>>>       i++;
>>>   }
>>>
>>>   // return the result Tuple
>>>   functionCall.getOutputCollector().add( result );
>>>   }
>>> }
>>>
>>> btw, I'm not sure if I understand what you mean by "an extractor
>>> method", can you please send me a pointer to an example?
>>>
>>> Any input is greatly appreciated!
>>>
>>> On Friday, June 30, 2017 at 3:51:09 PM UTC-7, Alex Levenson wrote:
>>>>
>>>> Probably not what you want to hear, but the scalding dev team is really
>>>> only developing + supporting the Typed API at this point -- which would
>>>> make something like this even more difficult.
>>>> But the question I'd probably ask is what are you trying to do, and can
>>>> you use strong types, the Typed Api, and maybe an extractor method or
>>>> similar instead?
>>>>
>>>> On Fri, Jun 30, 2017 at 2:13 PM, <[email protected]> wrote:
>>>>
>>>>> Here is the question:
>>>>>
>>>>> Assume I have a pipe and I want to rename all the fields in the pipe
>>>>> programmatically, meaning that I don't want to hard code the field names 
>>>>> in
>>>>> my code. Any idea how I can do this?
>>>>>
>>>>> As a concrete example, assume I have a pipe with two fields: "name"
>>>>> and "age" and I want to rename these fields to "employee_name" and
>>>>> "employee_age". Obviously the natural solution is to write a piece of code
>>>>> as below:
>>>>>
>>>>> pipe.rename(('name, 'age) -> ('employee_name, 'employee_age))
>>>>>
>>>>> or
>>>>>
>>>>> pipe.rename(new Fields("name", "age") ->  new Fields("employee_name",
>>>>> "employee_age"))
>>>>>
>>>>> However, what I need is to be able to iterate through all fields in
>>>>> the pipe without knowing their names.
>>>>>
>>>>> There are a couple of methods (resolveIncomingOperationArgumentFields
>>>>> and resolveIncomingOperationPassThroughFields) callable on a pipe
>>>>> which look promising but the issue is that they both take and input
>>>>> argument of type cascading.flow.planner.Scope which I don't know where can
>>>>> I get it from in a scalding job.
>>>>>
>>>>> Another solution that comes to my mind is using "each" method on the
>>>>> pipe and implementing a cascading function and pass it to the each
>>>>> statement. But I was now able to find any sample code for that either.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Scalding Development" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alex Levenson
>>>> @THISWILLWORK
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Scalding Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Alex Levenson
>> @THISWILLWORK
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Alex Levenson
@THISWILLWORK

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to