Re: group by and apply the same reduce method on all (non-group by) fields

mstroger Tue, 22 Aug 2017 11:29:24 -0700

hmm, I guess this is what I've been looking for. I'll give it a shot. 
Thanks for your help Cyrille!


On Tuesday, August 22, 2017 at 11:19:06 AM UTC-7, Cyrille Chépélov wrote:
>
> Hi, 
>
> In the variable column scenario, you can build a TupleGetter that 
> transforms your Cascading tuples into something that looks like
>
>    case class KeyedRecord[K, T](key: K, values: Iterable[T]) 
>
> Then your TypedPipe is TypedPipe[KeyedRecord[K, T]] 
>
> (fancier stuff quite possible, if you can derive a stronger subtype 
> consuming an exact amount of columns based on the leftmost columns, e.g 
> parsing EDI records) 
>
>   - - Cyrille 
>
> Envoyé avec AquaMail pour Android
> http://www.aqua-mail.com
>
> Le 22 août 2017 8:11:06 PM [email protected] <javascript:> a écrit :
>
>> Thanks very much Oscar. Seems that even in the typed api we need to know 
>> the number of columns (or the type structure) in advance (e.g. 
>> TypedPipe[(K, (T, T, T, T, T))]). What if you are dynamically building up a 
>> pipe and you don't know how many columns you will have at compile time 
>> (let's say all I know is that the first column is the one I want to group 
>> by and I don't have any idea about how many other columns are in the pipe). 
>> Do you know if typed api is able to handle such scenario?
>>
>>
>> On Monday, August 21, 2017 at 2:42:08 PM UTC-7, Oscar Boykin wrote:
>>>
>>> in the typed API, this would be something like:
>>>
>>> val input: TypedPipe[(K, (T, T, T, T, T))] = ???
>>>
>>> input.group.sum(Semigroup.semigroup5(Semigroup.from { (t1, t2) => fn(t1, 
>>> t2) }))
>>>
>>> There is not convenient way to do that in the Fields API that I see at 
>>> the moment.
>>>
>>> On Mon, Aug 21, 2017 at 2:25 PM, <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In Field-based API, is there any way to group by a field and apply the 
>>>> same reduce method on all other fields?
>>>>
>>>> I'm thinking about something like:
>>>>
>>>> pipe.groupBy(new Fields("fieldName"))(_.reduce(Fields.ALL -> 
>>>> Fields.ARGS){ (accum:TupleEntry, next:TupleEntry) =>
>>>>       someMethod(accum, next)
>>>>     })
>>>>
>>>> the above code gets compiled but I believe it does not generate the 
>>>> output I expect (in the output schema in the execution trace I only see 
>>>> the 
>>>> groupby field name and nothing else).
>>>>
>>>> As a concrete example, assume the following is your pipe:
>>>>
>>>> field1, field2, field3
>>>> 1       , 2      , 3
>>>> 1       , 1      , 1
>>>> 2,      , 10    , 1
>>>>
>>>> I want to group the pipe by field1 and sum up the values in the other 
>>>> fields so that the output is:
>>>>
>>>> field1, field2, field3
>>>> 1       , 13      , 5
>>>>
>>>>
>>>> of course the logic that I want to implement in practice is more 
>>>> complex than simple summation and yes I don't know how many fields I have 
>>>> in the pipe, so, using typed api is not an option.
>>>>
>>>> Thanks!
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Scalding Development" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Scalding Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: group by and apply the same reduce method on all (non-group by) fields

Reply via email to