I was able to implement the logic I was looking for using the typed api,
however, I believe Field-based api is also able to handle this scenario. In
case anyone reading this thread is interested you should be able to do it
as follows:
pipe.groupBy('some_field){_.reduce(Fields.VALUES ->
Fields.ARGS){(cumulativeTuple:Tuple,next:Tuple) => //do whatever you want
here and return a tuple } }
Fields.VALUES makes sue you iterate over the non-group field and
Fields.ARGS makes sure you will have as many output fields as the number of
non-grouping fields.
Hope it helps.
On Tuesday, August 22, 2017 at 11:28:59 AM UTC-7, [email protected] wrote:
>
> hmm, I guess this is what I've been looking for. I'll give it a shot.
> Thanks for your help Cyrille!
>
> On Tuesday, August 22, 2017 at 11:19:06 AM UTC-7, Cyrille Chépélov wrote:
>>
>> Hi,
>>
>> In the variable column scenario, you can build a TupleGetter that
>> transforms your Cascading tuples into something that looks like
>>
>> case class KeyedRecord[K, T](key: K, values: Iterable[T])
>>
>> Then your TypedPipe is TypedPipe[KeyedRecord[K, T]]
>>
>> (fancier stuff quite possible, if you can derive a stronger subtype
>> consuming an exact amount of columns based on the leftmost columns, e.g
>> parsing EDI records)
>>
>> - - Cyrille
>>
>> Envoyé avec AquaMail pour Android
>> http://www.aqua-mail.com
>>
>> Le 22 août 2017 8:11:06 PM [email protected] a écrit :
>>
>>> Thanks very much Oscar. Seems that even in the typed api we need to know
>>> the number of columns (or the type structure) in advance (e.g.
>>> TypedPipe[(K, (T, T, T, T, T))]). What if you are dynamically building up a
>>> pipe and you don't know how many columns you will have at compile time
>>> (let's say all I know is that the first column is the one I want to group
>>> by and I don't have any idea about how many other columns are in the pipe).
>>> Do you know if typed api is able to handle such scenario?
>>>
>>>
>>> On Monday, August 21, 2017 at 2:42:08 PM UTC-7, Oscar Boykin wrote:
>>>>
>>>> in the typed API, this would be something like:
>>>>
>>>> val input: TypedPipe[(K, (T, T, T, T, T))] = ???
>>>>
>>>> input.group.sum(Semigroup.semigroup5(Semigroup.from { (t1, t2) =>
>>>> fn(t1, t2) }))
>>>>
>>>> There is not convenient way to do that in the Fields API that I see at
>>>> the moment.
>>>>
>>>> On Mon, Aug 21, 2017 at 2:25 PM, <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In Field-based API, is there any way to group by a field and apply the
>>>>> same reduce method on all other fields?
>>>>>
>>>>> I'm thinking about something like:
>>>>>
>>>>> pipe.groupBy(new Fields("fieldName"))(_.reduce(Fields.ALL ->
>>>>> Fields.ARGS){ (accum:TupleEntry, next:TupleEntry) =>
>>>>> someMethod(accum, next)
>>>>> })
>>>>>
>>>>> the above code gets compiled but I believe it does not generate the
>>>>> output I expect (in the output schema in the execution trace I only see
>>>>> the
>>>>> groupby field name and nothing else).
>>>>>
>>>>> As a concrete example, assume the following is your pipe:
>>>>>
>>>>> field1, field2, field3
>>>>> 1 , 2 , 3
>>>>> 1 , 1 , 1
>>>>> 2, , 10 , 1
>>>>>
>>>>> I want to group the pipe by field1 and sum up the values in the other
>>>>> fields so that the output is:
>>>>>
>>>>> field1, field2, field3
>>>>> 1 , 13 , 5
>>>>>
>>>>>
>>>>> of course the logic that I want to implement in practice is more
>>>>> complex than simple summation and yes I don't know how many fields I have
>>>>> in the pipe, so, using typed api is not an option.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Scalding Development" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Scalding Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.