hmm, I guess this is what I've been looking for. I'll give it a shot. Thanks for your help Cyrille!
On Tuesday, August 22, 2017 at 11:19:06 AM UTC-7, Cyrille Chépélov wrote: > > Hi, > > In the variable column scenario, you can build a TupleGetter that > transforms your Cascading tuples into something that looks like > > case class KeyedRecord[K, T](key: K, values: Iterable[T]) > > Then your TypedPipe is TypedPipe[KeyedRecord[K, T]] > > (fancier stuff quite possible, if you can derive a stronger subtype > consuming an exact amount of columns based on the leftmost columns, e.g > parsing EDI records) > > - - Cyrille > > Envoyé avec AquaMail pour Android > http://www.aqua-mail.com > > Le 22 août 2017 8:11:06 PM [email protected] <javascript:> a écrit : > >> Thanks very much Oscar. Seems that even in the typed api we need to know >> the number of columns (or the type structure) in advance (e.g. >> TypedPipe[(K, (T, T, T, T, T))]). What if you are dynamically building up a >> pipe and you don't know how many columns you will have at compile time >> (let's say all I know is that the first column is the one I want to group >> by and I don't have any idea about how many other columns are in the pipe). >> Do you know if typed api is able to handle such scenario? >> >> >> On Monday, August 21, 2017 at 2:42:08 PM UTC-7, Oscar Boykin wrote: >>> >>> in the typed API, this would be something like: >>> >>> val input: TypedPipe[(K, (T, T, T, T, T))] = ??? >>> >>> input.group.sum(Semigroup.semigroup5(Semigroup.from { (t1, t2) => fn(t1, >>> t2) })) >>> >>> There is not convenient way to do that in the Fields API that I see at >>> the moment. >>> >>> On Mon, Aug 21, 2017 at 2:25 PM, <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> In Field-based API, is there any way to group by a field and apply the >>>> same reduce method on all other fields? >>>> >>>> I'm thinking about something like: >>>> >>>> pipe.groupBy(new Fields("fieldName"))(_.reduce(Fields.ALL -> >>>> Fields.ARGS){ (accum:TupleEntry, next:TupleEntry) => >>>> someMethod(accum, next) >>>> }) >>>> >>>> the above code gets compiled but I believe it does not generate the >>>> output I expect (in the output schema in the execution trace I only see >>>> the >>>> groupby field name and nothing else). >>>> >>>> As a concrete example, assume the following is your pipe: >>>> >>>> field1, field2, field3 >>>> 1 , 2 , 3 >>>> 1 , 1 , 1 >>>> 2, , 10 , 1 >>>> >>>> I want to group the pipe by field1 and sum up the values in the other >>>> fields so that the output is: >>>> >>>> field1, field2, field3 >>>> 1 , 13 , 5 >>>> >>>> >>>> of course the logic that I want to implement in practice is more >>>> complex than simple summation and yes I don't know how many fields I have >>>> in the pipe, so, using typed api is not an option. >>>> >>>> Thanks! >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Scalding Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Scalding Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
