Hi,
In the variable column scenario, you can build a TupleGetter that
transforms your Cascading tuples into something that looks like
case class KeyedRecord[K, T](key: K, values: Iterable[T])
Then your TypedPipe is TypedPipe[KeyedRecord[K, T]]
(fancier stuff quite possible, if you can derive a stronger subtype
consuming an exact amount of columns based on the leftmost columns, e.g
parsing EDI records)
- - Cyrille
Envoyé avec AquaMail pour Android
http://www.aqua-mail.com
Le 22 août 2017 8:11:06 PM [email protected] a écrit :
Thanks very much Oscar. Seems that even in the typed api we need to know
the number of columns (or the type structure) in advance (e.g.
TypedPipe[(K, (T, T, T, T, T))]). What if you are dynamically building up a
pipe and you don't know how many columns you will have at compile time
(let's say all I know is that the first column is the one I want to group
by and I don't have any idea about how many other columns are in the pipe).
Do you know if typed api is able to handle such scenario?
On Monday, August 21, 2017 at 2:42:08 PM UTC-7, Oscar Boykin wrote:
in the typed API, this would be something like:
val input: TypedPipe[(K, (T, T, T, T, T))] = ???
input.group.sum(Semigroup.semigroup5(Semigroup.from { (t1, t2) => fn(t1,
t2) }))
There is not convenient way to do that in the Fields API that I see at the
moment.
On Mon, Aug 21, 2017 at 2:25 PM, <[email protected] <javascript:>> wrote:
Hi all,
In Field-based API, is there any way to group by a field and apply the
same reduce method on all other fields?
I'm thinking about something like:
pipe.groupBy(new Fields("fieldName"))(_.reduce(Fields.ALL ->
Fields.ARGS){ (accum:TupleEntry, next:TupleEntry) =>
someMethod(accum, next)
})
the above code gets compiled but I believe it does not generate the
output I expect (in the output schema in the execution trace I only see the
groupby field name and nothing else).
As a concrete example, assume the following is your pipe:
field1, field2, field3
1 , 2 , 3
1 , 1 , 1
2, , 10 , 1
I want to group the pipe by field1 and sum up the values in the other
fields so that the output is:
field1, field2, field3
1 , 13 , 5
of course the logic that I want to implement in practice is more complex
than simple summation and yes I don't know how many fields I have in the
pipe, so, using typed api is not an option.
Thanks!
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected] <javascript:>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Scalding
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.