Hi,

In the variable column scenario, you can build a TupleGetter that transforms your Cascading tuples into something that looks like

  case class KeyedRecord[K, T](key: K, values: Iterable[T])

Then your TypedPipe is TypedPipe[KeyedRecord[K, T]]

(fancier stuff quite possible, if you can derive a stronger subtype consuming an exact amount of columns based on the leftmost columns, e.g parsing EDI records)

 - - Cyrille


Envoyé avec AquaMail pour Android
http://www.aqua-mail.com


Le 22 août 2017 8:11:06 PM [email protected] a écrit :

Thanks very much Oscar. Seems that even in the typed api we need to know
the number of columns (or the type structure) in advance (e.g.
TypedPipe[(K, (T, T, T, T, T))]). What if you are dynamically building up a
pipe and you don't know how many columns you will have at compile time
(let's say all I know is that the first column is the one I want to group
by and I don't have any idea about how many other columns are in the pipe).
Do you know if typed api is able to handle such scenario?


On Monday, August 21, 2017 at 2:42:08 PM UTC-7, Oscar Boykin wrote:

in the typed API, this would be something like:

val input: TypedPipe[(K, (T, T, T, T, T))] = ???

input.group.sum(Semigroup.semigroup5(Semigroup.from { (t1, t2) => fn(t1,
t2) }))

There is not convenient way to do that in the Fields API that I see at the
moment.

On Mon, Aug 21, 2017 at 2:25 PM, <[email protected] <javascript:>> wrote:

Hi all,

In Field-based API, is there any way to group by a field and apply the
same reduce method on all other fields?

I'm thinking about something like:

pipe.groupBy(new Fields("fieldName"))(_.reduce(Fields.ALL ->
Fields.ARGS){ (accum:TupleEntry, next:TupleEntry) =>
      someMethod(accum, next)
    })

the above code gets compiled but I believe it does not generate the
output I expect (in the output schema in the execution trace I only see the
groupby field name and nothing else).

As a concrete example, assume the following is your pipe:

field1, field2, field3
1       , 2      , 3
1       , 1      , 1
2,      , 10    , 1

I want to group the pipe by field1 and sum up the values in the other
fields so that the output is:

field1, field2, field3
1       , 13      , 5


of course the logic that I want to implement in practice is more complex
than simple summation and yes I don't know how many fields I have in the
pipe, so, using typed api is not an option.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected] <javascript:>.
For more options, visit https://groups.google.com/d/optout.




--
You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Scalding 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to