Hey Cyrille,
It is perfect, I gave a try and it worked great. Thank you!!
Charan.
On Sunday, December 3, 2017 at 1:29:07 AM UTC-8, Cyrille Chépélov wrote:
>
> Hi,
>
> how about
>
> DB.flatMap {
> info => fields.map(fieldName => (fieldName, getKey(fieldName)) )
> }.distinct
> .map { case (fieldName, distinctValue) => (fieldName, 1) }
> .group.sum
>
> ?
> -- Cyrille
>
> Le 03/12/2017 à 07:44, [email protected] <javascript:> a écrit :
>
>
> Consider the following snippet of scalding code:
>
> val fields = List[String]("blue", "yellow", "red")
>
> def getDistinctCount(DB: TypedPipe[Info])(implicit flowDef:
> cascading.flow.FlowDef, mode: com.twitter.scalding.Mode) = {
>
> val jsonValue: TypedPipe[String] = DB.map { info => getKey("blue"
> } //Returns a set of values (String) for the input
>
> val distinctCount: ValuePipe[Int]= jsonValue.distinct.map { x => 1
> }.sum //Returns Distinct Count
>
> distinctCount.write(TypedTsv(("HDFS Location"))) //Writes the
> value to a HDFS location
>
> }
>
> I have to calculate the Distinct count for every field in the List, one
> way is to iterate through list and calculate distinct counts for each
> String and write in a different HDFS locations and merge them to local file.
> In reality I have to calculate for at least twenty fields which makes this
> Process really slow.
>
> Is there any optimal way to calculate the distinct count for every field
> and write it to a single HDFS location at one go. Thank you !!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.