Hey Cyrille,

It is a perfect solution, I gave a try and it worked perfectly fine. thank 
you!!

Charan

On Sunday, December 3, 2017 at 1:29:07 AM UTC-8, Cyrille Chépélov wrote:
>
> Hi,
>
> how about 
>
>     DB.flatMap { 
>       info => fields.map(fieldName => (fieldName, getKey(fieldName)) )
>     }.distinct
>     .map { case (fieldName, distinctValue) => (fieldName, 1) }
>     .group.sum
>
> ?        
>     -- Cyrille
>
> Le 03/12/2017 à 07:44, [email protected] <javascript:> a écrit :
>
>      
> Consider the following snippet of scalding code:
>
>       val fields = List[String]("blue", "yellow", "red")
>
>      def getDistinctCount(DB: TypedPipe[Info])(implicit flowDef: 
> cascading.flow.FlowDef, mode: com.twitter.scalding.Mode) = {
>         
>         val jsonValue: TypedPipe[String] = DB.map { info => getKey("blue" 
> } //Returns a set of values (String) for the input
>    
>         val distinctCount: ValuePipe[Int]= jsonValue.distinct.map { x => 1 
> }.sum //Returns Distinct Count
>
>         distinctCount.write(TypedTsv(("HDFS Location"))) //Writes the 
> value to a HDFS location
>
>       }
>
> I have to calculate the Distinct count for every field in the List, one 
> way is to iterate through list and calculate distinct counts for each 
> String and write in a different HDFS locations and merge them to local file.
> In reality I have to calculate for at least twenty fields which makes this 
> Process really slow. 
>
> Is there any optimal way to calculate the distinct count for every field 
> and write it to a single HDFS location at one go. Thank you !!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to