Would you mind explaining, why you opted flatMap instead on map for 
"DB.flatMap". thank you!!

On Monday, December 4, 2017 at 10:43:31 AM UTC-8, [email protected] 
wrote:
>
>
> Hey Cyrille,
>
> It is a perfect solution, I gave a try and it worked perfectly fine. thank 
> you!!
>
> Charan
>
> On Sunday, December 3, 2017 at 1:29:07 AM UTC-8, Cyrille Chépélov wrote:
>>
>> Hi,
>>
>> how about 
>>
>>     DB.flatMap { 
>>      info => fields.map(fieldName => (fieldName, getKey(fieldName)) )
>>     }.distinct
>>     .map { case (fieldName, distinctValue) => (fieldName, 1) }
>>     .group.sum
>>
>> ?        
>>     -- Cyrille
>>
>> Le 03/12/2017 à 07:44, [email protected] a écrit :
>>
>>      
>> Consider the following snippet of scalding code:
>>
>>       val fields = List[String]("blue", "yellow", "red")
>>
>>      def getDistinctCount(DB: TypedPipe[Info])(implicit flowDef: 
>> cascading.flow.FlowDef, mode: com.twitter.scalding.Mode) = {
>>         
>>         val jsonValue: TypedPipe[String] = DB.map { info => getKey("blue" 
>> } //Returns a set of values (String) for the input
>>    
>>         val distinctCount: ValuePipe[Int]= jsonValue.distinct.map { x => 
>> 1 }.sum //Returns Distinct Count
>>
>>         distinctCount.write(TypedTsv(("HDFS Location"))) //Writes the 
>> value to a HDFS location
>>
>>       }
>>
>> I have to calculate the Distinct count for every field in the List, one 
>> way is to iterate through list and calculate distinct counts for each 
>> String and write in a different HDFS locations and merge them to local file.
>> In reality I have to calculate for at least twenty fields which makes 
>> this Process really slow. 
>>
>> Is there any optimal way to calculate the distinct count for every field 
>> and write it to a single HDFS location at one go. Thank you !!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Scalding Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to