Would you mind explaining, why you opted flatMap instead on map for "DB.flatMap". thank you!!
On Monday, December 4, 2017 at 10:43:31 AM UTC-8, [email protected] wrote: > > > Hey Cyrille, > > It is a perfect solution, I gave a try and it worked perfectly fine. thank > you!! > > Charan > > On Sunday, December 3, 2017 at 1:29:07 AM UTC-8, Cyrille Chépélov wrote: >> >> Hi, >> >> how about >> >> DB.flatMap { >> info => fields.map(fieldName => (fieldName, getKey(fieldName)) ) >> }.distinct >> .map { case (fieldName, distinctValue) => (fieldName, 1) } >> .group.sum >> >> ? >> -- Cyrille >> >> Le 03/12/2017 à 07:44, [email protected] a écrit : >> >> >> Consider the following snippet of scalding code: >> >> val fields = List[String]("blue", "yellow", "red") >> >> def getDistinctCount(DB: TypedPipe[Info])(implicit flowDef: >> cascading.flow.FlowDef, mode: com.twitter.scalding.Mode) = { >> >> val jsonValue: TypedPipe[String] = DB.map { info => getKey("blue" >> } //Returns a set of values (String) for the input >> >> val distinctCount: ValuePipe[Int]= jsonValue.distinct.map { x => >> 1 }.sum //Returns Distinct Count >> >> distinctCount.write(TypedTsv(("HDFS Location"))) //Writes the >> value to a HDFS location >> >> } >> >> I have to calculate the Distinct count for every field in the List, one >> way is to iterate through list and calculate distinct counts for each >> String and write in a different HDFS locations and merge them to local file. >> In reality I have to calculate for at least twenty fields which makes >> this Process really slow. >> >> Is there any optimal way to calculate the distinct count for every field >> and write it to a single HDFS location at one go. Thank you !! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Scalding Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
