|
Hi,
I have the same problem here, I need to map some values to ids, and I want a unique Int. For now I can use a local zipWithIndex, but it won't last for long. The only idea I've found to work around this is to do something like this : val partitionsSizes = dataset.mapPartitionsWithIndex{ case (index, itr) => List( (index, itr.count ) ).iterator} .collect() .sortBy{ case (i,v) => i } .map{ case (i,v) => v} val partitionsStartIndex = partitionsSizes.scanLeft(0)(_+_) // cumulative sum val partitionsInfo = sc.broadcast(partitionsSizes.zip(partitionsStartIndex)) dataset.mapPartitionsWithIndex{ case (index,itr) => { val partitionInfo = partitionsInfo.value(index) itr.zip((partitionInfo._2 until (partitionInfo._2 + partitionInfo._1)).iterator) } } Probably not the best solution (it requires 2 maps and a collect), and not tested, let me know if it works :) Guillaume
--
|
- How to map each line to (line number, line)? Aureliano Buendia
- Re: How to map each line to (line number, line)? Aureliano Buendia
- Re: How to map each line to (line number, line)... Tom Vacek
- Re: How to map each line to (line number, l... Aureliano Buendia
- Re: How to map each line to (line number, line)? Guillaume Pitel
- Re: How to map each line to (line number, line)... Aureliano Buendia
- Re: How to map each line to (line number, l... Guillaume Pitel
- Re: How to map each line to (line numbe... Aureliano Buendia
- Re: How to map each line to (line ... Guillaume Pitel
- Re: How to map each line to (l... Aureliano Buendia
- Re: How to map each line t... Tom Vacek
- Re: How to map each line t... Guillaume Pitel
- Re: How to map each line t... Aureliano Buendia
- Re: How to map each line t... Andrew Ash
- Re: How to map each line t... Aureliano Buendia

