Hi, I’m working on an use case using Spark streaming. I need to process a RDD of strings so that they will be grouped by IP and sorted by time. Could somebody tell me the right transformation?
Input: 2014-10-23 08:18:38,904 [192.168.10.1] bbbb 2014-10-23 08:18:38,907 [192.168.10.1] ccc 2014-10-23 08:18:39,910 [192.168.102.1] hhhh 2014-10-23 08:18:38,934 [192.168.10.1] eeee 2014-10-23 08:18:39,032 [192.168.102.1] ffff 2014-10-23 08:18:38,149 [192.168.10.1] aaaa 2014-10-23 08:18:39,582 [192.168.102.1] gggg 2014-10-23 08:18:38,691 [192.168.10.1] dddd Expected result: Array(192.168.10.1, ArrayBuffer( 2014-10-23 08:18:38,149 [192.168.10.1] aaaa, 2014-10-23 08:18:38,904 [192.168.10.1] bbbb, 2014-10-23 08:18:38,907 [192.168.10.1] ccc, 2014-10-23 08:18:38,691 [192.168.10.1] dddd, 2014-10-23 08:18:38,934 [192.168.10.1] eeee)) (192.168.102.1, ArrayBuffer( 2014-10-23 08:18:39,032 [192.168.102.1] ffff, 2014-10-23 08:18:39,582 [192.168.102.1] gggg, 2014-10-23 08:18:39,910 [192.168.102.1] hhhh)) Thanks Ping