I have a dataset (trimmed and simplified) with 2 columns as below.

Date                Subject
2015-01-14      "SEC Inquiry"
2014-02-12       "Happy birthday"
2014-02-13       "Re: Happy birthday"
2015-01-16       "Re: SEC Inquiry"
2015-01-18       "Fwd: Re: SEC Inquiry"

I have imported the same in a Spark Dataframe. What I am looking at is
groupBy subject field (however, I need a partial match to identify the
discussion topic).

For example in the above case.. I would like to group all messages, which
have subject containing "SEC Inquiry" which returns following grouped
frame:

2015-01-14      "SEC Inquiry"
2015-01-16       "Re: SEC Inquiry"
2015-01-18       "Fwd: Re: SEC Inquiry"

Another usecase for a similar problem could be group by year (in the above
example), it would mean partial match of the date field, which would mean
groupBy Date by matching year as "2014" or "2015".

Keenly Looking forward to reply/solution to the above.

- Suraj

Reply via email to