Thanks for trying this out.

Better support for groupby (e.g. https://github.com/apache/beam/pull/13843
, https://github.com/apache/beam/pull/13637) will be available in the next
Beam release (2.29, in progress, but you could try out head if you want).
Note, however, that Beam PCollections are by definition unordered, so
unless you sort a partition and immediately do something with it that
ordering may not be preserved. If you could let us know what you're trying
to do with this ordering that would be helpful.

- Robert


On Thu, Apr 1, 2021 at 7:31 PM Wenbing Bai <[email protected]>
wrote:

> Hi Beam users,
>
> I have a user case to partition my PCollection by some key, and then sort
> my rows within the same partition by some other key.
>
> I feel Beam Dataframe could be a candidate solution, but I cannot figure
> out how to make it work. Specifically, I tried df.groupby where I expect my
> data will be distributed to different nodes. I also tried df.sort_values,
> but it will sort my whole dataset, which is not what I need.
>
> Can someone shed some light on this?
>
>
>
>
>
> Wenbing Bai
>
> Senior Software Engineer
>
> Data Infrastructure, Cruise
>
> Pronouns: She/Her
>
>
>
> *Confidentiality Note:* We care about protecting our proprietary
> information, confidential material, and trade secrets. This message may
> contain some or all of those things. Cruise will suffer material harm if
> anyone other than the intended recipient disseminates or takes any action
> based on this message. If you have received this message (including any
> attachments) in error, please delete it immediately and notify the sender
> promptly.

Reply via email to