You can also throw in a SecondarySort: https://crunch.apache.org/user-guide.html#secsort to get each grouped presorted On Thu, Apr 28, 2016 at 2:16 PM David Ortiz <[email protected]> wrote:
> I think I am confused as to what you're going for. A parallelDo over the > PGroupedTable should do exactly what you described. You get key, > Iterable<DataRecord> for a single key, at which point you can do whatever > you want in the DoFn. That's exactly what i had to do on a flow at work, > where I do a groupByKey on a PTable, then in the ensuing parallelDo, create > a List out of the Iterable<Record> and do some aggregate functions over it. > > On Thu, Apr 28, 2016 at 2:59 PM Robinson, Landon - Landon < > [email protected]> wrote: > >> Crunch Gurus, >> >> We need to process some data in order, so parallelDo shouldn’t work for >> this approach. We’ve looked at SequentialDo, but not sure how exactly to >> make it work…(Not much documentation on it). >> *DataRecord is a java object with getters and setters.* >> >> Right now, we have a PGroupedTable<String, DataRecord> where the String >> keys in the PGT are linked to multiple DataRecord objects (standard PGT >> behavior). >> What we need to do now is loop through all records for a particular key, >> sort them, and do some simple calculations. >> >> *What is the best way/standard way to process a PgroupedTable so that >> records corresponding to the same key are all kept together and processed?* >> >> Right now we know how to crack open a PGT in the local code and flip >> through it (the SingleUseIterable), but we need to make a new dataset out >> of it, not just play with it. >> >> Any direction or guidance would be appreciated! >> >> --------------------------------------------------------------------------- >> Landon Robinson >> Big Data & Hadoop Engineer >> IT Business Intelligence, Lowe’s Companies Inc. >> >> --------------------------------------------------------------------------- >> NOTICE: All information in and attached to the e-mails below may be >> proprietary, confidential, privileged and otherwise protected from improper >> or erroneous disclosure. If you are not the sender's intended recipient, >> you are not authorized to intercept, read, print, retain, copy, forward, or >> disseminate this message. If you have erroneously received this >> communication, please notify the sender immediately by phone (704-758-1000) >> or by e-mail and destroy all copies of this message electronic, paper, or >> otherwise. >> >> *By transmitting documents via this email: Users, Customers, Suppliers >> and Vendors collectively acknowledge and agree the transmittal of >> information via email is voluntary, is offered as a convenience, and is not >> a secured method of communication; Not to transmit any payment information >> E.G. credit card, debit card, checking account, wire transfer information, >> passwords, or sensitive and personal information E.G. Driver's license, >> DOB, social security, or any other information the user wishes to remain >> confidential; To transmit only non-confidential information such as plans, >> pictures and drawings and to assume all risk and liability for and >> indemnify Lowe's from any claims, losses or damages that may arise from the >> transmittal of documents or including non-confidential information in the >> body of an email transmittal. Thank you. * >> >
