[jira] [Updated] (CRUNCH-663) Expose Record-level File Path to Processing Functions
[ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Roling updated CRUNCH-663: -- Attachment: CRUNCH-663-v2.patch > Expose Record-level File Path to Processing Functions > - > > Key: CRUNCH-663 > URL: https://issues.apache.org/jira/browse/CRUNCH-663 > Project: Crunch > Issue Type: Improvement > Components: Core >Reporter: Ben Roling >Assignee: Josh Wills >Priority: Major > Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch > > > We have some processing pipelines where we want to know the file path that > each record being processed came from. It would be nice if this could be > exposed to the DoFns in our pipelines. > > This same desire was expressed a little over 1 year ago on the mailing list: > [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E] > > Unfortunately, that thread dead-ended. > > I will use the comments section and a patch to propose a simple, albeit > slightly hacky solution. Another alternative would be to create a new Source > that provides a PCollection>, but I'm not sure of the > effort it would take to create that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CRUNCH-663) Expose Record-level File Path to Processing Functions
[ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347598#comment-16347598 ] Ben Roling commented on CRUNCH-663: --- Added a new patch where I modified CombineFileIT to test this new property. > Expose Record-level File Path to Processing Functions > - > > Key: CRUNCH-663 > URL: https://issues.apache.org/jira/browse/CRUNCH-663 > Project: Crunch > Issue Type: Improvement > Components: Core >Reporter: Ben Roling >Assignee: Josh Wills >Priority: Major > Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch > > > We have some processing pipelines where we want to know the file path that > each record being processed came from. It would be nice if this could be > exposed to the DoFns in our pipelines. > > This same desire was expressed a little over 1 year ago on the mailing list: > [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E] > > Unfortunately, that thread dead-ended. > > I will use the comments section and a patch to propose a simple, albeit > slightly hacky solution. Another alternative would be to create a new Source > that provides a PCollection>, but I'm not sure of the > effort it would take to create that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)