[jira] [Updated] (CRUNCH-663) Expose Record-level File Path to Processing Functions

2018-01-31 Thread Ben Roling (JIRA)

 [ 
https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated CRUNCH-663:
--
Attachment: CRUNCH-663-v2.patch

> Expose Record-level File Path to Processing Functions
> -
>
> Key: CRUNCH-663
> URL: https://issues.apache.org/jira/browse/CRUNCH-663
> Project: Crunch
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ben Roling
>Assignee: Josh Wills
>Priority: Major
> Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that 
> each record being processed came from.  It would be nice if this could be 
> exposed to the DoFns in our pipelines.
>  
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
>  
> Unfortunately, that thread dead-ended.
>  
> I will use the comments section and a patch to propose a simple, albeit 
> slightly hacky solution.  Another alternative would be to create a new Source 
> that provides a PCollection>, but I'm not sure of the 
> effort it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CRUNCH-663) Expose Record-level File Path to Processing Functions

2018-01-31 Thread Ben Roling (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347598#comment-16347598
 ] 

Ben Roling commented on CRUNCH-663:
---

Added a new patch where I modified CombineFileIT to test this new property.

> Expose Record-level File Path to Processing Functions
> -
>
> Key: CRUNCH-663
> URL: https://issues.apache.org/jira/browse/CRUNCH-663
> Project: Crunch
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ben Roling
>Assignee: Josh Wills
>Priority: Major
> Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that 
> each record being processed came from.  It would be nice if this could be 
> exposed to the DoFns in our pipelines.
>  
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
>  
> Unfortunately, that thread dead-ended.
>  
> I will use the comments section and a patch to propose a simple, albeit 
> slightly hacky solution.  Another alternative would be to create a new Source 
> that provides a PCollection>, but I'm not sure of the 
> effort it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)