Ok, thanks for the info Bryan. I'll log a jira ticket for making the change that I mentioned in my previous mail.
- Gabriel On Thu, Jul 2, 2015 at 4:13 PM Bryan Baugher <[email protected]> wrote: > The code throwing the exception was our code. We can get around it for > now. Just thought it was strange to see such a large config value. > > On Thu, Jul 2, 2015 at 12:26 AM Gabriel Reid <[email protected]> > wrote: > >> From what I remember, the original intent of storing the dotfile in the >> Configuration object was: >> * to have a simple way of making it available to clients without putting >> anything dotfile-specific in an API (because dotfiles are only available >> for MR pipelines) >> * making it available as soon as MRPipeline.plan is called (which rules >> out putting it in the PipelineResult object) >> * also making it easy to get at after a full pipeline had been run >> >> Looking at how things are now since CRUNCH-418 and CRUNCH-438, I think >> that most of the above points are no longer valid. The dotfile can be >> retrieved via the MRExecutor that is returned from MRPipeline.plan, and can >> be automatically written to an output directory. The only reason to keep it >> around in the Configuration object is for backwards compatibility. >> >> What I would propose is that we deprecate >> PlanningParameters#PIPELINE_PLAN_DOTFILE, and remove it in an upcoming >> release. That means we probably still need to work around the issue that >> Brian is encountering though. >> >> @Brian, was the code throwing the exception your own code, or is there a >> hard limit in the Configuration class somewhere? My initial thought is that >> we could skip adding the dotfile to the Configuration that is serialized, >> and only add it when we return the Configuration from >> MRPipeline.getConfiguration. >> >> - Gabriel >> >> >> >> On Thu, Jul 2, 2015 at 12:05 AM Christian Tzolov < >> [email protected]> wrote: >> >>> Hi Bryan, Josh, >>> >>> IIRC this comes from the original dotfile jobplan implementation. I kept >>> it for backward compatible. You can see that only the "jobplan" (e.g. the >>> original/main plan) is stored in the Configuration. >>> >>> +Gabriel i am not sure I remember the original intent to have the >>> jobplan stored in the Configuration? >>> >>> >>> >>> >>> >>> On Wed, Jul 1, 2015 at 11:02 PM, Josh Wills <[email protected]> >>> wrote: >>> >>>> +Christian >>>> >>>> I'm not sure what the intent was there-- Christian? >>>> >>>> J >>>> >>>> On Wed, Jul 1, 2015 at 12:29 PM, Bryan Baugher <[email protected]> >>>> wrote: >>>> >>>>> We recently ran into an issue where our code to serialize a pipeline's >>>>> configuration was throwing an exception because one of the key/values in >>>>> the config was too big (65k characters). We found this key/value was >>>>> 'crunch.planner.dotfile' which is included in the pipelines config from >>>>> Crunch. >>>>> >>>>> My question is why does Crunch provide this value into the config >>>>> object? >>>>> >>>>> Crunch saves the dotfile string in the MRExecutor context[1] and I >>>>> don't think any pipeline would need this at runtime. It also seems like >>>>> there are no references to this config value anywhere within Crunch other >>>>> then to write the value into the config object. >>>>> >>>>> [1] - >>>>> https://github.com/apache/crunch/blob/d176778cf803374506cb7743069a05e28e07e2cf/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/DotfileUtills.java#L139-L140 >>>>> >>>>> >>>> >>>
