The code throwing the exception was our code. We can get around it for now. Just thought it was strange to see such a large config value.
On Thu, Jul 2, 2015 at 12:26 AM Gabriel Reid <[email protected]> wrote: > From what I remember, the original intent of storing the dotfile in the > Configuration object was: > * to have a simple way of making it available to clients without putting > anything dotfile-specific in an API (because dotfiles are only available > for MR pipelines) > * making it available as soon as MRPipeline.plan is called (which rules > out putting it in the PipelineResult object) > * also making it easy to get at after a full pipeline had been run > > Looking at how things are now since CRUNCH-418 and CRUNCH-438, I think > that most of the above points are no longer valid. The dotfile can be > retrieved via the MRExecutor that is returned from MRPipeline.plan, and can > be automatically written to an output directory. The only reason to keep it > around in the Configuration object is for backwards compatibility. > > What I would propose is that we deprecate > PlanningParameters#PIPELINE_PLAN_DOTFILE, and remove it in an upcoming > release. That means we probably still need to work around the issue that > Brian is encountering though. > > @Brian, was the code throwing the exception your own code, or is there a > hard limit in the Configuration class somewhere? My initial thought is that > we could skip adding the dotfile to the Configuration that is serialized, > and only add it when we return the Configuration from > MRPipeline.getConfiguration. > > - Gabriel > > > > On Thu, Jul 2, 2015 at 12:05 AM Christian Tzolov < > [email protected]> wrote: > >> Hi Bryan, Josh, >> >> IIRC this comes from the original dotfile jobplan implementation. I kept >> it for backward compatible. You can see that only the "jobplan" (e.g. the >> original/main plan) is stored in the Configuration. >> >> +Gabriel i am not sure I remember the original intent to have the jobplan >> stored in the Configuration? >> >> >> >> >> >> On Wed, Jul 1, 2015 at 11:02 PM, Josh Wills <[email protected]> wrote: >> >>> +Christian >>> >>> I'm not sure what the intent was there-- Christian? >>> >>> J >>> >>> On Wed, Jul 1, 2015 at 12:29 PM, Bryan Baugher <[email protected]> wrote: >>> >>>> We recently ran into an issue where our code to serialize a pipeline's >>>> configuration was throwing an exception because one of the key/values in >>>> the config was too big (65k characters). We found this key/value was >>>> 'crunch.planner.dotfile' which is included in the pipelines config from >>>> Crunch. >>>> >>>> My question is why does Crunch provide this value into the config >>>> object? >>>> >>>> Crunch saves the dotfile string in the MRExecutor context[1] and I >>>> don't think any pipeline would need this at runtime. It also seems like >>>> there are no references to this config value anywhere within Crunch other >>>> then to write the value into the config object. >>>> >>>> [1] - >>>> https://github.com/apache/crunch/blob/d176778cf803374506cb7743069a05e28e07e2cf/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/DotfileUtills.java#L139-L140 >>>> >>>> >>> >>
