[
https://issues.apache.org/jira/browse/PIG-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates resolved PIG-10.
---------------------------
Resolution: Invalid
There is no requirement in pig that each tuple in a relation share the same
schema, so it will not always be an option to store the schema once up front in
intermediate results. Even in the cases where the schema is known, complex
data types with no guaranteed schemas (such as maps) could be in the tuples and
would still require markers in the code. We could optimize for the case where
all tuples are the same and all tuples contain only atomic data, but its not
clear how we would know that to be the case.
> reduce encoding of intermediate results
> ---------------------------------------
>
> Key: PIG-10
> URL: https://issues.apache.org/jira/browse/PIG-10
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Olga Natkovich
>
> Currently, in intermediate results, the data is written with a marker for
> every column in every row. For instance if
> we are writing a row that has a schema of bag, atom, we'll write:
> BAGMARKER BAGDATA ATOMMARKER ATOMDATA
> There's no reason to write the markers for every row. Is should be
> sufficient to write it once at the beginning of the
> file and then remember it for subsequent rows.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.