[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834373#action_12834373 ] Hong Tang commented on PIG-1115: Why not requesting the patch to be back ported to Hadoop 0.21 (btw do you mean Hadoop 0.21 or 0.20)? [zebra] temp files are not cleaned. --- Key: PIG-1115 URL: https://issues.apache.org/jira/browse/PIG-1115 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Hong Tang Assignee: Gaurav Jain Attachments: PIG-1115.patch Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1115) [zebra] temp files are not cleaned.
[zebra] temp files are not cleaned. --- Key: PIG-1115 URL: https://issues.apache.org/jira/browse/PIG-1115 Project: Pig Issue Type: Bug Reporter: Hong Tang Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-992) [zebra] Separate Schema-related files into a Schema package
[ https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763439#action_12763439 ] Hong Tang commented on PIG-992: --- Comments: - In many places, both types.ParseException and schema.ParseException are thrown. Do you really want both? - In the following {noformat} +public enum ColumnType implements Writable { {noformat} Is the Writable interface actually used? You have rather odd pattern of asymmetric readFields and write: {noformat} + @Override + public void readFields(DataInput in) throws IOException { +// no op, instantiated by the caller + } + + @Override + public void write(DataOutput out) throws IOException { +Utils.writeString(out, name); + } {noformat} - In the following code {noformat} + public static class ColumnSchema { +public String name; +public ColumnType type; +public Schema schema; +public int index; // field index in schema {noformat} Exposing fields as all-public seems like a bad idea. - Is there a specific usage case to allow schema to be mutable at any time? (minor nit: the comment says add a field, but the code seems to add a column to the schema). {noformat} + /** + * add a field + */ + public void add(ColumnSchema f) throws ParseException + { +add(f, false); + } {noformat} - Why Schema.equals(Object) is not implemented on top of the static version of the method (or vice versa)? - In Schema.readFields(), the Version string from the input is not checked for compatibility. - In the following {noformat} + private void init(String[] columnNames) throws ParseException { +// the arg must be of type or they will be treated as the default type +// TODO: verify column names don't contain COLUMN_DELIMITER {noformat} It seems that the TODO should not involve too much work and please consider not deferring it later. - Need more detailed documentation on the spec of the parameter for Schema.getColumnSchema(String name) {noformat} + /** + * Get a column's schema + */ + public ColumnSchema getColumnSchema(String name) throws ParseException + { {noformat} - Schema.getColumnSchemaOnParsedName and Schema.getColumnSchema seems to be copy/paste code. - Schema.getColumnSchema(ParsedName pn) has side effect of modifying the parameter pn. The javadoc reads cryptic to me. - There are many classes generated by JavaCC. It is probably better not including them in the patch (and put the generated source under build/src). Other minor issues: - Typically contrib projects should use the version string as the parent project. - Style: there are some very long lines. - There are a few white space changes. That should be avoided if possible. - In the following {noformat} +} catch (org.apache.hadoop.zebra.schema.ParseException e) { + throw new AssertionError(Invalid Projection: +e.getMessage()); {noformat} consider change AssertionError to IllegalArgumentException. - In the following: {noformat} + /* + * helper class to parse a column name string one section at a time and find the required + * type for the parsed part. + */ + public static class ParsedName { +public String mName; +int mKeyOffset; // the offset where the keysstring starts +public ColumnType mDT = ColumnType.ANY; // parent's type {noformat} The description seems to indicate that this should not be a public class. I tried to understand the body of the class and do not feel that it serves a general purpose. - The following seems like useless assignment: {noformat} + private long mVersion = schemaVersion; {noformat} - {noformat} /** + * Normalize the schema string. + * + * @param value + * the input string representation of the schema. + * @return the normalized string representation. + */ + public static String normalize(String value) { +String result = new String(); + +if (value == null || value.trim().isEmpty()) + return result; + +StringBuilder sb = new StringBuilder(); +String[] parts = value.trim().split(COLUMN_DELIMITER); +for (int nx = 0; nx parts.length; nx++) { + if (nx 0) sb.append(COLUMN_DELIMITER); + sb.append(parts[nx].trim()); +} +return sb.toString(); + } {noformat} There is a wasted value.trim(). - In Schema.equals(Object), instead of comparing class equality, using instanceof is typically better. - Use StringBuilder instead in the following code: {noformat} +String merged = new String(); +for (int i = 0; i columnNames.length; i++) { + if (i 0) merged += ,; + merged += columnNames[i]; +} {noformat} - There are a few indentation problems. [zebra] Separate Schema-related files into a Schema package - Key: PIG-992 URL: https://issues.apache.org/jira/browse/PIG-992 Project: Pig Issue Type: Improvement
[jira] Resolved: (PIG-526) Order of key, value pairs not preserved in MAP type.
[ https://issues.apache.org/jira/browse/PIG-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang resolved PIG-526. --- Resolution: Won't Fix Order of key, value pairs not preserved in MAP type. -- Key: PIG-526 URL: https://issues.apache.org/jira/browse/PIG-526 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.2.0 Reporter: Hong Tang PIG uses HashMap to deserialize the Pig MAP type which will not observe the order of key, value pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729744#action_12729744 ] Hong Tang commented on PIG-879: --- 1) and 3) are kind of equivalent to user, and are preferred for customized loaders that do not wish pig to do the escaping at all. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729771#action_12729771 ] Hong Tang commented on PIG-879: --- Both are valid arguments. The problem of 2) and 4) are that they require change to the load statement syntax or load-func api and would take longer to get there. I guess we could structure the fix in two phases: Phase One: supporting 1) and 3), so that we can have the minimum to move along without having to disable multi-query optimization completely. User should be able to modify the script to change all relative paths to absolute ones (the chance of such usage should be rare that most people should not be impacted). Phase Two: support either 2) or 4) (but I do not think we need both). And personally I think 4) would be better because loader should be the one that interprets the location string syntax. Pig should provide a way for input location string in load statement to be passed as-is to the Loader - Key: PIG-879 URL: https://issues.apache.org/jira/browse/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about Incompatible Changes - Path Names and Schemes). This is necessary since the script may have cd .. statements between load or store statements and if the load statements have relative paths, we would need to convert to absolute paths to know where to load/store from. To do this QueryParser.massageFilename() has the code below[1] which basically gives the fully qualified hdfs path However the issue with this approach is that if the filename string is something like hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2, the code below[1] actually translates this to hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 and throws an exception that it is an incorrect path. Some loaders may want to interpret the filenames (the input location string in the load statement) in any way they wish and may want Pig to not make absolute paths out of them. There are a few options to address this: 1)A command line switch to indicate to Pig that pathnames in the script are all absolute and hence Pig should not alter them and pass them as-is to Loaders and Storers. 2)A keyword in the load and store statements to indicate the same intent to pig 3)A property which users can supply on cmdline or in pig.properties to indicate the same intent. 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String curDir) which does the conversion to absolute - this way Loader can chose to implement it as a noop. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716470#action_12716470 ] Hong Tang commented on PIG-833: --- Jeff, just like the SQL effort, the space of columnar storage is also wide open, and I think it is more beneficial to the overall healthy of the hadoop ecosystem. With that being said, I also looked at the patch attached with HIVE-352. It appears that what the patch does is a level below our stated objectives. Specifically, the guts of the implementation (RCFile) is very close in spirit to TFile as described HADOOP-3315, which seems to have its first comprehensive patch back in December 2008. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712397#action_12712397 ] Hong Tang commented on PIG-794: --- - It appears that the code added a three-byte sync-mark \1\2\3 before every tuple. - There is no escaping of sync-mark collisions in user code. - The introduction of the sync mark also defeats the purpose of using Avro in the first place (sharing a common serialization format). Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Fix For: 0.2.0 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-793) Improving memory efficiency of Tuple implementation
[ https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705188#action_12705188 ] Hong Tang commented on PIG-793: --- Two ideas: # when loading tuple from serialized data, keep it as a byte array and only instantiate datums when get/set calls are made. This would help if we are moving tuples from one container to another container. {code} class LazyTuple implements Tuple { ArrayListObject fields; // null if not deserialized DataByteArray lazyBytes; // e.g. serialized bytes of tuple in avro format. } {code} # improving DataByteArray. it may be changed to an interface (need get(), offset(), and length() ), and use a DataByteArrayFactory to create instances in two ways: ## DataByteArrayFactor.createPrivate(byte[], offset, length), if we need to keep a private copy of the buffer. ## DataByteArrayCreateShared(). if the input buffer can be shared with the data byte array object. In this case, the contract would be that caller will no longer access the portion of byte array from offset to offset+length (exclusive). There could be three different implementations of this: - The current implementation will be used for createPrivate(). - An implementation for small buffers (offset/length can be represented in short/short). - An implementation for large buffers (offset/length are int/int, and length is larger enough) Note that the change to DataByteArray would break the current semantics where the offset is always 0, and length is always the length of the buffer. Improving memory efficiency of Tuple implementation --- Key: PIG-793 URL: https://issues.apache.org/jira/browse/PIG-793 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Currently, our tuple is a real pig and uses a lot of extra memory. There are several places where we can improve memory efficiency: (1) Laying out memory for the fields rather than using java objects since since each object for a numeric field takes 16 bytes (2) For the cases where we know the schema using Java arrays rather than ArrayList. There might be more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PIG-793) Improving memory efficiency of Tuple implementation
[ https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705188#action_12705188 ] Hong Tang edited comment on PIG-793 at 5/1/09 4:59 PM: --- Two ideas: # when loading tuple from serialized data, keep it as a byte array and only instantiate datums when get/set calls are made. This would help if we are moving tuples from one container to another container. {code} class LazyTuple implements Tuple { ArrayListObject fields; // null if not deserialized DataByteArray lazyBytes; // e.g. serialized bytes of tuple in avro format. } {code} # improving DataByteArray. it may be changed to an interface (need get(), offset(), and length() ), and use a DataByteArrayFactory to create instances in two ways: ## DataByteArrayFactor.createPrivate(byte[], offset, length), if we need to keep a private copy of the buffer. ## DataByteArrayFactor.createShared(byte[], offset, length). if the input buffer can be shared with the data byte array object. In this case, the contract would be that caller will no longer access the portion of byte array from offset to offset+length (exclusive). There could be three different implementations of this: - The current implementation will be used for createPrivate(). - An implementation for small buffers (offset/length can be represented in short/short). - An implementation for large buffers (offset/length are int/int, and length is larger enough) Note that the change to DataByteArray would break the current semantics where the offset is always 0, and length is always the length of the buffer. was (Author: hong.tang): Two ideas: # when loading tuple from serialized data, keep it as a byte array and only instantiate datums when get/set calls are made. This would help if we are moving tuples from one container to another container. {code} class LazyTuple implements Tuple { ArrayListObject fields; // null if not deserialized DataByteArray lazyBytes; // e.g. serialized bytes of tuple in avro format. } {code} # improving DataByteArray. it may be changed to an interface (need get(), offset(), and length() ), and use a DataByteArrayFactory to create instances in two ways: ## DataByteArrayFactor.createPrivate(byte[], offset, length), if we need to keep a private copy of the buffer. ## DataByteArrayCreateShared(). if the input buffer can be shared with the data byte array object. In this case, the contract would be that caller will no longer access the portion of byte array from offset to offset+length (exclusive). There could be three different implementations of this: - The current implementation will be used for createPrivate(). - An implementation for small buffers (offset/length can be represented in short/short). - An implementation for large buffers (offset/length are int/int, and length is larger enough) Note that the change to DataByteArray would break the current semantics where the offset is always 0, and length is always the length of the buffer. Improving memory efficiency of Tuple implementation --- Key: PIG-793 URL: https://issues.apache.org/jira/browse/PIG-793 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Currently, our tuple is a real pig and uses a lot of extra memory. There are several places where we can improve memory efficiency: (1) Laying out memory for the fields rather than using java objects since since each object for a numeric field takes 16 bytes (2) For the cases where we know the schema using Java arrays rather than ArrayList. There might be more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-652) Need to give user control of OutputFormat
[ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672690#action_12672690 ] Hong Tang commented on PIG-652: --- You probably want to provide utility method for getting back the StoreFunc from a JobConf, instead of forcing people into copy/paste internal pig code... Need to give user control of OutputFormat - Key: PIG-652 URL: https://issues.apache.org/jira/browse/PIG-652 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces. It does not allow any control over OutputFormat and RecordWriter interfaces. It just allows the user to implement a storage function that controls how the data is serialized. For hadoop tables, we will need to allow custom OutputFormats that prepare output information and objects needed by a Table store function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-652) Need to give user control of OutputFormat
[ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672711#action_12672711 ] Hong Tang commented on PIG-652: --- One more thing that is still not clear to me. StoreFunc does not impelement any serialization interface, and it depends on an all-string constructor to properly construct the object. How do my customized TableStoreFunc instance convey this information to PIG? Need to give user control of OutputFormat - Key: PIG-652 URL: https://issues.apache.org/jira/browse/PIG-652 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces. It does not allow any control over OutputFormat and RecordWriter interfaces. It just allows the user to implement a storage function that controls how the data is serialized. For hadoop tables, we will need to allow custom OutputFormats that prepare output information and objects needed by a Table store function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-653) Make fieldsToRead work in loader
[ https://issues.apache.org/jira/browse/PIG-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672176#action_12672176 ] Hong Tang commented on PIG-653: --- my quibble is that the interface uses null to indicate all required for nested fields, but uses a concrete class for top level fields. any justification why possible future extensions are only applicable to top-level fields but not nested fields? Make fieldsToRead work in loader Key: PIG-653 URL: https://issues.apache.org/jira/browse/PIG-653 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Pradeep Kamath Attachments: PIG-653-2.comment Currently pig does not call the fieldsToRead function in LoadFunc, thus it does not provide information to load functions on what fields are needed. We need to implement a visitor that determines (where possible) which fields in a file will be used and relays that information to the load function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-652) Need to give user control of OutputFormat
[ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12671254#action_12671254 ] Hong Tang commented on PIG-652: --- I might miss something. How can the outputformat class retrieve the schema information? The output format is constructed with its default constructor, and then its getRecordWriter is called with name like part-001, part-002, but not the path to the basic table. Need to give user control of OutputFormat - Key: PIG-652 URL: https://issues.apache.org/jira/browse/PIG-652 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces. It does not allow any control over OutputFormat and RecordWriter interfaces. It just allows the user to implement a storage function that controls how the data is serialized. For hadoop tables, we will need to allow custom OutputFormats that prepare output information and objects needed by a Table store function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-652) Need to give user control of OutputFormat
[ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670981#action_12670981 ] Hong Tang commented on PIG-652: --- Since this API is supposed to provide backend specific output classes, shouldn't the API take a parameter describing the backend? For MR backend, the returned class would be implementing OutputFormatText, Tuple ? Also, need to make it public the keys in the JobConf object describing path, schema, compression, etc. Need to give user control of OutputFormat - Key: PIG-652 URL: https://issues.apache.org/jira/browse/PIG-652 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces. It does not allow any control over OutputFormat and RecordWriter interfaces. It just allows the user to implement a storage function that controls how the data is serialized. For hadoop tables, we will need to allow custom OutputFormats that prepare output information and objects needed by a Table store function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-652) Need to give user control of OutputFormat
[ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12671028#action_12671028 ] Hong Tang commented on PIG-652: --- How do I get the schema information from Pig? I thought you would put the schema in the JobConf and pass it to the customized OutputFormat class to create RecordWriter. Need to give user control of OutputFormat - Key: PIG-652 URL: https://issues.apache.org/jira/browse/PIG-652 Project: Pig Issue Type: New Feature Components: impl Reporter: Alan Gates Assignee: Alan Gates Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces. It does not allow any control over OutputFormat and RecordWriter interfaces. It just allows the user to implement a storage function that controls how the data is serialized. For hadoop tables, we will need to allow custom OutputFormats that prepare output information and objects needed by a Table store function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-526) Order of key, value pairs not preserved in MAP type.
[ https://issues.apache.org/jira/browse/PIG-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647432#action_12647432 ] Hong Tang commented on PIG-526: --- I understand your concern. But just as I rephrased, the issue here is that PIG allows no control from user side to choose which concrete Map class to use when deserilizing the tuples. Probably a good compromise is to allow user to specify which Map class should be used when performing Tuple deserialization. Order of key, value pairs not preserved in MAP type. -- Key: PIG-526 URL: https://issues.apache.org/jira/browse/PIG-526 Project: Pig Issue Type: Bug Components: data Affects Versions: types_branch Reporter: Hong Tang PIG uses HashMap to deserialize the Pig MAP type which will not observe the order of key, value pairs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.