subject:"\[jira\] Updated\: \(PIG\-1198\) \[zebra\] performance improvements"

[jira] Updated: (PIG-1198) [zebra] performance improvements

2010-02-25 Thread Chao Wang (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chao Wang updated PIG-1198:
---

Patch reviewed.

Some feedbacks:

1) in fillRowSplit() method, reader.close() should always be called at the end;

2) in mapreduce.TableInputFormat.getRowSplits(), batchSize variable is not
needed.

Patch looks good overall +1

[zebra] performance improvements

Key: PIG-1198
URL: https://issues.apache.org/jira/browse/PIG-1198
Project: Pig
Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Fix For: 0.7.0

Attachments: PIG-1198.patch, PIG-1198.patch

Current input split generation is row-based split on individual TFiles. This
leaves undesired fact that even for TFiles smaller than one block one split
is still generated for each. Consequently, there will be many mappers, and
many waves, needed to handle the many small TFiles generated by as many
mappers/reducers that wrote the data. This issue can be addressed by
generating input splits that can include multiple TFiles.
For sorted tables, key distribution generation by table, which is used to
generated proper input splits, includes key distributions from column groups
even they are not in projection. This incurs extra cost to perform
unnecessary computations and, more inappropriately, creates unreasonable
results on input split generations;
For unsorted tables, when row split is generated on a union of tables, the
FileSplits are generated for each table and then lumped together to form the
final list of splits to Map/Reduce. This has a undesirable fact that number
of splits is subject to the number of tables in the table union and not just
controlled by the number of splits used by the Map/Reduce framework;
The input split's goal size is calculated on all column groups even if some
of them are not in projection;
For input splits of multiple files in one column group, all files are opened
at startup. This is unnecessary and takes unnecessarily resources from start
to end. The files should be opened when needed and closed when not;

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1198) [zebra] performance improvements

2010-02-25 Thread Yan Zhou (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yan Zhou updated PIG-1198:
--

Attachment: PIG-1198.patch

To address the review comments.

[zebra] performance improvements

Key: PIG-1198
URL: https://issues.apache.org/jira/browse/PIG-1198
Project: Pig
Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Fix For: 0.7.0

Attachments: PIG-1198.patch, PIG-1198.patch, PIG-1198.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1198) [zebra] performance improvements

2010-02-23 Thread Yan Zhou (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yan Zhou updated PIG-1198:
--

Attachment: PIG-1198.patch

This patch is based upon the load-store-redesign branch and thus might have
minor differences due to different code base from the final patch to be applied
to the trunk. This patch is teherefore only for reviewing purpose only and no
submission is intended.

[zebra] performance improvements

Key: PIG-1198
URL: https://issues.apache.org/jira/browse/PIG-1198
Project: Pig
Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Fix For: 0.7.0

Attachments: PIG-1198.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1198) [zebra] performance improvements

[jira] Updated: (PIG-1198) [zebra] performance improvements

[jira] Updated: (PIG-1198) [zebra] performance improvements

3 matches

Site Navigation

Mail list logo

Footer information