[jira] Commented: (PIG-599) BufferedPositionedInputStream isn't buffered
[ https://issues.apache.org/jira/browse/PIG-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673399#action_12673399 ] Olga Natkovich commented on PIG-599: Alan, Has this patch been committed? > BufferedPositionedInputStream isn't buffered > > > Key: PIG-599 > URL: https://issues.apache.org/jira/browse/PIG-599 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: types_branch >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: types_branch > > Attachments: loadperf-2.patch, loadperf.patch > > > org.apache.pig.impl.io.BufferedPositionedInputStream is not actually > buffered. This is because it sits atop a FSDataInputStream (somewhere down > the stack), which is buffered. So to avoid double buffering, which can be > bad, BufferedPositionedInputStream was written without buffering. But the > FSDataInputStream is far enough down the stack that it is still quite costly > to call read() individually for each byte. A run through a profiler shows > that a fair amount of time is being spent in > BufferedPositionedInputStream.read(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-599) BufferedPositionedInputStream isn't buffered
[ https://issues.apache.org/jira/browse/PIG-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662359#action_12662359 ] Benjamin Reed commented on PIG-599: --- +1 looks good. > BufferedPositionedInputStream isn't buffered > > > Key: PIG-599 > URL: https://issues.apache.org/jira/browse/PIG-599 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: types_branch >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: types_branch > > Attachments: loadperf-2.patch, loadperf.patch > > > org.apache.pig.impl.io.BufferedPositionedInputStream is not actually > buffered. This is because it sits atop a FSDataInputStream (somewhere down > the stack), which is buffered. So to avoid double buffering, which can be > bad, BufferedPositionedInputStream was written without buffering. But the > FSDataInputStream is far enough down the stack that it is still quite costly > to call read() individually for each byte. A run through a profiler shows > that a fair amount of time is being spent in > BufferedPositionedInputStream.read(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-599) BufferedPositionedInputStream isn't buffered
[ https://issues.apache.org/jira/browse/PIG-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661384#action_12661384 ] Benjamin Reed commented on PIG-599: --- There is a problem with using a buffered stream and compression. we have to do some really subtle things get the mapping of compression blocks into positions so that the load functions work out properly. if read ahead happens underneath things break. (it would be excellent if someone had a better way of doing it.) in absence of a better idea, i think we should check that the stream we are buffering and skip the buffering if it as a compressed stream. > BufferedPositionedInputStream isn't buffered > > > Key: PIG-599 > URL: https://issues.apache.org/jira/browse/PIG-599 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: types_branch >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: types_branch > > Attachments: loadperf.patch > > > org.apache.pig.impl.io.BufferedPositionedInputStream is not actually > buffered. This is because it sits atop a FSDataInputStream (somewhere down > the stack), which is buffered. So to avoid double buffering, which can be > bad, BufferedPositionedInputStream was written without buffering. But the > FSDataInputStream is far enough down the stack that it is still quite costly > to call read() individually for each byte. A run through a profiler shows > that a fair amount of time is being spent in > BufferedPositionedInputStream.read(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.