[ 
https://issues.apache.org/jira/browse/PIG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546105
 ] 

Utkarsh Srivastava commented on PIG-36:
---------------------------------------

RandomSampleLoader is just meant to load a few random rows of the input. So it 
does not care whether we were actually able to skip the amount requested or 
not. In fact, we want to make every disk seek count to give us a sample (the 
bigger our sample size, the better our quantile accuracy). So if we called 
skip, and that didnt skip the requested amount, we would still want to get a 
sample from the current position before moving on.

> FindBugs: Method ignores results of InputStream.skip()
> ------------------------------------------------------
>
>                 Key: PIG-36
>                 URL: https://issues.apache.org/jira/browse/PIG-36
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Patrick Hunt
>
> InputStreams don't always skip as much as they are asked to skip, need to do 
> this in a loop:
>               if (toSkip > 0)
>                       in.skip(toSkip);
>               return t;
> Severity and Description      Path    Resource        Location        
> Creation Time   Id
> M B RR: org.apache.pig.impl.builtin.RandomSampleLoader.getNext() ignores 
> result of org.apache.pig.impl.io.BufferedPositionedInputStream.skip(long)    
> pig-apache/src/org/apache/pig/impl/builtin      RandomSampleLoader.java line 
> 49 1196213971062   22891

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to