[ 
https://issues.apache.org/jira/browse/UIMA-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor closed UIMA-210.
-------------------------------


I think this is fine.

Here's my thinking on the pros/cons:

Heap used:

The overall heap consumed by both methods, for file having "N" bytes in file 
length, ignoring heap consumed by result which would be the same in both cases:

Current: 10,000 chars for buf + approx N to N/2 (depending on encoding) chars 
in string buf + maybe lots of garbage as string buf is repeatedly expanded ( 
estimated as approx: N to N/2).   One way to reduce this is to get the file 
length
in bytes and preallocate the string buffer to, say N/2.     
Previous:  N chars in buf

So for large files, the previous could be wasteful by overallocating the buf in 
the case of character encoding being used, and the current is wasteful in terms 
of the stringbufer being reallocated repeatedly.  

Performance:
For larger files, on some systems, there may be a large benefit from reading in 
more the 10,000 bytes at a time.

One other point - I've grown fond of coding tests against null as "null == 
otherObject" rather than  the more "natural" "otherObject == null" because if I 
accidently write "=" instead of "==" it gives me a compile error :-).
 

> faulty use of .read(buffer...) in several places - not checking for fewer 
> than expected bytes/chars read
> --------------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-210
>                 URL: https://issues.apache.org/jira/browse/UIMA-210
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 2.1
>            Reporter: Marshall Schor
>         Assigned To: Marshall Schor
>             Fix For: 2.1
>
>
> The definition of most instances of stream.read(bufferArray) says it reads 
> *up to* the length of the array.  We had earlier an issue on a multi-core 
> machine where the length read was much less than the length of the buffer or 
> of the file. (This was in Vinci).  The solution is to wrap these things in 
> code that looks like this from the XTalkTransporter (this assumes the file 
> length is known):
>   static public void readFully(byte[] b, int length, InputStream in) throws 
> IOException {
>     int read_so_far = 0;
>     while (read_so_far < length) {
>       int count = in.read(b, read_so_far, length - read_so_far);
>       if (count < 0) {
>         throw new EOFException();
>       }
>       read_so_far += count;
>     }
>   }
> Code which is broken can be found by scanning for .read(
> Ones I found scanning are:
> VinciTAEClient
> FileUtils (copyFile method) 
> (Note: similarly named class FileUtil (no final "s") has a copyFile method 
> that is OK)
> XMLUtil.java has fragment that could fail incorrectly in 
> detectXmlFileEncoding:
>       // store the 1st text byte and read next 6 bytes of XML file
>       buffer[byteCounter++] = (byte) nextByte;
>       if (iStream.read(buffer, byteCounter, bytes2put - 1) != bytes2put - 1)  
>  //ERROR NOT ALLOWING FOR FEWER BYTES READ
>         throw new IOException("cannot read file");
> There are multiple instances of code in JcasSofaTest don't allow for the 
> possiblity of reading fewer than buf size; here's one:
>       dest = new byte[4];
>       is.close();
>       is = intArrayView.getSofaDataStream();
>       assertTrue(is != null);
>       int i = 0;
>       while (is.read(dest) != -1) {
>         assertTrue(ByteBuffer.wrap(dest).getInt() == intArrayFS.get(i++));
>         ;
>       }
> And another one like this in SofaTest.
> DebugControlThread method doCheckpoint has the problem
> In our examples, the following have the problem:
> CasMultiplierExampleApplication 
> FileSystemCollectionReader
> ExampleApplication
> PrintAnnotations
> JetExpander
> And in uimaj-tools:
> FileSystemCollectionReader
> CasTreeViewer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to