Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Chris Burke
Oleg Kobchenko wrote: We need a general purpose read line functionality. It is common in C runtime and in other languages. Although, it is possible to do in J, but it's better not to do the low-level stuff every time. I suggest that we add two new definitions to the files script. One is

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Yoel Jacobsen
I have tryed it on a 1.2GB file. Since my laptop has only 1GB RAM I have killed the process when it consumed 500MB (and rising). Yoel On 5/15/06, Henry Rich [EMAIL PROTECTED] wrote: Try x ([: I. E.) y -- For information

RE: [Jprogramming] Scanning a large file

2006-05-16 Thread Miller, Raul D
Chris Burke wrote: if. len #dat do. if. p s do. dat=. dat, LF else. 'file not in LF-delimited lines' 13!:8[3 Note that this assumes that the last line of the file is terminated by a line feed. Otherwise, there can be a spurious error if the file is slightly larger

RE: [Jprogramming] Scanning a large file

2006-05-16 Thread Joey K Tuttle
At 09:38 -0400 2006/05/16, Miller, Raul D wrote: Chris Burke wrote: if. len #dat do. if. p s do. dat=. dat, LF else. 'file not in LF-delimited lines' 13!:8[3 Note that this assumes that the last line of the file is terminated by a line feed. Otherwise, there can

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Alain Miville de ChĂȘne
It is all relative. The LF can be seen (as you do) as end of line or as new line. In the first case, all lines should end with end of line. In the second, LF cuts one line from another. When editing a text file, and requesting to place the cursor at end of file, with no LF at the end the

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Joey K Tuttle
Certainly, in my experience, LF, CR, or CRLF are considered as EOL (in ..IX, MAC, PC OSs). Going way back, these things came from input devices such as the IBM 1050 which was an early typewriter terminal. It had the charming attribute that the return key did just that (returned the carriage as on

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Oleg Kobchenko
These are interesting stories about line terminators. I agree on providing all the data. But I think absence of final terminator is more a stylistic issue (or a matter of choice) than a defect. Hence, it more like truthful conveying than alerting cleanliness. Here's on cygwin: [EMAIL PROTECTED]

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Joey K Tuttle
OK, MS (not bashing women :) Excel - the problem is, one often doesn't have the choice not to use it in the sense that people send files exported from Excel... A case where you can choose not to use it includes things like trying to use Excel to open a text file that starts with the ascii

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Alain Miville de ChĂȘne
Our company is entirely using OpenOffice. It is a mature product to replace MS Office. Joey K Tuttle wrote: OK, MS (not bashing women :) Excel - the problem is, one often doesn't have the choice not to use it in the sense that people send files exported from Excel... ...

RE: [Jprogramming] Scanning a large file

2006-05-16 Thread Joey K Tuttle
At 15:29 -0400 2006/05/16, Miller, Raul D wrote: Joey K Tuttle wrote: OK, MS (not bashing women :) Excel - the problem is, one often doesn't have the choice not to use it in the sense that people send files exported from Excel... And sometimes those files are broken or virus infected,

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Chris Burke
Miller, Raul D wrote: Chris Burke wrote: if. len #dat do. if. p s do. dat=. dat, LF else. 'file not in LF-delimited lines' 13!:8[3 Note that this assumes that the last line of the file is terminated by a line feed. Otherwise, there can be a spurious error if the

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Chris Burke
Oleg Kobchenko wrote: It's a great idea to include line reading into a standard library. Here is a few comments. There are two differences from the original readlines: - overlapped reading (not once and only once) (with asserting presence of LF in current block) - automatic removal

RE: [Jprogramming] Scanning a large file

2006-05-16 Thread Miller, Raul D
Chris Burke wrote: I am in two minds on the buffer. It does impact performance, though not by much. But it means that after the block of 1e6 bytes is read in, it is immediately copied because it is appended to the tail of the previous block. So the question is whether this performance hit is

Re: [Jprogramming] Scanning a large file

2006-05-16 Thread Oleg Kobchenko
I am not sure about overlapped either. Raul's idea about special-casing sounds good. And the discussion on spread of copy. In my test, the impact was 5-7% or so -- a good price for streaming. I think the bottle neck is in looping in u;.2 and the line proc itself. I ran the UNIX wc, and it

RE: [Jprogramming] Scanning a large file

2006-05-16 Thread Joey K Tuttle
At 20:54 -0700 2006/05/16, Oleg Kobchenko wrote: http://support.microsoft.com/kb/215591/ ID,NAME 666,MS Don' B H8N Yes - I knew the workaround and even puzzled out that the origination of the bug is that SYLK files begin with ID;. You would think that some bright programmer could