Don, I found this helpful: http://en.wikipedia.org/wiki/UTF-8 . A UTF-8
character starting with 226 (0xE2) should be 3 bytes long. One fun thing with
UTF-8 is that I think you need to parse every character starting at the left end
of the string to see where the multi-byte boundaries are.
The case of ÿ took a bit of research. I was able to get Bill Lam's solution to
work for this case with this change to bfsize:
try. fh=: CreateFileR ((1 u: 7 u: y),{.a.);0;0;NULLPTR;OPEN_EXISTING;0;0 catch.
I don't think this is a general solution, so I'll continue my research.
As far as I can tell, ÿ is a valid 'extended ASCII' character. When J gets this
name in fdir, it gets displayed using the UTF-8 code (195 191{a.). Bill's verb
remaps this code back to a single 8 bit character:
a.i.(1 u: 7 u: ]) 'ÿ'
255
Which works fine as input to the ANSI version of CreateFileR to open the file.
With a UTF-8 name that cannot be mapped to single character values less than
256, this method fails:
c_p=: '沒有問題'
a.i. c_p
230 178 146 230 156 137 229 149 143 233 161 140
'' 1!:2 <c_p
fdir c_p
+------------+------------------+-+---+------+
|沒有問題|2008 12 2 15 34 22|0|rw-|-----a|
+------------+------------------+-+---+------+
bfsize_jbf_ 'ÿ'
0 300
bfsize_jbf_ c_p
+---+--------------------------------------------------------------------+
|123|The filename, directory name, or volume label syntax is incorrect. |
+---+--------------------------------------------------------------------+
More later
--
David Mitchell
Don Guinn wrote:
Well David, I am struggling with UTF8 as well. Let me give you another
interesting problem. As you can probably tell from the things I have been
running into, I'm trying to look at disks with some of the crazy names that
various software packages create. To make matters worse, some video files
are over 2G. So, I have to use bigfiles to get the proper size.
After I dug into another error I just got it looks like the filename
contains what may be an invalid UTF8 character. Here is what I got: At
offset 74 into yy is a pair of invalid (I think) UTF8 characters. They
are: 226 128{a. . From what I understand about UTF8, the low-order six bits
in the second character should not be all zeros. But I'm probably wrong. I
used Windows directory display and got the file name there and verified that
they are really there and not something done by fdir. It looks like whoever
created the name wanted to put in an apostrophe. This is the first 30 bytes
from the filename in the Windows directory:
09 From Holberg's Time- Suit
Gee, looking at the text displayed in this E-Mail it comes out as an
apostrophe leaning to the right! Maybe is is a valid UTF8 code.
But what puzzles me about all this is how does fdir and fsize find the file
when the calls bigfiles uses do not? By the way, ignore the double
backslashes in the file name. Bug in my program, but it doesn't seem to
hurt. Will fix. Another mystery as to why it doesn't hurt.
yy
D:\\Documents\My Music\Grieg\Edvard Grieg\Grieg- Peer Gynt\09 From Holberg's
Time- Suite in Olden Style ('Holberg Suite'), for piano (or string
orchestra), Op. 40- No. 1, Prelude.wma
fsize yy
2848179
bfsize_jbf_ yy
+-+--------------------------------------------+
|2|The system cannot find the file specified. |
+-+--------------------------------------------+
bfsize_jbf_ >(1 u: 7 u: ])yy
+---+--------------------------------------------------------------------+
|123|The filename, directory name, or volume label syntax is incorrect. |
+---+--------------------------------------------------------------------+
On Tue, Dec 2, 2008 at 8:12 AM, David Mitchell <[EMAIL PROTECTED]>wrote:
Sigh, my caution came home to roost. When I wrote bigfiles, I hadn't done
much work with Unicode, so I used the xxA versions (ANSI) of the Windows API
functions and didn't put any provision in for using the xxW (Unicode)
versions.
Let me see if I can fix this simply.
--
David Mitchell
Don Guinn wrote:
Thank you so much for the suggestion. Will try.
In the meantime I ran into this cute problem of a UTF8 character in a file
name. I have no idea how it got there, but it fouled up bigfiles. I looked
in the directory using the Vista directory display and it showed the file
name with the "y" with the double dots over it as well. I have no idea how
Windows handles UTF8 characters in a file name, but parts of it seem to
work
and some not. I don't know if there is really anything in bigfiles to fix,
but it is interesting.
bfsize_jbf_ 'D:\Documents\Jump Drive\Backup\J\J5User\ÿ'
+-+--------------------------------------------+
|2|The system cannot find the file specified. |
+-+--------------------------------------------+
fdir 'D:\Documents\Jump Drive\Backup\J\J5User\ÿ'
+--+---------------+--+---+------+
|ÿ|2004 8 5 7 5 44|36|rw-|-----a|
+--+---------------+--+---+------+
a.i._3{.'D:\Documents\Jump Drive\Backup\J\J5User\ÿ'
92 195 191
On Mon, Dec 1, 2008 at 5:45 PM, David Mitchell <[EMAIL PROTECTED]
wrote:
The simpler solution:
Change bigfiles.ijs
...
bfsize=: 3 : 0
if. t y do.
try. fh=: CreateFileR (y,{.a.);0;0;NULLPTR;OPEN_EXISTING;0;0 catch.
...
Changing the second parameter of CreateFileR to 0 allows bfsize to get a
file handle of an open file.
--
David Mitchell
(clip)
Don Guinn wrote:
Here is the problem: Given a file in the current directory "xxx" is
opened
in one J session then tried to get it's size using bigfiles in another
session.
Session 0:
h=:1!:21<'xxx'
h
34035648
Session 1:
load 'C:\j602\system\packages\files\bigfiles.ijs'
fdir 'xxx'
+---+-------------------+-+---+------+
|xxx|2006 12 15 10 41 25|3|rw-|-----a|
+---+-------------------+-+---+------+
bfsize_jbf_ 'xxx'
+--+---------------------------------------------------------------------------------+
|32|The process cannot access the file because it is being used by
another
process. |
+--+---------------------------------------------------------------------------------+
Notice that somehow fdir is able to get the size of the file but
bfsize_jbf_
is not.
Back to session 0:
1!:22 h
1
And now in session 1:
bfsize_jbf_ 'xxx'
0 3
Once the file is closed in session 0 the file is accessible to
bigfiles.
How is fdir able to get to the file size? Is it reading the directory
directly avoiding having to open the file? Or what?
On Sun, Nov 30, 2008 at 7:43 PM, David Mitchell <[EMAIL PROTECTED]
wrote:
Don Guinn wrote:
I ran into problems when using fdir (1!:0) when the files were larger
than
2
gig. The size of the file is wrong as is documented. I addressed this
by
using bigfiles and it worked well until I tried to get the size of a
file
that was open by another application. Then bfsize_jbf_ fails. What
confuses
me is that somehow 1!:0 can retrieve the size of the file (though
incorrect)
even though it is open to another application. As best as I can tell
all
file tools provided by Microsoft require file handles implying that
the
file
has been opened.
So how does 1!:0 get the file size for files open to other
applications?
It
would be nice if bigfiles could use the same technique.
----------------------------------------------------------------------
For information about J forums see
http://www.jsoftware.com/forums.htm
This worked for me:
FindFirstFile=: 'kernel32 FindFirstFileA i *c *'&cd
FindClose=: 'kernel32 FindClose i i'&cd
findd=:318$'a'
ffsize=:28+i.8
INVALID_HANDLE_VALUE=: _1
NB. =========================================================
NB.*fffsize v get file size using windows API
NB. form: fffsize file_path_name
fffsize=: 3 : 0
'fh ft fv'=. FindFirstFile y;findd
if. INVALID_HANDLE_VALUE~:fh do.
r=.b32to64 ctoi"1]2 4$ffsize{fv
FindClose fh
else.
r=.''
end.
r
)
--
David Mitchell
----------------------------------------------------------------------
For information about J forums see
http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
------------------------------------------------------------------------
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
------------------------------------------------------------------------
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm