Ulf Zibis wrote:
Am 10.05.2010 19:53, schrieb Ulf Zibis:
Am 10.05.2010 03:05, schrieb Xueming Shen:
Ulf,
Can you be more specific? I'm not sure I understand your question.
What "buffering"
are we talking here?
In http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev ,
I think byte[]
Am 11.05.2010 18:41, schrieb Xueming Shen:
Ulf Zibis wrote:
SOME of my comments below ARE ment for
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev
I marked the others. ;-)
- use Arrays.binarySearch() in Character.UnicodeBlock.of().
This one can be discussed in a separate thread,
Ulf Zibis wrote:
SOME of my comments below ARE ment for
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev
I marked the others. ;-)
- use Arrays.binarySearch() in Character.UnicodeBlock.of().
This one can be discussed in a separate thread, I would prefer to stay
with the script supp
Am 10.05.2010 19:53, schrieb Ulf Zibis:
Am 10.05.2010 03:05, schrieb Xueming Shen:
Ulf,
Can you be more specific? I'm not sure I understand your question.
What "buffering"
are we talking here?
In http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev ,
I think byte[] ba could be saved
SOME of my comments below ARE ment for
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev
I marked the others. ;-)
-Ulf
Am 11.05.2010 02:05, schrieb Xueming Shen:
Ulf,
My apology for distracting you to that "smaller size alternative", as
I said in my previous email
please only "re
Ulf,
My apology for distracting you to that "smaller size alternative", as I
said in my previous email
please only "review" the bits at
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev
It's fine if you are interested in the stuff I experimented at
http://cr.openjdk.java.net/~sherman/
Some additional thoughts:
- out.writeShort((short)(num & 0x)); ---short form--->
out.writeShort((short)num);
- use Arrays.binarySearch() in Character.UnicodeBlock.of().
- "if (notFirst)" could be saved if you would first append the first
word to sb outside the while loop.
- StringBuilder
Ulf,
Stuff under http://cr.openjdk.java.net/~sherman/script/webrev.00 just an
idea about a
smaller-size alternative It is not a intended to replace the final bits
for review at
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev
My bad, probably I should not mixed 2 things in one email.
Am 10.05.2010 03:05, schrieb Xueming Shen:
Ulf,
Can you be more specific? I'm not sure I understand your question.
What "buffering"
are we talking here?
In http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev ,
I think byte[] ba could be saved in initNamePool(), as you could
directly
Ulf,
Can you be more specific? I'm not sure I understand your question. What
"buffering"
are we talking here? If you are referring to code below
dis = new DataInputStream(new InflaterInputStream(
AccessController.doPrivileged(new PrivilegedAction()
{
public InputS
Sherman, I don't understand, why you use so much buffering.
InputStream from getResourceAsStream, and I believe InflaterInputStream
too, is yet buffered.
My understanding until now was, that access to buffered byte streams is
as fast as to naked byte arrays.
Am I wrong?
-Ulf
Am 08.05.2010
Hi,
The API proposals for Unicode script support below have been approved.
6945564: Unicode script support in Character class
6948903: Make Unicode scripts available for use in regular expressions
Here is the final webrev ready for push.
http://cr.openjdk.java.net/~sherman/6945564_6948903/web
Hi,
#4860714 has been closed as a dup (to workaround an internal process
problem) of my newly created
#6948903 for the regex script support.
So here are the CCC drafts for
6945564: Unicode script support in Character class
6948903: Make Unicode scripts available for use in regular expressions
I have corrected the statistics:
current code from Sherman:
- A Map.Entry object counts 24 bytes (40 on 64-bit machine)
- An Integer object for the key counts 12 bytes (20 on 64-bit machine)
- A String object counts 36 + 2*length, so for average character name
length of 26:
88 bytes (102
Am 24.04.2010 01:09, schrieb Xueming Shen:
Yes, the final table takes about 500k, we might consider to use a
weakref or something, if memory really
a concern. But the table will get initialized only if you invoke
Character.getName(),
Sherman, how did you compute that value:
- A Map.Entry obj
Am 27.04.2010 19:03, schrieb Xueming Shen:
Ulf Zibis wrote:
I'm wondering, as script.txt only has ~120k.
Ulf, you know we are not talking about Unicode scirpt but Unicode
character name here, right?
Unicode character name data is stored in UnicodeData.txt, you can find
it at make/tools/Unic
Ulf Zibis wrote:
Am 27.04.2010 06:25, schrieb Xueming Shen:
Ulf Zibis wrote:
Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal
uniName.dat is less than 88k (last version is 122+k), but
the I can no long use cpLen as the capacity for the hash
Oops, added attachment.
-Ulf
Am 27.04.2010 16:35, schrieb Ulf Zibis:
Am 27.04.2010 06:25, schrieb Xueming Shen:
Ulf Zibis wrote:
Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal
uniName.dat is less than 88k (last version is 122+k), but
th
Am 27.04.2010 06:25, schrieb Xueming Shen:
Ulf Zibis wrote:
Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal
uniName.dat is less than 88k (last version is 122+k), but
the I can no long use cpLen as the capacity for the hashmap. I'm now
usin
Ulf Zibis wrote:
I would like to have the 3 special cases INHERITED, COMMON and
UNKNOWN together at the beginning or end of the enum list.
Why? Since the current list is generated by the script from the
Scripts.txt, it's in the order of what
they are in the Scripts.txt, any particular reason
Ulf Zibis wrote:
Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal uniName.dat
is less than 88k (last version is 122+k), but
the I can no long use cpLen as the capacity for the hashmap. I'm now
using a hardcoded 2 for 5.2.
Again, is 88k
Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal uniName.dat
is less than 88k (last version is 122+k), but
the I can no long use cpLen as the capacity for the hashmap. I'm now
using a hardcoded 2 for 5.2.
Again, is 88k the compressed or
Am 27.04.2010 00:01, schrieb Xueming Shen:
Ulf Zibis wrote:
I would like to see the full names redundantly in the aliases map.
Needs only ~100 * (4 + 4) bytes in HashMap.
This is the implementation details, we can defer the difference for now.
I said that with the alternative of UnicodeScript
Ulf Zibis wrote:
Am 26.04.2010 07:28, schrieb Xueming Shen:
Can I assume we are all OK with at least the API part of the latest
webrev/blenderrev of
the script support in j.l.Character and j.u.r.Pattern, including the
j.l.Chareacter.getName().
I guess you mean:
public static enum Unicod
Ulf Zibis wrote:
Am 24.04.2010 01:09, schrieb Xueming Shen:
Ulf Zibis wrote:
- I like the idea, saving the data in a compressed binary file,
instead classfile static data.
- wouldn't PreHashMaps be faster initialized as a normal HashMaps in
j.l.Character.UnicodeScript and j.l.CharacterName?
Am 24.04.2010 01:09, schrieb Xueming Shen:
Ulf Zibis wrote:
- I like the idea, saving the data in a compressed binary file,
instead classfile static data.
- wouldn't PreHashMaps be faster initialized as a normal HashMaps in
j.l.Character.UnicodeScript and j.l.CharacterName?
I don't think so.
Am 26.04.2010 07:28, schrieb Xueming Shen:
Can I assume we are all OK with at least the API part of the latest
webrev/blenderrev of
the script support in j.l.Character and j.u.r.Pattern, including the
j.l.Chareacter.getName().
I guess you mean:
public static enum UnicodeScript {
Can I assume we are all OK with at least the API part of the latest
webrev/blenderrev of
the script support in j.l.Character and j.u.r.Pattern, including the
j.l.Chareacter.getName().
http://cr.openjdk.java.net/~sherman/script/blenderrev.html
http://cr.openjdk.java.net/~sherman/script/webrev
Martin Buchholz wrote:
Providing script support is obvious and non-controversial,
because other regex programming environments provide it.
Check that the behavior and syntax of the extension is
consistent with e.g. ICU, python, and especially perl
(5.12 just released!)
http://perldoc.perl.org/pe
Providing script support is obvious and non-controversial,
because other regex programming environments provide it.
Check that the behavior and syntax of the extension is
consistent with e.g. ICU, python, and especially perl
(5.12 just released!)
http://perldoc.perl.org/perlunicode.html
I would a
Am 24.04.2010 01:09, schrieb Xueming Shen:
Ulf Zibis wrote:
- I like the idea, saving the data in a compressed binary file,
instead classfile static data.
- wouldn't PreHashMaps be faster initialized as a normal HashMaps in
j.l.Character.UnicodeScript and j.l.CharacterName?
I don't think so.
Ulf Zibis wrote:
- I like the idea, saving the data in a compressed binary file,
instead classfile static data.
- wouldn't PreHashMaps be faster initialized as a normal HashMaps in
j.l.Character.UnicodeScript and j.l.CharacterName?
I don't think so. The key for these 2 cases is the whole unico
Yuri Gaevsky wrote:
Hi Sherman,
A couple of minor comments:
- There is a typo (Uniocde) in
Character.UnicodeScript.forName(java.lang.String):
"Returns the UnicodeScript with the given Uniocde script name or the
script
name alias. "
- Shouldn't the method be more specific i
Ulf Zibis wrote:
(3) the syntax for script constructs. In addition to the "normal"
\p{InScriptName} and \P{InScriptName} for the script support
I'm also adding
\p{script=ScriptName} \P{script=ScriptName} for the new script
support
\p{block=BlockName} \P{block=BlockName} for the "
Am 22.04.2010 10:01, schrieb Xueming Shen:
Hi,
Here is the webrev of the proposal to add Unicode script support in
regex and j.l.Character.
http://cr.openjdk.java.net/~sherman/script/webrev
and the corresponding blenderrev
http://cr.openjdk.java.net/~sherman/script/blenderrev.html
Please c
35 matches
Mail list logo