subject:"\"Suggestions\\\\\\\?\""

Re: Suggestions?

2018-02-22 Thread via Unicode


On 22.02.2018 05:01, David Starner via Unicode wrote:

On Wed, Feb 21, 2018 at 7:55 AM Jeb Eldridge via Unicode
 wrote:


Where can I post suggestions and feedback for Unicode?


Here is as good as any place. There are specific places for a few
specific things, but likely if you do have something thats likely to
get changed, youll need the help of someone here to get through the
process. It is a quarter-century old technical standard embedded in
most electronics, so I would temper any expectations for major
changes; it works the way it works because thats the way previous
versions worked, and nobody is interested in the trouble changing 
them

would involve.



Yes and no. This list is for informal discussion, so someone unsure 
about things may start here, but posting on this list does not count as 
feedback or suggestions to Unicode. So by all means post here some of 
your ideas and understand more.


Regards
John Knightley



Links:
--
[1] mailto:unicode@unicode.org

Re: Suggestions?

2018-02-21 Thread David Starner via Unicode

On Wed, Feb 21, 2018 at 7:55 AM Jeb Eldridge via Unicode <
unicode@unicode.org> wrote:

> Where can I post suggestions and feedback for Unicode?
>

Here is as good as any place. There are specific places for a few specific
things, but likely if you do have something that's likely to get changed,
you'll need the help of someone here to get through the process. It is a
quarter-century old technical standard embedded in most electronics, so I
would temper any expectations for major changes; it works the way it works
because that's the way previous versions worked, and nobody is interested
in the trouble changing them would involve.

Re: Suggestions?

2018-02-21 Thread James Kass via Unicode

http://www.unicode.org/faq/faq_on_faqs.html#34

Re: Suggestions?

2018-02-21 Thread Philippe Verdy via Unicode

The Unicode website has a section for feedback in its menu, but in separate
projects for TUS and for CLDR.
There are also feedbacks requested for every proposed amendment to the
standard, annexes, and data. First search the relevant topic on the
website, then look at the side bar if there's no specific feedback link on
the main page content.
Feedback or proposals are submitted within an online form, and will then be
forwarded by email to interested subcommities and possible subscribers.
For data submission to CLDR, this is done by the survey tool, when it is
open.
For reference implementations, that have an opensourced repository,
feedback is submitted via the links given in the repository itself.

Basically, you need to look for the most relevant topic, and then use the
appropriate link so that this can be sorted and sent to the correct people.
There's also a feedback for questions related to Unicode memberships, or
for legal requests.

There's also a general feedback link, but don't expect an emergency
response, it may take time to reach the right people to get an answer, and
unsorted/unqualified feedbacks take time to be classified and extracted
from the fog of incoming spams or non-relevant submissions.

If you don't know where to post, this mailing list can guide you, but this
is not the place to submit a formal request, and various people (including
me) may reply to you, and any reply you would receive from this list is not
endorsed ofciially by Unicode, this is more a "community" list used to
interconnect interested people and discuss about how to improve the
proposals, or being guided before submitting a qualified formal request, or
ask for peer review before submitting it.

2018-02-21 16:23 GMT+01:00 Jeb Eldridge via Unicode :

>
>
>
>
> Where can I post suggestions and feedback for Unicode?
>
>
>
>
>

Re: Suggestions?

2018-02-21 Thread Asmus Freytag via Unicode


  
  
On 2/21/2018 7:23 AM, Jeb Eldridge via
  Unicode wrote:


  
  
  
  
 
 
Where can I post suggestions and feedback
  for Unicode?
 
 
  

What kinds of suggestions / what kind of
feedback are we talking about?
A./

Suggestions?

2018-02-21 Thread Jeb Eldridge via Unicode



Where can I post suggestions and feedback for Unicode?

Re: Suggestions in Unicode Indic FAQ

2003-02-05 Thread Doug Ewell

Kent Karlsson  wrote:

> Consider English.  If I write "", that may well be a spell error.

Or even "Ŋŋŋŋ!", as Michael Everson wrote in WG2 N2306.

-Doug Ewell
 Fullerton, California

RE: Suggestions in Unicode Indic FAQ

2003-02-03 Thread Kent Karlsson




> > No, with proper reordering (and "normal" display mode), the e-matra at
> > the beginning of the second word would appear to be last glyph of the
> > first "word".  Similarly, for the second case, the e-matra glyph would
> > have come to the left of the pa.  The fluent reader (ok, not me...)
> > would then see those errors anyway, just like I can find spelling
> > errors in Swedish, most often without any kind of special marking. (I'm
> > assuming through-out that reordrant combining characters 
> are reordered.)
> 
> Illegal sequences

There are no illegal sequences.

> are not reordered as you indicated.

Then that is a problem with the display software you are using.

> Also, as far as I
> know there is no mention of reordering of illegal input sequence (or
> invalid combining mark) in Unicode standard.

Again, there are no "illegal input sequences".

> Consider the last set of glyphs (left-to-right, top-to-bottom) in the
> attached image. It is the rendering effect of illegal input sequence

See above.

> "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" 
> [U+0915] and without any dotted circle.

Let's see if I understand you. <093F, 0915> is the input.  Since
093F is a combining character, one should (not must, but should)
treat this *as if* the input was <0020, 093F, 0915>.  Since 093F
is also reordrant, one must reorder it before the preceding base
character (at least, more for consonant clusters), so the output
glyphs would be <>. 
(But your image does not show that.)

> As you might be knowing the correct input
> sequence should be U+0915 followed by U+093F.

That would be a different input (whether that is correct or
not depends on the authors intent).

> In that case the result would
> have been similar to what appears right now. 

Similar ONLY if you disregard the space "glyph" that should
have been there.

> (Though some more
> sophisticated font/application may want to replace the 
> appearing glyph for
> U+093F to be substituted by some other glyph with proper 
> attachment point).

That may be.

> Now there is no way that user can identify this illegal input sequence
> without dotted circle.

Yes, there is.  Don't disregard the space "glyph".

> In the worst case even this rendered glyph is
> attached to the character from a class (for example, 
> consonant cluster of
> "Ka" "Virama" "Ma") for which the glyph has been designed to 
> render with.
> In such case even a fluent reader can not identify the error.
> 
> > 
> > There are spelling errors, yes.  But there are other ways 
> of indicating
> > spelling errors, that are (by now) fairly conventional for 
> any language
> > (as long as there is an appropriate dictionary installed), 
> and that also
> > are more general (in catching more spelling errors) and 
> less obtrusive
> > (the author really wants to write it that way, for some reason).
> > 
> > > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > > he embedded the fonts into the page.  As long as you are looking
> > > at the page on-line, with the embedded fonts, these errors are
> > > invisible.  
> > > 
> > > It may be typographically horrible.  It *should* be 
> typographically
> > > horrible in order to illustrate bad sequences clearly.
> > 
> > I'd prefer little red wiggly lines under the word, or 
> yellow background
> > or some such (just for screen display, not for printing; 
> screen grabs
> > not counted).  And that for any spelling "error".
> 
> Spelling mistakes can be categorized into two different classes.

???

> One
> arising from illegal input sequence (e.g., Vowel Sign E as the first
> character in a word)

There are no illegal input sequences.

> and the other one is legal input sequence with no
> contextual meaning in the dictionary.

A simple spell checker just checks if the word is in the 
dictionary or not (without worrying about the context).
That would catch what you call "illegal input sequences" too.

> While indication of the  second type
> of mistake is generally used only in sophisticated 
> applications like word processor, 

Why?  There is nothing in principle hindering a spell checker
to be used in a "plain text" editor.

> everyone wants to know the first kind of mistake.

Without a spell checker, but with proper rendering, spelling
errors can be detected by a fluent reader, since they look
different also without any dotted circles. For some ambiguous
Indic cases, like a prefix matra, consonant, postfix matra, all
possible character sequences for them are misspellings (as far
as I know).

> With your
> explanation it seems that even plain text editor is not 
> useful at all to identify such common typing mistakes!

Consider English.  If I write "", that may well be a spell error.
Do I deserve to get the rendering of that string to be littered by
dotted circles just because a sequence of four n's "has to" be
a spell error?

/Kent K

> - Keyur

RE: Suggestions in Unicode Indic FAQ

2003-02-03 Thread Kent Karlsson

> --- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > > 
> > > No fallback rendering is coming into picture with your explanation. 
> > 
> > Yes, there is.  A character sequence  (say)
> > is very unlikely to have a ligature, specially adapted (and fitting)
> > adjustment points, or similar.  The rendering would in that sense
> > need to use a fallback mechanism that renders an "approximation"
> > for this rare combination.
> 
> Do you mean to say that an application has to take care of combination of

s/has to/should, also in display,/

> all other Unicode characters with each combining marks in the fallback

Including multiple combining marks on one base character.

> mechanism for such approximation? Can you count the number of combinations
> which may result in millions!?

Many, many more.  Which is why you need a fallback mechanism (rather
than ligatures, adjustment points, etc. which cannot handle that many
combinations).

In the case of Indic postfix and prefix matras, the general handling is
in principle simple: for the postfix ones, nothing special need be done,
for the prefix ones (i.e. the reordrant ones) do the reordering (before
the preceding base character at least, for certain Indic combinations,
move it even earlier).  Then the you have the "visual order".  I'm
ignoring ligature formation here, but that has to be done as well. For
the superscript, subscript, and split matras (and other combining
marks) the general  approach is a bit more complicated.  See
http://www.unicode.org/notes/tn2/ for hints.

/Kent K

RE: Suggestions in Unicode Indic FAQ

2003-02-02 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > 
> > No fallback rendering is coming into picture with your explanation. 
> 
> Yes, there is.  A character sequence  (say)
> is very unlikely to have a ligature, specially adapted (and fitting)
> adjustment points, or similar.  The rendering would in that sense
> need to use a fallback mechanism that renders an "approximation"
> for this rare combination.

Do you mean to say that an application has to take care of combination of
all other Unicode characters with each combining marks in the fallback
mechanism for such approximation? Can you count the number of combinations
which may result in millions!?

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-02-02 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:

> > 
> > Without that dotted circle appearing, the e-matra would appear to
> > have been properly encoded, 
> 
> No, with proper reordering (and "normal" display mode), the e-matra at
> the beginning of the second word would appear to be last glyph of the
> first "word".  Similarly, for the second case, the e-matra glyph would
> have come to the left of the pa.  The fluent reader (ok, not me...)
> would then see those errors anyway, just like I can find spelling
> errors in Swedish, most often without any kind of special marking. (I'm
> assuming through-out that reordrant combining characters are reordered.)

Illegal sequences are not reordered as you indicated. Also, as far as I
know there is no mention of reordering of illegal input sequence (or
invalid combining mark) in Unicode standard.

Consider the last set of glyphs (left-to-right, top-to-bottom) in the
attached image. It is the rendering effect of illegal input sequence
"Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" [U+0915] and
without any dotted circle. As you might be knowing the correct input
sequence should be U+0915 followed by U+093F. In that case the result would
have been similar to what appears right now. (Though some more
sophisticated font/application may want to replace the appearing glyph for
U+093F to be substituted by some other glyph with proper attachment point).
Now there is no way that user can identify this illegal input sequence
without dotted circle. In the worst case even this rendered glyph is
attached to the character from a class (for example, consonant cluster of
"Ka" "Virama" "Ma") for which the glyph has been designed to render with.
In such case even a fluent reader can not identify the error.

> 
> There are spelling errors, yes.  But there are other ways of indicating
> spelling errors, that are (by now) fairly conventional for any language
> (as long as there is an appropriate dictionary installed), and that also
> are more general (in catching more spelling errors) and less obtrusive
> (the author really wants to write it that way, for some reason).
> 
> > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > he embedded the fonts into the page.  As long as you are looking
> > at the page on-line, with the embedded fonts, these errors are
> > invisible.  
> > 
> > It may be typographically horrible.  It *should* be typographically
> > horrible in order to illustrate bad sequences clearly.
> 
> I'd prefer little red wiggly lines under the word, or yellow background
> or some such (just for screen display, not for printing; screen grabs
> not counted).  And that for any spelling "error".

Spelling mistakes can be categorized into two different classes. One
arising from illegal input sequence (e.g., Vowel Sign E as the first
character in a word) and the other one is legal input sequence with no
contextual meaning in the dictionary. While indication of the second type
of mistake is generally used only in sophisticated applications like word
processor, everyone wants to know the first kind of mistake. With your
explanation it seems that even plain text editor is not useful at all to
identify such common typing mistakes!

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
<>

RE: Suggestions in Unicode Indic FAQ

2003-01-31 Thread Kent Karlsson

Keyur Shroff wrote:
...
> 
> No fallback rendering is coming into picture with your explanation. 

Yes, there is.  A character sequence  (say)
is very unlikely to have a ligature, specially adapted (and fitting)
adjustment points, or similar.  The rendering would in that sense
need to use a fallback mechanism that renders an "approximation"
for this rare combination.

...
> Here is the para you are talking about.
> 
> [Quote]
[...]
> should be rendered as if they had a space as a base character."
> [/Quote]
> 
> In the text there is no mention of explicitly inputting space character
> before any combining mark that is defective combining character.

The text says "as if". Which I also emphasised before.

> Also, the word "should be rendered" implies that it is recommendation. 

Yes.  A rather good one.  

> > By removing that particular fallback mechanism from implementations
[inserting dotted circle glyphs for allegedly "invalid" combinations]
> > as well as the TUS text!  (I'm serious!) This particular fallback
> > mechanism is NOT recommended as it stands.  
> 
> Note that the text has been written in the section "Implementation
> Guidelines". Can't it be considered as recommendation?

That particular one, no.  Just an example [that isn't very good,
outside of a general "show invisibles" mode].

> > But since its mention is erroneously taken as a recommendation, I'd 
> > suggest removing also its mention.
> 
> This is disastrous! What will happen to the systems which already
> implemented this recommendations!?

It's not a recommendation.

> Will they be considered invalid
> implementation afterwards? What is about stability?

They are ugly implementations as they are.  And will stay ugly
implementations.  Stability is good ;-).

/Kent K

RE: Suggestions in Unicode Indic FAQ

2003-01-31 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> 
> > Clearly, since in this case the sign is not
> > preceded by any consonant base, it has to be rendered using one of the
> > mechanisms specified in fallback rendering of non-spacing marks.
> 
> If it is preceded by a SPACE (or is first in a string/paragraph/similar)
> it should be rendered as a "freestanding" glyph (no dotted circle).  If
> it
> is preceded, in the source string, by, say, FULL STOP, a typographically
> acceptable rendering would be to have the vowel sign E glyph float on
> top of the glyph for the FULL STOP (no dotted circle).

No fallback rendering is coming into picture with your explanation. 

> > I add that this is a good way of displaying a combining mark that has
no
> > base character, i.e. one occurring at the begin of a line or paragraph.
> No, those should be displayed *as if* preceded by a SPACE (TUS 3.0 page 
> 121).

Now here you are really talking about fallback rendering :-). 

Here is the para you are talking about.

[Quote]
"In a degenerate case, a nonspacing marks occurs as the first character in
the text or is separated from its base character by a line separator,
paragraph separator, ot other formatting character that causes a positional
separation. This result is called a defective combining character sequence
(see chapter 3.5, Combinations). Defective combining character sequences
should be rendered as if they had a space as a base character."
[/Quote]

In the text there is no mention of explicitly inputting space character
before any combining mark that is defective combining character. Also, the
word "should be rendered" implies that it is recommendation. 

> > 
> > Then how can we rake care of fallback mechanism?
> 
> By removing that particular fallback mechanism from implementations
> as well as the TUS text!  (I'm serious!) This particular fallback
> mechanism is NOT recommended as it stands.  

Note that the text has been written in the section "Implementation
Guidelines". Can't it be considered as recommendation? (although not
necessary for implementation)

> But since its mention is erroneously taken as a recommendation, I'd 
> suggest removing also its mention.

This is disastrous! What will happen to the systems which already
implemented this recommendations!? Will they be considered invalid
implementation afterwards? What is about stability?

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread jameskass

.
Kent Karlsson wrote,

> > I add that this is a good way of displaying a combining mark that has no
> > base character, i.e. one occurring at the begin of a line or paragraph.
> 
> No, those should be displayed *as if* preceded by a SPACE (TUS 3.0 page 121).

So it says.  But, the 'space method' could be interpreted as being under
the "Simple Overlap" fallback rendering method, since the paragraph from
which you are quoting appears immediately after the paragraph treating 
with "Simple Overlap" method.

In the first paragraph under "Fallback Rendering" (bottom of page 120), 
the "Show Hidden" method is described.  It would have been redundant to 
say that degenerate cases **under the "Show Hidden" method** need to
be displayed as if on a dotted circle, since the "Show Hidden" method
is **all about displaying on dotted circles** whenever there is an
inability to draw the sequence.  Like, in the event of an invalid
sequence.

Whether to use "Show Hidden" or "Simple Overlap" method to display 
invalid or degenerate sequences should be left up to the various
software developers.

It does seem to be very easy to spot bad input when the dotted circle
appears in the display.  Stands out like a sore thumb, which was
probably the intention.

Perhaps this section of T.U.S. could stand some clarification.

Best regards,

James Kass
.

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread John Hudson

At 01:20 AM 1/30/2003, Marco Cimarosti wrote:


However, I totally agree with Kent that this funny rendering is *not* a
requirement of the Unicode standard, as Keyur Shroff seems to suggest. It is
just an example of many "several methods [that] are available to deal with"
strange sequences.


Perhaps there is some confusion here because the use of the dotted circle 
is an explicit recommendation of the MS Indic OpenType spec, which is what 
the majority of Indian font developers are now working with. So there may 
be some confusion between what is expected in the MS spec and what is 
required by Unicode.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

A book is a visitor whose visits may be rare,
or frequent, or so continual that it haunts you
like your shadow and becomes a part of you.
   - al-Jahiz, The Book of Animals

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Marco Cimarosti

Keyur Shroff wrote:
> > However, I totally agree with Kent that this funny 
> rendering is *not* a
> > requirement of the Unicode standard, as Keyur Shroff seems 
> to suggest. It
> > is just an example of many "several methods [that] are 
> available to deal
> > with" strange sequences.
> 
> A sequence should not be treated as "strange" sequence if it has been
> written intentionally. It may have some contextual meaning.

I said "strange" in the sense of character sequences that are not part of
the ordinary spelling of any language. In fact, a thing like a matra
floating in the air or on a dotted circle is something that you'd only see
in a text (not necessarily *in* an Indian language) which talks about
spelling, character sets, and the like.

> Also, what is good or bad is also subjective. It may also 
> vary from one script to another.

Yes, but what is mandatory and what is not in Unicode sciould not be too
much subjective, else we could not call it a "standard".

_ Marco

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Kent Karlsson


> Let me give a proper example this time. Consider a "Vowel Sign E" [U+0947]
> appearing after any non-consonant character. This sign is generally
> attached to the consonants. It has zero advance width with negative left
> side bearing in the font. 

Ok.

> Clearly, since in this case the sign is not
> preceded by any consonant base, it has to be rendered using one of the
> mechanisms specified in fallback rendering of non-spacing marks.

If it is preceded by a SPACE (or is first in a string/paragraph/similar)
it should be rendered as a "freestanding" glyph (no dotted circle).  If it
is preceded, in the source string, by, say, FULL STOP, a typographically
acceptable rendering would be to have the vowel sign E glyph float on
top of the glyph for the FULL STOP (no dotted circle).  Similarly for a
vowel sign E that follows a LATIN CAPITAL LETTER A. (But I don't expect
good positioning, just readable.) Again similarly, a vowel sign I that
follows an EQUAL SIGN should be rendered as a vowel sign I glyph to the
left of an EQUAL SIGN glyph.  No dotted circle. (I know that the reordrant
vowel signs may reorder over more than the preceding base character IF it
is a (sub)string in an Indic script.) Again similarly, a 
string should be rendered as a KA + II + II glyph sequence (invoking
any ligature for KA + II if there is one in the font; II + II is
unlikely to have any ligature, since it is not used by any orthography). 
No dotted circle(s). The fallback hinted at in TUS 3.0 that uses dotted
circles is 1) typographically horrible, and 2) cannot indicate that
there is any error in the given character sequence.

...
> the application. Now in order to render it with dotted circle if we
> introduce the circle in the text before this sign then also 
> the circle is invalid base for this "Vowel Sign E".

No base character is invalid for any combining character.

...
> > Languages or syllable boundaries have nothing to do with this. These
> > special
> > sequences should *never* be part of any syllable or word in any language:
> > they are just a way of showing the shape of a glyph, to be used when,
> > e.g., talking about typography or spelling.
> 
> Then how can we rake care of fallback mechanism?

By removing that particular fallback mechanism from implementations
as well as the TUS text!  (I'm serious!) This particular fallback
mechanism is NOT recommended as it stands.  But since its mention is
erroneously taken as a recommendation, I'd suggest removing also its
mention.  That mechanism is as bad as misplacing glyphs for combining
marks on the glyph(s) for the follow-on character, if not worse.
("Show invisibles" (for all of the text or a "user" selected run
of the text) is an entirely different story.)

/Kent K

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Kent Karlsson


> > I don't know where you find support for that position in that text.
> > Can you please quote?  There are no "invalid base consonants" for
> > any dependent vowel (for Indic scripts; similarly for any 
> > other script).
> 
> Actually, there is a mention of displaying combining marks on dotted
> circles:

I know.  But there is no mention (that I have found) of "invalid base
characters" or any recommendation for using dotted circles especially
for Indic scripts.

> I add that this is a good way of displaying a combining mark that has no
> base character, i.e. one occurring at the begin of a line or paragraph.

No, those should be displayed *as if* preceded by a SPACE (TUS 3.0 page 121).

/Kent K

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Keyur Shroff

--- Marco Cimarosti <[EMAIL PROTECTED]> wrote:

> 
> I add that this is a good way of displaying a combining mark that has no
> base character, i.e. one occurring at the begin of a line or paragraph.
> 
> However, I totally agree with Kent that this funny rendering is *not* a
> requirement of the Unicode standard, as Keyur Shroff seems to suggest. It
> is just an example of many "several methods [that] are available to deal
> with" strange sequences.

A sequence should not be treated as "strange" sequence if it has been
written intentionally. It may have some contextual meaning.

> 
> > Any combining characters can be placed on any base characters without
> > there being any dotted circles displayed.

Not only that, but it is also desirable. How can one write a vowel matra
both with and without dotted circle in a single document if Unicode
recommends to place it only on top of space character? Matra with dotted
circle is sometimes useful as in the case of printing/explaining Unicode
standard. A user may want to hide dotted circle in the same document in
order to explain the actual shape of the matra character, i.e., without
dotted circle. Both kind of rendering behaviour is possible. There should
be some mechanism either to turn on or off dotted circle depending on the
default behaviour.

Also, what is good or bad is also subjective. It may also vary from one
script to another.

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Marco Cimarosti

Kent Karlsson wrote:
> Keyur Shroff wrote
> [...]
> > In Indic scripts any sign that appear in text not in 
> > conjunction with a
> > valid consonant base may be rendered with dotted circle as fallback
> > mechanism (Section 5.14 "Rendering Nonspacing Marks"
> > http://www.unicode.org/uni2book/ch05.pdf).
> 
> I don't know where you find support for that position in that text.
> Can you please quote?  There are no "invalid base consonants" for
> any dependent vowel (for Indic scripts; similarly for any 
> other script).

Actually, there is a mention of displaying combining marks on dotted
circles:

"Several methods are available to deal with an unknown composed
character sequence that is outside of a fixed, renderable set [...]. One
method (Show Hidden) indicates the inability to draw the sequence by drawing
the base character first and then rendering the nonspacing mark as an
individual unit - with the nonspacing mark positioned on a dotted circle."
(The Unicode Standard 3.0, page 120 - 5.14 Rendering Nonspacing Marks -
Fallback Rendering)

I add that this is a good way of displaying a combining mark that has no
base character, i.e. one occurring at the begin of a line or paragraph.

However, I totally agree with Kent that this funny rendering is *not* a
requirement of the Unicode standard, as Keyur Shroff seems to suggest. It is
just an example of many "several methods [that] are available to deal with"
strange sequences.

> > Any system implementing this as
> > default behaviour should not be considered buggy.
> 
> Indeed they are.  And it should certainly not be default behaviour.

In this case, I disagree with Kent: displaying these dotted circles is not
mandatory, but certainly not a bug.

> Any combining characters can be placed on any base characters without
> there being any dotted circles displayed.

True. But notice that Kent (against his own opinion) correctly wrote "can",
not "must".

> [...]

_ Marco

Re: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Aditya Gokhale




 
To support what Kayur has to say I will add few more 
things.
 
Take for instance a "vowel sigh" (matras as we call here in 
India) e.g. say is e (U093F), is combined with a consonant like ka (U0915) in 
the sequence it forms ke. (Please see the first image). The repositioning of the 
shape happens automatically. If the anyone puts the e matra (U093F) first and 
then the consonant ka (U0915) then they should be highlighted, putting a space 
can still make them look like ka. So to make this mistake very explicit, we have 
to put a dotted circle. If wrong combination is stored, it will create lot 
of problem in searching the data as well as in sorting. Please refer BIS (Bureau 
of Indian Standards) ISCII 91 / 88 documentation for this where in 
 

 
 
-Aditya
 
 
- Original Message - 
From: "Keyur Shroff" <[EMAIL PROTECTED]>
To: "'Unicode Mailing List'" <[EMAIL PROTECTED]>
 > --- Marco Cimarosti <[EMAIL PROTECTED]> wrote:> 
> Keyur Shroff wrote:> > > But sometimes a user may want visual 
representation of these > > > symbols in two different ways: with 
dotted circle and> > > without dotted circle.> > > 
> Why not using a dotted circle character explicity, when you want to 
see> > one?> > Note that whenever I mention the word 
"combining mark" I am really talking> about "vowel signs (matras)" and 
other modifiers in Indic scripts which is> script dependent. I am sorry 
if I have confused you with the combining> diacritical marks in the block 
[U+0300-U+036F] which I really didn't mean.> > Let me give a 
proper example this time. Consider a "Vowel Sign E" [U+0947]> appearing 
after any non-consonant character. This sign is generally> attached to 
the consonants. It has zero advance width with negative left> side 
bearing in the font. Clearly, since in this case the sign is not> 
preceded by any consonant base, it has to be rendered using one of the> 
mechanisms specified in fallback rendering of non-spacing marks. If we> 
render it with space, as you said, then we have to insert "space" 
character> at the time of fallback rendering (which can be taken care in 
rendering> pipeline) even though space character is not present in 
backing store of> the application. Now in order to render it with dotted 
circle if we> introduce the circle in the text before this sign then also 
the circle is> invalid base for this "Vowel Sign E". As a result, again 
fallback rendering> will take place with rendering circle and the vowel 
sign positionally> separate. In this case first dotted circle will apear 
which will be> followed by vowel sign (matra) on top of space 
character.> > If you know any other way to solve this problem then 
please explain. Also> let me know if I have misinterpreted the text 
written in Unicode standard.> > > > > > > 
Example of> > > this could be RAsup on top of dotted circle and 
RAsup on top of space> > > character. Current use of space 
character to eliminate dotted > > > circle is really painful and 
may create problems in determining > > > language and syllable 
boundaries.> > > > Languages or syllable boundaries have 
nothing to do with this. These> > special> > sequences 
should *never* be part of any syllabe or word in any language:> > they 
are just a way of showing the shape of a glyph, to be used when,> > 
e.g., talking about typography or spelling.> > Then how can we 
rake care of fallback mechanism?> > > Thanks for taking 
pain for answering my queries :-)> > - Keyur> > 
> > __> Do 
you Yahoo!?> Yahoo! Mail Plus - Powerful. Affordable. Sign up 
now.> http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Keyur Shroff

--- Marco Cimarosti <[EMAIL PROTECTED]> wrote:
> Keyur Shroff wrote:
> > But sometimes a user may want visual representation of these 
> > symbols in two different ways: with dotted circle and
> > without dotted circle.
> 
> Why not using a dotted circle character explicity, when you want to see
> one?

Note that whenever I mention the word "combining mark" I am really talking
about "vowel signs (matras)" and other modifiers in Indic scripts which is
script dependent. I am sorry if I have confused you with the combining
diacritical marks in the block [U+0300-U+036F] which I really didn't mean.

Let me give a proper example this time. Consider a "Vowel Sign E" [U+0947]
appearing after any non-consonant character. This sign is generally
attached to the consonants. It has zero advance width with negative left
side bearing in the font. Clearly, since in this case the sign is not
preceded by any consonant base, it has to be rendered using one of the
mechanisms specified in fallback rendering of non-spacing marks. If we
render it with space, as you said, then we have to insert "space" character
at the time of fallback rendering (which can be taken care in rendering
pipeline) even though space character is not present in backing store of
the application. Now in order to render it with dotted circle if we
introduce the circle in the text before this sign then also the circle is
invalid base for this "Vowel Sign E". As a result, again fallback rendering
will take place with rendering circle and the vowel sign positionally
separate. In this case first dotted circle will apear which will be
followed by vowel sign (matra) on top of space character.

If you know any other way to solve this problem then please explain. Also
let me know if I have misinterpreted the text written in Unicode standard.

> 
> > Example of
> > this could be RAsup on top of dotted circle and RAsup on top of space
> > character. Current use of space character to eliminate dotted 
> > circle is really painful and may create problems in determining 
> > language and syllable boundaries.
> 
> Languages or syllable boundaries have nothing to do with this. These
> special
> sequences should *never* be part of any syllabe or word in any language:
> they are just a way of showing the shape of a glyph, to be used when,
> e.g., talking about typography or spelling.

Then how can we rake care of fallback mechanism?

Thanks for taking pain for answering my queries :-)

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Kent Karlsson

Keyur Shroff wrote
> Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > 
> > A space followed by a dependent vowel sign should display just the
> > dependent vowel sign, no dotted circle.  Indeed, (except for a "show
> > invisibles" mode, or a "character chart" display mode) no (Indic or
> > other)
> > text that does not contain the *character* DOTTED CIRCLE should ever
> > display a dotted circle as part of the displayed text. Systems that
> > do display a dotted circle (in normal display mode) where there is
> > no such *character* in the displayed text are buggy!
> 
> In Indic scripts any sign that appear in text not in 
> conjunction with a
> valid consonant base may be rendered with dotted circle as fallback
> mechanism (Section 5.14 "Rendering Nonspacing Marks"
> http://www.unicode.org/uni2book/ch05.pdf).

I don't know where you find support for that position in that text.
Can you please quote?  There are no "invalid base consonants" for
any dependent vowel (for Indic scripts; similarly for any other script).

> Any system implementing this as
> default behaviour should not be considered buggy.

Indeed they are.  And it should certainly not be default behaviour.

Any combining characters can be placed on any base characters without
there being any dotted circles displayed.  In particular, any combining
Devanagari characters (note: including, in principle, several dependent
vowels, even if that does not occur in any (existing) orthography) can
be placed on any Devanagari base character as well as SPACE (and other
punctuation). What should result is a reasonable composed glyph, no
dotted circle in sight (except in show invisibles mode, which I'm not
discussing here). Spelling errors should be indicated otherwise, since
they are of a very different nature.

> For scripts other than Indic scripts, it may be useful to render the
> nonspacing mark without dotted circle because even after 
> rendering it as an
> overlap glyph, the result is recognizable. However, for Indic 
> scripts use
> of dotted circle is very useful as default behaviour since it gives
> immediate feedback to the user that there may be some 
> defective combining
> character in the text. Most of the time such errors are unintentional
> rather than intentional.

No combination of base + combining characters is defective per se.
Even if the scripts are different within the combining sequence.
(Note also that the 0300 block of combining characters are script
independent.) Spelling errors is something else entirely.

> Unicode has provision to remove this dotted circle.

I'm not sure what you are talking about here.

> Space 
> character is used
> to give indication to fallback mechanism that no dotted 
> circle should be
> used while rendering this stand alone sign which is normally 
> attached to
> other characters. This is useful when sometimes user want to 
> display the
> sign without any circle. Also, with this scheme it is 
> possible to show some
> combining marks with dotted circle and some without dotted circle.

The fallback mechanisms talked about in section 5.14 of TUS 3.0 is
the use of less than ideal (typographically!) mechanisms to display
an *approximation* of the glyph(s) for the combining sequence.

An exceedingly bad approximation is displaying a dotted circle as a
fake base (again: disregarding "show invisibles", or "chart" modes,
which, however, should be consistent and show a dotted circle fake
base for ALL combining characters occurring in the text).  The use
of this exceedingly bad approximation (in normal display mode) does
in no way indicate that the combining sequence is at all defective.
It may indicate that the display engine (or the font) is defective...

/Kent K

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Marco Cimarosti

Keyur Shroff wrote:
> But sometimes a user may want visual representation of these 
> symbols in two different ways: with dotted circle and
> without dotted circle.

Why not using a dotted circle character explicity, when you want to see one?

> Example of
> this could be RAsup on top of dotted circle and RAsup on top of space
> character. Current use of space character to eliminate dotted 
> circle is really painful and may create problems in determining 
> language and syllable boundaries.

Languages or syllable boundaries have nothing to do with this. These special
sequences should *never* be part of any syllabe or word in any language:
they are just a way of showing the shape of a glyph, to be used when, e.g.,
talking about typography or spelling.

> The main problem with space character is that unlike
> ZWJ/ZWNJ/Dotted Circle, it falls within the range of other 
> important script "Latin". 

Plain wrong! White-space characters and punctuation do not belong to any
script: character such as " ", "!" and "?" are used for many scripts and
languages. Even the "danda" punctuation, which is in the Devanagari range,
does not belong to Devanagari: it is also used for other Indic scripts.

> Use of INV character in one shot can solve all these
> problems. We can put it in "consonant" class which
> can help text processing applications. [...]

How can calling a "consonant" something which has nothing to do with
consonants help anybody doing anything?

_ Marco

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> 
> A space followed by a dependent vowel sign should display just the
> dependent vowel sign, no dotted circle.  Indeed, (except for a "show
> invisibles" mode, or a "character chart" display mode) no (Indic or
> other)
> text that does not contain the *character* DOTTED CIRCLE should ever
> display a dotted circle as part of the displayed text. Systems that
> do display a dotted circle (in normal display mode) where there is
> no such *character* in the displayed text are buggy!

In Indic scripts any sign that appear in text not in conjunction with a
valid consonant base may be rendered with dotted circle as fallback
mechanism (Section 5.14 "Rendering Nonspacing Marks"
http://www.unicode.org/uni2book/ch05.pdf). Any system implementing this as
default behaviour should not be considered buggy. What should be the
default rendering behaviour (i.e., show hidden or not) may vary from one
script to another script and also depends on implementation policy. 

For scripts other than Indic scripts, it may be useful to render the
nonspacing mark without dotted circle because even after rendering it as an
overlap glyph, the result is recognizable. However, for Indic scripts use
of dotted circle is very useful as default behaviour since it gives
immediate feedback to the user that there may be some defective combining
character in the text. Most of the time such errors are unintentional
rather than intentional.

Unicode has provision to remove this dotted circle. Space character is used
to give indication to fallback mechanism that no dotted circle should be
used while rendering this stand alone sign which is normally attached to
other characters. This is useful when sometimes user want to display the
sign without any circle. Also, with this scheme it is possible to show some
combining marks with dotted circle and some without dotted circle.

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Keyur Shroff

--- Marco Cimarosti <[EMAIL PROTECTED]> wrote:

> Why not representing INV with a double ZWJ? E.g.:
> 
>   ISCII Unicode
>   KA halant INV KA virama ZWJ ZWJ
>   RA halant INV RA virama ZWJ ZWJ (i.e., repha)
>   INV halant RA ZWJ ZWJ virama RA (RAsub)
> 
> This has the advantage that the most common sequences will work OK also
> on
> old display engines implemented *before* the double-ZWJ convention is
> introduced.
> 
> E.g., sequence "KA virama ZWJ ZWJ" works well also on an old engine, for
> the
> simple reason that the first ZWJ is enough to do the work, and  the
> second ZWJ is invisible.
> 
> Of course, an old engine will still display a  for  virama
> ZWJ ZWJ>, but that is not worse than displaying  followed by a
> white box, which is what would happen with your new INV character.

Certainly. This looks more promising because even RAsub has two alternate
forms. One form is used with consonants KA, KHA, GHA, etc and the other
form is used with consonants TTA, TTHA, DDA, DDHA, etc. With your ZWJ based
scheme we can insert as many ZWJ as we wish to produce all possible
alternate forms!

But sometimes a user may want visual representation of these symbols in two
different ways: with dotted circle and without dotted circle. Example of
this could be RAsup on top of dotted circle and RAsup on top of space
character. Current use of space character to eliminate dotted circle is
really painful and may create problems in determining language and syllable
boundaries. The main problem with space character is that unlike
ZWJ/ZWNJ/Dotted Circle, it falls within the range of other important script
"Latin". Finally it may affect all important text processing which uses
Unicode characters to find language boundaries. Use of INV character in one
shot can solve all these problems. We can put it in "consonant" class which
can help text processing applications. Moreover, it will be difficult for
all possible to provide upward compatibility all the time even though it is
desirable. Implementation of Unicode will need to be upgraded with every
introduction of new glyphs or rules. Otherwise applications have to
explicitly declare the version of Unicode used in implementation.

- Keyur

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Kent Karlsson


> The [new] INV character in Unicode can also be used for displaying dependent
> vowel matras without dotted circle.

A space followed by a dependent vowel sign should display just the
dependent vowel sign, no dotted circle.  Indeed, (except for a "show
invisibles" mode, or a "character chart" display mode) no (Indic or other)
text that does not contain the *character* DOTTED CIRCLE should ever
display a dotted circle as part of the displayed text. Systems that
do display a dotted circle (in normal display mode) where there is
no such *character* in the displayed text are buggy!

/Kent K

(B.t.w. the chart dotted circle glyph for combining characters
look a bit different from the (normal) glyph for DOTTED CIRLCE.)

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Marco Cimarosti

Keyur Shroff wrote:
> In the FAQ
>http://www.unicode.org/faq/indic.html#16
> 
> It is mentioned that following are equivalent
> 
> ISCII Unicode
> KA halant INV KA virama ZWJ
> RA halant INV RAsup (i.e., repha)

The last line is really bizarre! I would agree that it is plain wrong...

What is supposed to appear in column "Unicode" is the Unicode *encoding*
equivalent to the  in the "ISCII" column. But "RAsup (i.e.,
repha)" is the description of a *glyph*.

> In fact there is no way in Unicode to produce RAsup directly, 
> i.e., without using base consonant. [...]

I agree. This issue has been raised several times, and several viable
solutions have been proposed, but I don't remember that Unicode "officials"
ever showed to even acknowledge the problem.

But probably this has been noted down and discussed. I hope to see an
official solution in TUS 4.0.

> SUGGESTION-3:
> 
> Use of SPACE character as consonant may create problem for 
> state machine which finds language/syllable boundary.
> In fact we need a codepoint for one invisible consonant
> (similar to INV in ISCII) in Unicode which can solve
> this problem with Unicode.
> 
> After inclusion of INV character the following can be recommended.
> 
> ISCII Unicode
> KA halant INV KA virama INV
> RA halant INV RA virama INV (i.e., repha)
> INV halant RA INV virama RA (RAsub)

Why not representing INV with a double ZWJ? E.g.:

ISCII Unicode
KA halant INV KA virama ZWJ ZWJ
RA halant INV RA virama ZWJ ZWJ (i.e., repha)
INV halant RA ZWJ ZWJ virama RA (RAsub)

This has the advantage that the most common sequences will work OK also on
old display engines implemented *before* the double-ZWJ convention is
introduced.

E.g., sequence "KA virama ZWJ ZWJ" works well also on an old engine, for the
simple reason that the first ZWJ is enough to do the work, and  the second
ZWJ is invisible.

Of course, an old engine will still display a  for , but that is not worse than displaying  followed by a
white box, which is what would happen with your new INV character.

_ Marco

Suggestions in Unicode Indic FAQ

2003-01-29 Thread Keyur Shroff

Hello,

There are few discrepancies in Indic FAQ. Though it was reported earlier by
Andy White, I see they still have place there in the FAQ. I also clarified
it but by mistake I sent the mail to Yahoo groups where this mailing list
is archived and hence my mail never reached to this mailing list. You can
refer to the link http://groups.yahoo.com/group/unicode/message/16352


The following are the suggestions.

SUGGESTION-1:

In the FAQ
   http://www.unicode.org/faq/indic.html#2
it is mentioned that 

ISCII:   Unicode:
Halant + Halant  Halant + ZWJ

produce similar result. This is wrong. In ISCII, Halant+Halant is known as
explicit halant and its Unicode equivalent sequence is Halant+ZWNJ. So ZWJ
should be replaced by ZWNJ.


SUGGESTION-2:

In the FAQ
   http://www.unicode.org/faq/indic.html#16

It is mentioned that following are equivalent

ISCII Unicode
KA halant INV KA virama ZWJ
RA halant INV RAsup (i.e., repha)

In fact there is no way in Unicode to produce RAsup directly, i.e., without
using base consonant. The sequence "RA virama ZWJ" will actually produce
half-RA (or eyelash-RA) which is used commonly in Marathi. eyelash-RA can
also be produced with the sequence "RA Halant Nukta" sequence both in ISCII
(known as soft halant) and Unicode (just for conformance with ISCII).

Also, in the same answer the following sequence is recommended.

ISCII Unicode
INV halant RA SPACE virama RA (RAsub)



SUGGESTION-3:

Use of SPACE character as consonant may create problem for state machine
which finds language/syllable boundary. In fact we need a codepoint for one
invisible consonant (similar to INV in ISCII) in Unicode which can solve
this problem with Unicode.

After inclusion of INV character the following can be recommended.

ISCII Unicode
KA halant INV KA virama INV
RA halant INV RA virama INV (i.e., repha)
INV halant RA INV virama RA (RAsub)

The INV character in Unicode can also be used for displaying dependent
vowel matras without dotted circle.

Unicode
INV Vowel sign O
INV Vowel sign AI

etc. This can replace existing definition of "SPACE" as invisible consonant
depending on the context.

Any other pointers!!?

- Keyur


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: Suggestions for next print edition

2001-12-03 Thread juuichiketajin



> You can always search the big Unihan.txt file on the kJapaneseKun
> and kJapaneseOn fields, which provide whatever information we have
> on pronunciation of the characters in Japanese.
> 
> If you are just stuck looking up stuff because it isn't marked up
> for Japanese, try getting Sanseido's Unicode Kanji
> Information Dictionary, which has the first 20,902 kanji in Unicode
> (the most useful set) all marked up with all the Japanese pronunciations
> (where they have any). 

The first suggestion is useless. The file is too freaking big so maybe I'll go with 
the second. Thanks.

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze

Re: Suggestions for next print edition

2001-12-03 Thread Kenneth Whistler


11-digit boy suggested:

> 1. Unicode points are NUMBERS. Numbers can be written in ANY base. 
> Knowing decimal values of codepoints is sometimes useful, so please 
> print them in the next edition of the Unicode book.

The UTC has already decided not to do that, as it clutters up the
charts. Hexadecimal is far more useful to most implementers of
the standard. Hex/decimal conversion is only as far away as the
little calculator accessories available on any OS these days.

> 
> 2. There was a Shift-JIS index for kanji. I don't know much about 
> kanji, but it seems to me that they are arranged in a-i-u-e-o order 
> of on'yomi. Why not print little hiragana letters at the top to aid 
> people searching for a kanji?

Again, this is not the function of the charts. The radical/stroke
index is available for general lookup, but we cannot provide phonetic
indices, too, for Japanese, Chinese, Korean, and Vietnamese lookup.
You can always search the big Unihan.txt file on the kJapaneseKun
and kJapaneseOn fields, which provide whatever information we have
on pronunciation of the characters in Japanese.

If you are just stuck looking up stuff because it isn't marked up
for Japanese, try getting Sanseido's Unicode Kanji
Information Dictionary, which has the first 20,902 kanji in Unicode
(the most useful set) all marked up with all the Japanese pronunciations
(where they have any). I suspect that Sanseido will soon be
updating that dictionary to include Vertical Extension A, as well.

--Ken

Suggestions for next print edition

2001-12-02 Thread juuichiketajin


1. Unicode points are NUMBERS. Numbers can be written in ANY base. Knowing decimal 
values of codepoints is sometimes useful, so please print them in the next edition of 
the Unicode book.

2. There was a Shift-JIS index for kanji. I don't know much about kanji, but it seems 
to me that they are arranged in a-i-u-e-o order of on'yomi. Why not print little 
hiragana letters at the top to aid people searching for a kanji?

Remember how I could not find the "ran" of "randamu" before? Let's see this time... 
Aha! There is is!
I know it was somewhere between "mo(kuyoubi)" and "(fu)ro". Better than stroke / 
radical, I wonder?
* Disclaimer: From what I hear, the Japanese do NOT write "randamu" as U+4E71 U+3060 
U+3080. They use U+30E9 U+30F3 U+30C0 U+30E0. But the first is cuter. ^_^
-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze

Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Michael \(michka\) Kaplan


Sandeep,

Can you explain exactly what you are doing to get the data from ASP into the
Oracle database? Perhaps post the ASP code? Like most scriptoing languages,
VBScript and JScript both support UCS-2, and it is really usually the Oracle
ODBC or OLE DB driver that has the job of converting the text from UCS-2 to
UTF-8. I would wonder if what you are seeing is some type of "double
conversion?"

So the things that would be interesting to know:

1) The data access method to Oracle
2) Version of the driver being used
3) A sample of the code/script being used

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

- Original Message -
From: "Sandeep Krishna" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 3:12 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


> i mean all the entries at both Web server machine's registry and
Oracle
> Database server machine's registry or either one.
> in our setup... my machine is the Web Server and the Oracle Server is a
> separate machine
> please clarify
>
> regards,
>
> Sandeep
> - Original Message -
> From: Kedar Moghe <[EMAIL PROTECTED]>
> To: Unicode List <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 3:21 PM
> Subject: RE: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> I think you need to change at following three places,
> HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
> HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
> HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG
>
> Best of luck
>
> Regards,
>
> Kedar
>
> -Original Message-
> From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 27, 2000 5:45 PM
> To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
> Cc: [EMAIL PROTECTED]
> Subject: Re: unicode + oracle query... (suggestions needed...)
>
>
> hi...
>
> i m thoroughly confused.
> actually the registry entries for oracle shows 3 entries for NLS_LANG.
> and that too at the WEB SERVER end and at the DATABASE SERVER end.
> so that makes tooo many combinations...
>
> can someone indicate which of these NLS_LANG entries have to be set as
> "AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what
exactly
> should be there
>
> pls suggest necessary messures..
>
> regards,
>
> Sandeep
>
>
>
>
> - Original Message -
> From: Bob Verbrugge <[EMAIL PROTECTED]>
> To: Sandeep Krishna <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 1:30 PM
> Subject: Re: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> You probably need to change the NLS_LANG Oracle setting in the registry.
> Look under
> HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
> character set part to UTF8.
>
> Bob.
>
>
> - Original Message -
> From: "Sandeep Krishna" <[EMAIL PROTECTED]>
> To: "Unicode List" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 9:16 AM
> Subject: Re: unicode + oracle query... (suggestions needed...)
>
>
> > hi,
> >
> > thankx for responding.
> >
> > but when u mention change in the registry..
> > could u elaborate about where exactly in reg and what changes are
required
> >
> > my registry setting shows NLS = American_English.UTF8.
> >
> > is this the setting u indicated..or something to so with the charset
entry
> :
> > autodetect and autodetect_all (in classid...Mime>database>charset..)
> >
> > pls do elaborate
> >
> > regards,
> >
> > Sandeep
> >
> >
> >
> > - Original Message -
> > From: Kedar Moghe <[EMAIL PROTECTED]>
> > To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
> > Sent: Wednesday, September 27, 2000 11:20 AM
> > Subject: RE: unicode + oracle query... (suggestions needed...)
> >
> >
> > Sandeep,
> >
> > I think you need to set the registry charset to UTF8 where database is
> > installed. We were was getting the same problem when we use to send
UTF-8
> > strings to oracle database after conversion from Shift-JIS to UTF8. That
> > time also the byte sequence of the retrieved string is getting changed
and
> > some of the bytes are getting replaced with BF.
> >
> > Regards,
> >
> > Kedar
> >
> > -Original Message-
> > From: Sandeep Krishna [mailto:[E

RE: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Kedar Moghe


Only registry entries on the database machine. Not any other entry.

Regsrds,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 6:21 PM
To: Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


i mean all the entries at both Web server machine's registry and Oracle
Database server machine's registry or either one.
in our setup... my machine is the Web Server and the Oracle Server is a
separate machine
please clarify

regards,

Sandeep
- Original Message -
From: Kedar Moghe <[EMAIL PROTECTED]>
To: Unicode List <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 3:21 PM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge <[EMAIL PROTECTED]>
To: Sandeep Krishna <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


> hi,
>
> thankx for responding.
>
> but when u mention change in the registry..
> could u elaborate about where exactly in reg and what changes are required
>
> my registry setting shows NLS = American_English.UTF8.
>
> is this the setting u indicated..or something to so with the charset entry
:
> autodetect and autodetect_all (in classid...Mime>database>charset..)
>
> pls do elaborate
>
> regards,
>
> Sandeep
>
>
>
> - Original Message -
> From: Kedar Moghe <[EMAIL PROTECTED]>
> To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 11:20 AM
> Subject: RE: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> I think you need to set the registry charset to UTF8 where database is
> installed. We were was getting the same problem when we use to send UTF-8
> strings to oracle database after conversion from Shift-JIS to UTF8. That
> time also the byte sequence of the retrieved string is getting changed and
> some of the bytes are getting replaced with BF.
>
> Regards,
>
> Kedar
>
> -Original Message-
> From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 27, 2000 11:36 AM
> To: Unicode List
> Subject: unicode + oracle query... (suggestions needed...)
>
>
> hi
>
> actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
> cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
> back..
> (i used UTF-8 encoding for both writing to the database and also for
> retriving and displaying..)
>
> there were some amazing observations...
>
> * each  unicode character was taking 7 bytes in the database. (instead of
> expected 2 or 3...)
> * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval
>
> I was unable to find the reasons for these strange occurrences
> Pls suggest what could be the causes for these..
>
> regards,
>
> Sandeep.
>
>
>
>

> ***
> SANDEEP KRISHNA
> Member Technical Staff (Priceline.com)
> H.C.L. Technologies Limited
> A-1 C&D, Sector -16, NOIDA, UP, India.
> Ph:  91-11-91-4516321 (extn. 1062)
> Fax: 91-11-91-4510713, 4510226
> E-Mail : [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>
>
>
>

Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna


i mean all the entries at both Web server machine's registry and Oracle
Database server machine's registry or either one.
in our setup... my machine is the Web Server and the Oracle Server is a
separate machine
please clarify

regards,

Sandeep
- Original Message -
From: Kedar Moghe <[EMAIL PROTECTED]>
To: Unicode List <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 3:21 PM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge <[EMAIL PROTECTED]>
To: Sandeep Krishna <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


> hi,
>
> thankx for responding.
>
> but when u mention change in the registry..
> could u elaborate about where exactly in reg and what changes are required
>
> my registry setting shows NLS = American_English.UTF8.
>
> is this the setting u indicated..or something to so with the charset entry
:
> autodetect and autodetect_all (in classid...Mime>database>charset..)
>
> pls do elaborate
>
> regards,
>
> Sandeep
>
>
>
> - Original Message -
> From: Kedar Moghe <[EMAIL PROTECTED]>
> To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 11:20 AM
> Subject: RE: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> I think you need to set the registry charset to UTF8 where database is
> installed. We were was getting the same problem when we use to send UTF-8
> strings to oracle database after conversion from Shift-JIS to UTF8. That
> time also the byte sequence of the retrieved string is getting changed and
> some of the bytes are getting replaced with BF.
>
> Regards,
>
> Kedar
>
> -Original Message-
> From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 27, 2000 11:36 AM
> To: Unicode List
> Subject: unicode + oracle query... (suggestions needed...)
>
>
> hi
>
> actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
> cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
> back..
> (i used UTF-8 encoding for both writing to the database and also for
> retriving and displaying..)
>
> there were some amazing observations...
>
> * each  unicode character was taking 7 bytes in the database. (instead of
> expected 2 or 3...)
> * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval
>
> I was unable to find the reasons for these strange occurrences
> Pls suggest what could be the causes for these..
>
> regards,
>
> Sandeep.
>
>
>
>

> ***
> SANDEEP KRISHNA
> Member Technical Staff (Priceline.com)
> H.C.L. Technologies Limited
> A-1 C&D, Sector -16, NOIDA, UP, India.
> Ph:  91-11-91-4516321 (extn. 1062)
> Fax: 91-11-91-4510713, 4510226
> E-Mail : [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>
>
>
>

RE: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Kedar Moghe


Sandeep,

I think you need to change at following three places,
HKEY_LOCAL_MACHINE\ORACLE\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\ALL_HOMES\ID0\NLS_LANG
HKEY_LOCAL_MACHINE\ORACLE\HOME0\NLS_LANG

Best of luck

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 5:45 PM
To: Carl W. Brown; Bob Verbrugge; Kedar Moghe
Cc: [EMAIL PROTECTED]
Subject: Re: unicode + oracle query... (suggestions needed...)


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge <[EMAIL PROTECTED]>
To: Sandeep Krishna <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


> hi,
>
> thankx for responding.
>
> but when u mention change in the registry..
> could u elaborate about where exactly in reg and what changes are required
>
> my registry setting shows NLS = American_English.UTF8.
>
> is this the setting u indicated..or something to so with the charset entry
:
> autodetect and autodetect_all (in classid...Mime>database>charset..)
>
> pls do elaborate
>
> regards,
>
> Sandeep
>
>
>
> - Original Message -
> From: Kedar Moghe <[EMAIL PROTECTED]>
> To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 11:20 AM
> Subject: RE: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> I think you need to set the registry charset to UTF8 where database is
> installed. We were was getting the same problem when we use to send UTF-8
> strings to oracle database after conversion from Shift-JIS to UTF8. That
> time also the byte sequence of the retrieved string is getting changed and
> some of the bytes are getting replaced with BF.
>
> Regards,
>
> Kedar
>
> -Original Message-
> From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 27, 2000 11:36 AM
> To: Unicode List
> Subject: unicode + oracle query... (suggestions needed...)
>
>
> hi
>
> actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
> cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
> back..
> (i used UTF-8 encoding for both writing to the database and also for
> retriving and displaying..)
>
> there were some amazing observations...
>
> * each  unicode character was taking 7 bytes in the database. (instead of
> expected 2 or 3...)
> * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval
>
> I was unable to find the reasons for these strange occurrences
> Pls suggest what could be the causes for these..
>
> regards,
>
> Sandeep.
>
>
>
>

> ***
> SANDEEP KRISHNA
> Member Technical Staff (Priceline.com)
> H.C.L. Technologies Limited
> A-1 C&D, Sector -16, NOIDA, UP, India.
> Ph:  91-11-91-4516321 (extn. 1062)
> Fax: 91-11-91-4510713, 4510226
> E-Mail : [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>
>
>
>

Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna


hi...

i m thoroughly confused.
actually the registry entries for oracle shows 3 entries for NLS_LANG.
and that too at the WEB SERVER end and at the DATABASE SERVER end.
so that makes tooo many combinations...

can someone indicate which of these NLS_LANG entries have to be set as
"AMERICAN_AMERICA.UTF8" and if some of them doesnt need this...what exactly
should be there

pls suggest necessary messures..

regards,

Sandeep




- Original Message -
From: Bob Verbrugge <[EMAIL PROTECTED]>
To: Sandeep Krishna <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 1:30 PM
Subject: Re: unicode + oracle query... (suggestions needed...)


Sandeep,

You probably need to change the NLS_LANG Oracle setting in the registry.
Look under
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE for this setting and change the
character set part to UTF8.

Bob.


- Original Message -
From: "Sandeep Krishna" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 9:16 AM
Subject: Re: unicode + oracle query... (suggestions needed...)


> hi,
>
> thankx for responding.
>
> but when u mention change in the registry..
> could u elaborate about where exactly in reg and what changes are required
>
> my registry setting shows NLS = American_English.UTF8.
>
> is this the setting u indicated..or something to so with the charset entry
:
> autodetect and autodetect_all (in classid...Mime>database>charset..)
>
> pls do elaborate
>
> regards,
>
> Sandeep
>
>
>
> - Original Message -
> From: Kedar Moghe <[EMAIL PROTECTED]>
> To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
> Sent: Wednesday, September 27, 2000 11:20 AM
> Subject: RE: unicode + oracle query... (suggestions needed...)
>
>
> Sandeep,
>
> I think you need to set the registry charset to UTF8 where database is
> installed. We were was getting the same problem when we use to send UTF-8
> strings to oracle database after conversion from Shift-JIS to UTF8. That
> time also the byte sequence of the retrieved string is getting changed and
> some of the bytes are getting replaced with BF.
>
> Regards,
>
> Kedar
>
> -Original Message-
> From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 27, 2000 11:36 AM
> To: Unicode List
> Subject: unicode + oracle query... (suggestions needed...)
>
>
> hi
>
> actually i have been trying to use ASPs (UTF-8 encoding..) to write
unicode
> cahracters to an Oracle DB table (varchar2 field)... and then retrieve
them
> back..
> (i used UTF-8 encoding for both writing to the database and also for
> retriving and displaying..)
>
> there were some amazing observations...
>
> * each  unicode character was taking 7 bytes in the database. (instead of
> expected 2 or 3...)
> * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded
as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval
>
> I was unable to find the reasons for these strange occurrences
> Pls suggest what could be the causes for these..
>
> regards,
>
> Sandeep.
>
>
>
>

> ***
> SANDEEP KRISHNA
> Member Technical Staff (Priceline.com)
> H.C.L. Technologies Limited
> A-1 C&D, Sector -16, NOIDA, UP, India.
> Ph:  91-11-91-4516321 (extn. 1062)
> Fax: 91-11-91-4510713, 4510226
> E-Mail : [EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>
>
>
>

Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread J%ORG KNAPPEN


Sandeep Krishna schrieb:

>   * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval

Oops, it seems that this particular version of Oracle is only 7,5bit clean ...
Hope they fix it soon, otherwise you need UTF-7d5 (inofficial) as a workaround.

--J"org Knappen

RE: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Carl W. Brown




Sandeep,
 
what 
version of Oracle?  What API?
 
Carl

  -Original Message-From: Sandeep Krishna 
  [mailto:[EMAIL PROTECTED]]Sent: Tuesday, September 
  26, 2000 8:36 PMTo: Unicode ListSubject: unicode + 
  oracle query... (suggestions needed...)
  hi
   
  actually i have been trying to use ASPs (UTF-8 
  encoding..) to write unicode cahracters to an Oracle DB table (varchar2 
  field)... and then retrieve them back..
  (i used UTF-8 encoding for both writing to 
  the database and also for retriving and displaying..)
   
  there were some amazing 
  observations...
   
  * each  unicode character was taking 7 bytes in the database. (instead of expected 
  2 or 3...)
  * some unicode characters(or rather code 
  points.) like' F95F' when encoded in 
  UTF-8 was being encoded as EF A5 BF, when 
  it should have been encoded as EF A5 
  9F..  in fact many unicode charcters whose encoded form 
  had to had a byte in the range (80..9F) 
  were being somehow changed to BF ... thus 
  resulting in incorrect retrieval
   
  I was unable to find the reasons for these 
  strange occurrences
  Pls suggest what could be the causes for 
  these..
   
  regards,
   
  Sandeep.
   
   
  ***    
  SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. 
  Technologies LimitedA-1 C&D, Sector -16, NOIDA, UP, 
  India.Ph:  91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 
  4510226E-Mail : [EMAIL PROTECTED]

Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread Sandeep Krishna


hi,

thankx for responding.

but when u mention change in the registry..
could u elaborate about where exactly in reg and what changes are required

my registry setting shows NLS = American_English.UTF8.

is this the setting u indicated..or something to so with the charset entry :
autodetect and autodetect_all (in classid...Mime>database>charset..)

pls do elaborate

regards,

Sandeep



- Original Message -
From: Kedar Moghe <[EMAIL PROTECTED]>
To: 'Sandeep Krishna' <[EMAIL PROTECTED]>
Sent: Wednesday, September 27, 2000 11:20 AM
Subject: RE: unicode + oracle query... (suggestions needed...)


Sandeep,

I think you need to set the registry charset to UTF8 where database is
installed. We were was getting the same problem when we use to send UTF-8
strings to oracle database after conversion from Shift-JIS to UTF8. That
time also the byte sequence of the retrieved string is getting changed and
some of the bytes are getting replaced with BF.

Regards,

Kedar

-Original Message-
From: Sandeep Krishna [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 27, 2000 11:36 AM
To: Unicode List
Subject: unicode + oracle query... (suggestions needed...)


hi

actually i have been trying to use ASPs (UTF-8 encoding..) to write unicode
cahracters to an Oracle DB table (varchar2 field)... and then retrieve them
back..
(i used UTF-8 encoding for both writing to the database and also for
retriving and displaying..)

there were some amazing observations...

* each  unicode character was taking 7 bytes in the database. (instead of
expected 2 or 3...)
* some unicode characters(or rather code points.) like' F95F' when encoded
in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as
EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
byte in the range (80..9F) were being somehow changed to BF ... thus
resulting in incorrect retrieval

I was unable to find the reasons for these strange occurrences
Pls suggest what could be the causes for these..

regards,

Sandeep.




***
SANDEEP KRISHNA
Member Technical Staff (Priceline.com)
H.C.L. Technologies Limited
A-1 C&D, Sector -16, NOIDA, UP, India.
Ph:  91-11-91-4516321 (extn. 1062)
Fax: 91-11-91-4510713, 4510226
E-Mail : [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>

unicode + oracle query....... (suggestions needed...)

2000-09-26 Thread Sandeep Krishna




hi
 
actually i have been trying to use ASPs (UTF-8 
encoding..) to write unicode cahracters to an Oracle DB table (varchar2 
field)... and then retrieve them back..
(i used UTF-8 encoding for both writing to the 
database and also for retriving and displaying..)
 
there were some amazing 
observations...
 
* each  unicode character was taking 7 bytes in the database. (instead of expected 
2 or 3...)
* some unicode characters(or rather code points.) 
like' F95F' when encoded in UTF-8 was being 
encoded as EF A5 BF, when it should have 
been encoded as EF A5 9F..  in fact 
many unicode charcters whose encoded form had to had a byte in the range (80..9F) were being somehow changed to BF ... thus resulting in incorrect 
retrieval
 
I was unable to find the reasons for these strange 
occurrences
Pls suggest what could be the causes for 
these..
 
regards,
 
Sandeep.
 
 
***    
SANDEEP KRISHNAMember Technical Staff (Priceline.com)H.C.L. Technologies 
LimitedA-1 C&D, Sector -16, NOIDA, UP, India.Ph:  
91-11-91-4516321 (extn. 1062)Fax: 91-11-91-4510713, 4510226E-Mail : [EMAIL PROTECTED]

41 matches

Mail list logo