Re: Persian UTF-8 MySql collation

2004-07-05 Thread Peter Cruickshank
On Sat, 3 Jul 2004 22:10:24 +0430
"Ehsan Akhgari" <[EMAIL PROTECTED]> wrote:

[...]

> > I think you and the team I'm working with are trying to do
> > the same thing - it would be great if we could work together
> > and come up with a solution that anyone else can use too.
> 
> I looked around a bit, and it seems like MySQL 4.1.x will be supporting
> UTF-8.  MySQL 4.0.x doesn't have that support (the version I'm using on
> the production server is 4.0.18-standard.)  Because of that,
> incorporating that support into MySQL might require a lot more work that
> I currently imagine. Unfortunately in that case, I'll have to leave MySQL
> as it is, and sort the data at the client site (less efficient, but
> requiring less development time), and since the application I'm working
> on doesn't store very big chunks of data in the db, I may decide to
> sacrifice performance for development time.

Right. I was thinking about adding UTF-8 Persian collation to MySql 4.1.x
- our project will involve a fairly large amount of data, so we'd like to
have the option of sorting at the DB level.

> > What's involved in creating a collation file? These two pages:
> > http://dev.mysql.com/doc/mysql/en/Adding_character_set.html
> > http://dev.mysql.com/doc/mysql/en/Character_arrays.html
> > http://dev.mysql.com/doc/mysql/en/String_collating.html
> > seem to say that's it's not too difficult, if you know what
> > you're doing?
> > (Which I dont. I'm just a humble PHP programmer)
> 
> Well, that seems to be for single-byte code pages.  The Persian character
> coding system used in glibc is UTF-8, and that will require patching
> MySQL source code.  And like I said, because of MySQL's lack of UTF-8
> support, it might require more work that I imagine.  I think I can handle
> it from technical point of view (I'm good at C/C++) but I'm quite pressed
> in free time...

... which is why we're hoping to use MySql 4.1.x 

> > ... it seems it would be great to create a mySql Persian
> > collation file rather than changing the source, with all the
> > problems that would lead to of having to re-patch the code
> > everytime there's a new MySql release? Or is that inevitable?
> 
> Well, if we decide to change the MySQL source code, we can submit our
> patches to MySQL team, and hopefully they will incorporate it into their
> new releases.  Of course in that case we might have to look into adding
> that support to MySQL 4.1.x as well (if it already doesn't have.)  So
> there's no need for re-patching.  There's just a need for time!  :-)

Nope, no Persian collation file for MySql 4.1.x as far as I can see (which
is where we came in!)

> In case I decide not to spend the time in the development of Persian
> collation support in MySQL, I'll be glad to help your team in case they
> need technical programming help.  In that case, I'll let you know
> off-list(remind me if you don't get any note from me within a week,
> please.)

We may be in touch... :-)

Cheers

-- 
Peter Cruickshank
peter # cruickshank # biz

___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: Persian UTF-8 MySql collation

2004-07-05 Thread Roozbeh Pournader
On Tue, 2004-06-29 at 19:41, C Bobroff wrote:

> If you're talking about sorting, it was recently pointed out (see
> archives) that Windows server 2003 can sort Persian properly.

I would appreciate if someone can volunteer to run a test data set
FarsiWeb has on it. I'm 100% sure they won't support Hamzas or Harakat
properly.

roozbeh


___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> [Ehsan, you just replied to me.  Answering on list.]

My bad.  Sorry, I meant to reply to the list.

> Well, you may wish to read a couple documents.  Read Unicode Collation
> Algorithm for example.  Just read the intro or something like that.
> The point is that Persian Collation is only an small table feed to the
> Unicode Collation Algorithm.
> So yes, there is a free Persian collation implementation, Glibc +
> fa_IR locale.

Good point, thanks.  I'll investigate it.

> What you have seen is the binary encoded table.  The source is in the
> fa_IR locale source file.

Thanks, I'll try Googling for it.

> Guys, both of you, if you don't have Glib,

You mean glibc, right?

> and your system
> does not provide what you need, you:
>
> * Either forget about Persian Collation, or
> * Implement your own minimal collation, or

That's what I have in mind, currently.

> * Consider using something like Glibc or uClibc with Persian
>   locale as a library.  Not sure how uClibc deals with Persian
>   locale.

Thanks again,

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> Right. I was thinking about adding UTF-8 Persian collation to MySql
> 4.1.x
> - our project will involve a fairly large amount of data, so we'd like
> to have the option of sorting at the DB level.

I've never tested MySQL 4.1.x.  Have you tried it?  How is the UTF-8
support?  Have you tried Persian collation in MySQL 4.1.x to see how much
better it's compared to 4.0.x?

Unfortunately I won't be willing to look into 4.1.x at this time, since it's
Beta, and we don't use Beta products on our productions servers, so doing so
will do no good to my project.

> ... which is why we're hoping to use MySql 4.1.x

I'd give it a try if I were in your shoes.

> Nope, no Persian collation file for MySql 4.1.x as far as I can see
> (which is where we came in!)

How does 4.1.x get Persian sorting?  Like 4.0.x?


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
> That might work for Ehsan, but it sadly wouldn't save much effort for
> us since PHP doesn't do Persian UTF-8 collation (that I've been able
> to get working anyway), or provide access to strxfrm()
>
> :-(
>
> - which is why MySql seemed the least bad option.

Hmmm, if you've compiled PHP with glibc, I suppose you could simply do the
following (code not tested):



And yes, PHP doesn't provide access to strxfrm, but I think it's trivial to
write a PHP extension which provides that function.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing