RE: Persian PC-Kimmo 0.8 released

2004-05-11 Thread Ehsan Akhgari
 For anyone who's interested, Persian PC-Kimmo version
 0.8 has just been released.  It's available here:

 http://home.byu.net/jmd56/download/persian-pckimmo-0.8.tar.gz

Thanks, Jon, for releasing this version.  It looks a lot better than the
previous one!

 The biggest thing holding them back from being a 1.0 is a relatively
 small lexicon (~1350 words).  The morphology engine achieves about
 two-thirds recognition on a corpus of about 3.5 million words.
 And of course, it's GPL'ed.

Hmmm, do you have a list of the words in the current lexicon?  (I'm not
familiar with PC-KIMMO specific commands, so I can't parse them on my own.)
What should I do to help adding more words?

 Any helpful feedback would be appreciated.

I find the new tree-style recognition a lot helpful:

n+mi+]+im NEG+DUR+come.PRES+1P

1:
Top
 |
   Verb
 |
VNEGPREFIXVNStem
n+ __|___
   NEG+ VPREFIX   VStem
  mi+   |
 DUR+V1Stem
|_
 V2Stem  VPSUFFIX
|   +im
 V3Stem +1P
|
V
]
come.PRES

Top:
[ cat:   Top ]

1 parse found

n+mi+]+m NEG+DUR+come.PRES+1S

1:
Top
 |
   Verb
 |
VNEGPREFIXVNStem
n+ __|___
   NEG+ VPREFIX   VStem
  mi+   |
 DUR+V1Stem
|_
 V2Stem  VPSUFFIX
|   +m
 V3Stem +1S
|
V
]
come.PRES

Top:
[ cat:   Top ]

1 parse found

I was wonderring if there's some way to retrieve the tree-structured data in
a format which is easy to parse (the ASCII style is too difficult for a
computer program to parse), something like an XML format maybe?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian PC-Kimmo 0.8 released

2004-05-13 Thread Ehsan Akhgari
Thanks for your reply, Jon.

 Thanks for asking.   All the words are in
 tab-separated text files, as in noun.lex, verb.lex,
 etc.   They get converted to a kimmo-usable file such
 as fa-noun.lex, fa-verb.lex, etc. using the db2lex perl scripts in the
 scripts directory.  The verb and adjective files use a specific script
 written for them; all others use the plain script.  Also see the
 orthography.txt file for the romanization scheme.  It also has some
 other goodies.

 I would love add any additions you might make to the lexicon in the
 next release.

I suppose I can use roman2unicode to convert the roman encoding into
readable plain text (I'm not fast on reading the roman notation).  That way,
I can import the data into Excel, sort it alphabetically, and start adding
new stuff...

 As you can see, it needs a little more work on the morphophonemic
 rules, but it should work fine for stemming purposes.

Yes, it's pretty good at recognizing the stem of the word.

 Hans Nelson is the man to talk to.  He's working on a Kimmo output to
 XML program.  I don't know much about
 it, but here's his email:   [EMAIL PROTECTED]

Thanks for your hint.  I'll try to contact him.  In case you're interested,
I can send the final result of our discussion to you off-list.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Farsi Stemming Algorithm

2004-04-28 Thread Ehsan Akhgari
Hi all,

Does anyone know of any free Farsi Stemming algorithm, like the Porter
algorithm to English?

Thanks a lot!
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-30 Thread Ehsan Akhgari
 One of the things that drives me nuts about the software is that it
 claims to run on Solaris/Sparc, Win/x86, MacOS, or BSD, but apparently
 no Linux (I have a Sparc box, so I'm lucky :-).  The source code is
 downloadable, but it currently doesn't seem to compile on Linux/x86.
 It does have a callable C interface, as documented in the kimmolib.txt
 in this file [2].  In fact, I'm working on an AI program that calls
 PC-Kimmo to do morphology.  Batch mode is used via the 'take' command,
 and using a .tak file.

Here's an update.  I tried to build the whole pc-parse package on Linux
(RedHat 9.0) using gcc 3.2.2, and it compiled without a single problem.  I
also tried running PC-Kimmo, and it was working smoothly.  I noticed that in
the README, they cliam to have tested the build process on the following
platfroms:

  1. Debian GNU/Linux 2.2 (kernel 2.2.17) / gcc 2.95.2, glibc 2.1.3-24
  2. Red Hat Linux 7.3 (kernel 2.4.18) / gcc 2.96, glibc 2.2.5-34
  3. Red Hat Linux 8.0 (kernel 2.4.18-14) / gcc 3.2-7, glibc 2.2.93-5
  4. OpenBSD 3.1 / gcc 2.95.3
  5. Mac OS X (10.2) / gcc 3.1
  6. cygwin 1.3.10-1 (Windows XP Pro) / gcc 2.95.3-5

Maybe you're trying an older version?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Farsi Stemming Algorithm

2004-04-30 Thread Ehsan Akhgari
 (I'll just reply to your other post here)  I guess I didn't know about
 a new pc-parse release.  Where did you get the newest source code?
 That's terrific news for me.

Well, the release I downloaded is approximately one year old, but here's the
URL I downloaded it from:

ftp://ftp.sil.org/software/unix/pc-parse-src-20030321.tgz

To build it, I just did a typical ./configure; make; make install; - there
was nothing more than that.  What compiler version have you used to compile
it?

Let me know if you still have compilation problems.  I might be able to help
if I can reproduce them here.

 I'm very interested in any work you'd work on, including a PHP
 extension.  Maybe SIL.org might be interested as well.

Actually, what I'm working on is an English/Persian search engine which can
be placed on any site with no need to download/install anything.  It's
nearly finished, I only have to translate the web UI into Persian, and also
implement stemming for Persian in the engine.  Originally I planned to
implement a stemming algorithm myself, but I figured that I can't be
considered an expert in Persian grammar/linguistics at all, so I prefer to
use already working solutions, and your work seems to be the *best* choice.

The PHP extension would be quite a thin wrapper, but anyway I'll definitely
provide you with the source code when I'm finished.  You'll be also welcome
to a copy of the search engine's source code itself if you're interested.

 Give me a week and I'll email them to the email address in your
 signature, unless you tell me otherwise.

Thanks a lot!  I highly appreciate your great help.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-18 Thread Ehsan Akhgari
 An important note: what Notepad does here is only acceptable. It's
 not even recommended. HTML 4 clearly doesn't allow a UTF-8 BOM appear
 before the HTML tag. Notepad is supposed to be a text editor. A text
 editor shouldn't insert markup by itself. BTW, ISIRI 6219 strongly
 discourages the use of a BOM in UTF-8 files.

The problem here is that web protocols (HTML for example) don't allow the
BOM, and Notepad is not an HTML editor, so there's nothing to prevent it
from adding the BOM.  Check out:

http://www.unicode.org/faq/utf_bom.html#28

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

'I generally take life as it comes my way', said Death.



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-18 Thread Ehsan Akhgari
 First of all, thank you very much for all the patient and lengthy
 explanations. Very nice of you to share so many tips!
 (Thanks to the others too who answered on and off list!)

Happy to help!

[snip]
 Now that 2 people have said to change ZWNJ to \u200c, I tried that but
 it didn't work. I don't think I have the right tool.

 I couldn't do it in Notepad because as I said, it's WYSIWYG in Persian
 script so if I do a global replacement and stick \u200c in the middle
 of Persian script, that's obviously not going to work (and I also
 tried it for good measure and it didn't work but there may be many
 reasons it didn't work out using Notepad.)

I don't know what you mean here.  Why it doesn't work in Notepad?  Note that
on Windows XP, you can't type ZWNJ inside the Find/Replace dialog box - you
need to copy/paste it from inside the Notepad text editor window.  Another
reason why not to use Notepad.

 Then, since you recommended Frontpage, I tried that. Earlier, it had
 not even occured to me to attempt to open a .js file in  Frontpage
 (version
 2000.) This time I fooled it by changing the extension from .js to
 .html and so was able to open it in html view where all the unicode
 was in numeric style. I changed all the #8204; to \u200c but now I
 see that also has not worked.

Well, I don't know what the problem is here...

BTW, FrontPage 2003 can open the .js file (using File | Open, or drag and
drop) and render the UTF-8 characters without converting them to numeric
entities just fine.  Don't try putting them in an HTML file.  Don't know
about FrontPage 2000, though.

 I think I'm not going to use Notepad for making bidirectional arrays
 from now on! That is insane to go to such great lengths!

Yeah, it's definitely so.

 Not sure what you have in mind here, but at this point, Ill be glad
 just to make it work with ZWNJ.

In the JS code, try to replace the trailing ZWNJ-raa and ZWNJ-o with nothing
using a regex.

HTH,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-20 Thread Ehsan Akhgari
 You can re-live its creation here in the archives:
 http://lists.sharif.edu/pipermail/persiancomputing/2003-June/0
00538.html
[snip]

Thanks for the links.  Seems like a very handy keyboard.  BTW, why the
Shift-Space combination does not work?

 Done! Beautiful!
 I hope the Mozilla users appreciate all this trouble.

 Thanks again for all your help!

You're welcome! :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-25 Thread Ehsan Akhgari
 What is notepad? A text editor? Text editors should not insert a UTF-8
 BOM either. The problem is that Microsoft sometimes invents
 non-standard things and then pushes it so hard that Unicode adds it to
 parts of the standard (or an FAQ). Microsoft conventions for .txt
 files in the Unicode FAQ looks sarcastic to me.

Well, maybe you're right, but I don't see how a text editor is supposed to
know the encoding of a file without some kind of mark.  See, HTTP transfers
the character set using the Content-Type response header.  In HTML, it's
spedified with a meta http-equiv=Content-Type ... tag.  In XML, the
default encoding is UTF-8, and if a document is encoded in another encoding,
it must be specified in the ?xml ? PI.  Plain text files have no means of
identifying the character encoding, so a single text file can be interpreted
as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the
exact character encoding used.

The point here is that, protocols which do not allow BOM are those who
provide other means of specifying the character encoding.  A certain byte
stream can have multiple interpretations depending on what content encoding
you use to interpret it, and there must be some way to cut off this
confusion.

YMMV,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-05-25 Thread Ehsan Akhgari
  Thanks for the links.  Seems like a very handy keyboard.
 BTW, why the
  Shift-Space combination does not work?

 Bug in Microsoft keyboard layout creation tool. Use Shift-B
 temporarily.

Thanks.

I've not done any work in this arena, so what I propose here might make no
sense.  Sorry if that's so.  But, the M$ page on the keyboard layout
creation tool says the tool simplifies the process of creating a keyboard
layout.  Would there be any way to assign ZWNJ to Shift+Space by coding the
keyboard layout tool manually?  If you can send me the C/C++ source file
off-list, I'll try to investigate it further.

If not, I guess Shift+B is not that bad as well.  The keyboard layout rocks,
even without having Shift+Space in place.  :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian-English Dictionary -- Was: Iranian Mac User group

2004-06-04 Thread Ehsan Akhgari
[snip]
 I'm sure this dictionary must have been funded by the Iranian
 government and no profits expected. I'm shocked to see that less than
 a dozen US universities have purchased it. I should think the author
 and publisher would be very happy to see it put online and all the
 efforts go to some use.  Surely they will agree if their name is kept
 with the data!  As for the technical part, I no longer have any doubts
 as to the abilities of the members of this group, especially after
 hearing the keyboard hack job for the sake of the ZWNJ earlier today!

:-)

I did the keyboard job just because I thought it's a lot easier to use
Shift+Space instead of Shift+B, and also because I was in the process of
typing in a lot of Persian data.  It took only about half an hour (not the
time to download the MSKLC tool of course) and improved my typing speed
considerably.

About your proposal, I'm personally interested in doing the technical part
of the job.  I volunteer to implement a web interface for the dictionary,
and I can also provide the hosting for the web interface.  I can provide
some amount of web space for the data as well, but I think we'll need other
people's help as well, because I would guess the whole data would be *huge*.
If the data has to reside on multiple web servers, I can code some sort of
distributed query mechanism which transparently fetches the definitions for
remote web servers and display them to the end user transparently.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Misinformation!

2004-06-04 Thread Ehsan Akhgari
 Here is a solution (in fact a hack) that if implemented correctly, can
 resolve some of the issues till people and Google start using correct
 software:

 With a little tweaking, the web servers can translate the correct
 Unicode to the incorrect unicode desired so much by the Win9X users.
 That is, the web severs looks at the browser request, and if it can
 detect Win9X, translates all U+06CC's in the document to U+064A (and
 all other required translations). The same technique could be used to
 fool google into generating correct search results. That, is the web
 server generates a Win9X friendly version of the document and appends
 it to the original document. You can also allocate tags that the user
 of the web server can disable or enable some of these features. This
 may even make one gain some advatnage over other web hosting
 companies.

That solves half of the problem.  On Win9x, the key d on the keyboard
inserts an Arabic YEH, and on Win2K+, it inserts FARSI YEH.  So, if you use
this method, when a user types in a word containing yeh in the google's
search box on Win9x, they wouldn't find your site.

The best hack (or solution, as one might call it) I've found for this is
feeding a version of page too Google which contains both forms of words
(using YEH and FARSI YEH) so that the chances of google finding your page
for a certain keyword gets maximized.  Of course, certain measures must be
taken to prevent bad results, for example, the proximity of the words must
not get touched.  Nevertheless, this will cause other problems, such as
malformed keyword density, which cannot be solved reliably.  The problem
must be fixed in the search engine code, really, and such hacks have their
own downsides.  The search engine project I've been working on
www.ariasearch.com handles this (and the ARABIC KEHEH and FARSI KEH
problem) among other problems for searching in Persian text.

 Of course, the solution above is only a transient one, and it is up to
 people to upgrade their Win9X machines to something that is
 Unicode-compliant, also it is up to Google to program their systems
 such that it can understand that both U+06CC and U+064A are the same
 shape and hence should be regarded the same for searching unless user
 requests otherwise. This is the same as case-insensitive search that
 is usually implemented by mapping all upper and lower case characters
 -- in documents and queries alike -- to uppercase.

Yeah that's right.  Of course great attention must be paid so that it
doesn't break Arabic search results.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

He who sees the abyss, but with eagle's eyes - he who with eagle's talons
grasps the abyss: he has courage.
-Thus Spoke Zarathustra, F. W. Nietzsche



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Misinformation!

2004-06-04 Thread Ehsan Akhgari
 There's a difference in the case of C++ standard and web
 standards:  Writing non-standard C++ code only produces compile-time
 problems, but if you happen to compile the code, it works correctly
 (or supposed to do so).

Well, that's not exactly so.  Some non-conformant behavior tend to generate
(maybe subtle) runtime behavior differences.  But I see what your point here
is.

 But it's quite a different case in web.
 30-40 percent is low enough to get ignored, counting that the other
 way you are sacrificing the other 60-70% for not being able to find
 the document by searching in Google.  And note that even with Win9x
 and a recent IE, and updated fonts, there's no problem.

I'd definitely do so if the Google search problem couldn't be solved.  But
I've been using a method I've mentioned in my other post to solve that
problem as well.  This was the best way of having the best of the two worlds
that I could think of, but I'm wide open for suggestions/improvements to
this idea.

 About using HTML entities, no matter what the encoding of the page is,
 HTML entities generate Unicode characters.

They do on most browsers, but browsers are not required to do so.  Consider
a browser which can't handle UTF-8 (well, or at all).

 It's quite common to see
 people exporting Persian documents in MS Word, and get an HTML page
 encoded in MS Arabic encoding, with Persian Yeh and Keh encoded in
 HTML entities.

Yes, and that will make their document even more difficult for search
engines to index.  And of course, I'd debate that using CP1256/ISO-8859-6 is
not suitable for Persian documents, but that's another story perhaps.

 PS.  BTW, I just found that using Harakat (kasre, fathe, ...) also
 prevent a hit in Google search :(.  That's quite expected, but perhaps
 I should reconsider my habbit of putting those tiny marks everywhere.

That's another sad fact.  I really think that Google must seriously consider
implementing some such details on their indexing process.  That's also one
of the things that AriaSearch.com handles.

---

Hmmm, now that we're here, how about gathering some volunteers who can work
with Google to fix some of these problems?  In the past, I've contacted
Google on a number of occassions about small problems in their services, and
they seemed quite willing to fix them.  Maybe we would hopefully have a more
Persian-friendly Google in the future this way.

If you feel that this is a good idea, I'd be pleased to take part in that
team.  Comments?

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian-English Dictionary -- Was: Iranian Mac User group

2004-06-08 Thread Ehsan Akhgari
  I volunteer to implement a web interface for the dictionary,
 Excellent!
 You'll have to make it so that whether the user types in bi[ZWNJ]kaar,
 bikaar, or bi kaar, the word will be found!

Yes, that's right.  This is relatively easy to implement.

   but I think we'll need other
  people's help as well, because I would guess the whole data
 would be *huge*.
 Will this require separate dedicated server(s)?
 (I'm thinking about Behdad and the Persian Digital Library here...)

Hmmm, not necessarily *dedicated*.  As long as there's enough web space for
some part of the data to reside on the server, and I have access to it to
install an application which processes the queries locally, it doesn't
really have to be dedicated, unless the server's already fully loaded by
other tasks.  I don't think we'll need dedicated servers for this job.  The
process of searching can be done fast enough.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-03 Thread Ehsan Akhgari
 Ehsan - are you thinking about adding glibc collation to the
 strings/ctype-MYSET.c file? Or something more fundemental?

Well, to tell you the truth, I'm not really sure, since I've not checked the
MySQL source tree yet.  But yes, I'm going to see if glibc support can be
incorporated into MySQL's charset handling mechanism.

 I think you and the team I'm working with are trying to do
 the same thing - it would be great if we could work together
 and come up with a solution that anyone else can use too.

I looked around a bit, and it seems like MySQL 4.1.x will be supporting
UTF-8.  MySQL 4.0.x doesn't have that support (the version I'm using on the
production server is 4.0.18-standard.)  Because of that, incorporating that
support into MySQL might require a lot more work that I currently imagine.
Unfortunately in that case, I'll have to leave MySQL as it is, and sort the
data at the client site (less efficient, but requiring less development
time), and since the application I'm working on doesn't store very big
chunks of data in the db, I may decide to sacrifice performance for
development time.

 What's involved in creating a collation file? These two pages:
 http://dev.mysql.com/doc/mysql/en/Adding_character_set.html
 http://dev.mysql.com/doc/mysql/en/Character_arrays.html
 http://dev.mysql.com/doc/mysql/en/String_collating.html
 seem to say that's it's not too difficult, if you know what
 you're doing?
 (Which I dont. I'm just a humble PHP programmer)

Well, that seems to be for single-byte code pages.  The Persian character
coding system used in glibc is UTF-8, and that will require patching MySQL
source code.  And like I said, because of MySQL's lack of UTF-8 support, it
might require more work that I imagine.  I think I can handle it from
technical point of view (I'm good at C/C++) but I'm quite pressed in free
time...

 ... it seems it would be great to create a mySql Persian
 collation file rather than changing the source, with all the
 problems that would lead to of having to re-patch the code
 everytime there's a new MySql release? Or is that inevitable?

Well, if we decide to change the MySQL source code, we can submit our
patches to MySQL team, and hopefully they will incorporate it into their new
releases.  Of course in that case we might have to look into adding that
support to MySQL 4.1.x as well (if it already doesn't have.)  So there's no
need for re-patching.  There's just a need for time!  :-)

In case I decide not to spend the time in the development of Persian
collation support in MySQL, I'll be glad to help your team in case they need
technical programming help.  In that case, I'll let you know off-list
(remind me if you don't get any note from me within a week, please.)


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian UTF-8 MySql collation

2004-07-05 Thread Ehsan Akhgari
 Right. I was thinking about adding UTF-8 Persian collation to MySql
 4.1.x
 - our project will involve a fairly large amount of data, so we'd like
 to have the option of sorting at the DB level.

I've never tested MySQL 4.1.x.  Have you tried it?  How is the UTF-8
support?  Have you tried Persian collation in MySQL 4.1.x to see how much
better it's compared to 4.0.x?

Unfortunately I won't be willing to look into 4.1.x at this time, since it's
Beta, and we don't use Beta products on our productions servers, so doing so
will do no good to my project.

 ... which is why we're hoping to use MySql 4.1.x

I'd give it a try if I were in your shoes.

 Nope, no Persian collation file for MySql 4.1.x as far as I can see
 (which is where we came in!)

How does 4.1.x get Persian sorting?  Like 4.0.x?


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Linux teaching website

2004-08-07 Thread Ehsan Akhgari
 BTW Ehsan, I consider this off-topic. This is about Persian support in
 software and computers, software written to handle Persian text, etc.
 This is not a list to gather volunteers for a website that happens to
 be about an operating system and in Persian.

 Not that I'm not personally interested, but only that it is off-topic.

Oh, I'm sorry for posting off-topic to the list.  I'll try not to do so
again.  :-)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Vi/Emacs editor with RTL support

2004-09-01 Thread Ehsan Akhgari
 Not anything really useful.  Vim has a rightleft mode (:set
 rightleft), which is useful for ONLY RIGHT-TO-LEFT text.

 Emacs, it's worse:  there's an emacs-unicode branch, an
 emacs-bidi branch, and the emacs-head branch.  They are
 trying to merge the three of them for a few years now!

Thanks for your reply, Behdad.

So, is there any editor you would recommend that has good support for
bidirectional (Persian and English) text, and preferrably supporting HTML
(but an editor without HTML support will also be just fine)?  The latest one
I'm working with is Bluefish, but it has some minor problems, and I'm
looking to see if there's something better available.

TIA,
-
Ehsan Akhgari

Learn Linux in Persian: http://www.persian-linux.org/



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-10-31 Thread Ehsan Akhgari
 Hi friends,

 The FarsiWeb Project's website http://farsiweb.info/ is now
 up-to-date with a new Wiki system.

Congrats on the new site!

I took a quick look, and I have a comment regarding the design.  It seems to
me that you're using a transparent PNG file as the background for the pages.
IE doesn't support this feature of PNG files correctly, so the pages render
half unreadable on IE.  I suggest changing this, and the easiest way would
be not to use a transparent PNG (no need for that, anyway - just let the
background be white.)  Fortunately real browsers (Firefox, and Mozilla) do
render it pretty fine!

Other than that, the layout seems very nice.  Thanks for your efforts.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-11-01 Thread Ehsan Akhgari
 Ah, that's a good sign, that none of us at FarsiWeb uses IE anymore!
 BTW, IIRC, 8bit transparent PNG works in IE too.

I'm not sure.  What I can say for sure is the image won't render correctly
in IE.  Hmm, BTW, at a second look, IE fails to render the layout correctly
as well!  Of course that's not as bad as how the background image looks.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: farsiweb.info

2004-11-01 Thread Ehsan Akhgari
 Humm, would you check http://farsitex.org/?  I think it worked in IE
 when I designed it.

Done.  It looks pretty well, only the non-link items in the left hand menu
might not be much readable (or it might be my lack of perfect sight.)

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

Light without eyes illuminates nothing.



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Miscellaneous web issues

2004-12-01 Thread Ehsan Akhgari
 Roozbeh, it is a long time and I don't remember your answer to this
 email. What happened to this new dll?

AFAIK, it's not still put in the sourceforge.  If you're interested, I can
mail it to you off-list.

-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Parsnegar to Unicode conversion AND phonetic Farsi keyboardwithEnglish keyboard

2004-12-18 Thread Ehsan Akhgari
Mr. Khazaee misdirected the email to me personally.  I thought I'd send it
to the whole list.

 -Original Message-
 From: khazaee [mailto:[EMAIL PROTECTED]
 Sent: 2004/12/18 10:27 Þ.Ù
 To: Ehsan Akhgari
 Subject: RE: Parsnegar to Unicode conversion AND phonetic
 Farsi keyboardwithEnglish keyboard


 You want to define a user-defined keyboard for linux
 operating system or not?
 for linux operating system you can refer to persian keyboard
 on farsilinux.org.
 you can change the position of persian letter in your keyboard easily.
 regards.
 -- Original Message --
 From: Ehsan Akhgari [EMAIL PROTECTED]
 Date:  Fri, 17 Dec 2004 22:50:03 +0330

 
 
 Also, I was wondering if anyone knows a way of defining a
 user-defined
 keyboard to use with Farsi Unicode, similar to Parsnegar
 which allows
 to define a phonetic Farsi keyboard with English keyboards, so that,
 when typing in Microsoft word in Farsi, I could use key J
 for letter jim, A
 for letter alef, etc.
 
 You need your custom keyboard layout.  M$ has a tool for that:
 Microsoft Keyboard Layout Creator.  You can use it to create
 your fully
 (well, nearly
 fully) customized keyboard layout for Windows.
 
 -
 Ehsan Akhgari
 
 www.farda-tech.com http://www.farda-tech.com/ List Owner:
 mailto:[EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
 [Email: [EMAIL PROTECTED]
 [WWW: http://www.beginthread.com/Ehsan ]
 
 
 
 
 
 






___
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: openoffice zwnj

2005-01-04 Thread Ehsan Akhgari
 That's a famous bug that will happen in applications. KDE also had
 that bug for quite a time until Behdad fixed it. The bug is because
 the application or the rendering engine asks the font for a glyph for
 the character, where it shouldn't.
 The application or the rendering engine should not pass ZWNJ (and a
 few other invisible Unicode characters) down.

Great to know it's been fixed.  Do you exactly know the fix is included
since which version of the KDE?  I've noticed that this bug seriously
affects the usability of KDE for Persian computing.

Thanks,
-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: MSVC@BeginThread.com

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Frasi in MS Powerpoint

2005-02-01 Thread Ehsan Akhgari
 Hi

 I would like to write farsi in microsoft powerpoint for presentation
 purposes. Would it be possible at all? If yes, how this can be done?
 What alternatives are available.

 I appreciate your help.

It is possible.  You simply should switch to a Persian keyboard and type
your text.  I seem to remember that some versions of MS Powerpoint did not
support right-to-left text properly (I don't remember exactly what the
problem was).

A very good alternative to MS Powerpoint is the OpenOffice.org
(www.openoffice.org) version 1.1.3.  I have used it to create Persian
presentations with no problems.


-
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: MSVC@BeginThread.com

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


RE: Persian in Windows Applications

2005-02-02 Thread Ehsan Akhgari





  
  I'm going to program and develop a windows 
  application
  and I want to use Persianin user 
  interface.
  I'm using Windows XP and uni-code in programming 
  language.
  But is there any trick or rule to make 
  application working fine in
  older windows? (98, ME)
  Or just using uni-code makes anything 
  fine?
Win9x does 
not support Unicode internally. M$ has developed the so-called MSLU[1] 
which provides Unicode compatibility at the Windows API level for Win9x. I 
have used it, and it indeed works, but be warned that these OSes do *not* 
support Unicode anyway, and all MSLU can do is implement API stubs for Unicode 
versions Win32 functions (such as, CreateFileW) which would allow you to build 
your app in Unicode mode in Visual C++.

What I've 
ended up doing in the past is do all the UI as HTML, and embed a HTML rendering 
engine in my app. I've used the WebBrowser control (the same control used 
by IE). This requires you to distribute a customized[2] version of IE with 
your own app which has "Arabic" support built-in, and write some amount of 
_javascript_ code to enable the user to type Persian in your application even if 
they don't have a Persian keyboard installed (you can find several JS codes as 
starters on the web for this purpose.) You can also use Gecko, which is 
Mozilla's great HTML rendering engine as well. If you decide to use the 
WebBrowser control, check out http://www.beginthread.com/Article/Ehsan/WebBrowser%20Goodies/ 
for some articles about possible customizations of the control that you may be 
needing in your own applications.

All of this, 
of course, applies to Visual C++. If you use some other programming tool, 
then you'll have to research on your own, though I think that few support 
MSLU.

[1] You can 
download it from http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm.
[2] You can 
deploy acustomizedIE install using the IE Administration Kit (IEAK.) 


-Ehsan Akhgari


www.farda-tech.comList Owner: MSVC@BeginThread.com
[Email: [EMAIL PROTECTED]][WWW: http://www.beginthread.com/Ehsan 
]
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: problem in myql data display

2005-04-13 Thread Ehsan Akhgari
mzz wrote:
hi every one i have a problem in mysql data base
is that 
when i reveiw my table cotained data in PhpMyAdmin in
persian i can see and edit data correctly but when i
use 
my script to query my tables using PHP it display my
table data as a '?' (question marks)
i am using 
mysql server 4.1;
php4.xx and utf-8 encoding in my pages.
OS:Win2000 server.
Regards 
zarbizade.
Can you dump the table into a file from the PHP script and then make 
sure the data in the file is correct (and in UTF-8 encoding)?

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: problem in myql data display

2005-04-13 Thread Ehsan Akhgari
Sadeq Naqashzade wrote:
Salaam,
One of my frinds have same problem (but I have not) I'm using mysqli
and he using mysql extention. Try mysqli this may help you.
- Sadeq

Thanks, but I wasn't the one who asked the question!  I'm CCing the OP 
as well as the list.

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

2005-11-28 Thread Ehsan Akhgari





  
  Dear Ehsan,You suggested a creative solution. Thank you.My 
  application, consists of a database, and two user-interfaces.The first 
  UI is used for data entry,where I parse a given XML file, extract and 
  "Romanize" itsdata - based on a "Persian-Roman Conversion Map" -and 
  then insert them into DB.Luckily, PHP provides a very fast function 
  forsuch conversions, named strtr().Now I have a "Roman 
  DB".The second UI is used for data retrieval (searching),where I 
  "Romanize" the given search argument,and look for it trough the DB 
  records. The results will bedecoded and converted to Persian, before 
  sending to stdout.
I've actually implemented this approach in a 
project. I have not yet published the code, but if you want, I can make it 
available under the GPL.

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing


Re: IranSystem to Unicode (UTF-8) converter

2005-12-06 Thread Ehsan Akhgari





  salam nemidoonam shoma in narmafzaro darin ya na 
  , age darin lotf konid baram send konid
I just wrote a PHP script to do just that a couple 
of days ago at work. It's relatively simple, using Roozbeh Pournader's 
conversion table. All you have to do is to read the input string byte by 
byte, and output the appropriate UTF-8 codes in reverse order. The only 
gotcha I faced was if there are latin characters (or numbers) in the middle of 
the text, they should not be reversed. This is caused by the way 
IranSystem encodes strings.

Ehsan
___
PersianComputing mailing list
PersianComputing@lists.sharif.edu
http://lists.sharif.edu/mailman/listinfo/persiancomputing