On Wed, Mar 4, 2009 at 12:24 PM, John Machin <sjmac...@lexicon.net> wrote:
> On 4/03/2009 2:48 PM, Peng Huang wrote: > > On Wed, Mar 4, 2009 at 11:40 AM, John Machin <sjmac...@lexicon.net> > wrote: > > > >> On 4/03/2009 2:12 PM, Peng Huang wrote: > >>> Hi Igor Tandetnik, > >>> > >>> Thanks for your quick reply. > >>> > >>> Your solution works. But in some cases, each y%d may has two or three > >>> choices. So the SQL will become very complex, we need ( 2 * 2 * 2 * 2) > >> sub > >>> where statements. Does SQLite have some build-in features to optimize > >> those > >>> kinds of SQL statements? Or do you have other suggestions to optimize > the > >>> database of SQL statements? > >>> > >>> For example: > >>> "SELECT * FROM py_phrase WHERE ylen = ? and (y0 = ? or y0 = ?) and (y1 > = > >> ? > >>> or y1 =? ) and (y2 = ? or y2 = ?) and (y3 = ? or y3 = ? or y3 = ?) > ORDERY > >> BY > >>> user_freq DESC, freq DESC" > >>> > >> (y3 = ? or y3 = ? or y3 = ?) > >> is *logically* equivalent to > >> (y3 IN (?, ?, ?)) > >> > >> Does it run at the same speed? > > > > > > Its improvement is not obviously. :( > > OK, next try: > > Input is 1 to 4 pinyin characters ... with tone or without tone? > Even with tone, (number of different pinyin characters (a few hundred, > about 410 IIRC)) * 5 tones will fit easily in 16 bits. So a possible > pronunciation of a 4-character phrase will easily fit in a 64-bit number. > > So instead of complications with y0, y1, y2, y3 just use one composite > number e.g. (((y0 * 10000 + y1) * 10000) + y2 * 10000) + y3 -- note that > for phrases shorter tan 4 characters, you would need a special code > (zero) for "no character at all". > > Instead of complicated logic to generate varing sizes and shapes of > <query1> union all <query 2> ...., you just have to produce a list of > the possible composite keys and select * from the_table where > composite_key in (?, ?, ?,...) > This is a good idea to make thing easily. But it is not suitable for other kinds of use cases. I don't know if you know Chinese and Hanyu Pinyin. Let me explain it for you. Most of Chinese chars' pronunciation (Pinyin) has two parts - vowel and consonant. One use case is some users will not type complete Pinyin for each Chinese chars. For example: User types complete Pinyin for first char, but just types consonant for second and third chars. In this case, I will use below SQL cmd. The columns s%d is for chars' consonant. So composing four y%d into one is not suitable. :( > > I'd be interested to get Igor's thoughts on how much speed up might be > achievable by doing this. For Igor's idea. I did not write test code. Just type SQL command in sqlite3 program. The improvement is obviously. I think it is acceptable for user typing. > > > Another thought: if you have enough memory (is this for a PC or a > phone?), an in-memory hash-table might be a good idea. It is a desktop service. Other applications need it to type Chinese. So it should not use many memory. BTW, the database of phrases is about 100M includes index. > > > HTH, > > John > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users