Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)

2008-03-25 Thread mx

 On Tue, 25 Mar 2008, mx wrote:

  The atom unit of flash is page(512~2048byte typically). Page are
  organized into blocks, typically of 32 or 64 pages. All read write and
  write operations happen at page granularity, but erase operations happen
  at block granularity.

 You made a subtle switch here I wanted to emphasise.  Your original
 message suggested flash is an increasingly important storage mechanism
 because flash devices like SSD drives are going to be more popular in the
 future; that is true.  However, what you're describing is something more
 like how flash is used in raw embedded systems applications.  The kinds of
 SSD drives that are becoming popular for database use abstract away all of
 this low-level block mess and hide it with approaches like sophisticated
 write-leveling algorithms.

Maybe I gives too many detailed about raw flash devices.
In fact, what I want to show is the asymmetric speed of read and write.
Any flash devices including SSD have such a characteristic.
For a sumsung 64G SSD PATA IDE 2.5,maximum Squential Read is 57MB/s,
while maximum sequential Read is 38MB/s according the product datasheet.

Anyway, in the eyes performance of outside, write is more expensive than read.
Some strategy of trade read for write may be considered.
So, the asymmetric speed of read and write make it is still valuable
to do some work on SSD.

You don't (and possibly can't) even know what
 the underlying structure is like.  And even if you did, the fact that
 there's a always a regular operating system and filesystem underneath
 PostgreSQL writes will make it undertain the writes are only touching the
 tiny portion of flash you want to target anyway.  They may write a whole
 OS block regardless.

Yeah, you're right. This is the most confused thing. I wish my thesis
work is  independent of low level flash device. But it's very hard in
fact, just as what you said.

-- 
Have a good day;-)
Best Regards,
Xiao Meng

━
Data and Knowledge Engineering Research Center,CST
Harbin Institute of Technology, Harbin, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
Blog: http://xiaomeng.yo2.cn

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)

2008-03-25 Thread mx
Thank you for all of your advices.
I think you're right. I should be more realistic. There are so many
work to do if I want to do some work on Flash disk. It's too difficult
to complete the task only in a summer. Obviously, It's not an
appropriate project idea for GSoC anyway.
Maybe I'll do it in the future after I've done enough work according
to my theis work.

So, I finally decide to focus on the project idea of improving hash
index now. It's more valuable , and also challenging.

Any suggestion about the project idea of improving hash index?

-- 
Have a good day;-)
Best Regards,
Xiao Meng

━
Data and Knowledge Engineering Research Center,CST
Harbin Institute of Technology, Harbin, Heilongjiang, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
Blog: http://xiaomeng.yo2.cn

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)

2008-03-24 Thread mx
Hello,Everyone!
I'm a student in China. and I'm preparing for GSocC2008 in these days.
There are two questions about GSoC.

1. There's a paragraph about the Example Proposal Ideas in PostgreSQL Summer
Projects website.

*TODO Items*: A number of the items on our TODO list have been marked as
 good projects for beginners who are new to the PostgreSQL code. Items on
 this list have the advantage of already having general community agreement
 that the feature is desireable. These items should also have some general
 discussion available in the mailing list archives to help get you started.
 *You can find these items on the 
 TODOhttp://www.postgresql.org/docs/faqs.TODO.htmllist, they will be marked 
 with a percent sign (%)
 *.


I didn't get attention to this paragraph before, so I choose some items
without % in the List.
*Is it OK?* By the way, I'm writing proposal for multi-column hash now.

2. I'm currently in my fourth year of  studies. And I'm in a lab  doing
database research.
My thesis work is about B-Tree index in NAND Flash Disk. I want to do it
based on PostgreSQL..
I know embedded server is the feature that postgreSQL don't want. But flash
Disk is developing very fast. It's a trend that Flash Disk will replace
magnetic disk one day just like what Jim Gray said Tape is dead, disk is
tape, flash is disk,  though  nowadays flash  device is only widely used in
embedded devices.
*So, how about a project idea on NAND Flash disk  without limited-resource
environments?*
*Is it an acceptable idea?*

Anyway, hope to hear from you, Thanks!

-- 
Best Regards,
Meng Xiao

━
Data and Knowledge Engineering Research Center,CST
Harbin Institute of Technology, Harbin, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
Blog: http://xiaomeng.yo2.cn


Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)

2008-03-24 Thread mx
Thank you for your suggestion!

 The biggest problem with the hash index is currently that there's no
 significant performance over b-tree. If you want to work on hash
 indexes, I would suggest doing benchmarking and looking at ways to
 improve performance, before spending time on making it multi-column
 capable. And missing WAL logging is a big issue as well.

It's a good suggestion! My work is useless if the performance of hash index
is not effective enough.
I'll adopt your suggestion to consider improving hash performance at first.
It's a more challenging and exciting work.

On Tue, Mar 25, 2008 at 12:23 AM, Heikki Linnakangas 
[EMAIL PROTECTED] wrote:

 Maybe, hard to tell without more details. What difference does it make
 if the b-tree is on a flash device, as opposed to disk? What's different
 in general when you run on a flash disk?

 The embedded server idea in the not wanted list refers to the idea
 of running PostgreSQL in the same process as the client. If I understood
 you correctly, you're proposing something quite different.


OK, I'll explain it in more details.

The atom unit of flash is page(512~2048byte typically).
Page are organized into blocks, typically of 32 or 64 pages.
All read write and write operations happen at page granularity, but erase
operations happen at block granularity.

Flash has a weird characteristic erase-before-write.You can't just
overwrite a page, You have to erase the whole blocks and then write the
page. So read operation  is  faster than write operation( about 2~200times
by different device).

It's a big problem when we just run database designed for magnetic
disk.Wejust  overwrite  a  page  when  we  update  B-Tree,  but it's
not a good
way to update for flash disk. Currently, there are  some research results on
this problem. They use a method similar to  the  Log-structured File
 Systems  and  every  node is encoded by many log entries. So, they can
reduce update using log.

In my opinion,  we  have to change Access Method and some part of Storage
Managers greatly. Is it too hard for a beginner to serve as a GSoC project?


Finally, please make some suggestions, thanks!

-- 
Have a good day;-)
Best Regards,
Meng Xiao

━
Data and Knowledge Engineering Research Center,CST
Harbin Institute of Technology, Harbin, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
Blog: http://xiaomeng.yo2.cn