Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)
On Tue, 25 Mar 2008, mx wrote: The atom unit of flash is page(512~2048byte typically). Page are organized into blocks, typically of 32 or 64 pages. All read write and write operations happen at page granularity, but erase operations happen at block granularity. You made a subtle switch here I wanted to emphasise. Your original message suggested flash is an increasingly important storage mechanism because flash devices like SSD drives are going to be more popular in the future; that is true. However, what you're describing is something more like how flash is used in raw embedded systems applications. The kinds of SSD drives that are becoming popular for database use abstract away all of this low-level block mess and hide it with approaches like sophisticated write-leveling algorithms. Maybe I gives too many detailed about raw flash devices. In fact, what I want to show is the asymmetric speed of read and write. Any flash devices including SSD have such a characteristic. For a sumsung 64G SSD PATA IDE 2.5,maximum Squential Read is 57MB/s, while maximum sequential Read is 38MB/s according the product datasheet. Anyway, in the eyes performance of outside, write is more expensive than read. Some strategy of trade read for write may be considered. So, the asymmetric speed of read and write make it is still valuable to do some work on SSD. You don't (and possibly can't) even know what the underlying structure is like. And even if you did, the fact that there's a always a regular operating system and filesystem underneath PostgreSQL writes will make it undertain the writes are only touching the tiny portion of flash you want to target anyway. They may write a whole OS block regardless. Yeah, you're right. This is the most confused thing. I wish my thesis work is independent of low level flash device. But it's very hard in fact, just as what you said. -- Have a good day;-) Best Regards, Xiao Meng ━ Data and Knowledge Engineering Research Center,CST Harbin Institute of Technology, Harbin, China Gtalk: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] Blog: http://xiaomeng.yo2.cn -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)
Thank you for all of your advices. I think you're right. I should be more realistic. There are so many work to do if I want to do some work on Flash disk. It's too difficult to complete the task only in a summer. Obviously, It's not an appropriate project idea for GSoC anyway. Maybe I'll do it in the future after I've done enough work according to my theis work. So, I finally decide to focus on the project idea of improving hash index now. It's more valuable , and also challenging. Any suggestion about the project idea of improving hash index? -- Have a good day;-) Best Regards, Xiao Meng ━ Data and Knowledge Engineering Research Center,CST Harbin Institute of Technology, Harbin, Heilongjiang, China Gtalk: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] Blog: http://xiaomeng.yo2.cn -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)
Hello,Everyone! I'm a student in China. and I'm preparing for GSocC2008 in these days. There are two questions about GSoC. 1. There's a paragraph about the Example Proposal Ideas in PostgreSQL Summer Projects website. *TODO Items*: A number of the items on our TODO list have been marked as good projects for beginners who are new to the PostgreSQL code. Items on this list have the advantage of already having general community agreement that the feature is desireable. These items should also have some general discussion available in the mailing list archives to help get you started. *You can find these items on the TODOhttp://www.postgresql.org/docs/faqs.TODO.htmllist, they will be marked with a percent sign (%) *. I didn't get attention to this paragraph before, so I choose some items without % in the List. *Is it OK?* By the way, I'm writing proposal for multi-column hash now. 2. I'm currently in my fourth year of studies. And I'm in a lab doing database research. My thesis work is about B-Tree index in NAND Flash Disk. I want to do it based on PostgreSQL.. I know embedded server is the feature that postgreSQL don't want. But flash Disk is developing very fast. It's a trend that Flash Disk will replace magnetic disk one day just like what Jim Gray said Tape is dead, disk is tape, flash is disk, though nowadays flash device is only widely used in embedded devices. *So, how about a project idea on NAND Flash disk without limited-resource environments?* *Is it an acceptable idea?* Anyway, hope to hear from you, Thanks! -- Best Regards, Meng Xiao ━ Data and Knowledge Engineering Research Center,CST Harbin Institute of Technology, Harbin, China Gtalk: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] Blog: http://xiaomeng.yo2.cn
Re: [HACKERS] [GSoC] (Is it OK to choose items without % mark in theToDoList) (is it an acceptable idea to build index on Flash Disk)
Thank you for your suggestion! The biggest problem with the hash index is currently that there's no significant performance over b-tree. If you want to work on hash indexes, I would suggest doing benchmarking and looking at ways to improve performance, before spending time on making it multi-column capable. And missing WAL logging is a big issue as well. It's a good suggestion! My work is useless if the performance of hash index is not effective enough. I'll adopt your suggestion to consider improving hash performance at first. It's a more challenging and exciting work. On Tue, Mar 25, 2008 at 12:23 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Maybe, hard to tell without more details. What difference does it make if the b-tree is on a flash device, as opposed to disk? What's different in general when you run on a flash disk? The embedded server idea in the not wanted list refers to the idea of running PostgreSQL in the same process as the client. If I understood you correctly, you're proposing something quite different. OK, I'll explain it in more details. The atom unit of flash is page(512~2048byte typically). Page are organized into blocks, typically of 32 or 64 pages. All read write and write operations happen at page granularity, but erase operations happen at block granularity. Flash has a weird characteristic erase-before-write.You can't just overwrite a page, You have to erase the whole blocks and then write the page. So read operation is faster than write operation( about 2~200times by different device). It's a big problem when we just run database designed for magnetic disk.Wejust overwrite a page when we update B-Tree, but it's not a good way to update for flash disk. Currently, there are some research results on this problem. They use a method similar to the Log-structured File Systems and every node is encoded by many log entries. So, they can reduce update using log. In my opinion, we have to change Access Method and some part of Storage Managers greatly. Is it too hard for a beginner to serve as a GSoC project? Finally, please make some suggestions, thanks! -- Have a good day;-) Best Regards, Meng Xiao ━ Data and Knowledge Engineering Research Center,CST Harbin Institute of Technology, Harbin, China Gtalk: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] Blog: http://xiaomeng.yo2.cn