Hi, Neil! Agree, but have some qualification: modern filesystems (I know this about xfs and ext4) use indices or other techniques to speed up searching large directories. So it is not required to visit each file in directory to get file by its name. However having a file per each table row is not a good idea anyway.
On Sun, Oct 7, 2012 at 8:59 PM, Neil D'Souza <nxd...@yahoo.com> wrote: > Dear Rashad, > > > I have been developing a survey system and cross tab engine. I was > experimenting with my random data generator which is part of the suite - > someone on hackernews mentioned an app that could handle thousands of > records and someone enquired if it could handle millions of records. So I > thought, lets check what kind of speeds my software can do. So i got my > random data generator to generate 1 million data files and then another part > of the software wrote out the files into a single flat file and did my > experiments. > > Let me tell you that after I generated the 1 million files - the system > got very slow. Everytime I cd'd into the directory containing the 1 million > files - the next command took minutes to execute. I guess it is because the > os has to read the directory entries - and the directory entry itself is > about 40 mb (showing at 1 level higher than itself) > > drwxrwxr-x 5 nxd nxd 40255488 Oct 6 11:39 attempt2 > > Above is the directory that contained the 1 million files. My laptop is an > i5, having 8Gb of ram. So I am not sure your idea of flat files is very > good. Some of the other reasons I think why if got slow also would be that > the buffer cache got full of data I would never need etc. Also "zsh" would > not be able to do command completion - again because the command line size > parameter would be exceeded. I think you would just move the bottle neck > from database search to directory entry search. Also directory entries have > no concept of regular expression and indexing which would make it fast. Each > and every single entry has to be visited before you decide that this is the > file which you want. > > I think you should try out a few experiments yourself, before you rush > into something like this. My software is open source - so if you want to > create a few million files the random data generator can be used and I can > show you how to use it. > > Regards, > Neil > > > ________________________________ > From: Mohammed Rashad <mohammedrasha...@gmail.com> > To: witty-interest@lists.sourceforge.net > Sent: Sunday, October 7, 2012 7:14 PM > Subject: [Wt-interest] Wt File IO vs Database IO > > All, > > Due to large data used in a crowd source mapping project. I had decided to > completely eliminate the use of database and use Flat files for storage, > retrieval and query. > > I thought of storing each record in db as individual files. so this will > help in the retrieval speed and no search is needed in the entire db or a > file. > > but if a table have more than 10000's of records and users accessing > (same/different) records from different places will result in N number of > File I/O > > Will this be a bottleneck in the application. consider each file of size <= > 15KB.? > > The main reason to eliminate db is because of performance bottleneck in > database I/O. > > So moving to new model will help in anyway as the number of users and data > will be much more than expected? > > > -- > Regards, > Rashad > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > witty-interest mailing list > witty-interest@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/witty-interest > > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > witty-interest mailing list > witty-interest@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/witty-interest > ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ witty-interest mailing list witty-interest@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/witty-interest