On Wednesday, April 16, 2014, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Tue, Apr 15, 2014 at 6:36 PM, Nick Eubank > <nickeub...@gmail.com<javascript:_e(%7B%7D,'cvml','nickeub...@gmail.com');> > > wrote: > >> Hi all, >> >> A few years ago someone said postgres windows can't set working_mem above >> about 2 GB (www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us-- >> seems to be same for maintenance_working_mem ). Im finding limit still >> present. >> >> I'm doing single user, single connection data intensive queries and >> would like to set a higher value on windows to better use 16gb built in >> ram (don't control platform, so can't jump to Linux). >> >> Anyone found a work around? >> > > Before worrying much about that, I'd just give it a try at the highest > value it will let you set and see what happens. > > If you want to do something like hashed aggregate that would have been > predicted to fit in 6GB but not in 1.999GB, then you will lose out on the > hash agg by not being able to set the memory higher. On the other hand, if > your queries want to use sorts that will spill to disk anyway, the exact > value of work_mem usually doesn't matter much as long as it not absurdly > small (1MB absurdly small for analytics, 64MB is probably not). In fact > very large work_mem can be worse in those cases, because large priority > queue heaps are unfriendly to the CPU cache. (Based on Linux experience, > but I don't see why that would not carry over to Windows) > > Frankly I think you've bitten off more than you can chew. 600GB of csv is > going to expand to probably 3TB of postgresql data once loaded. If you > can't control the platform, I'm guessing your disk array options are no > better than your OS options are. > > ACID compliance is expensive, both in storage overhead and in processing > time, and I don't think you can afford that and probably don't need it. > Any chance you could give up on databases and get what you need just using > pipelines of sort, cut, uniq, awk, perl, etc. (or whatever their Window > equivalent is)? > > Cheers, > > Jeff > Thanks Jeff -- you're clearly correct that SQL is not the optimal tool for this, as I'm clearly leaning. I just can't find anything MADE for one-user big data transformations. :/ I may resort to that kind of pipeline approach, I just have so many transformations to do I was hoping I could use a declarative language in something. But your point about hash map size is excellent. No idea how big an index for this would be...