I have considered EC2 /S3 combination for development.  After a
cost-benefit analysis of  VPS hosting and EC2 , I decided to go with
VPS hosting until  my application was ready for launch and all the
kinks in the code  were worked out.

The lack of persistent storage in EC2 meant that I would have to back
up the  index, code and configuration data all the time,  as reboots
in EC2 cause complete data loss.  I preferred to deal with that when I
really needed the full power of Ec2 to serve the search queries.

EC2 bandwidth is not free and costs 10c/GB for incoming data. Since I
had to crawl a few Terabytes of data each month, I am better off with
a VPS hosting plan which comes with bandwidth included.  EC2 to S3
bandwidth is free but setting up a system to backup the server data
was distracting, as I am in the initial phase of development and it
was taking up a considerable chunk of my time trying to set up a
reliable ec2 dev. framework.

Hence, my question about what hardware developers initially chose for
their projects.



On Dec 23, 2007 4:01 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Have you considered using EC2 during your testing/development stage?  It 
> would be safer than investing in the wrong hardware with insufficient 
> knowledge of the exact demands and requirements.
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> From: v k <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED];
> Sent: Tuesday, December 18, 2007 5:21:28 PM
> Subject: Infrastructure Question
>
> Hello,
>
> I am using Lucene to build an index from roughly  10 million documents
> in number. The  documents are about 4 TB in total.
>
> After some trial runs, indexing a subset of the documents I am trying
> to figure out a hosting service configuration to create a full index
> from the entire 10 TB of data. As I am still unsure how this project
> will turn out I am not purchasing hardware/ram but considering a web
> host.
> for the purpose of :
> 1)  download the data and to start indexing it.
> 2) The web front end to access this index will be a python framework (
> eg. Django  etc)
>
> I am seriously contemplating signing up with Joyent for this plan:
> AMD Opteron x64 multi-core servers with 4GiB RAM per core
> 1/16 (Burstable up to 95%)
> 1 TB    - Bandwidth/month, 1 GB RAM, + as such as NAS  storage as I can
> afford to pay for.
>
> My QUESTION is - Will this RAM and CPU be sufficient during
> development of the search application and building the index, etc. or
> is it so abysmal and under-equipped in terms of hardware that the
> development version of my application will not work.
> I understand that having more RAM is always good, but is 1GB as good as
>  nothing?
>
> This setup is NOT for production but for for development so I can get
> my hands dirty with lucene which will require plenty of tweaks as the
> project moves along.
>
> What initial configuration would you recommend for a development
> version given the corpus size. I am not even sure how large my index
> will look like at this point.
>
> I hope to build an my indexes this way and once the search
> infrastructure is working and the web-front end complete, I plan to
> worry about Redundancy, availability and scalability for the many
> users I hope to provide this free service for :-)
>
> Many of you in this forum have built successful products with Lucene.
> To name a few I am aware of -  Ken Krugle, James Ryley, Dennis Kubes
>
> Some of you must have started with small machines,test set-ups etc
> where you built your initial search apps. I hope  to receive some
> advise about my plan and approach to start building an infrastructure
> to support my Lucene app.
>
> Thank you.
>
> Venkat
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>

Reply via email to