Re: Generating large datasets for Solr proof-of-concept

2011-09-17 Thread Pulkit Singhal
Thanks Hoss. I agree that the way you restated the question is better for getting results. BTW I think you've tipped me off to exactly what I needed with this URL: http://bbyopen.com/ Thanks! - Pulkit On Fri, Sep 16, 2011 at 4:35 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Has anyone

Re: Generating large datasets for Solr proof-of-concept

2011-09-16 Thread Toke Eskildsen
On Thu, 2011-09-15 at 22:54 +0200, Pulkit Singhal wrote: Has anyone ever had to create large mock/dummy datasets for test environments or for POCs/Demos to convince folks that Solr was the wave of the future? Yes, but I did it badly. The problem is that real data are not random so any simple

Re: Generating large datasets for Solr proof-of-concept

2011-09-16 Thread Chris Hostetter
: Has anyone ever had to create large mock/dummy datasets for test : environments or for POCs/Demos to convince folks that Solr was the : wave of the future? Any tips would be greatly appreciated. I suppose : it sounds a lot like crawling even though it started out as innocent : DIH usage. the

Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Hello Everyone, I have a goal of populating Solr with a million unique products in order to create a test environment for a proof of concept. I started out by using DIH with Amazon RSS feeds but I've quickly realized that there's no way I can glean a million products from one RSS feed. And I'd go

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Daniel Skiles
I've done it using SolrJ and a *lot *of of parallel processes feeding dummy data into the server. On Thu, Sep 15, 2011 at 4:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Hello Everyone, I have a goal of populating Solr with a million unique products in order to create a test

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Markus Jelsma
If we want to test with huge amounts of data we feed portions of the internet. The problem is it takes a lot of bandwith and lots of computing power to get to a `reasonable` size. On the positive side, you deal with real text so it's easier to tune for relevance. I think it's easier to create

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Ah missing } doh! BTW I still welcome any ideas on how to build an e-commerce test base. It doesn't have to be amazon that was jsut my approach, any one? - Pulkit On Thu, Sep 15, 2011 at 8:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Thanks for all the feedback thus far. Now to get  

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Pulkit Singhal
Thanks for all the feedback thus far. Now to get little technical about it :) I was thinking of feeding a file with all the tags of amazon that yield close to roughly 5 results each into a file and then running my rss DIH off of that, I came up with the following config but something is

Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Lance Norskog
http://aws.amazon.com/datasets DBPedia might be the easiest to work with: http://aws.amazon.com/datasets/2319 Amazon has a lot of these things. Infochimps.com is a marketplace for free pay versions. Lance On Thu, Sep 15, 2011 at 6:55 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Ah