Thanks Hoss. I agree that the way you restated the question is better
for getting results. BTW I think you've tipped me off to exactly what
I needed with this URL: http://bbyopen.com/
Thanks!
- Pulkit
On Fri, Sep 16, 2011 at 4:35 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: Has anyone
On Thu, 2011-09-15 at 22:54 +0200, Pulkit Singhal wrote:
Has anyone ever had to create large mock/dummy datasets for test
environments or for POCs/Demos to convince folks that Solr was the
wave of the future?
Yes, but I did it badly. The problem is that real data are not random so
any simple
: Has anyone ever had to create large mock/dummy datasets for test
: environments or for POCs/Demos to convince folks that Solr was the
: wave of the future? Any tips would be greatly appreciated. I suppose
: it sounds a lot like crawling even though it started out as innocent
: DIH usage.
the
Hello Everyone,
I have a goal of populating Solr with a million unique products in
order to create a test environment for a proof of concept. I started
out by using DIH with Amazon RSS feeds but I've quickly realized that
there's no way I can glean a million products from one RSS feed. And
I'd go
I've done it using SolrJ and a *lot *of of parallel processes feeding dummy
data into the server.
On Thu, Sep 15, 2011 at 4:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
Hello Everyone,
I have a goal of populating Solr with a million unique products in
order to create a test
If we want to test with huge amounts of data we feed portions of the internet.
The problem is it takes a lot of bandwith and lots of computing power to get
to a `reasonable` size. On the positive side, you deal with real text so it's
easier to tune for relevance.
I think it's easier to create
Ah missing } doh!
BTW I still welcome any ideas on how to build an e-commerce test base.
It doesn't have to be amazon that was jsut my approach, any one?
- Pulkit
On Thu, Sep 15, 2011 at 8:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
Thanks for all the feedback thus far. Now to get
Thanks for all the feedback thus far. Now to get little technical about it :)
I was thinking of feeding a file with all the tags of amazon that
yield close to roughly 5 results each into a file and then running
my rss DIH off of that, I came up with the following config but
something is
http://aws.amazon.com/datasets
DBPedia might be the easiest to work with:
http://aws.amazon.com/datasets/2319
Amazon has a lot of these things.
Infochimps.com is a marketplace for free pay versions.
Lance
On Thu, Sep 15, 2011 at 6:55 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:
Ah