Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi,

I've been recently ask to implement an application to search products from
several stores, each store having different prices and stock for the same
product.

So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I must be able to
filter for a given store and order by stock or price for that store. The
application should also allow incresing the number of stores, fields
depending of store and number of products without much work.

The numbers for the application are more or less 100 stores and 7M products.

I've been thinking of some ways of defining the index structure but I don't
know wich one is better as I think each one has it's pros and cons.


   1. *Each product-store as a document:* Denormalizing the information so
   for every product and store I have a different document. Pros are that I
   can filter and order without problems and that adding a new store-depending
   field is very easy. Cons are that the index goes from 7M documents to 700M
   and that most of the info is redundant as most of the fields are repeated
   among stores.
   2. *Each field-store as a field:* For example for price I would have
   store1_price, store2_price,  Pros are that the index stays at 7M
   documents, and I can still filter and sort by those fields. Cons are that I
   have to add some logic so if I filter by one store I order for the
   associated price field, and that number of fields increases as number of
   store-depending fields x number of stores. I don't know if having more
   fields affects performance, but adding new store-depending fields will
   increase the number of fields even more
   3. *Join:* First time I read about solr joins thought it was the way to
   go in this case, but after reading a bit more and doing some tests I'm not
   so sure about it... Maybe I've done it wrong but I think it also
   denormalizes the info (So I will also havee 700M documents) and besides I
   can't order or filter by store fields.


I must say my preferred option is number 2, so I don't duplicate
information, I keep a relatively small number of documents and I can filter
and sort by the store fields. However, my main concern here is I don't know
if having too many fields in a document will be harmful to performance.

Which one do you think is the best approach for this application? Is there
a better approach that I have missed?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


RE: Best implementation for multi-price store?

2013-11-21 Thread Petersen, Robert
Hi,

I'd go with (2) also but using dynamic fields so you don't have to define all 
the storeX_price fields in your schema but rather just one *_price field.  Then 
when you filter on store:store1 you'd know to sort with store1_price and so 
forth for units.  That should be pretty straightforward.

Hope that helps,
Robi

-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Thursday, November 21, 2013 1:36 AM
To: solr-user@lucene.apache.org
Subject: Best implementation for multi-price store?

Hi,

I've been recently ask to implement an application to search products from 
several stores, each store having different prices and stock for the same 
product.

So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I must be able to 
filter for a given store and order by stock or price for that store. The 
application should also allow incresing the number of stores, fields depending 
of store and number of products without much work.

The numbers for the application are more or less 100 stores and 7M products.

I've been thinking of some ways of defining the index structure but I don't 
know wich one is better as I think each one has it's pros and cons.


   1. *Each product-store as a document:* Denormalizing the information so
   for every product and store I have a different document. Pros are that I
   can filter and order without problems and that adding a new store-depending
   field is very easy. Cons are that the index goes from 7M documents to 700M
   and that most of the info is redundant as most of the fields are repeated
   among stores.
   2. *Each field-store as a field:* For example for price I would have
   store1_price, store2_price,  Pros are that the index stays at 7M
   documents, and I can still filter and sort by those fields. Cons are that I
   have to add some logic so if I filter by one store I order for the
   associated price field, and that number of fields increases as number of
   store-depending fields x number of stores. I don't know if having more
   fields affects performance, but adding new store-depending fields will
   increase the number of fields even more
   3. *Join:* First time I read about solr joins thought it was the way to
   go in this case, but after reading a bit more and doing some tests I'm not
   so sure about it... Maybe I've done it wrong but I think it also
   denormalizes the info (So I will also havee 700M documents) and besides I
   can't order or filter by store fields.


I must say my preferred option is number 2, so I don't duplicate information, I 
keep a relatively small number of documents and I can filter and sort by the 
store fields. However, my main concern here is I don't know if having too many 
fields in a document will be harmful to performance.

Which one do you think is the best approach for this application? Is there a 
better approach that I have missed?

Thanks in advance



--
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42



Re: Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi Robert,

That was the idea, dynamic fields, so, as you said, it is easier to sort
and filter. Besides, having dynamic fields it would be easier to add new
stores, as I wouldn't have to modify the schema :)

Thanks for the answer!


2013/11/21 Petersen, Robert robert.peter...@mail.rakuten.com

 Hi,

 I'd go with (2) also but using dynamic fields so you don't have to define
 all the storeX_price fields in your schema but rather just one *_price
 field.  Then when you filter on store:store1 you'd know to sort with
 store1_price and so forth for units.  That should be pretty straightforward.

 Hope that helps,
 Robi

 -Original Message-
 From: Alejandro Marqués Rodríguez [mailto:
 amarq...@paradigmatecnologico.com]
 Sent: Thursday, November 21, 2013 1:36 AM
 To: solr-user@lucene.apache.org
 Subject: Best implementation for multi-price store?

 Hi,

 I've been recently ask to implement an application to search products from
 several stores, each store having different prices and stock for the same
 product.

 So I have products that have the usual fields (name, description, brand,
 etc) and also number of units and price for each store. I must be able to
 filter for a given store and order by stock or price for that store. The
 application should also allow incresing the number of stores, fields
 depending of store and number of products without much work.

 The numbers for the application are more or less 100 stores and 7M
 products.

 I've been thinking of some ways of defining the index structure but I
 don't know wich one is better as I think each one has it's pros and cons.


1. *Each product-store as a document:* Denormalizing the information so
for every product and store I have a different document. Pros are that I
can filter and order without problems and that adding a new
 store-depending
field is very easy. Cons are that the index goes from 7M documents to
 700M
and that most of the info is redundant as most of the fields are
 repeated
among stores.
2. *Each field-store as a field:* For example for price I would have
store1_price, store2_price,  Pros are that the index stays at 7M
documents, and I can still filter and sort by those fields. Cons are
 that I
have to add some logic so if I filter by one store I order for the
associated price field, and that number of fields increases as number of
store-depending fields x number of stores. I don't know if having more
fields affects performance, but adding new store-depending fields will
increase the number of fields even more
3. *Join:* First time I read about solr joins thought it was the way to
go in this case, but after reading a bit more and doing some tests I'm
 not
so sure about it... Maybe I've done it wrong but I think it also
denormalizes the info (So I will also havee 700M documents) and besides
 I
can't order or filter by store fields.


 I must say my preferred option is number 2, so I don't duplicate
 information, I keep a relatively small number of documents and I can filter
 and sort by the store fields. However, my main concern here is I don't know
 if having too many fields in a document will be harmful to performance.

 Which one do you think is the best approach for this application? Is there
 a better approach that I have missed?

 Thanks in advance



 --
 Alejandro Marqués Rodríguez

 Paradigma Tecnológico
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42