Re: Storing 2 dimension array in Solr
Hi, I will check for pesudo join. Jack, I doubt further de-normalization. Rest of the points that you told me, I will take them. Thank you. Basically, We have 2 different sor indexes. One table is rarely updated but this group-disease table has frequent update and new dieasese are added very often. So we maintain them separately. While querying we need join operation on table 1 and 2. Till now, I could create a test solr index with 100k dynamic field to each document. Further, i am yet to test. it took almost 1.5 hours to create index for 1500 groups * each group almost having 90k dynamic fields. I also added doc_static field which copies all the integer set from copy fields_disease to this field. While querying I use only this filed to retrieve. Any best approaches, please let me know. Thanks - David On Sun, Oct 13, 2013 at 6:37 PM, Jack Krupansky j...@basetechnology.comwrote: Yeah, something like that. The key or ID field would probably just be the composition of the group and disease fields. The other thing is if occurrence is simply a boolean, omit it and omit the document if that disease is not present for that group. If the majority of the diseases are not present for a specified group, that would eliminate a lot of documents. Or if occurrence is not a boolean, keep the field, but again not add a document if the disease is not present for that group. My usual, over-generalized rule for dynamic fields is that they are a powerful tool, but only if used in moderation. Millions would not be moderation. -- Jack Krupansky -Original Message- From: Lee Carroll Sent: Sunday, October 13, 2013 8:35 AM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr I think he means a doc for each element. so you have a disease occurrence index doc group1/group dis1/dis occurrenceexist/occurrence unique Field1-1/unique field /doc assuming (and its a pretty fair assumption?) most groups have only a subset of diseases this will be a sparse matrix so just don't index the occurrence value does not exist basically denormalize via adding fields which don't relate to the key. This will work fine on modest hardware and no thought to performance for 5 million docs. It will work fine with some though and hardware for very large numbers. Its worth a go anyway just to test. It should probably be your first method to try out. On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote: This sounds like a denormalization issue. Don't be afraid G. Actually, I've seen from 50M 50 300M small docs on a Solr node, depending on query type, hardware, etc. So that gives you a place to start being cautious about the number of docs in your system. If your full expansion of your table numbers in that range, you might be just fine denormalizing the data. Alternatively, there's the pseudo join capability to consider. I'm usually hesitant to recommend that, but Joel is committing some really interesting stuff in the join area which you might take a look at if the existing pseudo-join isn't performant enough. But I'd consider denormalizing the data as the first approach. Best, Erick On Sun, Oct 13, 2013 at 8:07 AM, David Philip davidphilipshe...@gmail.com**wrote: Hi Jack, for the point: each element of the array as a solr document, with a group field and a disease field Did you mean it this way: doc group1_grp: G1 disease1_d: 2, disease2_d: 3, /doc doc group1_grp: G2 disease1_d: 2, disease2_d: 3, disease3_d: 1, disease4_d: 1, /doc similar to first case: having dynamic fields for disease? Will it be performance issue if disease field increase to millions? On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com wrote: You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/02/fun-with-docvalues-in-** solr-**4-2/http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/ http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04
Re: Storing 2 dimension array in Solr
Hi Jack, for the point: each element of the array as a solr document, with a group field and a disease field Did you mean it this way: doc group1_grp: G1 disease1_d: 2, disease2_d: 3, /doc doc group1_grp: G2 disease1_d: 2, disease2_d: 3, disease3_d: 1, disease4_d: 1, /doc similar to first case: having dynamic fields for disease? Will it be performance issue if disease field increase to millions? On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.comwrote: You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com* *wrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.com**wrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.com**wrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
This sounds like a denormalization issue. Don't be afraid G. Actually, I've seen from 50M 50 300M small docs on a Solr node, depending on query type, hardware, etc. So that gives you a place to start being cautious about the number of docs in your system. If your full expansion of your table numbers in that range, you might be just fine denormalizing the data. Alternatively, there's the pseudo join capability to consider. I'm usually hesitant to recommend that, but Joel is committing some really interesting stuff in the join area which you might take a look at if the existing pseudo-join isn't performant enough. But I'd consider denormalizing the data as the first approach. Best, Erick On Sun, Oct 13, 2013 at 8:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi Jack, for the point: each element of the array as a solr document, with a group field and a disease field Did you mean it this way: doc group1_grp: G1 disease1_d: 2, disease2_d: 3, /doc doc group1_grp: G2 disease1_d: 2, disease2_d: 3, disease3_d: 1, disease4_d: 1, /doc similar to first case: having dynamic fields for disease? Will it be performance issue if disease field increase to millions? On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com wrote: You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/ http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com * *wrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.com**wrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.com**wrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
I think he means a doc for each element. so you have a disease occurrence index doc group1/group dis1/dis occurrenceexist/occurrence unique Field1-1/unique field /doc assuming (and its a pretty fair assumption?) most groups have only a subset of diseases this will be a sparse matrix so just don't index the occurrence value does not exist basically denormalize via adding fields which don't relate to the key. This will work fine on modest hardware and no thought to performance for 5 million docs. It will work fine with some though and hardware for very large numbers. Its worth a go anyway just to test. It should probably be your first method to try out. On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote: This sounds like a denormalization issue. Don't be afraid G. Actually, I've seen from 50M 50 300M small docs on a Solr node, depending on query type, hardware, etc. So that gives you a place to start being cautious about the number of docs in your system. If your full expansion of your table numbers in that range, you might be just fine denormalizing the data. Alternatively, there's the pseudo join capability to consider. I'm usually hesitant to recommend that, but Joel is committing some really interesting stuff in the join area which you might take a look at if the existing pseudo-join isn't performant enough. But I'd consider denormalizing the data as the first approach. Best, Erick On Sun, Oct 13, 2013 at 8:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi Jack, for the point: each element of the array as a solr document, with a group field and a disease field Did you mean it this way: doc group1_grp: G1 disease1_d: 2, disease2_d: 3, /doc doc group1_grp: G2 disease1_d: 2, disease2_d: 3, disease3_d: 1, disease4_d: 1, /doc similar to first case: having dynamic fields for disease? Will it be performance issue if disease field increase to millions? On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com wrote: You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/ http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com * *wrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.com**wrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.com**wrote: Hi, I have a 2 dimension array and want it to be persisted in solr
Re: Storing 2 dimension array in Solr
Yeah, something like that. The key or ID field would probably just be the composition of the group and disease fields. The other thing is if occurrence is simply a boolean, omit it and omit the document if that disease is not present for that group. If the majority of the diseases are not present for a specified group, that would eliminate a lot of documents. Or if occurrence is not a boolean, keep the field, but again not add a document if the disease is not present for that group. My usual, over-generalized rule for dynamic fields is that they are a powerful tool, but only if used in moderation. Millions would not be moderation. -- Jack Krupansky -Original Message- From: Lee Carroll Sent: Sunday, October 13, 2013 8:35 AM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr I think he means a doc for each element. so you have a disease occurrence index doc group1/group dis1/dis occurrenceexist/occurrence unique Field1-1/unique field /doc assuming (and its a pretty fair assumption?) most groups have only a subset of diseases this will be a sparse matrix so just don't index the occurrence value does not exist basically denormalize via adding fields which don't relate to the key. This will work fine on modest hardware and no thought to performance for 5 million docs. It will work fine with some though and hardware for very large numbers. Its worth a go anyway just to test. It should probably be your first method to try out. On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote: This sounds like a denormalization issue. Don't be afraid G. Actually, I've seen from 50M 50 300M small docs on a Solr node, depending on query type, hardware, etc. So that gives you a place to start being cautious about the number of docs in your system. If your full expansion of your table numbers in that range, you might be just fine denormalizing the data. Alternatively, there's the pseudo join capability to consider. I'm usually hesitant to recommend that, but Joel is committing some really interesting stuff in the join area which you might take a look at if the existing pseudo-join isn't performant enough. But I'd consider denormalizing the data as the first approach. Best, Erick On Sun, Oct 13, 2013 at 8:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi Jack, for the point: each element of the array as a solr document, with a group field and a disease field Did you mean it this way: doc group1_grp: G1 disease1_d: 2, disease2_d: 3, /doc doc group1_grp: G2 disease1_d: 2, disease2_d: 3, disease3_d: 1, disease4_d: 1, /doc similar to first case: having dynamic fields for disease? Will it be performance issue if disease field increase to millions? On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com wrote: You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/ http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com * *wrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.com**wrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight
Storing 2 dimension array in Solr
Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.comwrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David