Re: Storing 2 dimension array in Solr

2013-10-14 Thread David Philip
Hi,

  I will check for pesudo join.

Jack,
I doubt further de-normalization. Rest of the points that you told me,  I
will take them. Thank you.
Basically, We have 2 different sor indexes. One table is rarely updated but
this group-disease table has frequent update and new dieasese are added
very often. So we maintain them separately. While querying we need join
operation on table 1 and 2.

Till now, I could create a test solr index with 100k dynamic field to each
document. Further, i am yet to test. it took almost 1.5 hours to create
index for 1500 groups * each group almost having 90k dynamic fields.

I also added doc_static field which copies all the integer set from copy
fields_disease to this field. While querying I use only this filed to
retrieve.
Any best approaches, please let me know.

Thanks - David






On Sun, Oct 13, 2013 at 6:37 PM, Jack Krupansky j...@basetechnology.comwrote:

 Yeah, something like that. The key or ID field would probably just be the
 composition of the group and disease fields.

 The other thing is if occurrence is simply a boolean, omit it and omit the
 document if that disease is not present for that group. If the majority of
 the diseases are not present for a specified group, that would eliminate a
 lot of documents. Or if occurrence is not a boolean, keep the field, but
 again not add a document if the disease is not present for that group.

 My usual, over-generalized rule for dynamic fields is that they are a
 powerful tool, but only if used in moderation. Millions would not be
 moderation.

 -- Jack Krupansky

 -Original Message- From: Lee Carroll
 Sent: Sunday, October 13, 2013 8:35 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Storing 2 dimension array in Solr

 I think he means a doc for each element. so you have a disease occurrence
 index

 doc
 group1/group
 dis1/dis
 occurrenceexist/occurrence
 unique Field1-1/unique field
 /doc

 assuming (and its a pretty fair assumption?) most groups have only a subset
 of diseases this will be a sparse matrix so just don't index
 the occurrence value does not exist

 basically denormalize via adding fields which don't relate to the key.

 This will work fine on modest hardware and no thought to performance for 5
 million docs. It will work fine with some though and hardware for very
 large numbers. Its worth a go anyway just to test. It should probably be
 your first method to try out.




 On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote:

  This sounds like a denormalization issue. Don't be afraid G.

 Actually, I've seen from 50M 50 300M small docs on a Solr node,
 depending on query type, hardware, etc. So that gives you a
 place to start being cautious about the number of docs in your
 system. If your full expansion of your table numbers in that range,
 you might be just fine denormalizing the data.

 Alternatively, there's the pseudo join capability to consider. I'm
 usually hesitant to recommend that, but Joel is committing some
 really interesting stuff in the join area which you might take a look
 at if the existing pseudo-join isn't performant enough.

 But I'd consider denormalizing the data as the first approach.

 Best,
 Erick


 On Sun, Oct 13, 2013 at 8:07 AM, David Philip
 davidphilipshe...@gmail.com**wrote:

  Hi Jack, for the point: each element of the array as a solr document,
 with
  a group field and a disease field
  Did you mean it this way:
 
  doc
group1_grp: G1
   disease1_d: 2,
   disease2_d: 3,
  /doc
  doc
group1_grp: G2
   disease1_d: 2,
   disease2_d: 3,
  disease3_d:  1,
  disease4_d:  1,
  /doc
  similar to first case: having dynamic fields for disease?
  Will it be performance issue if disease field increase to millions?
 
 
 
 
 
 
 
 
 
 
 
  On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
 
   You may be better off indexing each element of the array as a solr
   document, with a group field and a disease field. Then you can easily
 and
   efficiently add new diseases. Then to query a row, you query for the
  group
   field having the desired group.
  
   If possible, index the array as being sparse - no document for a
 disease
   if it is not present for that group.
  
   -- Jack Krupansky
  
   -Original Message- From: David Philip
   Sent: Saturday, October 12, 2013 9:56 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Storing 2 dimension array in Solr
  
  
   Hi Erick, Yes it is. But the columns here are dynamically and very
   frequently added.They can increase upto 1 million right now. So, 1
  document
   with 1 million dynamic fields, is it fine? Or any other approach?
  
   While searching through web, I found that docValues are column
 oriented.
   http://searchhub.org/2013/04/02/fun-with-docvalues-in-**
 solr-**4-2/http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/
 
  http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04

Re: Storing 2 dimension array in Solr

2013-10-13 Thread David Philip
Hi Jack, for the point: each element of the array as a solr document, with
a group field and a disease field
Did you mean it this way:

doc
  group1_grp: G1
 disease1_d: 2,
 disease2_d: 3,
/doc
doc
  group1_grp: G2
 disease1_d: 2,
 disease2_d: 3,
disease3_d:  1,
disease4_d:  1,
/doc
similar to first case: having dynamic fields for disease?
Will it be performance issue if disease field increase to millions?











On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.comwrote:

 You may be better off indexing each element of the array as a solr
 document, with a group field and a disease field. Then you can easily and
 efficiently add new diseases. Then to query a row, you query for the group
 field having the desired group.

 If possible, index the array as being sparse - no document for a disease
 if it is not present for that group.

 -- Jack Krupansky

 -Original Message- From: David Philip
 Sent: Saturday, October 12, 2013 9:56 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Storing 2 dimension array in Solr


 Hi Erick, Yes it is. But the columns here are dynamically and very
 frequently added.They can increase upto 1 million right now. So, 1 document
 with 1 million dynamic fields, is it fine? Or any other approach?

 While searching through web, I found that docValues are column oriented.
 http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
 However,  I did not understand, how to use docValues to add these columns.

 What is the recommended approach?

 Thanks - David






 On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com*
 *wrote:

  Isn't this just indexing each row as a separate document
 with a suitable ID groupN in your example?


 On Sat, Oct 12, 2013 at 2:43 PM, David Philip
 davidphilipshe...@gmail.com**wrote:

  Hi Erick,
 
 We have set of groups as represented below. New columns (diseases as
 in
  below matrix) keep coming and we need to add them as new column. To that
  column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
  notfound) for respective groups.
 
  While querying we need  to get the entire row for group:group1.  We
 will
  not be searching on columns(*_disease) values, index=false but stored is
  true.
 
  for ex: we use, get group:group1 and we need to get the entire row-
  exist,slight, not found. Hoping this explanation is clearer.
 
 disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group3slight exist
  groupK-na exist
 
 
 
  Thanks - David
 
 
 
 
 
  On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   David:
  
   This feels like it may be an XY problem. _Why_ do you
   want to store a 2-dimensional array and what
   do you want to do with it? Maybe there are better
   approaches.
  
   Best
   Erick
  
  
   On Sat, Oct 12, 2013 at 2:07 AM, David Philip
   davidphilipshe...@gmail.com**wrote:
  
Hi,
   
  I have a 2 dimension array and want it to be persisted in solr. 
   How
   can I
do that?
   
Sample case:
   
 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist
   
exist-1 not found - 2 slight-3 .. can be stored like this also.
   
Note: This array has frequent updates.  Every time new disease get's
   added
and I have to add description about that disease to all groups. And
 at
query time, I will do get by row  - get by group only group = group2
  row.
   
Any suggestion on how I can achieve this?  I am thankful to the  
  forum
  for
replying with patience, on achieving this, i will blog and will  
  share
  it
with all.
   
Thanks - David
   
  
 





Re: Storing 2 dimension array in Solr

2013-10-13 Thread Erick Erickson
This sounds like a denormalization issue. Don't be afraid G.

Actually, I've seen from 50M 50 300M small docs on a Solr node,
depending on query type, hardware, etc. So that gives you a
place to start being cautious about the number of docs in your
system. If your full expansion of your table numbers in that range,
you might be just fine denormalizing the data.

Alternatively, there's the pseudo join capability to consider. I'm
usually hesitant to recommend that, but Joel is committing some
really interesting stuff in the join area which you might take a look
at if the existing pseudo-join isn't performant enough.

But I'd consider denormalizing the data as the first approach.

Best,
Erick


On Sun, Oct 13, 2013 at 8:07 AM, David Philip
davidphilipshe...@gmail.comwrote:

 Hi Jack, for the point: each element of the array as a solr document, with
 a group field and a disease field
 Did you mean it this way:

 doc
   group1_grp: G1
  disease1_d: 2,
  disease2_d: 3,
 /doc
 doc
   group1_grp: G2
  disease1_d: 2,
  disease2_d: 3,
 disease3_d:  1,
 disease4_d:  1,
 /doc
 similar to first case: having dynamic fields for disease?
 Will it be performance issue if disease field increase to millions?











 On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  You may be better off indexing each element of the array as a solr
  document, with a group field and a disease field. Then you can easily and
  efficiently add new diseases. Then to query a row, you query for the
 group
  field having the desired group.
 
  If possible, index the array as being sparse - no document for a disease
  if it is not present for that group.
 
  -- Jack Krupansky
 
  -Original Message- From: David Philip
  Sent: Saturday, October 12, 2013 9:56 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Storing 2 dimension array in Solr
 
 
  Hi Erick, Yes it is. But the columns here are dynamically and very
  frequently added.They can increase upto 1 million right now. So, 1
 document
  with 1 million dynamic fields, is it fine? Or any other approach?
 
  While searching through web, I found that docValues are column oriented.
  http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/
 http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
  However,  I did not understand, how to use docValues to add these
 columns.
 
  What is the recommended approach?
 
  Thanks - David
 
 
 
 
 
 
  On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com
 *
  *wrote:
 
   Isn't this just indexing each row as a separate document
  with a suitable ID groupN in your example?
 
 
  On Sat, Oct 12, 2013 at 2:43 PM, David Philip
  davidphilipshe...@gmail.com**wrote:
 
   Hi Erick,
  
  We have set of groups as represented below. New columns (diseases
 as
  in
   below matrix) keep coming and we need to add them as new column. To
 that
   column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
   notfound) for respective groups.
  
   While querying we need  to get the entire row for group:group1.  We
  will
   not be searching on columns(*_disease) values, index=false but stored
 is
   true.
  
   for ex: we use, get group:group1 and we need to get the entire row-
   exist,slight, not found. Hoping this explanation is clearer.
  
  disease1disease2 disease3
   group1exist slight  not found
   groups2   slightnot foundexist
   group3slight exist
   groupK-na exist
  
  
  
   Thanks - David
  
  
  
  
  
   On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
David:
   
This feels like it may be an XY problem. _Why_ do you
want to store a 2-dimensional array and what
do you want to do with it? Maybe there are better
approaches.
   
Best
Erick
   
   
On Sat, Oct 12, 2013 at 2:07 AM, David Philip
davidphilipshe...@gmail.com**wrote:
   
 Hi,

   I have a 2 dimension array and want it to be persisted in solr.
 
How
can I
 do that?

 Sample case:

  disease1disease2 disease3
 group1exist slight  not found
 groups2   slightnot foundexist
 group2slight exist

 exist-1 not found - 2 slight-3 .. can be stored like this also.

 Note: This array has frequent updates.  Every time new disease
 get's
added
 and I have to add description about that disease to all groups.
 And
  at
 query time, I will do get by row  - get by group only group =
 group2
   row.

 Any suggestion on how I can achieve this?  I am thankful to the 
 
   forum
   for
 replying with patience, on achieving this, i will blog and will 
 
   share
   it
 with all.

 Thanks - David

   
  
 
 
 



Re: Storing 2 dimension array in Solr

2013-10-13 Thread Lee Carroll
I think he means a doc for each element. so you have a disease occurrence
index

doc
group1/group
dis1/dis
occurrenceexist/occurrence
unique Field1-1/unique field
/doc

assuming (and its a pretty fair assumption?) most groups have only a subset
of diseases this will be a sparse matrix so just don't index
the occurrence value does not exist

basically denormalize via adding fields which don't relate to the key.

This will work fine on modest hardware and no thought to performance for 5
million docs. It will work fine with some though and hardware for very
large numbers. Its worth a go anyway just to test. It should probably be
your first method to try out.




On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote:

 This sounds like a denormalization issue. Don't be afraid G.

 Actually, I've seen from 50M 50 300M small docs on a Solr node,
 depending on query type, hardware, etc. So that gives you a
 place to start being cautious about the number of docs in your
 system. If your full expansion of your table numbers in that range,
 you might be just fine denormalizing the data.

 Alternatively, there's the pseudo join capability to consider. I'm
 usually hesitant to recommend that, but Joel is committing some
 really interesting stuff in the join area which you might take a look
 at if the existing pseudo-join isn't performant enough.

 But I'd consider denormalizing the data as the first approach.

 Best,
 Erick


 On Sun, Oct 13, 2013 at 8:07 AM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi Jack, for the point: each element of the array as a solr document,
 with
  a group field and a disease field
  Did you mean it this way:
 
  doc
group1_grp: G1
   disease1_d: 2,
   disease2_d: 3,
  /doc
  doc
group1_grp: G2
   disease1_d: 2,
   disease2_d: 3,
  disease3_d:  1,
  disease4_d:  1,
  /doc
  similar to first case: having dynamic fields for disease?
  Will it be performance issue if disease field increase to millions?
 
 
 
 
 
 
 
 
 
 
 
  On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com
  wrote:
 
   You may be better off indexing each element of the array as a solr
   document, with a group field and a disease field. Then you can easily
 and
   efficiently add new diseases. Then to query a row, you query for the
  group
   field having the desired group.
  
   If possible, index the array as being sparse - no document for a
 disease
   if it is not present for that group.
  
   -- Jack Krupansky
  
   -Original Message- From: David Philip
   Sent: Saturday, October 12, 2013 9:56 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Storing 2 dimension array in Solr
  
  
   Hi Erick, Yes it is. But the columns here are dynamically and very
   frequently added.They can increase upto 1 million right now. So, 1
  document
   with 1 million dynamic fields, is it fine? Or any other approach?
  
   While searching through web, I found that docValues are column
 oriented.
   http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/
  http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
   However,  I did not understand, how to use docValues to add these
  columns.
  
   What is the recommended approach?
  
   Thanks - David
  
  
  
  
  
  
   On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson 
 erickerick...@gmail.com
  *
   *wrote:
  
Isn't this just indexing each row as a separate document
   with a suitable ID groupN in your example?
  
  
   On Sat, Oct 12, 2013 at 2:43 PM, David Philip
   davidphilipshe...@gmail.com**wrote:
  
Hi Erick,
   
   We have set of groups as represented below. New columns (diseases
  as
   in
below matrix) keep coming and we need to add them as new column. To
  that
column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
notfound) for respective groups.
   
While querying we need  to get the entire row for group:group1.
  We
   will
not be searching on columns(*_disease) values, index=false but
 stored
  is
true.
   
for ex: we use, get group:group1 and we need to get the entire
 row-
exist,slight, not found. Hoping this explanation is clearer.
   
   disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group3slight exist
groupK-na exist
   
   
   
Thanks - David
   
   
   
   
   
On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
 David:

 This feels like it may be an XY problem. _Why_ do you
 want to store a 2-dimensional array and what
 do you want to do with it? Maybe there are better
 approaches.

 Best
 Erick


 On Sat, Oct 12, 2013 at 2:07 AM, David Philip
 davidphilipshe...@gmail.com**wrote:

  Hi,
 
I have a 2 dimension array and want it to be persisted in
 solr

Re: Storing 2 dimension array in Solr

2013-10-13 Thread Jack Krupansky
Yeah, something like that. The key or ID field would probably just be the 
composition of the group and disease fields.


The other thing is if occurrence is simply a boolean, omit it and omit the 
document if that disease is not present for that group. If the majority of 
the diseases are not present for a specified group, that would eliminate a 
lot of documents. Or if occurrence is not a boolean, keep the field, but 
again not add a document if the disease is not present for that group.


My usual, over-generalized rule for dynamic fields is that they are a 
powerful tool, but only if used in moderation. Millions would not be 
moderation.


-- Jack Krupansky

-Original Message- 
From: Lee Carroll

Sent: Sunday, October 13, 2013 8:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Storing 2 dimension array in Solr

I think he means a doc for each element. so you have a disease occurrence
index

doc
group1/group
dis1/dis
occurrenceexist/occurrence
unique Field1-1/unique field
/doc

assuming (and its a pretty fair assumption?) most groups have only a subset
of diseases this will be a sparse matrix so just don't index
the occurrence value does not exist

basically denormalize via adding fields which don't relate to the key.

This will work fine on modest hardware and no thought to performance for 5
million docs. It will work fine with some though and hardware for very
large numbers. Its worth a go anyway just to test. It should probably be
your first method to try out.




On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote:


This sounds like a denormalization issue. Don't be afraid G.

Actually, I've seen from 50M 50 300M small docs on a Solr node,
depending on query type, hardware, etc. So that gives you a
place to start being cautious about the number of docs in your
system. If your full expansion of your table numbers in that range,
you might be just fine denormalizing the data.

Alternatively, there's the pseudo join capability to consider. I'm
usually hesitant to recommend that, but Joel is committing some
really interesting stuff in the join area which you might take a look
at if the existing pseudo-join isn't performant enough.

But I'd consider denormalizing the data as the first approach.

Best,
Erick


On Sun, Oct 13, 2013 at 8:07 AM, David Philip
davidphilipshe...@gmail.comwrote:

 Hi Jack, for the point: each element of the array as a solr document,
with
 a group field and a disease field
 Did you mean it this way:

 doc
   group1_grp: G1
  disease1_d: 2,
  disease2_d: 3,
 /doc
 doc
   group1_grp: G2
  disease1_d: 2,
  disease2_d: 3,
 disease3_d:  1,
 disease4_d:  1,
 /doc
 similar to first case: having dynamic fields for disease?
 Will it be performance issue if disease field increase to millions?











 On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  You may be better off indexing each element of the array as a solr
  document, with a group field and a disease field. Then you can easily
and
  efficiently add new diseases. Then to query a row, you query for the
 group
  field having the desired group.
 
  If possible, index the array as being sparse - no document for a
disease
  if it is not present for that group.
 
  -- Jack Krupansky
 
  -Original Message- From: David Philip
  Sent: Saturday, October 12, 2013 9:56 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Storing 2 dimension array in Solr
 
 
  Hi Erick, Yes it is. But the columns here are dynamically and very
  frequently added.They can increase upto 1 million right now. So, 1
 document
  with 1 million dynamic fields, is it fine? Or any other approach?
 
  While searching through web, I found that docValues are column
oriented.
  http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/
 http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
  However,  I did not understand, how to use docValues to add these
 columns.
 
  What is the recommended approach?
 
  Thanks - David
 
 
 
 
 
 
  On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson 
erickerick...@gmail.com
 *
  *wrote:
 
   Isn't this just indexing each row as a separate document
  with a suitable ID groupN in your example?
 
 
  On Sat, Oct 12, 2013 at 2:43 PM, David Philip
  davidphilipshe...@gmail.com**wrote:
 
   Hi Erick,
  
  We have set of groups as represented below. New columns 
   (diseases

 as
  in
   below matrix) keep coming and we need to add them as new column. To
 that
   column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
   notfound) for respective groups.
  
   While querying we need  to get the entire row for group:group1.
 We
  will
   not be searching on columns(*_disease) values, index=false but
stored
 is
   true.
  
   for ex: we use, get group:group1 and we need to get the entire
row-
   exist,slight, not found. Hoping this explanation is clearer.
  
  disease1disease2 disease3
   group1exist slight

Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi,

  I have a 2 dimension array and want it to be persisted in solr. How can I
do that?

Sample case:

 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist

exist-1 not found - 2 slight-3 .. can be stored like this also.

Note: This array has frequent updates.  Every time new disease get's added
and I have to add description about that disease to all groups. And at
query time, I will do get by row  - get by group only group = group2 row.

Any suggestion on how I can achieve this?  I am thankful to the forum for
replying with patience, on achieving this, i will blog and will share it
with all.

Thanks - David


Re: Storing 2 dimension array in Solr

2013-10-12 Thread Erick Erickson
David:

This feels like it may be an XY problem. _Why_ do you
want to store a 2-dimensional array and what
do you want to do with it? Maybe there are better
approaches.

Best
Erick


On Sat, Oct 12, 2013 at 2:07 AM, David Philip
davidphilipshe...@gmail.comwrote:

 Hi,

   I have a 2 dimension array and want it to be persisted in solr. How can I
 do that?

 Sample case:

  disease1disease2 disease3
 group1exist slight  not found
 groups2   slightnot foundexist
 group2slight exist

 exist-1 not found - 2 slight-3 .. can be stored like this also.

 Note: This array has frequent updates.  Every time new disease get's added
 and I have to add description about that disease to all groups. And at
 query time, I will do get by row  - get by group only group = group2 row.

 Any suggestion on how I can achieve this?  I am thankful to the forum for
 replying with patience, on achieving this, i will blog and will share it
 with all.

 Thanks - David



Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick,

   We have set of groups as represented below. New columns (diseases as in
below matrix) keep coming and we need to add them as new column. To that
column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
notfound) for respective groups.

While querying we need  to get the entire row for group:group1.  We will
not be searching on columns(*_disease) values, index=false but stored is
true.

for ex: we use, get group:group1 and we need to get the entire row-
exist,slight, not found. Hoping this explanation is clearer.

   disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group3slight exist
groupK-na exist



Thanks - David





On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 David:

 This feels like it may be an XY problem. _Why_ do you
 want to store a 2-dimensional array and what
 do you want to do with it? Maybe there are better
 approaches.

 Best
 Erick


 On Sat, Oct 12, 2013 at 2:07 AM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi,
 
I have a 2 dimension array and want it to be persisted in solr. How
 can I
  do that?
 
  Sample case:
 
   disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group2slight exist
 
  exist-1 not found - 2 slight-3 .. can be stored like this also.
 
  Note: This array has frequent updates.  Every time new disease get's
 added
  and I have to add description about that disease to all groups. And at
  query time, I will do get by row  - get by group only group = group2 row.
 
  Any suggestion on how I can achieve this?  I am thankful to the forum for
  replying with patience, on achieving this, i will blog and will share it
  with all.
 
  Thanks - David
 



Re: Storing 2 dimension array in Solr

2013-10-12 Thread Erick Erickson
Isn't this just indexing each row as a separate document
with a suitable ID groupN in your example?


On Sat, Oct 12, 2013 at 2:43 PM, David Philip
davidphilipshe...@gmail.comwrote:

 Hi Erick,

We have set of groups as represented below. New columns (diseases as in
 below matrix) keep coming and we need to add them as new column. To that
 column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
 notfound) for respective groups.

 While querying we need  to get the entire row for group:group1.  We will
 not be searching on columns(*_disease) values, index=false but stored is
 true.

 for ex: we use, get group:group1 and we need to get the entire row-
 exist,slight, not found. Hoping this explanation is clearer.

disease1disease2 disease3
 group1exist slight  not found
 groups2   slightnot foundexist
 group3slight exist
 groupK-na exist



 Thanks - David





 On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  David:
 
  This feels like it may be an XY problem. _Why_ do you
  want to store a 2-dimensional array and what
  do you want to do with it? Maybe there are better
  approaches.
 
  Best
  Erick
 
 
  On Sat, Oct 12, 2013 at 2:07 AM, David Philip
  davidphilipshe...@gmail.comwrote:
 
   Hi,
  
 I have a 2 dimension array and want it to be persisted in solr. How
  can I
   do that?
  
   Sample case:
  
disease1disease2 disease3
   group1exist slight  not found
   groups2   slightnot foundexist
   group2slight exist
  
   exist-1 not found - 2 slight-3 .. can be stored like this also.
  
   Note: This array has frequent updates.  Every time new disease get's
  added
   and I have to add description about that disease to all groups. And at
   query time, I will do get by row  - get by group only group = group2
 row.
  
   Any suggestion on how I can achieve this?  I am thankful to the forum
 for
   replying with patience, on achieving this, i will blog and will share
 it
   with all.
  
   Thanks - David
  
 



Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick, Yes it is. But the columns here are dynamically and very
frequently added.They can increase upto 1 million right now. So, 1 document
with 1 million dynamic fields, is it fine? Or any other approach?

While searching through web, I found that docValues are column oriented.
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
However,  I did not understand, how to use docValues to add these columns.

What is the recommended approach?

Thanks - David






On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote:

 Isn't this just indexing each row as a separate document
 with a suitable ID groupN in your example?


 On Sat, Oct 12, 2013 at 2:43 PM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi Erick,
 
 We have set of groups as represented below. New columns (diseases as
 in
  below matrix) keep coming and we need to add them as new column. To that
  column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
  notfound) for respective groups.
 
  While querying we need  to get the entire row for group:group1.  We
 will
  not be searching on columns(*_disease) values, index=false but stored is
  true.
 
  for ex: we use, get group:group1 and we need to get the entire row-
  exist,slight, not found. Hoping this explanation is clearer.
 
 disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group3slight exist
  groupK-na exist
 
 
 
  Thanks - David
 
 
 
 
 
  On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   David:
  
   This feels like it may be an XY problem. _Why_ do you
   want to store a 2-dimensional array and what
   do you want to do with it? Maybe there are better
   approaches.
  
   Best
   Erick
  
  
   On Sat, Oct 12, 2013 at 2:07 AM, David Philip
   davidphilipshe...@gmail.comwrote:
  
Hi,
   
  I have a 2 dimension array and want it to be persisted in solr. How
   can I
do that?
   
Sample case:
   
 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist
   
exist-1 not found - 2 slight-3 .. can be stored like this also.
   
Note: This array has frequent updates.  Every time new disease get's
   added
and I have to add description about that disease to all groups. And
 at
query time, I will do get by row  - get by group only group = group2
  row.
   
Any suggestion on how I can achieve this?  I am thankful to the forum
  for
replying with patience, on achieving this, i will blog and will share
  it
with all.
   
Thanks - David
   
  
 



Re: Storing 2 dimension array in Solr

2013-10-12 Thread Jack Krupansky
You may be better off indexing each element of the array as a solr document, 
with a group field and a disease field. Then you can easily and efficiently 
add new diseases. Then to query a row, you query for the group field having 
the desired group.


If possible, index the array as being sparse - no document for a disease if 
it is not present for that group.


-- Jack Krupansky

-Original Message- 
From: David Philip

Sent: Saturday, October 12, 2013 9:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Storing 2 dimension array in Solr

Hi Erick, Yes it is. But the columns here are dynamically and very
frequently added.They can increase upto 1 million right now. So, 1 document
with 1 million dynamic fields, is it fine? Or any other approach?

While searching through web, I found that docValues are column oriented.
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
However,  I did not understand, how to use docValues to add these columns.

What is the recommended approach?

Thanks - David






On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson 
erickerick...@gmail.comwrote:



Isn't this just indexing each row as a separate document
with a suitable ID groupN in your example?


On Sat, Oct 12, 2013 at 2:43 PM, David Philip
davidphilipshe...@gmail.comwrote:

 Hi Erick,

We have set of groups as represented below. New columns (diseases as
in
 below matrix) keep coming and we need to add them as new column. To that
 column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
 notfound) for respective groups.

 While querying we need  to get the entire row for group:group1.  We
will
 not be searching on columns(*_disease) values, index=false but stored is
 true.

 for ex: we use, get group:group1 and we need to get the entire row-
 exist,slight, not found. Hoping this explanation is clearer.

disease1disease2 disease3
 group1exist slight  not found
 groups2   slightnot foundexist
 group3slight exist
 groupK-na exist



 Thanks - David





 On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
erickerick...@gmail.com
 wrote:

  David:
 
  This feels like it may be an XY problem. _Why_ do you
  want to store a 2-dimensional array and what
  do you want to do with it? Maybe there are better
  approaches.
 
  Best
  Erick
 
 
  On Sat, Oct 12, 2013 at 2:07 AM, David Philip
  davidphilipshe...@gmail.comwrote:
 
   Hi,
  
 I have a 2 dimension array and want it to be persisted in solr. 
   How

  can I
   do that?
  
   Sample case:
  
disease1disease2 disease3
   group1exist slight  not found
   groups2   slightnot foundexist
   group2slight exist
  
   exist-1 not found - 2 slight-3 .. can be stored like this also.
  
   Note: This array has frequent updates.  Every time new disease get's
  added
   and I have to add description about that disease to all groups. And
at
   query time, I will do get by row  - get by group only group = group2
 row.
  
   Any suggestion on how I can achieve this?  I am thankful to the 
   forum

 for
   replying with patience, on achieving this, i will blog and will 
   share

 it
   with all.
  
   Thanks - David