Re: SOLR - Documents with large number of fields ~ 450
Hi John, Mark is right. DocValues can be enabled in two ways: RAM resident (default) or on-disk. You can read more here: http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues Regards. On 22 March 2013 16:55, John Nielsen wrote: > "with the on disk option". > > Could you elaborate on that? > Den 22/03/2013 05.25 skrev "Mark Miller" : > > > You might try using docvalues with the on disk option and try and let the > > OS manage all the memory needed for all the faceting/sorting. This would > > require Solr 4.2. > > > > - Mark > > > > On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote: > > > > > Hello All, > > > > > > Scenario: > > > > > > My data model consist of approx. 450 fields with different types of > > data. We > > > want to include each field for indexing as a result it will create a > > single > > > SOLR document with *450 fields*. The total of number of records in the > > data > > > set is *755K*. We will be using the features like faceting and sorting > on > > > approx. 50 fields. > > > > > > We are planning to use SOLR 4.1. Following is the hardware > configuration > > of > > > the web server that we plan to install SOLR on:- > > > > > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB > > > > > > Questions : > > > > > > 1)What's the best approach when dealing with documents with large > number > > of > > > fields. What's the drawback of having a single document with a very > large > > > number of fields. Does SOLR support documents with large number of > > fields as > > > in my case? > > > > > > 2)Will there be any performance issue if i define all of the 450 fields > > for > > > indexing? Also if faceting is done on 50 fields with document having > > large > > > number of fields and huge number of records? > > > > > > 3)The name of the fields in the data set are quiet lengthy around 60 > > > characters. Will it be a problem defining fields with such a huge name > in > > > the schema file? Is there any best practice to be followed related to > > naming > > > convention? Will big field names create problem during querying? > > > > > > Thanks! > > > > > > > > > > > > -- > > > View this message in context: > > > http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >
Re: SOLR - Documents with large number of fields ~ 450
"with the on disk option". Could you elaborate on that? Den 22/03/2013 05.25 skrev "Mark Miller" : > You might try using docvalues with the on disk option and try and let the > OS manage all the memory needed for all the faceting/sorting. This would > require Solr 4.2. > > - Mark > > On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote: > > > Hello All, > > > > Scenario: > > > > My data model consist of approx. 450 fields with different types of > data. We > > want to include each field for indexing as a result it will create a > single > > SOLR document with *450 fields*. The total of number of records in the > data > > set is *755K*. We will be using the features like faceting and sorting on > > approx. 50 fields. > > > > We are planning to use SOLR 4.1. Following is the hardware configuration > of > > the web server that we plan to install SOLR on:- > > > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB > > > > Questions : > > > > 1)What's the best approach when dealing with documents with large number > of > > fields. What's the drawback of having a single document with a very large > > number of fields. Does SOLR support documents with large number of > fields as > > in my case? > > > > 2)Will there be any performance issue if i define all of the 450 fields > for > > indexing? Also if faceting is done on 50 fields with document having > large > > number of fields and huge number of records? > > > > 3)The name of the fields in the data set are quiet lengthy around 60 > > characters. Will it be a problem defining fields with such a huge name in > > the schema file? Is there any best practice to be followed related to > naming > > convention? Will big field names create problem during querying? > > > > Thanks! > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html > > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: SOLR - Documents with large number of fields ~ 450
Hi, I have a collection with more than 4K fields, but mostly Trie*Fields types. It is used for faceting,sorting,searching and statsComponent. It works pretty fine on Amazon 4xm1.large (7.5GB RAM) EC2 boxes. I'm using SolrCloud, multi A-Z setup and ephemeral storage. Index is managed by mmap, 4GB for Java heap, CMS for GC. Currently there is 800K records, but will be about 2m. Queries response is much longer (couple to dozen of seconds) during bulk loading, but this is rather typical as I think. Indexing takes much much longer than in case of records with less number of fields. I'm sending updates in 5MB batches. No OOM issues. Regarding DocValues: I believe they are great improvement for faceting, but they are annoying because of their limitations: as far as I checked a field has to be required or to have default value which is not possible in my case (I can't set some figures to 0 by default as it may impact other results displayed to the end user, which is not good). I wish it could change. Regards. On 21 March 2013 07:56, kobe.free.wo...@gmail.com wrote: > Hello All, > > Scenario: > > My data model consist of approx. 450 fields with different types of data. > We > want to include each field for indexing as a result it will create a single > SOLR document with *450 fields*. The total of number of records in the data > set is *755K*. We will be using the features like faceting and sorting on > approx. 50 fields. > > We are planning to use SOLR 4.1. Following is the hardware configuration of > the web server that we plan to install SOLR on:- > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB > > Questions : > > 1)What's the best approach when dealing with documents with large number of > fields. What's the drawback of having a single document with a very large > number of fields. Does SOLR support documents with large number of fields > as > in my case? > > 2)Will there be any performance issue if i define all of the 450 fields for > indexing? Also if faceting is done on 50 fields with document having large > number of fields and huge number of records? > > 3)The name of the fields in the data set are quiet lengthy around 60 > characters. Will it be a problem defining fields with such a huge name in > the schema file? Is there any best practice to be followed related to > naming > convention? Will big field names create problem during querying? > > Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR - Documents with large number of fields ~ 450
You might try using docvalues with the on disk option and try and let the OS manage all the memory needed for all the faceting/sorting. This would require Solr 4.2. - Mark On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote: > Hello All, > > Scenario: > > My data model consist of approx. 450 fields with different types of data. We > want to include each field for indexing as a result it will create a single > SOLR document with *450 fields*. The total of number of records in the data > set is *755K*. We will be using the features like faceting and sorting on > approx. 50 fields. > > We are planning to use SOLR 4.1. Following is the hardware configuration of > the web server that we plan to install SOLR on:- > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB > > Questions : > > 1)What's the best approach when dealing with documents with large number of > fields. What's the drawback of having a single document with a very large > number of fields. Does SOLR support documents with large number of fields as > in my case? > > 2)Will there be any performance issue if i define all of the 450 fields for > indexing? Also if faceting is done on 50 fields with document having large > number of fields and huge number of records? > > 3)The name of the fields in the data set are quiet lengthy around 60 > characters. Will it be a problem defining fields with such a huge name in > the schema file? Is there any best practice to be followed related to naming > convention? Will big field names create problem during querying? > > Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - Documents with large number of fields ~ 450
Hi, In short, I suspect you'll OOM if you sort and facet on all these fields. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Mar 21, 2013 at 2:56 AM, kobe.free.wo...@gmail.com < kobe.free.wo...@gmail.com> wrote: > Hello All, > > Scenario: > > My data model consist of approx. 450 fields with different types of data. > We > want to include each field for indexing as a result it will create a single > SOLR document with *450 fields*. The total of number of records in the data > set is *755K*. We will be using the features like faceting and sorting on > approx. 50 fields. > > We are planning to use SOLR 4.1. Following is the hardware configuration of > the web server that we plan to install SOLR on:- > > CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB > > Questions : > > 1)What's the best approach when dealing with documents with large number of > fields. What's the drawback of having a single document with a very large > number of fields. Does SOLR support documents with large number of fields > as > in my case? > > 2)Will there be any performance issue if i define all of the 450 fields for > indexing? Also if faceting is done on 50 fields with document having large > number of fields and huge number of records? > > 3)The name of the fields in the data set are quiet lengthy around 60 > characters. Will it be a problem defining fields with such a huge name in > the schema file? Is there any best practice to be followed related to > naming > convention? Will big field names create problem during querying? > > Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR - Documents with large number of fields ~ 450
You will definitely be pushing the limits for reasonable performance. Maybe 4-5 years from now you will be able to get decent performance with hundreds of fields and dozens of faceted fields, but I'd be surprised if you could get decent performance with more than about 100 fields and a dozen facets. The length of a field name should not be a problem for queries other than readability. Just be sure to stick with Java-style names (alpha, digit, underscore). The bottom line: Do a proof of concept (POC) first - and tell us how it performs. -- Jack Krupansky -Original Message- From: kobe.free.wo...@gmail.com Sent: Thursday, March 21, 2013 2:56 AM To: solr-user@lucene.apache.org Subject: SOLR - Documents with large number of fields ~ 450 Hello All, Scenario: My data model consist of approx. 450 fields with different types of data. We want to include each field for indexing as a result it will create a single SOLR document with *450 fields*. The total of number of records in the data set is *755K*. We will be using the features like faceting and sorting on approx. 50 fields. We are planning to use SOLR 4.1. Following is the hardware configuration of the web server that we plan to install SOLR on:- CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB Questions : 1)What's the best approach when dealing with documents with large number of fields. What's the drawback of having a single document with a very large number of fields. Does SOLR support documents with large number of fields as in my case? 2)Will there be any performance issue if i define all of the 450 fields for indexing? Also if faceting is done on 50 fields with document having large number of fields and huge number of records? 3)The name of the fields in the data set are quiet lengthy around 60 characters. Will it be a problem defining fields with such a huge name in the schema file? Is there any best practice to be followed related to naming convention? Will big field names create problem during querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR - Documents with large number of fields ~ 450
Hello All, Scenario: My data model consist of approx. 450 fields with different types of data. We want to include each field for indexing as a result it will create a single SOLR document with *450 fields*. The total of number of records in the data set is *755K*. We will be using the features like faceting and sorting on approx. 50 fields. We are planning to use SOLR 4.1. Following is the hardware configuration of the web server that we plan to install SOLR on:- CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB Questions : 1)What's the best approach when dealing with documents with large number of fields. What's the drawback of having a single document with a very large number of fields. Does SOLR support documents with large number of fields as in my case? 2)Will there be any performance issue if i define all of the 450 fields for indexing? Also if faceting is done on 50 fields with document having large number of fields and huge number of records? 3)The name of the fields in the data set are quiet lengthy around 60 characters. Will it be a problem defining fields with such a huge name in the schema file? Is there any best practice to be followed related to naming convention? Will big field names create problem during querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents With large number of fields
Indexing should be fine - depending on your total document count. I think the potential issue is the FieldCache at query time. I think it should be linear based on number of documents, fields, and unique terms per field for string values, so if you do two tests, index with 1,000 docs and then 2,000 docs, and then check Java memory usage after a simple query, then after a query with a significant number of these faceted fields, and then after a couple more queries with a high number of distinct fields that are faceted, and then multiply those memory use increments to scale up to your expected range of documents, that should give you a semi-decent estimate of memory the JVM will need. CPU requirement estimating would be more complex, but memory has to work out first. And the delta for index size between 1,000 and 2,000 should give you a number to scale up to total index size, roughly, but depending on relative uniqueness of field values. -- Jack Krupansky -Original Message- From: Keswani, Nitin - BLS CTR Sent: Monday, May 14, 2012 10:27 AM To: solr-user@lucene.apache.org Subject: RE: Documents With large number of fields Unfortunately I never got any response. However I did a POC with a Document containing 400 fields and loaded around 1000 docs to my local machine. I didn’t see any issue but then again the document set was very small. Hopefully as mentioned below providing enough memory should help alleviate any performance issues. Thanks. Regards, Nitin Keswani -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Sunday, May 13, 2012 10:42 PM To: solr-user@lucene.apache.org Subject: Re: Documents With large number of fields I didn't see any response. There was a similar issue recently, where someone had 400 faceted fields with 50-70 facets per query and they were running out of memory due to accumulation of the FieldCache for these faceted fields, but that was on a 3 GB system. It probably could be done, assuming a fair number of 64-bit sharded machines. -- Jack Krupansky -Original Message- From: Darren Govoni Sent: Sunday, May 13, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Re: Documents With large number of fields Was there a response to this? On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: Hi, My data model consist of different types of data. Each data type has its own characteristics If I include the unique characteristics of each type of data, my single Solr Document could end up containing 300-400 fields. In order to drill down to this data set I would have to provide faceting on most of these fields so that I can drilldown to very small set of Documents. Here are some of the questions : 1) What's the best approach when dealing with documents with large number of fields . Should I keep a single document with large number of fields or split my document into a number of smaller documents where each document would consist of some fields 2) From an operational point of view, what's the drawback of having a single document with a very large number of fields. Can Solr support documents with large number of fields (say 300 to 400). Thanks. Regards, Nitin Keswani
Re: Documents With large number of fields
Nitin, I meant to reply, but I think the thing to watch out for are Lucene segment merges. I think this is another thing I saw in a client engagement where the client had a crazy number of fields. If I recall correctly, it was segment merges that were painfully slow. So try creating a non-trivial index where you have some big Lucene index segments and see how that goes. This was a while back before various Lucene improvements around indexing (different merge policies, non-blocking flushing to disk, etc.) were implemented, so things may be different now. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm > > From: "Keswani, Nitin - BLS CTR" >To: "solr-user@lucene.apache.org" >Sent: Monday, May 14, 2012 10:27 AM >Subject: RE: Documents With large number of fields > >Unfortunately I never got any response. However I did a POC with a Document >containing 400 fields and loaded around 1000 docs to my local machine. I >didn’t see any issue but then again the document set was very small. Hopefully >as mentioned below providing enough memory should help alleviate any >performance issues. > >Thanks. > >Regards, > >Nitin Keswani > > >-Original Message- >From: Jack Krupansky [mailto:j...@basetechnology.com] >Sent: Sunday, May 13, 2012 10:42 PM >To: solr-user@lucene.apache.org >Subject: Re: Documents With large number of fields > >I didn't see any response. There was a similar issue recently, where someone >had 400 faceted fields with 50-70 facets per query and they were running out >of memory due to accumulation of the FieldCache for these faceted fields, but >that was on a 3 GB system. > >It probably could be done, assuming a fair number of 64-bit sharded machines. > >-- Jack Krupansky > >-Original Message- >From: Darren Govoni >Sent: Sunday, May 13, 2012 7:56 PM >To: solr-user@lucene.apache.org >Subject: Re: Documents With large number of fields > >Was there a response to this? > >On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: >> Hi, >> >> My data model consist of different types of data. Each data type has >> its own characteristics >> >> If I include the unique characteristics of each type of data, my >> single Solr Document could end up containing 300-400 fields. >> >> In order to drill down to this data set I would have to provide >> faceting on most of these fields so that I can drilldown to very small >> set of Documents. >> >> Here are some of the questions : >> >> 1) What's the best approach when dealing with documents with large >> number of fields . >> Should I keep a single document with large number of fields or >> split my >> document into a number of smaller documents where each document >> would consist of some fields >> >> 2) From an operational point of view, what's the drawback of having a >> single document with a very large number of fields. >> Can Solr support documents with large number of fields (say 300 to >> 400). >> >> >> Thanks. >> >> Regards, >> >> Nitin Keswani >> > > > >
RE: Documents With large number of fields
Unfortunately I never got any response. However I did a POC with a Document containing 400 fields and loaded around 1000 docs to my local machine. I didn’t see any issue but then again the document set was very small. Hopefully as mentioned below providing enough memory should help alleviate any performance issues. Thanks. Regards, Nitin Keswani -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Sunday, May 13, 2012 10:42 PM To: solr-user@lucene.apache.org Subject: Re: Documents With large number of fields I didn't see any response. There was a similar issue recently, where someone had 400 faceted fields with 50-70 facets per query and they were running out of memory due to accumulation of the FieldCache for these faceted fields, but that was on a 3 GB system. It probably could be done, assuming a fair number of 64-bit sharded machines. -- Jack Krupansky -Original Message- From: Darren Govoni Sent: Sunday, May 13, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Re: Documents With large number of fields Was there a response to this? On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: > Hi, > > My data model consist of different types of data. Each data type has > its own characteristics > > If I include the unique characteristics of each type of data, my > single Solr Document could end up containing 300-400 fields. > > In order to drill down to this data set I would have to provide > faceting on most of these fields so that I can drilldown to very small > set of Documents. > > Here are some of the questions : > > 1) What's the best approach when dealing with documents with large > number of fields . > Should I keep a single document with large number of fields or > split my > document into a number of smaller documents where each document > would consist of some fields > > 2) From an operational point of view, what's the drawback of having a > single document with a very large number of fields. > Can Solr support documents with large number of fields (say 300 to > 400). > > > Thanks. > > Regards, > > Nitin Keswani >
Re: Documents With large number of fields
I didn't see any response. There was a similar issue recently, where someone had 400 faceted fields with 50-70 facets per query and they were running out of memory due to accumulation of the FieldCache for these faceted fields, but that was on a 3 GB system. It probably could be done, assuming a fair number of 64-bit sharded machines. -- Jack Krupansky -Original Message- From: Darren Govoni Sent: Sunday, May 13, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Re: Documents With large number of fields Was there a response to this? On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: Hi, My data model consist of different types of data. Each data type has its own characteristics If I include the unique characteristics of each type of data, my single Solr Document could end up containing 300-400 fields. In order to drill down to this data set I would have to provide faceting on most of these fields so that I can drilldown to very small set of Documents. Here are some of the questions : 1) What's the best approach when dealing with documents with large number of fields . Should I keep a single document with large number of fields or split my document into a number of smaller documents where each document would consist of some fields 2) From an operational point of view, what's the drawback of having a single document with a very large number of fields. Can Solr support documents with large number of fields (say 300 to 400). Thanks. Regards, Nitin Keswani
Re: Documents With large number of fields
Was there a response to this? On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: > Hi, > > My data model consist of different types of data. Each data type has its own > characteristics > > If I include the unique characteristics of each type of data, my single Solr > Document could end up containing 300-400 fields. > > In order to drill down to this data set I would have to provide faceting on > most of these fields so that I can drilldown to very small set of > Documents. > > Here are some of the questions : > > 1) What's the best approach when dealing with documents with large number of > fields . > Should I keep a single document with large number of fields or split my > document into a number of smaller documents where each document would > consist of some fields > > 2) From an operational point of view, what's the drawback of having a single > document with a very large number of fields. > Can Solr support documents with large number of fields (say 300 to 400). > > > Thanks. > > Regards, > > Nitin Keswani >
Re: Documents With large number of fields
I'm also interested in this. Same situation. On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: > Hi, > > My data model consist of different types of data. Each data type has its own > characteristics > > If I include the unique characteristics of each type of data, my single Solr > Document could end up containing 300-400 fields. > > In order to drill down to this data set I would have to provide faceting on > most of these fields so that I can drilldown to very small set of > Documents. > > Here are some of the questions : > > 1) What's the best approach when dealing with documents with large number of > fields . > Should I keep a single document with large number of fields or split my > document into a number of smaller documents where each document would > consist of some fields > > 2) From an operational point of view, what's the drawback of having a single > document with a very large number of fields. > Can Solr support documents with large number of fields (say 300 to 400). > > > Thanks. > > Regards, > > Nitin Keswani >
Documents With large number of fields
Hi, My data model consist of different types of data. Each data type has its own characteristics If I include the unique characteristics of each type of data, my single Solr Document could end up containing 300-400 fields. In order to drill down to this data set I would have to provide faceting on most of these fields so that I can drilldown to very small set of Documents. Here are some of the questions : 1) What's the best approach when dealing with documents with large number of fields . Should I keep a single document with large number of fields or split my document into a number of smaller documents where each document would consist of some fields 2) From an operational point of view, what's the drawback of having a single document with a very large number of fields. Can Solr support documents with large number of fields (say 300 to 400). Thanks. Regards, Nitin Keswani