Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Charlie Hull

Hi Jeremy,

You might find our recent blog on Debugging Solr Performance Issues 
useful 
https://opensourceconnections.com/blog/2021/01/05/a-solr-performance-debugging-toolkit/ 
- also check out Savan Das' blog which is linked within.


Best

Charlie

On 12/01/2021 14:53, Michael Gibney wrote:

Ahh ok. If those are your only fieldType definitions, and most of your
config is copied from the default, then SOLR-13336 is unlikely to be the
culprit. Looking at more general options, off the top of my head:
1. make sure you haven't allocated all physical memory to heap (leave a
decent amount for OS page cache)
2. disable swap, if you can (this is esp. important if using network
storage as swap). There are potential downsides to this (so proceed with
caution); but if part of your heap gets swapped out (and it almost
certainly will, with a sufficiently large heap) full GCs lead to a swap
storm that compounds the problem. (fwiw, this is probably the first thing
I'd recommend looking into and trying, because it's so easy, and can in
some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
-a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
eliminate swapping in the way that's needed to achieve the desired goal in
this case. Again, exercise caution in doing this, discuss, research, etc.).
Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
well:
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
-- the note there about "lowering swappiness" being an acceptable
alternative contradicts my experience, but I suppose ymmv?
3. if you're faceting on fields -- especially high-cardinality fields (many
values) -- make sure that you have `docValues=true, uninvertible=false`
configured (to ensure that you're not building large on-heap data
structures when there's an alternative that doesn't require it.

These are all recommendations that are explained in more detail by others
elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
upgrading if you have the (human) bandwidth to do so. Good luck!

Michael

On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith  wrote:


Thanks Michael,
  SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
these are the relevant sections from our schema definition:

 
   
 
 
   
   
 
 
   
 
 
   
 
 
 
   
   
 
 
 
 
   
 

Our other fieldTypes don't have any analyzers attached to them.


If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
solr 8?  It doesn't look like the fix was back patched to 7.x.

Our schema has some issues arising from not fully understanding Solr and
just copying existing structures from the defaults.  In this case,
stopwords.txt is completely empty and synonyms.txt is just the default
synonyms.txt, which seems not useful at all for us.  Could I just take out
the StopFilterFactory and SynonymGraphFilterFactory from the query section
(and maybe the StopFilterFactory from the index section as well)?

Thanks again,
Jeremy


From: Michael Gibney 
Sent: Monday, January 11, 2021 8:30 PM
To: solr-user@lucene.apache.org 
Subject: Re: Solr using all available CPU and becoming unresponsive

Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:


Hello all,
  We have been struggling with an issue where solr will intermittently
use all available CPU and become unresponsive.  It will remain in this
state until we restart.  Solr will remain stable for some time, usually a
few hours to a few days, before this happens again.  We've tried

adjusting

the caches and adding memory to both the VM and JVM, but we haven't been
able to solve the issue yet.

Here is some info about our server:
Solr:
   Solr 7.3.1, running on Java 1.8
   Running in cloud mode, but there's only one core

Host:
   CentOS7
   8 CPU, 56GB RAM
   The only other processes running on this VM are two zookeepers, one for
this Solr instance, one for another Solr instance

Solr Config:
  - One Core
  - 36 Million documents (Max Doc), 28 million (Num Docs)
  - ~15GB
  - 10-20 Requests/second
  - The schema is fairly large (~100 fields) and we allow faceting and
searching on many, but not all, of the fields
  - Data are imported once per minute through the DataImportHandler, with

a

hard commit at the end.  We usually index ~100-500 documents per minute,
with many of these being updates to existing documents.

Cache settings:
 

 

 

For the filterCache, we have tried sizes as low as 128, which caused our
CPU usage to go up and didn't solve our issue.  autowarmCount used to be
m

Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Michael Gibney
Ahh ok. If those are your only fieldType definitions, and most of your
config is copied from the default, then SOLR-13336 is unlikely to be the
culprit. Looking at more general options, off the top of my head:
1. make sure you haven't allocated all physical memory to heap (leave a
decent amount for OS page cache)
2. disable swap, if you can (this is esp. important if using network
storage as swap). There are potential downsides to this (so proceed with
caution); but if part of your heap gets swapped out (and it almost
certainly will, with a sufficiently large heap) full GCs lead to a swap
storm that compounds the problem. (fwiw, this is probably the first thing
I'd recommend looking into and trying, because it's so easy, and can in
some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
-a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
eliminate swapping in the way that's needed to achieve the desired goal in
this case. Again, exercise caution in doing this, discuss, research, etc.).
Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
well:
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
-- the note there about "lowering swappiness" being an acceptable
alternative contradicts my experience, but I suppose ymmv?
3. if you're faceting on fields -- especially high-cardinality fields (many
values) -- make sure that you have `docValues=true, uninvertible=false`
configured (to ensure that you're not building large on-heap data
structures when there's an alternative that doesn't require it.

These are all recommendations that are explained in more detail by others
elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
upgrading if you have the (human) bandwidth to do so. Good luck!

Michael

On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith  wrote:

> Thanks Michael,
>  SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
> these are the relevant sections from our schema definition:
>
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>  positionIncrementGap="100" multiValued="false">
>   
> 
>  words="stopwords.txt" />
> 
>   
>   
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>   
> 
>
> Our other fieldTypes don't have any analyzers attached to them.
>
>
> If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
> solr 8?  It doesn't look like the fix was back patched to 7.x.
>
> Our schema has some issues arising from not fully understanding Solr and
> just copying existing structures from the defaults.  In this case,
> stopwords.txt is completely empty and synonyms.txt is just the default
> synonyms.txt, which seems not useful at all for us.  Could I just take out
> the StopFilterFactory and SynonymGraphFilterFactory from the query section
> (and maybe the StopFilterFactory from the index section as well)?
>
> Thanks again,
> Jeremy
>
> 
> From: Michael Gibney 
> Sent: Monday, January 11, 2021 8:30 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr using all available CPU and becoming unresponsive
>
> Hi Jeremy,
> Can you share your analysis chain configs? (SOLR-13336 can manifest in a
> similar way, and would affect 7.3.1 with a susceptible config, given the
> right (wrong?) input ...)
> Michael
>
> On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:
>
> > Hello all,
> >  We have been struggling with an issue where solr will intermittently
> > use all available CPU and become unresponsive.  It will remain in this
> > state until we restart.  Solr will remain stable for some time, usually a
> > few hours to a few days, before this happens again.  We've tried
> adjusting
> > the caches and adding memory to both the VM and JVM, but we haven't been
> > able to solve the issue yet.
> >
> > Here is some info about our server:
> > Solr:
> >   Solr 7.3.1, running on Java 1.8
> >   Running in cloud mode, but there's only one core
> >
> > Host:
> >   CentOS7
> >   8 CPU, 56GB RAM
> >   The only other processes running on this VM are two zookeepers, one for
> > this Solr instance, one for another Solr instance
> >
> > Solr Config:
> >  - One Core
> >  - 36 Million documents (Max Doc), 28 million (Num Docs)
> >  - ~15GB
> >  - 10-20 Requests/second
> >  - The

Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Jeremy Smith
Thanks Michael,
 SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe these 
are the relevant sections from our schema definition:


  


  
  


  


  



  
  




  


Our other fieldTypes don't have any analyzers attached to them.


If SOLR-13336 is the cause of the issue is the best remedy to upgrade to solr 
8?  It doesn't look like the fix was back patched to 7.x.

Our schema has some issues arising from not fully understanding Solr and just 
copying existing structures from the defaults.  In this case, stopwords.txt is 
completely empty and synonyms.txt is just the default synonyms.txt, which seems 
not useful at all for us.  Could I just take out the StopFilterFactory and 
SynonymGraphFilterFactory from the query section (and maybe the 
StopFilterFactory from the index section as well)?

Thanks again,
Jeremy


From: Michael Gibney 
Sent: Monday, January 11, 2021 8:30 PM
To: solr-user@lucene.apache.org 
Subject: Re: Solr using all available CPU and becoming unresponsive

Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:

> Hello all,
>  We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>   size="256"
>  initialSize="256"
>  autowarmCount="8"
>  showItems="64"/>
>
>size="256"
>   initialSize="256"
>   autowarmCount="0"/>
>
> size="1024"
>initialSize="1024"
>autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>


Re: Solr using all available CPU and becoming unresponsive

2021-01-11 Thread Michael Gibney
Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:

> Hello all,
>  We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>   size="256"
>  initialSize="256"
>  autowarmCount="8"
>  showItems="64"/>
>
>size="256"
>   initialSize="256"
>   autowarmCount="0"/>
>
> size="1024"
>initialSize="1024"
>autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>


Solr using all available CPU and becoming unresponsive

2021-01-11 Thread Jeremy Smith
Hello all,
 We have been struggling with an issue where solr will intermittently use 
all available CPU and become unresponsive.  It will remain in this state until 
we restart.  Solr will remain stable for some time, usually a few hours to a 
few days, before this happens again.  We've tried adjusting the caches and 
adding memory to both the VM and JVM, but we haven't been able to solve the 
issue yet.

Here is some info about our server:
Solr:
  Solr 7.3.1, running on Java 1.8
  Running in cloud mode, but there's only one core

Host:
  CentOS7
  8 CPU, 56GB RAM
  The only other processes running on this VM are two zookeepers, one for this 
Solr instance, one for another Solr instance

Solr Config:
 - One Core
 - 36 Million documents (Max Doc), 28 million (Num Docs)
 - ~15GB
 - 10-20 Requests/second
 - The schema is fairly large (~100 fields) and we allow faceting and searching 
on many, but not all, of the fields
 - Data are imported once per minute through the DataImportHandler, with a hard 
commit at the end.  We usually index ~100-500 documents per minute, with many 
of these being updates to existing documents.

Cache settings:






For the filterCache, we have tried sizes as low as 128, which caused our CPU 
usage to go up and didn't solve our issue.  autowarmCount used to be much 
higher, but we have reduced it to try to address this issue.


The behavior we see:

Solr is normally using ~3-6GB of heap and we usually have ~20GB of free memory. 
 Occasionally, though, solr is not able to free up memory and the heap usage 
climbs.  Analyzing the GC logs shows a sharp incline of usage with the GC (the 
default CMS) working hard to free memory, but not accomplishing much.  
Eventually, it fills up the heap, maxes out the CPUs, and never recovers.  We 
have tried to analyze the logs to see if there are particular queries causing 
issues or if there are network issues to zookeeper, but we haven't been able to 
find any patterns.  After the issues start, we often see session timeouts to 
zookeeper, but it doesn't appear​ that they are the cause.



Does anyone have any recommendations on things to try or metrics to look into 
or configuration issues I may be overlooking?

Thanks,
Jeremy



Re: Handling failure when adding docs to Solr using SolrJ

2020-09-17 Thread Erick Erickson
I recommend _against_ issuing explicit commits from the client, let
your solrconfig.xml autocommit settings take care of it. Make sure
either your soft or hard commits open a new searcher for the docs
to be searchable.

I’ll bend a little bit if you can _guarantee_ that you only ever have one
indexing client running and basically only ever issue the commit at the
end.

There’s another strategy, do the solrClient.add() command with the
commitWithin parameter.

As far as failures, look at 
https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/TolerantUpdateProcessor.html
that’ll give you a better clue about _which_ docs failed. From there, though,
it’s a bit if debugging to figure out why that particular doc failed, usually 
people
record the docs that failed for later analysis. and/or look at the Solr logs 
which
usually give a more detailed reason of _why_ a document failed...

Best,
Erick

> On Sep 17, 2020, at 1:09 PM, Steven White  wrote:
> 
> Hi everyone,
> 
> I'm trying to figure out when and how I should handle failures that may
> occur during indexing.  In the sample code below, look at my comment and
> let me know what state my index is in when things fail:
> 
>   SolrClient solrClient = new HttpSolrClient.Builder(url).build();
> 
>   solrClient.add(solrDocs);
> 
>   // #1: What to do if add() fails?  And how do I know if all or some of
> my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of
> 1 or more doc), should I retry add() again?  Retry with a smaller chunk?
> Etc.
> 
>   if (doCommit == true)
>   {
>  solrClient.commit();
> 
>   // #2: What to do if commit() fails?  Re-issue commit() again?
>   }
> 
> Thanks
> 
> Steven



Handling failure when adding docs to Solr using SolrJ

2020-09-17 Thread Steven White
Hi everyone,

I'm trying to figure out when and how I should handle failures that may
occur during indexing.  In the sample code below, look at my comment and
let me know what state my index is in when things fail:

   SolrClient solrClient = new HttpSolrClient.Builder(url).build();

   solrClient.add(solrDocs);

   // #1: What to do if add() fails?  And how do I know if all or some of
my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of
1 or more doc), should I retry add() again?  Retry with a smaller chunk?
Etc.

   if (doCommit == true)
   {
  solrClient.commit();

   // #2: What to do if commit() fails?  Re-issue commit() again?
   }

Thanks

Steven


Re: Querying solr using many QueryParser in one call

2020-07-20 Thread Charlie Hull

Hi,

It's very hard to answer questions like 'how fast/slow might this be' - 
the best way to find out is to try, e.g. to build a prototype that you 
can time. To be useful this prototype should use representative data and 
queries. Once you have this, you can try improving performance with 
strategies like the cacheing you describe.


Charlie

On 16/07/2020 18:14, harjag...@gmail.com wrote:

Hi All,
Below are question regarding querying solr using many QueryParser in one
call.
We have need to do a search by keyword and also include few specific
documents to result. We don't want to use elevator component as that would
put those mandatory documents to the top of the result. We would like to mix
those mandatory documents with organic keyword lookup result set and also
make sure those mandatory documents take part in other scoring mechanism
like bq's.On top of this we would also need to classify documents matched by
keyword lookup against mandatory docs.We ended up doing the below solr query
param to achieve it.

fl=id,title,isTermMatch:exists(query({!type=edismax qf=$qf v=blah})),score
q=({!edismax qf=$qf v=$searchQuery mm=$mm}) OR ({!edismax qf=$qf
v=$docIdQuery mm=0 sow=true})
docIdQuery=5985612 6339445 5357348
searchQuery=blah

Below are my question
1.As you can see we are calling three query parser in one call what would be
the performance implication of the search?
2.As you can see two of those queries. the one in q and one in fl is the
same. would query result cache help?
3.In general what is the implications on performance when we do a search
calling multiple query parser in a single call?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Querying solr using many QueryParser in one call

2020-07-16 Thread harjags
Hi All,

Below are question regarding querying solr using many QueryParser in one
call.

We have need to do a search by keyword and also include few specific
documents to result. We don't want to use elevator component as that would
put those mandatory documents to the top of the result. We would like to mix
those mandatory documents with organic keyword lookup result set and also
make sure those mandatory documents take part in other scoring mechanism
like bq's.On top of this we would also need to classify documents matched by
keyword lookup against mandatory docs.We ended up doing the below solr query
param to achieve it.

*fl*=id,title,isTermMatch:exists(query({!type=edismax qf=$qf v=blah})),score
*q*=({!edismax qf=$qf v=$searchQuery mm=$mm}) OR ({!edismax qf=$qf
v=$docIdQuery mm=0 sow=true}) 
*docIdQuery*=5985612 6339445 5357348
*searchQuery*=blah

Below are my question.
1.As you can see we are calling three query parser in one call what would be
the performance implication of the search?
2.As you can see two of those queries. the one in q and one in fl is the
same. would query result cache help?
3.In general what is the implications on performance when we do a search
calling multiple query parser in a single call.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Querying solr using many QueryParser in one call

2020-07-16 Thread harjag...@gmail.com
Hi All,
Below are question regarding querying solr using many QueryParser in one
call.
We have need to do a search by keyword and also include few specific
documents to result. We don't want to use elevator component as that would
put those mandatory documents to the top of the result. We would like to mix
those mandatory documents with organic keyword lookup result set and also
make sure those mandatory documents take part in other scoring mechanism
like bq's.On top of this we would also need to classify documents matched by
keyword lookup against mandatory docs.We ended up doing the below solr query
param to achieve it.

fl=id,title,isTermMatch:exists(query({!type=edismax qf=$qf v=blah})),score
q=({!edismax qf=$qf v=$searchQuery mm=$mm}) OR ({!edismax qf=$qf
v=$docIdQuery mm=0 sow=true})
docIdQuery=5985612 6339445 5357348
searchQuery=blah

Below are my question
1.As you can see we are calling three query parser in one call what would be
the performance implication of the search?
2.As you can see two of those queries. the one in q and one in fl is the
same. would query result cache help?
3.In general what is the implications on performance when we do a search
calling multiple query parser in a single call?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Ignore faceting for particular fields in solr using Solrconfig.xml

2019-05-23 Thread Bernd Fehling

Have a look at "invariants" for your requestHandler in solrconfig.xml.
It might be an option for you.

Regards
Bernd


Am 22.05.19 um 22:23 schrieb RaviTeja:

Hello Solr Expert,

How are you?

Am trying to ignore faceting for some of the fields. Can you please help me
out to ignore faceting using solrconfig.xml.
I tried but I can ignore faceting all the fields that useless. I'm trying
to ignore some specific fields.

Really Appreciate your help for the response!

Regards,
Ravi



Re: Ignore faceting for particular fields in solr using Solrconfig.xml

2019-05-22 Thread Erick Erickson
Just don’t ask for them. Or you saying that users can specify arbitrary fields 
to facet on and you want to prevent certain fields from being possible?

No, there’s no good way to do that in solrconfig.xml. You could write a query 
component that stripped out certain fields from the facet.field parameter.

Likely the easiest would be to do that in the application I assume you have 
between Solr and your users.

Best,
Erick

> On May 22, 2019, at 1:23 PM, RaviTeja  wrote:
> 
> Hello Solr Expert,
> 
> How are you?
> 
> Am trying to ignore faceting for some of the fields. Can you please help me
> out to ignore faceting using solrconfig.xml.
> I tried but I can ignore faceting all the fields that useless. I'm trying
> to ignore some specific fields.
> 
> Really Appreciate your help for the response!
> 
> Regards,
> Ravi



Ignore faceting for particular fields in solr using Solrconfig.xml

2019-05-22 Thread RaviTeja
Hello Solr Expert,

How are you?

Am trying to ignore faceting for some of the fields. Can you please help me
out to ignore faceting using solrconfig.xml.
I tried but I can ignore faceting all the fields that useless. I'm trying
to ignore some specific fields.

Really Appreciate your help for the response!

Regards,
Ravi


Broken pipe when indexing in solr using ConcurrentUpdateSolrClient

2018-06-17 Thread BIBI Yassin
Hello,
Sometimes we have "broken pipe" error when indexing 40 000 000 of documents 
using ConcurrentUpdateSolrClient in java.

I have find on google that's probably is because à time out. But in 
ConcurrentUpdateSolrClient we can't configure timeout.

For example the last broken pie was for indexing 40 000 000 of documents and 
several broken pipe occurred and 16 000 files would'nt indexing.

Thank you for helping.

Yassin


Unable to load multipart Email to SOLR using TIka

2018-02-06 Thread Anantharaman, Srinatha (Contractor)
Hi,

I am trying to load MIME file of Email with multipart to SOLR using TIKA and 
Morphline
I am using Flume to load continuously but it fails to load when it finds 
attachments inside an email(multiparts)
It does not throw any error but fails to index the email to SOLR

Could you please help me to resolve the issue?

Regards,
~Sri


CVE-2016-6809: Java code execution for serialized objects embedded in MATLAB files parsed by Apache Solr using Apache Tika

2017-10-26 Thread Shalin Shekhar Mangar
CVE-2016-6809: Java code execution for serialized objects embedded in
MATLAB files parsed by Apache Solr using Tika

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 5.0.0 to 5.5.4
Solr 6.0.0 to 6.6.1
Solr 7.0.0 to 7.0.1

Description:

Apache Solr uses Apache Tika for parsing binary file types such as
doc, xls, pdf etc. Apache Tika wraps the jmatio parser
(https://github.com/gradusnikov/jmatio) to handle MATLAB files. The
parser uses native deserialization on serialized Java objects embedded
in MATLAB files. A malicious user could inject arbitrary code into a
MATLAB file that would be executed when the object is deserialized.

This vulnerability was originally described at
http://mail-archives.apache.org/mod_mbox/tika-user/201611.mbox/%3C2125912914.1308916.1478787314903%40mail.yahoo.com%3E

Mitigation:
Users are advised to upgrade to either Solr 5.5.5 or Solr 6.6.2 or Solr 7.1.0
releases which have fixed this vulnerability.

Solr 5.5.5 upgrades the jmatio parser to v1.2 and disables the Java
deserialisation support to protect against this vulnerability.

Solr 6.6.2 and Solr 7.1.0 have upgraded the bundled Tika to v1.16.

Once upgrade is complete, no other steps are required.

References:
https://issues.apache.org/jira/browse/SOLR-11486
https://issues.apache.org/jira/browse/SOLR-10335
https://wiki.apache.org/solr/SolrSecurity

-- 
Regards,
Shalin Shekhar Mangar.


Could not find collection , Error while ingesting to Solr using Flume and Morphlines

2017-04-26 Thread Anantharaman, Srinatha (Contractor)
Hi,

Though I see Zookeeper is uploaded with the collection, I get below error while 
Ingesting data to Solr using Flume and Morphline.
Kindly let me know if you need more details

017-04-26 18:25:31,767 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - 
org.kitesdk.morphline.base.AbstractCommand.beforeNotify(AbstractCommand.java:142)]
 beforeNotify()
2017-04-26 18:25:31,767 (SinkRunner-PollingRunner-DefaultSinkProcessor) [TRACE 
- 
org.kitesdk.morphline.base.AbstractCommand.beforeNotify(AbstractCommand.java:140)]
 beforeNotify: {lifecycle=[COMMIT_TRANSACTION]}
2017-04-26 18:25:31,768 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG 
- 
org.kitesdk.morphline.base.AbstractCommand.beforeNotify(AbstractCommand.java:142)]
 beforeNotify()
2017-04-26 18:25:31,768 (SinkRunner-PollingRunner-DefaultSinkProcessor) [TRACE 
- 
org.kitesdk.morphline.base.AbstractCommand.beforeNotify(AbstractCommand.java:140)]
 beforeNotify: {lifecycle=[COMMIT_TRANSACTION]}
2017-04-26 18:25:31,768 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG 
- 
org.kitesdk.morphline.base.AbstractCommand.beforeNotify(AbstractCommand.java:142)]
 beforeNotify()
2017-04-26 18:25:31,772 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR 
- 
org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:163)]
 Morphline Sink SolrSink: Unable to process event from channel FileChannel. 
Exception follows.
org.apache.solr.common.SolrException: Could not find collection : esearch
at 
org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at 
org.apache.solr.morphlines.solr.SolrServerDocumentLoader.loadBatch(SolrServerDocumentLoader.java:91)
at 
org.apache.solr.morphlines.solr.SolrServerDocumentLoader.commitTransaction(SolrServerDocumentLoader.java:79)
at 
org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doNotify(LoadSolrBuilder.java:95)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at 
org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at 
org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at 
org.kitesdk.morphline.base.Notifications.notify(Notifications.java:96)
at 
org.kitesdk.morphline.base.Notifications.notifyCommitTransaction(Notifications.java:61)
at 
org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.commitTransaction(MorphlineHandlerImpl.java:149)
at 
org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:156)
at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)


Regards,
~Sri


Re: Issues with ingesting to Solr using Flume

2017-04-25 Thread Shawn Heisey
On 4/20/2017 9:02 AM, Anantharaman, Srinatha (Contractor) wrote:
> Hi all,
>
> I am trying to ingest data to Solr 6.3 using flume 1.5 on Hortonworks 2.5 
> platform Facing below issue while sinking the data
>
> 19 Apr 2017 19:54:26,943 ERROR [lifecycleSupervisor-1-3] 
> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:253)  - 
> Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@130344d7 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> org.kitesdk.morphline.api.MorphlineCompilationException: No command builder 
> registered for name: detectMimeType near: {
> # /etc/flume/conf/morphline.conf: 48
> "detectMimeType" : {
> # /etc/flume/conf/morphline.conf: 50
> "includeDefaultMimeTypes" : true
> }
> }

I know nothing at all about Flume, but reading that message, Solr is not
mentioned anywhere.  My recommendation is to ask for help on this
problem using a Flume resource.  If Solr is doing something wrong, they
should be able to help you find evidence showing that.  At that point,
you can come back to this thread with that evidence.

Are there any ERROR or WARN messages in the Solr logs?

Thanks,
Shawn



Issues with ingesting to Solr using Flume

2017-04-20 Thread Anantharaman, Srinatha (Contractor)
Hi all,

I am trying to ingest data to Solr 6.3 using flume 1.5 on Hortonworks 2.5 
platform Facing below issue while sinking the data

19 Apr 2017 19:54:26,943 ERROR [lifecycleSupervisor-1-3] 
(org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:253)  - 
Unable to start SinkRunner: { 
policy:org.apache.flume.sink.DefaultSinkProcessor@130344d7 counterGroup:{ 
name:null counters:{} } } - Exception follows.
org.kitesdk.morphline.api.MorphlineCompilationException: No command builder 
registered for name: detectMimeType near: {
# /etc/flume/conf/morphline.conf: 48
"detectMimeType" : {
# /etc/flume/conf/morphline.conf: 50
"includeDefaultMimeTypes" : true
}
}

The morphline config file is as below


id : morphline1

importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
#importCommands : ["com.cloudera.**", "org.kitesdk.**"]

commands :
[

  { detectMimeType { includeDefaultMimeTypes : true } }

  {

solrCell {

  solrLocator : ${solrLocator}

  captureAttr : true

  lowernames : true

  capture : [_attachment_body, _attachment_mimetype, basename, content, 
content_encoding, content_type, file, meta,text]

  parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }

# { parser : org.apache.tika.parser.AutoDetectParser }
  #{ parser : org.apache.tika.parser.asm.ClassParser }
  #{ parser : org.gagravarr.tika.FlacParser }
  #{ parser : 
org.apache.tika.parser.executable.ExecutableParser }
  #{ parser : org.apache.tika.parser.font.TrueTypeParser }
  #{ parser : org.apache.tika.parser.xml.XMLParser }
  #{ parser : org.apache.tika.parser.html.HtmlParser }
  #{ parser : org.apache.tika.parser.image.TiffParser }
  # { parser : org.apache.tika.parser.mail.RFC822Parser }
  #{ parser : org.apache.tika.parser.mbox.MboxParser, 
additionalSupportedMimeTypes : [message/x-emlx] }
  #{ parser : org.apache.tika.parser.microsoft.OfficeParser 
}
  #{ parser : org.apache.tika.parser.hdf.HDFParser }
  #{ parser : org.apache.tika.parser.odf.OpenDocumentParser 
}
  #{ parser : org.apache.tika.parser.pdf.PDFParser }
  #{ parser : org.apache.tika.parser.rtf.RTFParser }
  { parser : org.apache.tika.parser.txt.TXTParser }
  #{ parser : org.apache.tika.parser.chm.ChmParser }
]

 fmap : { content : text }
 }

  }
  { generateUUID { field : id } }

  { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }


  { logDebug { format : "output record: {}", args : ["@{}"] } }

  { loadSolr: { solrLocator : ${solrLocator} } }

]

  }

]


I have copied all required jars files to Flume Classpath Kindly let me know the 
solution for this issue

Regards,
~Sri



HBase table indexing in Solr using morphline conf

2016-12-15 Thread Gurdeep Singh
Hi All,

I am trying to index a HBase table into Solr using HBase indexer and
morphline conf. file.

The issue I'm facing is that, one of the column in HBase table is a count
field (with values as integer) and except this column all other string type
HBase columns are getting indexed in Solr as expected. (only this count
field is not getting indexed in Solr.)

Below is how I configured this column in morphline file:

--
{
   inputColumn : "a:count"(a is one of the column family in HBase table)
   outputField : "count"
   type :"int"
   source: value
}
--

In Solr schema.xml also, I kept count as int.
I also tried changing type in morphline file as long/double, but no luck.

However when I set this column as "string" in morphline and in Solr's
schema.xml, I see the column in Solr but it shows data with type mismatch
error:
"ERROR SCHEMA-INDEX-MISMATCH", stringValue=123

Please advice how to index integer type data from HBase table into Solr
using morphline.

Thanks in advance



Best Regards,
Gurdeep

gurdeepgan...@gmail.com


Adding retries for making calls to solr using solrj

2016-11-16 Thread pdj xyz
Hi,

We are seeing transient Connection reset in our custom solr client(a
wrapper around solrj). We want to add retries to all methods that we are
currently using so that the we are able to upload successfully. However,
I'm not sure if there's any relevant documentation on which methods are
idempotent and which aren't.

Our use case - We have a single solr host. We aren't using solr cloud or
anything fancy,

We want to upload an index to Solr host. To do that, we first:
1) Disable replication
2) delete old index
3) upload new index
4) commit the changes (rollback if there's an exception)
5) run a solr query and perform some validations
6) run /admin/luke and perform some validation.
7) enable replication

We're currently thinking it should be OK to retry each of these 6
requests(atleast for Socket Exceptions), but would like
guidance/confirmation. Any documentation on this would be really helpful.

Thanks

-- 
Pranay


Re: Recommended api/lib to search Solr using PHP

2016-06-02 Thread Scott Chu
ay 
of document object. Use 'foreach' to iterate it.
8. document object is essentially array of field. Say its variable name is 
$doc. You can access field by 3 ways:
a> Use 'foreach' to iterate it.
b> Use $doc['fieldname'] to get specific field, e.g. $doc['ID'].
c> Use $doc->fieldname to get specfic field, e.g. $doc->ID.

How to paginate in PHP
===
1. Basically, just use $qurey->setStart(...)->setRows(...) in loop by setting 
appropriate start row offset and pagination row size for your pagination.
2. More advanced, you can try write reuse object codes as shown in official 
document and do pagination to save memory and prevent more GC in php engine.

That's all for now. Rest will be your own homework.

Scott Chu,scott@udngroup.com
2016/6/2 (週四)
- Original Message - 
From: Shawn Heisey 
To: solr-user 
CC: 
Date: 2016/5/31 (週二) 02:57
Subject: Re: Recommended api/lib to search Solr using PHP


On 5/30/2016 12:32 PM, GW wrote: 
> I would say look at the urls for searches you build in the query tool 
> 
> In my case 
> 
> http://172.16.0.1:8983/solr/#/products/query 
> 
> When you build queries with the Query tool, for example an edismax query, 
> the URL is there for you to copy. 
> Use the url structure with curl in your programming/scripting. The results 
> come back as REST data. 
> 
> This is what I do with PHP and it's pretty tight. 

Be careful with URLs in the admin UI. 

URLs with "#" in them will *only* work in a browser. They are not the 
REST endpoints. 

When you run a query in the admin UI, it will give you a URL to make the 
same query, but it will NOT be the URL in the address bar of the 
browser. There is a link right above the query results. 

Thanks, 
Shawn 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12328 - 發佈日期: 05/30/16


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu

For those who might have same need to use Solarium, this is the best tutorial I 
can find by googling, it's actually a chapter in the book "Apache Solr PHP 
Integration"

https://www.packtpub.com/packtlib/book/Big-Data-and-Business-Intelligence/9781782164920/1/ch01lvl1sec13/Installing%20Solarium

I follow it and install and use Solarium correctly.

scott.chu,scott@udngroup.com
2016/5/31 (週二)
- Original Message - 
From: Shawn Heisey 
To: solr-user 
CC: 
Date: 2016/5/31 (週二) 02:57
Subject: Re: Recommended api/lib to search Solr using PHP


On 5/30/2016 12:32 PM, GW wrote: 
> I would say look at the urls for searches you build in the query tool 
> 
> In my case 
> 
> http://172.16.0.1:8983/solr/#/products/query 
> 
> When you build queries with the Query tool, for example an edismax query, 
> the URL is there for you to copy. 
> Use the url structure with curl in your programming/scripting. The results 
> come back as REST data. 
> 
> This is what I do with PHP and it's pretty tight. 

Be careful with URLs in the admin UI. 

URLs with "#" in them will *only* work in a browser. They are not the 
REST endpoints. 

When you run a query in the admin UI, it will give you a URL to make the 
same query, but it will NOT be the URL in the address bar of the 
browser. There is a link right above the query results. 

Thanks, 
Shawn 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12331 - 發佈日期: 05/30/16


Re(2): Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu
Thanks, guys! My engineers just find another thing called 'SolrPhpClient'. But  
I am trying solarium again. It just looks like a well-structured API. (Note: 
Actually, I've noticed it from very beginning when it's developed but never 
give it a try.) 


scott.chu,scott@udngroup.com
2016/5/31 (週二)
- Original Message - 
From: GW 
To: solr-user ; scott(自己) 
CC: 
Date: 2016/5/31 (週二) 02:32
Subject: Re: Recommended api/lib to search Solr using PHP


I would say look at the urls for searches you build in the query tool 

In my case 

http://172.16.0.1:8983/solr/#/products/query 

When you build queries with the Query tool, for example an edismax query, 
the URL is there for you to copy. 
Use the url structure with curl in your programming/scripting. The results 

come back as REST data. 

This is what I do with PHP and it's pretty tight. 


On 30 May 2016 at 02:29, scott.chu <scott@udngroup.com> wrote: 

> 
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. 

> Our engineers currently just use fopen with url to search Solr but it's 
> kinda unenough when we want to do more advanced, complex queries. We've 
> tried to use something called 'Solarium' but its installtion steps has 
> something to do with symphony, which is kinda complicated. We can't get the 
> installation done ok. I'd like to know if there are some other 
> better-structured PHP libraries or APIs? 
> 
> Note: Solr is 5.4.1. 
> 
> scott.chu,scott@udngroup.com 
> 2016/5/30 (週一) 
> 



- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4591/12331 - 發佈日期: 05/30/16


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Shawn Heisey
On 5/30/2016 12:32 PM, GW wrote:
> I would say look at the urls for searches you build in the query tool
>
> In my case
>
> http://172.16.0.1:8983/solr/#/products/query
>
> When you build queries with the Query tool, for example an edismax query,
> the URL is there for you to copy.
> Use the url structure with curl in your programming/scripting. The results
> come back as REST data.
>
> This is what I do with PHP and it's pretty tight.

Be careful with URLs in the admin UI.

URLs with "#" in them will *only* work in a browser.  They are not the
REST endpoints. 

When you run a query in the admin UI, it will give you a URL to make the
same query, but it will NOT be the URL in the address bar of the
browser.  There is a link right above the query results.

Thanks,
Shawn



Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread GW
I would say look at the urls for searches you build in the query tool

In my case

http://172.16.0.1:8983/solr/#/products/query

When you build queries with the Query tool, for example an edismax query,
the URL is there for you to copy.
Use the url structure with curl in your programming/scripting. The results
come back as REST data.

This is what I do with PHP and it's pretty tight.


On 30 May 2016 at 02:29, scott.chu  wrote:

>
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> Our engineers currently just use fopen with url to search Solr but it's
> kinda unenough when we want to do more advanced, complex queries. We've
> tried to use something called 'Solarium' but its installtion steps has
> something to do with symphony, which is kinda complicated. We can't get the
> installation done ok. I'd like to know if there are some other
> better-structured PHP libraries or APIs?
>
> Note: Solr is 5.4.1.
>
> scott.chu,scott@udngroup.com
> 2016/5/30 (週一)
>


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread John Blythe
we also use Solarium. the documentation is pretty spotty in some cases (tho
they've recently updated it, or at least the formatting, which seems to be
a move in the right direction), but overall pretty simple to use. some good
plugins at hand to help extend the base power, too. i'd say give it a whirl

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, May 30, 2016 at 4:27 AM, Georg Sorst  wrote:

> We've had good experiences with Solarium, so it's probably worth spending
> some time in getting it to run.
>
> scott.chu  schrieb am Mo., 30. Mai 2016 um
> 09:30 Uhr:
>
> >
> > We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> > Our engineers currently just use fopen with url to search Solr but it's
> > kinda unenough when we want to do more advanced, complex queries. We've
> > tried to use something called 'Solarium' but its installtion steps has
> > something to do with symphony, which is kinda complicated. We can't get
> the
> > installation done ok. I'd like to know if there are some other
> > better-structured PHP libraries or APIs?
> >
> > Note: Solr is 5.4.1.
> >
> > scott.chu,scott@udngroup.com
> > 2016/5/30 (週一)
> >
>


Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Shawn Heisey
On 5/30/2016 1:29 AM, scott.chu wrote:
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. Our 
> engineers currently just use fopen with url to search Solr but it's kinda 
> unenough when we want to do more advanced, complex queries. We've tried to 
> use something called 'Solarium' but its installtion steps has something to do 
> with symphony, which is kinda complicated. We can't get the installation done 
> ok. I'd like to know if there are some other better-structured PHP libraries 
> or APIs? 

There are a *lot* of PHP clients out there.  Note that none of them can
be supported here, because they are all third-party software.

https://wiki.apache.org/solr/IntegratingSolr#PHP

Thanks,
Shawn



Re: Recommended api/lib to search Solr using PHP

2016-05-30 Thread Georg Sorst
We've had good experiences with Solarium, so it's probably worth spending
some time in getting it to run.

scott.chu  schrieb am Mo., 30. Mai 2016 um
09:30 Uhr:

>
> We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3.
> Our engineers currently just use fopen with url to search Solr but it's
> kinda unenough when we want to do more advanced, complex queries. We've
> tried to use something called 'Solarium' but its installtion steps has
> something to do with symphony, which is kinda complicated. We can't get the
> installation done ok. I'd like to know if there are some other
> better-structured PHP libraries or APIs?
>
> Note: Solr is 5.4.1.
>
> scott.chu,scott@udngroup.com
> 2016/5/30 (週一)
>


Recommended api/lib to search Solr using PHP

2016-05-30 Thread scott.chu

We have two legacy in-house applications written in PHP 5.2.6 and 5.5.3. Our 
engineers currently just use fopen with url to search Solr but it's kinda 
unenough when we want to do more advanced, complex queries. We've tried to use 
something called 'Solarium' but its installtion steps has something to do with 
symphony, which is kinda complicated. We can't get the installation done ok. 
I'd like to know if there are some other better-structured PHP libraries or 
APIs? 

Note: Solr is 5.4.1.

scott.chu,scott@udngroup.com
2016/5/30 (週一)


RE: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Mark Horninger
Thanks for the response, Alex.  

I am trying to accomplish a Federated search of SQL Server and Solr.  I guess I 
should have given more detail on this.

The overall plan is to do the following:
1. SSIS ETL data from multiple sources into SQL Server
2. SSIS call to update Solr Indexing.
3. SQL standard "=" matching when possible to reduce the candidate data set.
4. Dismax match based on a rule set Joining SQL Server candidate dataset 
against Solr indexing set using a join operator.
5. Cache possible matches in SQL Server for a given record in order for a human 
to disposition them.

From what I read, Carrot is great for Solr clustering, but once you get into 
RDBMS, you're out of luck.


 
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 12:44 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Teiid with Solr - using any other engine except the 
SolrDefaultQueryEngine

Are you trying to do federated search? What about carrot? Not the one that 
ships with Solr, the parent project.

Regards,
   Alex
On 31 Dec 2015 12:21 am, "Mark Horninger" <mhornin...@grayhairsoftware.com>
wrote:

> I have gotten Teiid and Solr wired up, but it seems like the only way 
> to query is with the default Solr Query Engine, and nothing else.  In 
> asking Dr. Google, this is a data black hole.  The more I look at it, 
> the more I think I'm going to end up having to write a custom 
> translator.  Is there anyone else out there who has had this 
> challenge, and if so, how did you overcome it?
>
> Thanks In Advance!
>
> -Mark H.
>
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution 
> of this information is prohibited, and may be punishable by law. If 
> this was sent to you in error, please notify the sender by reply 
> e-mail and destroy all copies of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>
>


Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Erick Erickson
In addition, and depending on your time-frame, you may want to work
with Solr 6.0
and the "ParallelSQL" option. NOTE: this is _very_ new. People are using it but
it'll probably have some rough edges for a while, not to mention you're using an
unreleased version of Solr.

BTW, Solr 6.0 is also current "trunk". In Solr-speak, "trunk" is "the
next major
version of Solr". When we start the process of releasing 6.0 (no firm
commitment,
but Q1 2016 has been mentioned) then "trunk" will be synonymous with 7.0

Anyway, another thing to consider if you can't do what Alexandre and Erik and
are talking about (and I completely agree that that is the first
approach to try)
is the "export" handler which is far more suitable for
returning larger numbers of rows than the usual "query" or "select" handlers.
This last is gibberish perhaps, just put it on a note on your wall and when
you start asking "why does it take so long for Solr to return 10M rows" you
can glance at the note and remember

Best,
Erick

On Thu, Dec 31, 2015 at 7:38 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> Yeah, rather than go to all the complexity here, definitely see if providing 
> a list of id’s to a Solr filter query (fq) works well for you.  Many do this 
> sort of thing, and with the “terms” query parser it’s tractable to provide 
> fairly big lists of id’s to filter on… fq={!terms f=id}1,2,3,…, I’d 
> definitely recommend giving {!terms} a try before doing anything custom.
>
> I’m with Alexandre in the recommendation to get everything into Solr and use 
> that for the “front end” :)  (many folks come around to this way after 
> exploring more complicated arrangements)
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
>> On Dec 31, 2015, at 10:29 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
>> wrote:
>>
>> Actually, this does not sound like a federated search. It sounds like
>> you want to pre-filter possible records with SQL query before doing
>> the rest of the search in Solr. The simple option would be to see if
>> Solr alone can handle it and avoid the complicated integration..
>>
>> But if not, a custom search component (to inject pre-checked list of
>> IDs as an FQ) or a custom Query Parser to provide the ids might be
>> able to do the trick.
>>
>> If you are ok with post-filtering against SQL, so the Solr has to do a
>> full search and you just save on re-hydrating and shipping the
>> records, then you also have post-filters or upcoming xjoin
>> https://issues.apache.org/jira/browse/SOLR-7341 .
>>
>> But yes, a custom translator of some sort looks inevitable if you want
>> to implement your use case as described.
>>
>> Regards,
>>   Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 31 December 2015 at 22:18, Mark Horninger
>> <mhornin...@grayhairsoftware.com> wrote:
>>> Thanks for the response, Alex.
>>>
>>> I am trying to accomplish a Federated search of SQL Server and Solr.  I 
>>> guess I should have given more detail on this.
>>>
>>> The overall plan is to do the following:
>>> 1. SSIS ETL data from multiple sources into SQL Server
>>> 2. SSIS call to update Solr Indexing.
>>> 3. SQL standard "=" matching when possible to reduce the candidate data set.
>>> 4. Dismax match based on a rule set Joining SQL Server candidate dataset 
>>> against Solr indexing set using a join operator.
>>> 5. Cache possible matches in SQL Server for a given record in order for a 
>>> human to disposition them.
>>>
>>> From what I read, Carrot is great for Solr clustering, but once you get 
>>> into RDBMS, you're out of luck.
>>>
>>>
>>>
>>> -Original Message-
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Thursday, December 31, 2015 12:44 AM
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Subject: Re: Teiid with Solr - using any other engine except the 
>>> SolrDefaultQueryEngine
>>>
>>> Are you trying to do federated search? What about carrot? Not the one that 
>>> ships with Solr, the parent project.
>>>
>>> Regards,
>>>   Alex
>>> On 31 Dec 2015 12:21 am, "Mark Horninger" <mhornin...@grayhairsoftware.com>
>>> wrote:
>>>
>>>> I have gotten Teiid and Solr wired up, but it seems like the only way
>>>> to query is with the default Solr Query Engine, and nothing else.  In
>>>> asking Dr. Google, this is a data black hole.  The more I look at it,
>>>> the more I think I'm going to end up having to write a custom
>>>> translator.  Is there anyone else out there who has had this
>>>> challenge, and if so, how did you overcome it?
>>>>
>>>> Thanks In Advance!
>>>>
>>>> -Mark H.
>>>>
>


Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Alexandre Rafalovitch
Actually, this does not sound like a federated search. It sounds like
you want to pre-filter possible records with SQL query before doing
the rest of the search in Solr. The simple option would be to see if
Solr alone can handle it and avoid the complicated integration..

But if not, a custom search component (to inject pre-checked list of
IDs as an FQ) or a custom Query Parser to provide the ids might be
able to do the trick.

If you are ok with post-filtering against SQL, so the Solr has to do a
full search and you just save on re-hydrating and shipping the
records, then you also have post-filters or upcoming xjoin
https://issues.apache.org/jira/browse/SOLR-7341 .

But yes, a custom translator of some sort looks inevitable if you want
to implement your use case as described.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 22:18, Mark Horninger
<mhornin...@grayhairsoftware.com> wrote:
> Thanks for the response, Alex.
>
> I am trying to accomplish a Federated search of SQL Server and Solr.  I guess 
> I should have given more detail on this.
>
> The overall plan is to do the following:
> 1. SSIS ETL data from multiple sources into SQL Server
> 2. SSIS call to update Solr Indexing.
> 3. SQL standard "=" matching when possible to reduce the candidate data set.
> 4. Dismax match based on a rule set Joining SQL Server candidate dataset 
> against Solr indexing set using a join operator.
> 5. Cache possible matches in SQL Server for a given record in order for a 
> human to disposition them.
>
> From what I read, Carrot is great for Solr clustering, but once you get into 
> RDBMS, you're out of luck.
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 12:44 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Teiid with Solr - using any other engine except the 
> SolrDefaultQueryEngine
>
> Are you trying to do federated search? What about carrot? Not the one that 
> ships with Solr, the parent project.
>
> Regards,
>Alex
> On 31 Dec 2015 12:21 am, "Mark Horninger" <mhornin...@grayhairsoftware.com>
> wrote:
>
>> I have gotten Teiid and Solr wired up, but it seems like the only way
>> to query is with the default Solr Query Engine, and nothing else.  In
>> asking Dr. Google, this is a data black hole.  The more I look at it,
>> the more I think I'm going to end up having to write a custom
>> translator.  Is there anyone else out there who has had this
>> challenge, and if so, how did you overcome it?
>>
>> Thanks In Advance!
>>
>> -Mark H.
>>


Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Erik Hatcher
Yeah, rather than go to all the complexity here, definitely see if providing a 
list of id’s to a Solr filter query (fq) works well for you.  Many do this sort 
of thing, and with the “terms” query parser it’s tractable to provide fairly 
big lists of id’s to filter on… fq={!terms f=id}1,2,3,…, I’d definitely 
recommend giving {!terms} a try before doing anything custom.

I’m with Alexandre in the recommendation to get everything into Solr and use 
that for the “front end” :)  (many folks come around to this way after 
exploring more complicated arrangements)

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 31, 2015, at 10:29 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
> wrote:
> 
> Actually, this does not sound like a federated search. It sounds like
> you want to pre-filter possible records with SQL query before doing
> the rest of the search in Solr. The simple option would be to see if
> Solr alone can handle it and avoid the complicated integration..
> 
> But if not, a custom search component (to inject pre-checked list of
> IDs as an FQ) or a custom Query Parser to provide the ids might be
> able to do the trick.
> 
> If you are ok with post-filtering against SQL, so the Solr has to do a
> full search and you just save on re-hydrating and shipping the
> records, then you also have post-filters or upcoming xjoin
> https://issues.apache.org/jira/browse/SOLR-7341 .
> 
> But yes, a custom translator of some sort looks inevitable if you want
> to implement your use case as described.
> 
> Regards,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 31 December 2015 at 22:18, Mark Horninger
> <mhornin...@grayhairsoftware.com> wrote:
>> Thanks for the response, Alex.
>> 
>> I am trying to accomplish a Federated search of SQL Server and Solr.  I 
>> guess I should have given more detail on this.
>> 
>> The overall plan is to do the following:
>> 1. SSIS ETL data from multiple sources into SQL Server
>> 2. SSIS call to update Solr Indexing.
>> 3. SQL standard "=" matching when possible to reduce the candidate data set.
>> 4. Dismax match based on a rule set Joining SQL Server candidate dataset 
>> against Solr indexing set using a join operator.
>> 5. Cache possible matches in SQL Server for a given record in order for a 
>> human to disposition them.
>> 
>> From what I read, Carrot is great for Solr clustering, but once you get into 
>> RDBMS, you're out of luck.
>> 
>> 
>> 
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Thursday, December 31, 2015 12:44 AM
>> To: solr-user <solr-user@lucene.apache.org>
>> Subject: Re: Teiid with Solr - using any other engine except the 
>> SolrDefaultQueryEngine
>> 
>> Are you trying to do federated search? What about carrot? Not the one that 
>> ships with Solr, the parent project.
>> 
>> Regards,
>>   Alex
>> On 31 Dec 2015 12:21 am, "Mark Horninger" <mhornin...@grayhairsoftware.com>
>> wrote:
>> 
>>> I have gotten Teiid and Solr wired up, but it seems like the only way
>>> to query is with the default Solr Query Engine, and nothing else.  In
>>> asking Dr. Google, this is a data black hole.  The more I look at it,
>>> the more I think I'm going to end up having to write a custom
>>> translator.  Is there anyone else out there who has had this
>>> challenge, and if so, how did you overcome it?
>>> 
>>> Thanks In Advance!
>>> 
>>> -Mark H.
>>> 



Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-30 Thread Alexandre Rafalovitch
Are you trying to do federated search? What about carrot? Not the one that
ships with Solr, the parent project.

Regards,
   Alex
On 31 Dec 2015 12:21 am, "Mark Horninger" 
wrote:

> I have gotten Teiid and Solr wired up, but it seems like the only way to
> query is with the default Solr Query Engine, and nothing else.  In asking
> Dr. Google, this is a data black hole.  The more I look at it, the more I
> think I'm going to end up having to write a custom translator.  Is there
> anyone else out there who has had this challenge, and if so, how did you
> overcome it?
>
> Thanks In Advance!
>
> -Mark H.
>
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution of
> this information is prohibited, and may be punishable by law. If this was
> sent to you in error, please notify the sender by reply e-mail and destroy
> all copies of the original message.
>
> GrayHair Software 
>
>


Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-30 Thread Mark Horninger
I have gotten Teiid and Solr wired up, but it seems like the only way to query 
is with the default Solr Query Engine, and nothing else.  In asking Dr. Google, 
this is a data black hole.  The more I look at it, the more I think I'm going 
to end up having to write a custom translator.  Is there anyone else out there 
who has had this challenge, and if so, how did you overcome it?

Thanks In Advance!

-Mark H.


[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software 



Re: Autostart Zookeeper and Solr using scripting

2015-10-19 Thread Scott Stults
Hi Adrian,

I'd probably start with the expect command and "echo ruok | nc  "
for a simple script. You might also want to try the Netflix Exhibitor REST
interface:

https://github.com/Netflix/exhibitor/wiki/REST-Cluster


k/r,
Scott

On Thu, Oct 15, 2015 at 2:01 AM, Adrian Liew 
wrote:

> Hi,
>
> I am trying to implement some scripting to detect if all Zookeepers have
> started in a cluster, then restart the solr servers. Has anyone achieved
> this yet through scripting?
>
> I also saw there is the ZookeeperClient that is available in .NET via a
> nuget package. Not sure if this could be also implemented to check if a
> zookeeper is running.
>
> Any thoughts on anyone using a script to perform this?
>
> Regards,
> Adrian
>
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Autostart Zookeeper and Solr using scripting

2015-10-15 Thread Adrian Liew
Hi,

I am trying to implement some scripting to detect if all Zookeepers have 
started in a cluster, then restart the solr servers. Has anyone achieved this 
yet through scripting?

I also saw there is the ZookeeperClient that is available in .NET via a nuget 
package. Not sure if this could be also implemented to check if a zookeeper is 
running.

Any thoughts on anyone using a script to perform this?

Regards,
Adrian



Deep paging in solr using cursorMark

2015-01-27 Thread CKReddy Bhimavarapu
Hi,
 Using CursorMark we over come the Deep paging so far so good. As far
as I understand cursormark unique for each and every query depending on
sort values other than unique id and also depends up on number of rows.
 But my concern is if solr internally creates a different set for each
and every different queries upon sort values and they lasts for ever I
think.
1. if it lasts for ever does they consume server ram or not.
2. if it is occupying server ram does there is any way to destroy or clean
it.

please let me know if I am wrong.

Thanks in advance.
-- 
chaitu. chaitu...@gmail.com


Re: Deep paging in solr using cursorMark

2015-01-27 Thread Brendan Humphreys
Apologies in advance for hijacking the thread, but somewhat related, does
anyone have experience with using cursorMark and elevations at the same
time? When I tried this, either passing elevatedIds via solrJ or specifying
them in elevate.xml, I got an AIOOBE if a cursorMark was also specified.
When I get a chance, I'll try to reproduce in a unit test. Just wondering
if anyone has encountered this and has a workaround. This is the one thing
that is stopping us adopting cursor-based pagination :-(

Cheers,
-Brendan

On 28 January 2015 at 02:51, Chris Hostetter hossman_luc...@fucit.org
wrote:


 :  But my concern is if solr internally creates a different set for
 each
 : and every different queries upon sort values and they lasts for ever I
 : think.


 https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

 Cursors in Solr are a logical concept, that doesn't involve caching any
 state information on the server.  Instead the sort values of the last
 document returned to the client are used to compute a mark representing
 a logical point in the ordered space of sort values.




 -Hoss
 http://www.lucidworks.com/



Re: Deep paging in solr using cursorMark

2015-01-27 Thread Yonik Seeley
On Tue, Jan 27, 2015 at 3:29 AM, CKReddy Bhimavarapu
chaitu...@gmail.com wrote:
 Hi,
  Using CursorMark we over come the Deep paging so far so good. As far
 as I understand cursormark unique for each and every query depending on
 sort values other than unique id and also depends up on number of rows.
  But my concern is if solr internally creates a different set for each
 and every different queries upon sort values and they lasts for ever I
 think.
 1. if it lasts for ever does they consume server ram or not.
 2. if it is occupying server ram does there is any way to destroy or clean
 it.

No, there is no server-side state cached.  Think of it as a cookie.

-Yonik


Re: Deep paging in solr using cursorMark

2015-01-27 Thread Chris Hostetter

:  But my concern is if solr internally creates a different set for each
: and every different queries upon sort values and they lasts for ever I
: think.


https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

Cursors in Solr are a logical concept, that doesn't involve caching any 
state information on the server.  Instead the sort values of the last 
document returned to the client are used to compute a mark representing 
a logical point in the ordered space of sort values.




-Hoss
http://www.lucidworks.com/


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2014-10-01 Thread xoku
help me!
i can't find all result. 
str name=spellcheck.count200/str
Ex:
i find: file
result expected: file name documentabcxyz
but solr return result (suggest: result term object) :
-

[suggestions:protected] = Array
(
[0] = file
[1] = file (whitespace)
[2] = file n
[3] = file nam
[4] =file name
[5] = file name (whitespace)
[6] = file name do
[7] = file name doc 
[8] = file name docu (always is 14 character)

)
---
when result is 14 character, it stop and show result is file name docu.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4162063.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2014-10-01 Thread Erick Erickson
Perhaps your ngram filter is set to terminate at 14 (maxGram)?

Best,
Erick

On Wed, Oct 1, 2014 at 3:18 AM, xoku xoan...@anlab.info wrote:
 help me!
 i can't find all result.
 str name=spellcheck.count200/str
 Ex:
 i find: file
 result expected: file name documentabcxyz
 but solr return result (suggest: result term object) :
 -

 [suggestions:protected] = Array
 (
 [0] = file
 [1] = file (whitespace)
 [2] = file n
 [3] = file nam
 [4] =file name
 [5] = file name (whitespace)
 [6] = file name do
 [7] = file name doc
 [8] = file name docu (always is 14 character)

 )
 ---
 when result is 14 character, it stop and show result is file name docu.








 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4162063.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing database in Solr using Data Import Handler

2014-07-17 Thread Sam Barber
Hi,



You have the wrong varname in your sub query.



select favouritedby from filefav where id=
'${filemetadata.id}'



should be

select favouritedby from filefav where id=
'${restaurant.id}'


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2014-04-08 Thread atpatil11
Hi I have done the same changes as you told  changed the code with my fields
name. However I'm getting following error. I even reverted edited code but
still its throwing same error. We're having Solr 4.6. When i restart the
solr it says solr (pid 4610) already running.

SolrCore Initialization Failures

ole-beta:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
copyField dest :'autocomplete_text' is not an explicit field and doesn't
match a dynamicField.. Schema file is
/opt/bitnami/apache-solr/solr/collection1/schema.xml 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4129807.html
Sent from the Solr - User mailing list archive at Nabble.com.


AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Sohan Kalsariya
Can anyone suggest me the best practices how to do SpellCheck and
AutoSuggest in solarium.
Can anyone give me example for that?


-- 
Regards,
*Sohan Kalsariya*


RE: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Suresh Soundararajan
Hi Sohan,

The best approach for the auto suggest is using the facet query.

Please refer the link : 
http://solr.pl/en/2010/10/18/solr-and-autocomplete-part-1/


Thanks,
SureshKumar.S


From: Sohan Kalsariya sohankalsar...@gmail.com
Sent: Monday, March 17, 2014 8:14 PM
To: solr-user@lucene.apache.org
Subject: AutoSuggest like Google in Solr using Solarium Client.

Can anyone suggest me the best practices how to do SpellCheck and
AutoSuggest in solarium.
Can anyone give me example for that?


--
Regards,
*Sohan Kalsariya*
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


Re: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread Michael McCandless
I think it's best to use one of the many autosuggesters Lucene/Solr provide?

E.g. AnalyzingInfixSuggester is running here:
http://jirasearch.mikemccandless.com

But that's just one suggester... there are many more.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Mar 17, 2014 at 10:44 AM, Sohan Kalsariya
sohankalsar...@gmail.com wrote:
 Can anyone suggest me the best practices how to do SpellCheck and
 AutoSuggest in solarium.
 Can anyone give me example for that?


 --
 Regards,
 *Sohan Kalsariya*


Re: AutoSuggest like Google in Solr using Solarium Client.

2014-03-17 Thread bbi123
Not sure if you have already seen this one..

http://www.solarium-project.org/2012/01/suggester-query-support/

You can also use edge N gram filter to implement typeahead auto suggest.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AutoSuggest-like-Google-in-Solr-using-Solarium-Client-tp4124821p4124871.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread heaven
I have the same problem, please look at the image:
http://lucene.472066.n3.nabble.com/file/n4114026/Screenshot_733.png 

And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an
issue, there's a lot. RAID 10 (15000RPM rapid hdd).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4114026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread Otis Gospodnetic
Hi,

Show us more graphs.  Is the GC working hard?  Any of the JVM mem pools at
or near 100%?  SPM for Solr is your friend for long term
monitoring/alerting/trends, jconsole and visualvm for a quick look.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Tue, Jan 28, 2014 at 2:11 PM, heaven aheave...@gmail.com wrote:

 I have the same problem, please look at the image:
 http://lucene.472066.n3.nabble.com/file/n4114026/Screenshot_733.png

 And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an
 issue, there's a lot. RAID 10 (15000RPM rapid hdd).



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4114026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-10-28 Thread anurag.sharma
Hi ... I am trying to build autocomplete functionality using your post. But I
am getting the following error

*2577 [coreLoadExecutor-3-thread-1] WARN 
org.apache.solr.spelling.suggest.Suggester  – Loading stored lookup data
failed
java.io.FileNotFoundException:
/home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:137)
at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:116)
at
org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601)
at org.apache.solr.core.SolrCore.init(SolrCore.java:830)
at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
*

I am using solr 4.4. Is the suggester component still works in this version



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4098032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error while importing HBase data to Solr using the DataImportHandler

2013-09-11 Thread ppatel
Hi,

Can you provide me an example of data-config.xml? because with my Hbase
configuration, I am getting
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;

AND

Exception while processing: item document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute SCANNER: [tableName=Item, startRow=null, stopRow=null,
columns=[{Item|r}, {Item|m}, {Item|u}]] Processing Document # 1

Mine data-config.xml:

dataConfig

dataSource type=HbaseDataSource name=HBase host=127.0.0.1 port=2181
/

document name=Item

entity name=item 
pk=ROW_KEY
dataSource=HBase
processor=HbaseEntityProcessor
tableName=Item 
onError=abort
columns=Item|r,
 Item|m,
 Item|u
query=scan 'Item', {COLUMNS = ['r','m', 'u']}
deltaImportQuery=
deltaQuery= 

field column=ROW_KEY name=id /
field column=r name=r /
field column=m name=m /
field column=u name=u /

/entity


/document
/dataConfig

Please respond me ASAP.

Thanks in advance!!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-importing-HBase-data-to-Solr-using-the-DataImportHandler-tp4085613p4089402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error while importing HBase data to Solr using the DataImportHandler

2013-08-20 Thread Jamshaid Ashraf
Hi,

I want to import HBase(0.90.4) data to solr 4.3.0 using DIH (Data import
handler) for this I used https://code.google.com/p/hbase-solr-dataimport/;
project. Whenever I run data import handler 
http://localhost:8080/solr/#/collection1/dataimport; it throws following
error in log.

*Jar:*
hbase-solr-dataimport-0.0.1.jar

*Error:*
Full Import failed:java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/HBaseConfiguration

Your early response would be appreciated!

Thanks  Regards,
Jamshaid


Re: Error while importing HBase data to Solr using the DataImportHandler

2013-08-20 Thread tamanjit.bin...@yahoo.co.in
You would need to add the jar that is missing to the Solr web-inf\lib folder.
You can do that using winzip etc into the lib folder of solr.war. Then you
need to redeployed the changed solr.war and restart your webcontainer.

The jar is available here:
http://code.google.com/p/hbase-solr-dataimport/downloads/detail?name=hbase-solr-dataimport-0.0.1.jarcan=2q=
http://code.google.com/p/hbase-solr-dataimport/downloads/detail?name=hbase-solr-dataimport-0.0.1.jarcan=2q=
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-importing-HBase-data-to-Solr-using-the-DataImportHandler-tp4085613p4085616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error while importing HBase data to Solr using the DataImportHandler

2013-08-20 Thread Jamshaid Ashraf
Thanks tamanjit!

Issue is resolved I just further added following jars as well:

hbase-0.90.4.jar
hadoop-core-0.20-append-r1056497.jar
zookeeper-3.4.5.jar

Regards,
Jamshaid


On Tue, Aug 20, 2013 at 2:32 PM, tamanjit.bin...@yahoo.co.in 
tamanjit.bin...@yahoo.co.in wrote:

 You would need to add the jar that is missing to the Solr web-inf\lib
 folder.
 You can do that using winzip etc into the lib folder of solr.war. Then you
 need to redeployed the changed solr.war and restart your webcontainer.

 The jar is available here:

 http://code.google.com/p/hbase-solr-dataimport/downloads/detail?name=hbase-solr-dataimport-0.0.1.jarcan=2q=
 
 http://code.google.com/p/hbase-solr-dataimport/downloads/detail?name=hbase-solr-dataimport-0.0.1.jarcan=2q=
 



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Error-while-importing-HBase-data-to-Solr-using-the-DataImportHandler-tp4085613p4085616.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: SOLR USING 100% percent CPU and not responding after a while

2013-08-08 Thread nitin4php
Hi Biva,

Any luck on this?

Even we are facing same issue with exactly same configuration and setup.

Any inputs will help a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4083234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr - using fq parameter does not retrieve an answer

2013-08-06 Thread Mysurf Mail
Thanks.


On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/5/2013 2:35 AM, Mysurf Mail wrote:
  When I query using
 
  http://localhost:8983/solr/vault/select?q=*:*
 
  I get reuslts including the following
 
  doc
...
...
int name=VersionNumber7/int
...
  /doc
 
  Now I try to get only that row so I add to my query fq=VersionNumber:7
 
  http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7
 
  And I get nothing.
  Any idea?

 Is the VersionNumber field indexed?  If it's not, you won't be able to
 search on it.

 If you change your schema so that the field has 'indexed=true, you'll
 have to reindex.

 http://wiki.apache.org/solr/HowToReindex

 When you are retrieving a single document, it's better to use the q
 parameter rather than the fq parameter.  Querying a single document will
 pollute the cache.  It's a lot better to pollute the queryResultCache
 than the filterCache.  The former is generally much larger than the
 latter and better able to deal with pollution.

 Thanks,
 Shawn




solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Mysurf Mail
When I query using

http://localhost:8983/solr/vault/select?q=*:*

I get reuslts including the following

doc
  ...
  ...
  int name=VersionNumber7/int
  ...
/doc

Now I try to get only that row so I add to my query fq=VersionNumber:7

http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7

And I get nothing.
Any idea?


Re: solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Jack Krupansky

Is VersionNumber an indexed field, or just stored?

-- Jack Krupansky

-Original Message- 
From: Mysurf Mail 
Sent: Monday, August 05, 2013 4:35 AM 
To: solr-user@lucene.apache.org 
Subject: solr - using fq parameter does not retrieve an answer 


When I query using

http://localhost:8983/solr/vault/select?q=*:*

I get reuslts including the following

doc
 ...
 ...
 int name=VersionNumber7/int
 ...
/doc

Now I try to get only that row so I add to my query fq=VersionNumber:7

http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7

And I get nothing.
Any idea?


Re: solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Shawn Heisey
On 8/5/2013 2:35 AM, Mysurf Mail wrote:
 When I query using
 
 http://localhost:8983/solr/vault/select?q=*:*
 
 I get reuslts including the following
 
 doc
   ...
   ...
   int name=VersionNumber7/int
   ...
 /doc
 
 Now I try to get only that row so I add to my query fq=VersionNumber:7
 
 http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7
 
 And I get nothing.
 Any idea?

Is the VersionNumber field indexed?  If it's not, you won't be able to
search on it.

If you change your schema so that the field has 'indexed=true, you'll
have to reindex.

http://wiki.apache.org/solr/HowToReindex

When you are retrieving a single document, it's better to use the q
parameter rather than the fq parameter.  Querying a single document will
pollute the cache.  It's a lot better to pollute the queryResultCache
than the filterCache.  The former is generally much larger than the
latter and better able to deal with pollution.

Thanks,
Shawn



Indexing Oracle Database in Solr using Data Import Handler

2013-07-23 Thread archit2112
Im trying to Index oracle database 10g XE using Solr's Data Import Handler.

My data-config.xml looks like this

dataConfig
dataSource type=JdbcDataSource driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@XXX.XXX.XXX.XXX::xe user=XX
password=XX / 
document name=product_info
entity name=product query=select * from product
field column=pid name=id /  
field column=pname name=itemName / 
field column=initqty name=itemQuantity /
field column=remQty name=remQuantity /
field column=price name=itemPrice / 
field column=specification name=specifications / 
/entity
/document
/dataConfig

My schema.xml looks like this -

field name=id type=text_general indexed=true stored=true
required=true multiValued=false / 
   field name=itemName type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
   field name=itemQuantity type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=remQuantity type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=itemPrice type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /   
   field name=specifications type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true /   
   field name=brand type=text_general indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /   
   field name=itemCategory type=text_general indexed=true
stored=true multiValued=true omitNorms=true termVectors=true / 

Now when I try to index it, Solr is not able to read the columns of the
table and therefore indexing fails. it says that the document is missing the
unique key id which ,as you can see, is clearly present in document. Also,
generally in the log when such an exception is thrown it is clearly shown
that what all fields were picked up by the document. However in this case,
No fields are being read.

But if i change my query then everything works perfectly. The modified
data-config.xml -

dataConfig
dataSource name=db1 type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@XXX.XXX.XX.XX::xe user=
password=X / 
document name=product_info
entity name=products dataSource=db1 query=select pid as id,pname as
itemName,initqty as itemQuantity, remqty as remQuantity, price as itemPrice,
specification as specifications from product
field column=id name=id / 
field column=itemName name=itemName / 
field column=itemQuantity name=itemQuantity /
field column=remQuantity name=remQuantity /
field column=itemPrice name=itemPrice / 
field column=specifications name=specifications /
/entity
/document
/dataConfig

Why is this happening? how do i solve it? how does giving an alias affect
indexing process? Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Oracle-Database-in-Solr-using-Data-Import-Handler-tp4079649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing database in Solr using Data Import Handler

2013-07-11 Thread Gora Mohanty
On 11 July 2013 11:13, archit2112 archit2...@gmail.com wrote:


 Im trying to index MySql database using Data Import Handler in solr.
[...]
 Everything is working but the favouritedby1 field is not getting indexed
 ,
 ie, that field does not exist when i run the *:* query. Can you please
 help
 me out?

Please show us your schema.xml. Does it have
a favouritedby1 field, and the other fields that
you are trying to add through DIH?

Regards,
Gora


Indexing database in Solr using Data Import Handler

2013-07-10 Thread archit2112

Im trying to index MySql database using Data Import Handler in solr.

I have made two tables. The first table holds the metadata of a file.

create table filemetadata (
id varchar(20) primary key ,
filename varchar(50),
path varchar(200),
size varchar(10),
author varchar(50)
) ;

The second table contains the favourite info about a particular file in
the above table.

create table filefav (
fid varchar(20) primary key ,
id varchar(20),
favouritedby varchar(300),
favouritedtime varchar(10),
FOREIGN KEY (id) REFERENCES filemetadata(id) 
) ;

As you can see id is a foreign key.

To index this i have written the following data-config.xml -

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/test user=root password=root / 
document name=filemetadata

entity name=restaurant query=select * from filemetadata
field column=id name=id / 

 entity name=filefav query=select favouritedby from filefav where id=
'${filemetadata.id}'
field column=favouritedby name=favouritedby1 /
/entity

field column=filename name=name1 / 
field column=path name=path1 / 
field column=size name=size1 / 
field column=author name=author1 /  

/entity

/document
/dataConfig

Everything is working but the favouritedby1 field is not getting indexed ,
ie, that field does not exist when i run the *:* query. Can you please help
me out?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-database-in-Solr-using-Data-Import-Handler-tp4077180.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Erick Erickson
John:

If you'd like to add your experience to the Wiki, create
an ID and let us know what it is and we'll add you to the
contributors list. Unfortunately we had problems with
spam pages to we added this step.

Make sure you include your logon in the request.

Thanks,
Erick

On Fri, Jun 14, 2013 at 8:55 AM, John Nielsen j...@mcb.dk wrote:
 Sorry for not getting back to the list sooner. It seems like I finally
 solved the memory problems by following Toke's instruction of splitting the
 cores up into smaller chunks.

 After some major refactoring, our 15 cores have now turned into ~500 cores
 and our memory consumption has dropped dramaticly. Running 200 webshops now
 actually uses less memory as our 24 test shops did before.

 Thank you to everyone who helped, and especially to Toke.

 I looked at the wiki, but could not find any reference to this unintuitive
 way of using memory. Did I miss it somewhere?



 On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Hmmm. There has been quite a bit of work lately to support a couple of
 things that might be of interest (4.3, which Simon cut today, probably
 available to all mid next week at the latest). Basically, you can
 choose to pre-define all the cores in solr.xml (so-called old style)
 _or_ use the new-style solr.xml which uses auto-discover mode to
 walk the indicated directory and find all the cores (indicated by the
 presence of a 'core.properties' file). Don't know if this would make
 your particular case easier, and I should warn you that this is
 relatively new code (although there are some reasonable unit tests).

 You also have the option to only load the cores when they are
 referenced, and only keep N cores open at a time (loadOnStartup and
 transient properties).

 See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
 http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond

 Note, the docs are somewhat sketchy, so if you try to go down this
 route let us know anything that should be improved (or you can be
 added to the list of wiki page contributors and help out!)

 Best
 Erick

 On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen j...@mcb.dk wrote:
  You are missing an essential part: Both the facet and the sort
  structures needs to hold one reference for each document
  _in_the_full_index_, even when the document does not have any values in
  the fields.
 
 
  Wow, thank you for this awesome explanation! This is where the penny
  dropped for me.
 
  I will definetely move to a multi-core setup. It will take some time and
 a
  lot of re-coding. As soon as I know the result, I will let you know!
 
 
 
 
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk




 --
 Med venlig hilsen / Best regards

 *John Nielsen*
 Programmer



 *MCB A/S*
 Enghaven 15
 DK-7500 Holstebro

 Kundeservice: +45 9610 2824
 p...@mcb.dk
 www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-06-16 Thread adityab
It was interesting to read this post. I had similar issue on Solr v4.2.1. The
nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns 
We then found a bug Lucene-4995 which was causing all the problem.
Applying the patch has helped a lot. 
Not sure related but you might want to check that out. 
Thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Jack Krupansky
Yeah, this is yet another anti-pattern we need to be discouraging - large 
multivalued fields. They indicate that the data model is not well balanced 
and aligned with the strengths of Solr and Lucene.


-- Jack Krupansky

-Original Message- 
From: adityab

Sent: Sunday, June 16, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

It was interesting to read this post. I had similar issue on Solr v4.2.1. 
The

nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns
We then found a bug Lucene-4995 which was causing all the problem.
Applying the patch has helped a lot.
Not sure related but you might want to check that out.
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr using a ridiculous amount of memory

2013-06-14 Thread John Nielsen
Sorry for not getting back to the list sooner. It seems like I finally
solved the memory problems by following Toke's instruction of splitting the
cores up into smaller chunks.

After some major refactoring, our 15 cores have now turned into ~500 cores
and our memory consumption has dropped dramaticly. Running 200 webshops now
actually uses less memory as our 24 test shops did before.

Thank you to everyone who helped, and especially to Toke.

I looked at the wiki, but could not find any reference to this unintuitive
way of using memory. Did I miss it somewhere?



On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm. There has been quite a bit of work lately to support a couple of
 things that might be of interest (4.3, which Simon cut today, probably
 available to all mid next week at the latest). Basically, you can
 choose to pre-define all the cores in solr.xml (so-called old style)
 _or_ use the new-style solr.xml which uses auto-discover mode to
 walk the indicated directory and find all the cores (indicated by the
 presence of a 'core.properties' file). Don't know if this would make
 your particular case easier, and I should warn you that this is
 relatively new code (although there are some reasonable unit tests).

 You also have the option to only load the cores when they are
 referenced, and only keep N cores open at a time (loadOnStartup and
 transient properties).

 See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
 http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond

 Note, the docs are somewhat sketchy, so if you try to go down this
 route let us know anything that should be improved (or you can be
 added to the list of wiki page contributors and help out!)

 Best
 Erick

 On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen j...@mcb.dk wrote:
  You are missing an essential part: Both the facet and the sort
  structures needs to hold one reference for each document
  _in_the_full_index_, even when the document does not have any values in
  the fields.
 
 
  Wow, thank you for this awesome explanation! This is where the penny
  dropped for me.
 
  I will definetely move to a multi-core setup. It will take some time and
 a
  lot of re-coding. As soon as I know the result, I will let you know!
 
 
 
 
 
 
  --
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-06-14 Thread Toke Eskildsen
On Fri, 2013-06-14 at 14:55 +0200, John Nielsen wrote:
 Sorry for not getting back to the list sooner.

Time not important, only feedback important (apologies to Fifth
Element).

 After some major refactoring, our 15 cores have now turned into ~500 cores
 and our memory consumption has dropped dramaticly. Running 200 webshops now
 actually uses less memory as our 24 test shops did before.

That's great to hear. One core/shop also sounds like a cleaner setup.

 I looked at the wiki, but could not find any reference to this unintuitive
 way of using memory. Did I miss it somewhere?

I am not aware of a wikified explanation, but a section on Why does
Solr use so much memory? with some suggestions for changes to setup
would seem appropriate. You are not the first to have these kinds of
problems.


Thank you for closing the issue,
Toke Eskildsen



Re: Solr using a ridiculous amount of memory

2013-04-19 Thread Erick Erickson
Hmmm. There has been quite a bit of work lately to support a couple of
things that might be of interest (4.3, which Simon cut today, probably
available to all mid next week at the latest). Basically, you can
choose to pre-define all the cores in solr.xml (so-called old style)
_or_ use the new-style solr.xml which uses auto-discover mode to
walk the indicated directory and find all the cores (indicated by the
presence of a 'core.properties' file). Don't know if this would make
your particular case easier, and I should warn you that this is
relatively new code (although there are some reasonable unit tests).

You also have the option to only load the cores when they are
referenced, and only keep N cores open at a time (loadOnStartup and
transient properties).

See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond

Note, the docs are somewhat sketchy, so if you try to go down this
route let us know anything that should be improved (or you can be
added to the list of wiki page contributors and help out!)

Best
Erick

On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen j...@mcb.dk wrote:
 You are missing an essential part: Both the facet and the sort
 structures needs to hold one reference for each document
 _in_the_full_index_, even when the document does not have any values in
 the fields.


 Wow, thank you for this awesome explanation! This is where the penny
 dropped for me.

 I will definetely move to a multi-core setup. It will take some time and a
 lot of re-coding. As soon as I know the result, I will let you know!






 --
 Med venlig hilsen / Best regards

 *John Nielsen*
 Programmer



 *MCB A/S*
 Enghaven 15
 DK-7500 Holstebro

 Kundeservice: +45 9610 2824
 p...@mcb.dk
 www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
 That was strange. As you are using a multi-valued field with the new
setup, they should appear there.

Yes, the new field we use for faceting is a multi valued field.

 Can you find the facet fields in any of the other caches?

Yes, here it is, in the field cache:

http://screencast.com/t/mAwEnA21yL

 I hope you are not calling the facets with facet.method=enum? Could you
paste a typical facet-enabled search request?

Here is a typical example (I added newlines for readability):

http://172.22.51.111:8000/solr/default1_Danish/search
?defType=edismax
q=*%3a*
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv
facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv
fq=site_guid%3a(10217)
fq=item_type%3a(PRODUCT)
fq=language_guid%3a(1)
fq=item_group_1522_combination%3a(*)
fq=is_searchable%3a(True)
sort=item_group_1522_name_int+asc, variant_of_item_guid+asc
querytype=Technical
fl=feed_item_serialized
facet=true
group=true
group.facet=true
group.ngroups=true
group.field=groupby_variant_of_item_guid
group.sort=name+asc
rows=0

 Are you warming all the sort- and facet-fields?

I'm sorry, I don't know. I have the field value cache commented out in my
config, so... Whatever is default?

Removing the custom sort fields is unfortunately quite a bit more difficult
than my other facet modification.

The problem is that each item can have several sort orders. The sort order
to use is defined by a group number which is known ahead of time. The group
number is included in the sort order field name. To solve it in the same
way i solved the facet problem, I would need to be able to sort on a
multi-valued field, and unless I'm wrong, I don't think that it's possible.

I am quite stomped on how to fix this.




On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 John Nielsen [j...@mcb.dk]:
  I never seriously looked at my fieldValueCache. It never seemed to get
 used:

  http://screencast.com/t/YtKw7UQfU

 That was strange. As you are using a multi-valued field with the new
 setup, they should appear there. Can you find the facet fields in any of
 the other caches?

 ...I hope you are not calling the facets with facet.method=enum? Could you
 paste a typical facet-enabled search request?

  Yep. We still do a lot of sorting on dynamic field names, so the field
 cache
  has a lot of entries. (9.411 entries as we speak. This is considerably
 lower
  than before.). You mentioned in an earlier mail that faceting on a field
  shared between all facet queries would bring down the memory needed.
  Does the same thing go for sorting?

 More or less. Sorting stores the raw string representations (utf-8) in
 memory so the number of unique values has more to say than it does for
 faceting. Just as with faceting, a list of pointers from documents to
 values (1 value/document as we are sorting) is maintained, so the overhead
 is something like

 #documents*log2(#unique_terms*average_term_length) +
 #unique_terms*average_term_length
 (where average_term_length is in bits)

 Caveat: This is with the index-wide sorting structure. I am fairly
 confident that this is what Solr uses, but I have not looked at it lately
 so it is possible that some memory-saving segment-based trickery has been
 implemented.

  Does those 9411 entries duplicate data between them?

 Sorry, I do not know. SOLR- discusses the problems with the field
 cache and duplication of data, but I cannot infer if it is has been solved
 or not. I am not familiar with the stat breakdown of the fieldCache, but it
 _seems_ to me that there are 2 or 3 entries for each segment for each sort
 field. Guesstimating further, let's say you have 30 segments in your index.
 Going with the guesswork, that would bring the number of sort fields to
 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?

 Extrapolating from 1.4M documents and 180 clients, let's say that there
 are 1.4M/180/5 unique terms for each sort-field and that their average
 length is 10. We thus have
 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
 per sort field or about 4GB for all the 180 fields.

 With this few unique values, the doc-value structure is by far the
 biggest, just as with facets. As opposed to the faceting structure, this is
 fairly close to the actual memory usage. Switching to a single sort field
 would reduce the memory usage from 4GB to about 55MB.

  I do commit a bit more often than i should. I get these in my log file
 from
  time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

 So 1 active searcher and 2 warming searchers. Ignoring that one of the
 warming searchers is highly likely to 

Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 08:34 +0200, John Nielsen wrote:

 
[Toke: Can you find the facet fields in any of the other caches?]

 Yes, here it is, in the field cache:

 http://screencast.com/t/mAwEnA21yL
 
Ah yes, mystery solved, my mistake.

 http://172.22.51.111:8000/solr/default1_Danish/search

[...]

 fq=site_guid%3a(10217)

This constraints to hits to a specific customer, right? Any search will
only be in a single customer's data?

 
[Toke: Are you warming all the sort- and facet-fields?]

 I'm sorry, I don't know. I have the field value cache commented out in
 my config, so... Whatever is default?

(a bit shaky here) I would say not warming. You could check simply by
starting solr and looking at the caches before you issue any searches.

This fits the description of your searchers gradually eating memory
until your JVM OOMs. Each time a new field is faceted or sorted upon, it
it added to the cache. As your index is relatively small and the number
of values in the single fields is small, the initialization time for a
field is so short that it is not a performance problem. Memory wise is
is death by a thousand cuts.

If you did explicit warming of all the possible fields for sorting and
faceting, your would allocate it all up front and would be sure that
there would be enough memory available. But it would take much longer
than your current setup. You might want to try it out (no need to fiddle
with Solr setup, just make a script and fire wgets as this has the same
effect).

 The problem is that each item can have several sort orders. The sort
 order to use is defined by a group number which is known ahead of
 time. The group number is included in the sort order field name. To
 solve it in the same way i solved the facet problem, I would need to
 be able to sort on a multi-valued field, and unless I'm wrong, I don't
 think that it's possible.

That is correct.

Three suggestions off the bat:

1) Reduce the number of sort fields by mapping names.
Count the maximum number of unique sort fields for any given customer.
That will be the total number of sort fields in the index. For each
group number for a customer, map that number to one of the index-wide
sort fields.
This only works if the maximum number of unique fields is low (let's say
a single field takes 50MB, so 20 fields should be okay).

2) Create a custom sorter for Solr.
Create a field with all the sort values, prefixed by group ID. Create a
structure (or reuse the one from Lucene) with a doc-terms map with all
the terms in-memory. When sorting, extract the relevant compare-string
for a document by iterating all the terms for the document and selecting
the one with the right prefix.
Memory wise this scales linear to the number of terms instead of the
number of fields, but it would require quite some coding.

3) Switch to a layout where each customer has a dedicated core.
The basic overhead is a lot larger than for a shared index, but it would
make your setup largely immune to the adverse effect of many documents
coupled with many facet- and sort-fields.

- Toke Eskildsen, State and University Library, Denmark




Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen

  http://172.22.51.111:8000/solr/default1_Danish/search

 [...]

  fq=site_guid%3a(10217)

 This constraints to hits to a specific customer, right? Any search will
 only be in a single customer's data?


Yes, thats right. No search from any given client ever returns anything
from another client.


[Toke: Are you warming all the sort- and facet-fields?]

  I'm sorry, I don't know. I have the field value cache commented out in
  my config, so... Whatever is default?

 (a bit shaky here) I would say not warming. You could check simply by
 starting solr and looking at the caches before you issue any searches.


The field cache shows 0 entries at startup. On the running server, forcing
a commit (and thus opening a new searcher) does not change the number of
entries.


  The problem is that each item can have several sort orders. The sort
  order to use is defined by a group number which is known ahead of
  time. The group number is included in the sort order field name. To
  solve it in the same way i solved the facet problem, I would need to
  be able to sort on a multi-valued field, and unless I'm wrong, I don't
  think that it's possible.

 That is correct.

 Three suggestions off the bat:

 1) Reduce the number of sort fields by mapping names.
 Count the maximum number of unique sort fields for any given customer.
 That will be the total number of sort fields in the index. For each
 group number for a customer, map that number to one of the index-wide
 sort fields.
 This only works if the maximum number of unique fields is low (let's say
 a single field takes 50MB, so 20 fields should be okay).


I just checked our DB. Our worst case scenario client has over a thousand
groups for sorting. Granted, it may be, probably is, an error with the
data. It is an interesting idea though and I will look into this posibility.


 3) Switch to a layout where each customer has a dedicated core.
 The basic overhead is a lot larger than for a shared index, but it would
 make your setup largely immune to the adverse effect of many documents
 coupled with many facet- and sort-fields.


Now this is where my brain melts down.

If I understand the fieldCache mechanism correctly (which i can see that I
don't), the data used for faceting and sorting is saved in the fieldCache
using a key comprised of the fields used for said faceting/sorting. That
data only contains the data which is actually used for the operation. This
is what the fq queries are for.

So if i generate a core for each client, I would have a client specific
fieldCache containing the data from that client. Wouldn't I just split up
the same data into several cores?

I'm afraid I don't understand how this would help.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 11:59 +0200, John Nielsen wrote:
 Yes, thats right. No search from any given client ever returns
 anything from another client.

Great. That makes the 1 core/client solution feasible.

[No sort  facet warmup is performed]

[Suggestion 1: Reduce the number of sort fields by mapping]

[Suggestion 3: 1 core/customer]

 If I understand the fieldCache mechanism correctly (which i can see
 that I don't), the data used for faceting and sorting is saved in the
 fieldCache using a key comprised of the fields used for said
 faceting/sorting. That data only contains the data which is actually
 used for the operation. This is what the fq queries are for.
 
You are missing an essential part: Both the facet and the sort
structures needs to hold one reference for each document
_in_the_full_index_, even when the document does not have any values in
the fields.

It might help to visualize the structures as arrays of values with docID
as index: String[] myValues = new String[140] takes up 1.4M * 32 bit
(or more for a 64 bit machine) = 5.6MB, even when it is empty.

Note: Neither String-objects, nor Java references are used for the real
facet- and sort-structures, but the principle is quite the same.

 So if i generate a core for each client, I would have a client
 specific fieldCache containing the data from that client. Wouldn't I
 just split up the same data into several cores?

The same terms, yes, but not the same references.

Let's say your customer has 10K documents in the index and that there
are 100 unique values, each 10 bytes long, in each group .

As each group holds its own separate structure, we use the old formula
to get the memory overhead:

#documents*log2(#unique_terms*average_term_length) +
#unique_terms*average_term_length
 
1.4M*log2(100*(10*8)) + 100*(10*8) bit = 1.2MB + 1KB.

Note how the values themselves are just 1KB, while the nearly empty
reference list takes 1.2MB.


Compare this to a dedicated core with just the 10K documents:
10K*log2(100*(10*8)) + 100*(10*8) bit = 8.5KB + 1KB.

The terms take up exactly the same space, but the heap requirement for
the references is reduced by 99%.

Now, 25GB for 180 clients means 140MB/client with your current setup.
I do not know the memory overhead of running a core, but since Solr can
run fine with 32MB for small indexes, it should be smaller than that.
You will of course have to experiment and to measure.


- Toke Eskildsen, State and University Library, Denmark




Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
 You are missing an essential part: Both the facet and the sort
 structures needs to hold one reference for each document
 _in_the_full_index_, even when the document does not have any values in
 the fields.


Wow, thank you for this awesome explanation! This is where the penny
dropped for me.

I will definetely move to a multi-core setup. It will take some time and a
lot of re-coding. As soon as I know the result, I will let you know!






-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.

Unfortunately it doesn't seem to have done much difference, if any at all.

Some more information that might help:

The JVM memory seem to be eaten up slowly. I dont think that there is one
single query that causes the problem. My test case (dumping 180 clients on
top of solr) takes hours before it causes an OOM. Often a full day. The
memory usage wobbles up and down, so the GC is at least partially doing its
job. It still works its way up to 100% eventually. When that happens it
either OOM's or it stops the world and brings the memory consumption to
10-15 gigs.

I did try to facet on all products across all clients (about 1.4 mil docs)
and i could not make it OOM on a server with a 4 gig jvm. This was on a
dedicated test server with my test being the only traffic.

I am beginning to think that this may be related to traffic volume and not
just on the type of query that I do.

I tried to calculate the memory requirement example you gave me above based
on the change that got rid of the dynamic fields.

documents = ~1.400.000
references 11.200.000  (we facet on two multivalue fields with each 4
values on average, so 1.400.000 * 2 * 4 = 11.200.000
unique values = 1.132.344 (total number of variant options across all
clients. This is what we facet on)

1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field
(we have 4 fields)?

I must be calculating this wrong.






On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen j...@mcb.dk wrote:

 I did a search. I have no occurrence of UnInverted in the solr logs.

  Another explanation for the large amount of memory presents itself if
  you use a single index: If each of your clients facet on at least one
  fields specific to the client (client123_persons or something like
  that), then your memory usage goes through the roof.

 This is exactly how we facet right now! I will definetely rewrite the
 relevant parts of our product to test this out before moving further down
 the docValues path.

 I will let you know as soon as I know one way or the other.


 On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen 
 t...@statsbiblioteket.dkwrote:

 On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

  The FieldCache is the big culprit. We do a huge amount of faceting so
  it seems right.

 Yes, you wrote that earlier. The mystery is that the math does not check
 out with the description you have given us.

  Unfortunately I am super swamped at work so I have precious little
  time to work on this, which is what explains my silence.

 No problem, we've all been there.
 
 [Band aid: More memory]

  The extra memory helped a lot, but it still OOM with about 180 clients
  using it.

 You stated earlier that you has a solr cluster and your total(?) index
 size was 35GB, with each register being between 15k and 30k. I am
 using the quotes to signify that it is unclear what you mean. Is your
 cluster multiple machines (I'm guessing no), multiple Solr's, cores,
 shards or maybe just a single instance prepared for later distribution?
 Is a register a core, shard or a simply logical part (one client's data)
 of the index?

 If each client has their own core or shard, that would mean that each
 client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
 ~= 200MB of index. That sounds quite high and you would need a very
 heavy facet to reach that.

 If you could grep UnInverted from the Solr log file and paste the
 entries here, that would help to clarify things.


 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

 Assuming an index with 10M documents, each with 5 references to a modest
 10K unique values in a facet field, the simplified formula
   #documents*log2(#references) + #references*log2(#unique_values) bit
 tells us that this takes at least 110MB with field cache based faceting.

 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
 least double that. This fits neatly with your new heap of 64GB.


 If my guessing is correct, you can solve your memory problems very
 easily by sharing _all_ the facet fields between your clients.
 This should bring your memory usage down to a few GB.

 You are probably already restricting their searches to their own data by
 filtering, so this should not influence the returned facet values and
 counts, as compared to separate fields.

 This is very similar to the thread Facets with 5000 facet fields BTW.

  Today I finally managed to set up a test core so I can begin to play
  around with docValues.

 If you are using a single index with the individual-facet-fields for
 each client approach, the DocValues will also have scaling issues, as
 the amount of values (of which the 

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk] wrote:
 I managed to get this done. The facet queries now facets on a multivalue 
 field as opposed to the dynamic field names.

 Unfortunately it doesn't seem to have done much difference, if any at all.

I am sorry to hear that.

 documents = ~1.400.000
 references 11.200.000  (we facet on two multivalue fields with each 4 values 
 on average, so 1.400.000 * 2 * 4 = 11.200.000
 unique values = 1.132.344 (total number of variant options across all clients.
 This is what we facet on)

 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field 
 (we have 4 fields)?

 I must be calculating this wrong.

No, that sounds about right. In reality you need to multiply with 3 or 4, so 
let's round to 50MB/field: 1.4M documents with 2 fields with 5M 
references/field each is not very much and should not take a lot of memory. In 
comparison, we facet on 12M documents with 166M references and do some other 
stuff (in Lucene with a different faceting implementation, but at this level it 
is equivalent to Solr's in terms of memory). Our heap is 3GB.

I am surprised about the lack of UnInverted from your logs as it is logged on 
INFO level. It should also be available from the admin interface under 
collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing you got your 
numbers from that and that the list only contains the few facets you mentioned 
previously? It might be wise to sanity check by summing the memSizes though; 
they ought to take up far below 1GB.

From your description, your index is small and your faceting requirements 
modest. A SSD-equipped laptop should be adequate as server. So we are back to 
math does not check out.


You stated that you were unable to make a 4GB JVM OOM when you just performed 
faceting (I guesstimate that it will also run fine with just ½GB or at least 
with 1GB, based on the numbers above) and you have observed that the field 
cache eats the memory. This does indicate that the old caches are somehow not 
freed when the index is updated. That is strange as Solr should take care of 
that automatically.

Guessing wildly: Do you issue a high frequency small updates with frequent 
commits? If you pause the indexing, does memory use fall back to the single GB 
level (You probably need to trigger a full GC to check that)? If that is the 
case, it might be a warmup problem with old warmups still running when new 
commits are triggered.

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
 I am surprised about the lack of UnInverted from your logs as it is
logged on INFO level.

Nope, no trace of it. No mention either in Logging - Level from the admin
interface.

 It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.

I never seriously looked at my fieldValueCache. It never seemed to get used:

http://screencast.com/t/YtKw7UQfU

 You stated that you were unable to make a 4GB JVM OOM when you just
performed faceting (I guesstimate that it will also run fine with just ½GB
or at least with 1GB, based on the
 numbers above) and you have observed that the field cache eats the
memory.

Yep. We still do a lot of sorting on dynamic field names, so the field
cache has a lot of entries. (9.411 entries as we speak. This is
considerably lower than before.). You mentioned in an earlier mail that
faceting on a field shared between all facet queries would bring down the
memory needed. Does the same thing go for sorting? Does those 9411 entries
duplicate data between them? If this is where all the memory is going, I
have a lot of coding to do.

 Guessing wildly: Do you issue a high frequency small updates with
frequent commits? If you pause the indexing, does memory use fall back to
the single GB level

I do commit a bit more often than i should. I get these in my log file from
time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The way I
understand this is that two searchers are being warmed at the same time and
that one will be discarded when it finishes its auto warming procedure. If
the math above is correct, I would need tens of searchers auto
warming in parallel to cause my problem. If I misunderstand how this works,
do let me know.

My indexer has a cleanup routine that deletes replay logs and other things
when it has nothing to do. This includes running a commit on the solr
server to make sure nothing is ever in a state where something is not
written to disk anywhere. In theory it can commit once every 60 seconds,
though i doubt that ever happenes. The less work the indexer has, the more
often it commits. (yes i know, its on my todo list)

Other than that, my autocommit settings look like this:

autoCommit maxTime6/maxTime maxDocs6000/maxDocs openSearcher
false/openSearcher /autoCommit

The control panel says that the warm up time of the last searcher is 5574.
Is that seconds or milliseconds?
http://screencast.com/t/d9oIbGLCFQwl

I would prefer to not turn off the indexer unless the numbers above
suggests that I really should try this. Waiting for a full GC would take a
long time. Unfortunately I don't know of a way to provoke a full GC on
command.


On Wed, Apr 17, 2013 at 11:48 AM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:

 John Nielsen [j...@mcb.dk] wrote:
  I managed to get this done. The facet queries now facets on a multivalue
 field as opposed to the dynamic field names.

  Unfortunately it doesn't seem to have done much difference, if any at
 all.

 I am sorry to hear that.

  documents = ~1.400.000
  references 11.200.000  (we facet on two multivalue fields with each 4
 values
  on average, so 1.400.000 * 2 * 4 = 11.200.000
  unique values = 1.132.344 (total number of variant options across all
 clients.
  This is what we facet on)

  1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per
 field (we have 4 fields)?

  I must be calculating this wrong.

 No, that sounds about right. In reality you need to multiply with 3 or 4,
 so let's round to 50MB/field: 1.4M documents with 2 fields with 5M
 references/field each is not very much and should not take a lot of memory.
 In comparison, we facet on 12M documents with 166M references and do some
 other stuff (in Lucene with a different faceting implementation, but at
 this level it is equivalent to Solr's in terms of memory). Our heap is 3GB.

 I am surprised about the lack of UnInverted from your logs as it is
 logged on INFO level. It should also be available from the admin interface
 under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing
 you got your numbers from that and that the list only contains the few
 facets you mentioned previously? It might be wise to sanity check by
 summing the memSizes though; they ought to take up far below 1GB.

 From your description, your index is small and your faceting requirements
 modest. A SSD-equipped laptop should be adequate as server. So we are back
 to math does not check out.


 You stated that you were unable to make a 4GB JVM OOM when you just
 performed faceting (I guesstimate that it will also run fine with just ½GB
 or at least with 1GB, based on the numbers above) and you have observed
 that the field cache eats the memory. This does indicate that the old
 caches are somehow not freed when the index is updated. That is strange as
 Solr should take care of that automatically.

 Guessing wildly: Do you issue a high frequency small updates with frequent
 commits? If you pause the 

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk]:
 I never seriously looked at my fieldValueCache. It never seemed to get used:

 http://screencast.com/t/YtKw7UQfU

That was strange. As you are using a multi-valued field with the new setup, 
they should appear there. Can you find the facet fields in any of the other 
caches?

...I hope you are not calling the facets with facet.method=enum? Could you 
paste a typical facet-enabled search request?

 Yep. We still do a lot of sorting on dynamic field names, so the field cache
 has a lot of entries. (9.411 entries as we speak. This is considerably lower
 than before.). You mentioned in an earlier mail that faceting on a field
 shared between all facet queries would bring down the memory needed.
 Does the same thing go for sorting?

More or less. Sorting stores the raw string representations (utf-8) in memory 
so the number of unique values has more to say than it does for faceting. Just 
as with faceting, a list of pointers from documents to values (1 value/document 
as we are sorting) is maintained, so the overhead is something like

#documents*log2(#unique_terms*average_term_length) + 
#unique_terms*average_term_length
(where average_term_length is in bits)

Caveat: This is with the index-wide sorting structure. I am fairly confident 
that this is what Solr uses, but I have not looked at it lately so it is 
possible that some memory-saving segment-based trickery has been implemented.

 Does those 9411 entries duplicate data between them?

Sorry, I do not know. SOLR- discusses the problems with the field cache and 
duplication of data, but I cannot infer if it is has been solved or not. I am 
not familiar with the stat breakdown of the fieldCache, but it _seems_ to me 
that there are 2 or 3 entries for each segment for each sort field. 
Guesstimating further, let's say you have 30 segments in your index. Going with 
the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100. 
Looks like you use a custom sort field for each client?

Extrapolating from 1.4M documents and 180 clients, let's say that there are 
1.4M/180/5 unique terms for each sort-field and that their average length is 
10. We thus have
1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB 
per sort field or about 4GB for all the 180 fields.

With this few unique values, the doc-value structure is by far the biggest, 
just as with facets. As opposed to the faceting structure, this is fairly close 
to the actual memory usage. Switching to a single sort field would reduce the 
memory usage from 4GB to about 55MB.

 I do commit a bit more often than i should. I get these in my log file from
 time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

So 1 active searcher and 2 warming searchers. Ignoring that one of the warming 
searchers is highly likely to finish well ahead of the other one, that means 
that your heap must hold 3 times the structures for a single searcher. With the 
old heap size of 25GB that left only 8GB for a full dataset. Subtract the 4GB 
for sorting and a similar amount for faceting and you have your OOM.

Tweaking your ingest to avoid 3 overlapping searchers will lower your memory 
requirements by 1/3. Fixing the facet  sorting logic will bring it down to 
laptop size.

 The control panel says that the warm up time of the last searcher is 5574. Is 
 that seconds or milliseconds?
 http://screencast.com/t/d9oIbGLCFQwl

milliseconds, I am fairly sure. It is much faster than I anticipated. Are you 
warming all the sort- and facet-fields?

 Waiting for a full GC would take a long time.

Until you have fixed the core memory issue, you might consider doing an 
explicit GC every night to clean up and hope that it does not occur 
automatically at daytime (or whenever your clients uses it).

 Unfortunately I don't know of a way to provoke a full GC on command.

VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin 
folder), is your friend. Just start it on the server and click on the relevant 
process.

Regards,
Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
Whopps. I made some mistakes in the previous post. 

Toke Eskildsen [t...@statsbiblioteket.dk]:

 Extrapolating from 1.4M documents and 180 clients, let's say that
 there are 1.4M/180/5 unique terms for each sort-field and that their
 average length is 10. We thus have
 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
 per sort field or about 4GB for all the 180 fields.

That would be 10 bytes and thus 80 bits. The results were correct though.

 So 1 active searcher and 2 warming searchers. Ignoring that one of
 the warming searchers is highly likely to finish well ahead of the other
 one, that means that your heap must hold 3 times the structures for
 a single searcher.

This should be taken with a grain of salt as it depends on whether or not there 
is any re-use of segments. There might be for sorting.

Apologies for any confusion,
Toke Eskildsen


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
 Our memory requirements are running amok. We have less than a quarter of
 our customers running now and even though we have allocated 25GB to the JVM
 already, we are still seeing daily OOM crashes.

Out of curiosity: Did you manage to pinpoint the memory eater in your
setup?

- Toke Eskildsen



Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
Yes and no,

The FieldCache is the big culprit. We do a huge amount of faceting so it
seems right. Unfortunately I am super swamped at work so I have precious
little time to work on this, which is what explains my silence.

Out of desperation, I added another 32G of memory to each server and
increased the JVM size to 64G from 25G. The servers are running with 96G
memory right now (this is the max amount supported by the hardware) which
leaves solr somewhat starved for memory. I am aware of the performance
implications of doing this but I have little choice.

The extra memory helped a lot, but it still OOM with about 180 clients
using it. Unfortunately I need to support at least double that. After
upgrading the RAM, I ran for almost two weeks with the same workload that
used to OOM a couple of times a day, so it doesn't look like a leak.

Today I finally managed to set up a test core so I can begin to play around
with docValues.

I actually have a couple of questions regarding docValues:
1) If I facet on multible fields and only some of those fields are using
docValues, will I still get the memory saving benefit of docValues? (one of
the facet fields use null values and will require a lot of work in our
product to fix)
2) If i just use docValues on one small core with very limited traffic at
first for testing purposes, how can I test that it is actually using the
disk for caching?

I really appreciate all the help I have received on this list so far. I do
feel confident that I will be able to solve this issue eventually.



On Mon, Apr 15, 2013 at 9:00 AM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
  Our memory requirements are running amok. We have less than a quarter of
  our customers running now and even though we have allocated 25GB to the
 JVM
  already, we are still seeing daily OOM crashes.

 Out of curiosity: Did you manage to pinpoint the memory eater in your
 setup?

 - Toke Eskildsen




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

 The FieldCache is the big culprit. We do a huge amount of faceting so
 it seems right.

Yes, you wrote that earlier. The mystery is that the math does not check
out with the description you have given us.

 Unfortunately I am super swamped at work so I have precious little
 time to work on this, which is what explains my silence.

No problem, we've all been there.
 
[Band aid: More memory]

 The extra memory helped a lot, but it still OOM with about 180 clients
 using it.

You stated earlier that you has a solr cluster and your total(?) index
size was 35GB, with each register being between 15k and 30k. I am
using the quotes to signify that it is unclear what you mean. Is your
cluster multiple machines (I'm guessing no), multiple Solr's, cores,
shards or maybe just a single instance prepared for later distribution?
Is a register a core, shard or a simply logical part (one client's data)
of the index?

If each client has their own core or shard, that would mean that each
client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
~= 200MB of index. That sounds quite high and you would need a very
heavy facet to reach that.

If you could grep UnInverted from the Solr log file and paste the
entries here, that would help to clarify things.


Another explanation for the large amount of memory presents itself if
you use a single index: If each of your clients facet on at least one
fields specific to the client (client123_persons or something like
that), then your memory usage goes through the roof.

Assuming an index with 10M documents, each with 5 references to a modest
10K unique values in a facet field, the simplified formula
  #documents*log2(#references) + #references*log2(#unique_values) bit
tells us that this takes at least 110MB with field cache based faceting.

180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
least double that. This fits neatly with your new heap of 64GB.


If my guessing is correct, you can solve your memory problems very
easily by sharing _all_ the facet fields between your clients.
This should bring your memory usage down to a few GB.

You are probably already restricting their searches to their own data by
filtering, so this should not influence the returned facet values and
counts, as compared to separate fields.

This is very similar to the thread Facets with 5000 facet fields BTW.

 Today I finally managed to set up a test core so I can begin to play
 around with docValues.

If you are using a single index with the individual-facet-fields for
each client approach, the DocValues will also have scaling issues, as
the amount of values (of which the majority will be null) will be
  #clients*#documents*#facet_fields
This means that the adding a new client will be progressively more
expensive.

On the other hand, if you use a lot of small shards, DocValues should
work for you.

Regards,
Toke Eskildsen




Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
I did a search. I have no occurrence of UnInverted in the solr logs.

 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

This is exactly how we facet right now! I will definetely rewrite the
relevant parts of our product to test this out before moving further down
the docValues path.

I will let you know as soon as I know one way or the other.


On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

  The FieldCache is the big culprit. We do a huge amount of faceting so
  it seems right.

 Yes, you wrote that earlier. The mystery is that the math does not check
 out with the description you have given us.

  Unfortunately I am super swamped at work so I have precious little
  time to work on this, which is what explains my silence.

 No problem, we've all been there.
 
 [Band aid: More memory]

  The extra memory helped a lot, but it still OOM with about 180 clients
  using it.

 You stated earlier that you has a solr cluster and your total(?) index
 size was 35GB, with each register being between 15k and 30k. I am
 using the quotes to signify that it is unclear what you mean. Is your
 cluster multiple machines (I'm guessing no), multiple Solr's, cores,
 shards or maybe just a single instance prepared for later distribution?
 Is a register a core, shard or a simply logical part (one client's data)
 of the index?

 If each client has their own core or shard, that would mean that each
 client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
 ~= 200MB of index. That sounds quite high and you would need a very
 heavy facet to reach that.

 If you could grep UnInverted from the Solr log file and paste the
 entries here, that would help to clarify things.


 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

 Assuming an index with 10M documents, each with 5 references to a modest
 10K unique values in a facet field, the simplified formula
   #documents*log2(#references) + #references*log2(#unique_values) bit
 tells us that this takes at least 110MB with field cache based faceting.

 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
 least double that. This fits neatly with your new heap of 64GB.


 If my guessing is correct, you can solve your memory problems very
 easily by sharing _all_ the facet fields between your clients.
 This should bring your memory usage down to a few GB.

 You are probably already restricting their searches to their own data by
 filtering, so this should not influence the returned facet values and
 counts, as compared to separate fields.

 This is very similar to the thread Facets with 5000 facet fields BTW.

  Today I finally managed to set up a test core so I can begin to play
  around with docValues.

 If you are using a single index with the individual-facet-fields for
 each client approach, the DocValues will also have scaling issues, as
 the amount of values (of which the majority will be null) will be
   #clients*#documents*#facet_fields
 This means that the adding a new client will be progressively more
 expensive.

 On the other hand, if you use a lot of small shards, DocValues should
 work for you.

 Regards,
 Toke Eskildsen





-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Upayavira
Might be obvious, but just in case - remember that you'll need to
re-index your content once you've added docValues to your schema, in
order to get the on-disk files to be created.

Upayavira

On Mon, Mar 25, 2013, at 03:16 PM, John Nielsen wrote:
 I apologize for the slow reply. Today has been killer. I will reply to
 everyone as soon as I get the time.
 
 I am having difficulties understanding how docValues work.
 
 Should I only add docValues to the fields that I actually use for sorting
 and faceting or on all fields?
 
 Will the docValues magic apply to the fields i activate docValues on or
 on
 the entire document when sorting/faceting on a field that has docValues
 activated?
 
 I'm not even sure which question to ask. I am struggling to understand
 this
 on a conceptual level.
 
 
 On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote:
 
  On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:
 
   Schema with DocValues attempt at solving problem:
   http://pastebin.com/Ne23NnW4
   Config: http://pastebin.com/x1qykyXW
  
 
  This schema isn't using docvalues, due to a typo in your config.
  it should not be DocValues=true but docValues=true.
 
  Are you not getting an error? Solr needs to throw exception if you
  provide invalid attributes to the field. Nothing is more frustrating
  than having a typo or something in your configuration and solr just
  ignores this, reports no error, and doesnt work the way you want.
  I'll look into this (I already intend to add these checks to analysis
  factories for the same reason).
 
  Separately, if you really want the terms data and so on to remain on
  disk, it is not enough to just enable docvalues for the field. The
  default implementation uses the heap. So if you want that, you need to
  set docValuesFormat=Disk on the fieldtype. This will keep the
  majority of the data on disk, and only some key datastructures in heap
  memory. This might have significant performance impact depending upon
  what you are doing so you need to test that.
 
 
 
 
 -- 
 Med venlig hilsen / Best regards
 
 *John Nielsen*
 Programmer
 
 
 
 *MCB A/S*
 Enghaven 15
 DK-7500 Holstebro
 
 Kundeservice: +45 9610 2824
 p...@mcb.dk
 www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-03-25 Thread John Nielsen
I apologize for the slow reply. Today has been killer. I will reply to
everyone as soon as I get the time.

I am having difficulties understanding how docValues work.

Should I only add docValues to the fields that I actually use for sorting
and faceting or on all fields?

Will the docValues magic apply to the fields i activate docValues on or on
the entire document when sorting/faceting on a field that has docValues
activated?

I'm not even sure which question to ask. I am struggling to understand this
on a conceptual level.


On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote:

 On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:

  Schema with DocValues attempt at solving problem:
  http://pastebin.com/Ne23NnW4
  Config: http://pastebin.com/x1qykyXW
 

 This schema isn't using docvalues, due to a typo in your config.
 it should not be DocValues=true but docValues=true.

 Are you not getting an error? Solr needs to throw exception if you
 provide invalid attributes to the field. Nothing is more frustrating
 than having a typo or something in your configuration and solr just
 ignores this, reports no error, and doesnt work the way you want.
 I'll look into this (I already intend to add these checks to analysis
 factories for the same reason).

 Separately, if you really want the terms data and so on to remain on
 disk, it is not enough to just enable docvalues for the field. The
 default implementation uses the heap. So if you want that, you need to
 set docValuesFormat=Disk on the fieldtype. This will keep the
 majority of the data on disk, and only some key datastructures in heap
 memory. This might have significant performance impact depending upon
 what you are doing so you need to test that.




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Solr using a ridiculous amount of memory

2013-03-24 Thread John Nielsen
Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
Just to get started, do you hit OOM quickly with a few expensive queries, or 
is it after a number of hours and lots of queries?


Does Java heap usage seem to be growing linearly as queries come in, or are 
there big spikes?


How complex/rich are your queries (e.g., how many terms, wildcards, faceted 
fields, sorting, etc.)?


As a baseline experiment, start a Solr server, see how much Java heap is 
used/available. Then do a couple of typical queries, and check the heap size 
again. Then do a couple more similar but different (to avoid query cache 
matches), and check the heap again. Maybe do that a few times to get a 
handle on the baseline memory required and whether there might be a leak of 
some sort. Do enough queries to hits all of the fields, facets, sorting, 
etc. that are likely to be encountered in one of your typical days that hits 
OOM - just not the volume of queries. The goal is to determine if there is 
something inherently memory intensive in your index/queries, or something 
relating to a leak based on total query volume.


-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:

 Schema with DocValues attempt at solving problem:
 http://pastebin.com/Ne23NnW4
 Config: http://pastebin.com/x1qykyXW


This schema isn't using docvalues, due to a typo in your config.
it should not be DocValues=true but docValues=true.

Are you not getting an error? Solr needs to throw exception if you
provide invalid attributes to the field. Nothing is more frustrating
than having a typo or something in your configuration and solr just
ignores this, reports no error, and doesnt work the way you want.
I'll look into this (I already intend to add these checks to analysis
factories for the same reason).

Separately, if you really want the terms data and so on to remain on
disk, it is not enough to just enable docvalues for the field. The
default implementation uses the heap. So if you want that, you need to
set docValuesFormat=Disk on the fieldtype. This will keep the
majority of the data on disk, and only some key datastructures in heap
memory. This might have significant performance impact depending upon
what you are doing so you need to test that.


RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
From: John Nielsen [j...@mcb.dk]:
 The index is about 35GB on disk with each register between 15k and 30k.
 (This is simply the size of a full xml reply of one register. I'm not sure
 how to measure it otherwise.)

 Our memory requirements are running amok. We have less than a quarter of
 our customers running now and even though we have allocated 25GB to the JVM
 already, we are still seeing daily OOM crashes.

That does sound a bit peculiar. I do not understand what you mean by register 
though. How many documents does your index holds?

 I can see from the memory dumps we've done that the field cache is by far
 the biggest sinner.

Do you sort on a lot of different fields?

 We do a lot of facetting. One client facets on about 50.000 docs of approx
 30k each on 5 fields. I understand that this is VERY memory intensive.

To get a rough approximation of memory usage, we need the total number of 
documents, the average number of values for each of the 5 fields for a document 
and the number of unique values in each of the 5 fields. The rule of thumb I 
use for lower ceiling is

#documents*log2(#references) + #references*log2(#unique_values) bit

If your whole index has 10M documents, which each has 100 values for each 
field, with each field having 50M unique values, then the memory requirement 
would be more than 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~= 
1.6GB for faceting on all fields. Even when we multiply that with 4 to get a 
more real-world memory requirement, it is far from the 25GB that you are 
allocating. Either you have an interestingly high number somewhere in the 
equation or something's off.

Regards,
Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk]:
 If your whole index has 10M documents, which each has 100 values
 for each field, with each field having 50M unique values, then the 
 memory requirement would be more than 
 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~=
 1.6GB for faceting on all fields.

Whoops. Missed a 0 when calculating. The case above would actually take more 
than 15GB, probably also more than the 25GB you have allocated.


Anyway, I see now in your solrconfig that your main facet fields are cat, 
manu_exact, content_type and author_s, with the 5th being maybe price, 
popularity or manufacturedate_dt?

cat seems like category (relatively few references, few uniques), content_type 
probably has a single value/item and again few uniques. No memory problem 
there, unless you have a lot of documents (100M-range). That leaves manu_exact 
and author_s. If those are freetext fields with item descriptions or similar, 
that might explain the OOM.

Could you describe the facet fields in more detail and provide us with the 
total document count?


Quick sanity check: If you are using a Linux server, could you please verify 
that your virtual memory is set to unlimited with 'ulimit -v'?

Regards,
Toke Eskildsen


Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
A step I meant to include was that after you warm Solr with a 
representative collection of queries that references all of the fields, 
facets, sorting, etc. that your daily load will reference, check the Java 
heap size at that point, and then set your Java heap limit to a moderate 
level higher, like 256M, restart, and then see what happens.


The theory is that if you have too much available heap, Java will gradually 
fill it all with garbage (no leaks implied, but maybe some leaks as well), 
and then a Java GC will be an expensive hit, and sometimes a rapid flow of 
incoming requests at that point can cause Java to freak out and even hit OOM 
even though a more graceful garbage collection would eventually free up tons 
of garbage.


So, by only allowing for a moderate amount of garbage, more frequent GCs 
will be less intensive and less likely to cause weird situations.


The other part of the theory is that it is usually better to leave tons of 
memory to the OS for efficiently caching files, rather than force Java to 
manage large amounts of memory, which it typically does not do so well.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Sunday, March 24, 2013 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

Just to get started, do you hit OOM quickly with a few expensive queries, or
is it after a number of hours and lots of queries?

Does Java heap usage seem to be growing linearly as queries come in, or are
there big spikes?

How complex/rich are your queries (e.g., how many terms, wildcards, faceted
fields, sorting, etc.)?

As a baseline experiment, start a Solr server, see how much Java heap is
used/available. Then do a couple of typical queries, and check the heap size
again. Then do a couple more similar but different (to avoid query cache
matches), and check the heap again. Maybe do that a few times to get a
handle on the baseline memory required and whether there might be a leak of
some sort. Do enough queries to hits all of the fields, facets, sorting,
etc. that are likely to be encountered in one of your typical days that hits
OOM - just not the volume of queries. The goal is to determine if there is
something inherently memory intensive in your index/queries, or something
relating to a leak based on total query volume.

-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread AnnaVak
Thanks for your solution it works for me too, I'm new with Solr but how I can
additionally fetch another fields not only field that was used for
searching? For example I have product title and image fields and I want to
get the title but also related to this title image ? How can I do this?

Thanks in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread Naresh
Hi,
You can fetch all the stored fields by passing them as part of
*fl*parameter. Go through
http://wiki.apache.org/solr/CommonQueryParameters#fl


On Thu, Jan 24, 2013 at 8:56 PM, AnnaVak anna.vakulc...@gmail.com wrote:

 Thanks for your solution it works for me too, I'm new with Solr but how I
 can
 additionally fetch another fields not only field that was used for
 searching? For example I have product title and image fields and I want to
 get the title but also related to this title image ? How can I do this?

 Thanks in advance



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards
Naresh


Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Jie Sun
unfortunately solrj is not an option here...
we will have to make a quick fix with a patch out in production.

I am still unable to make the solr (3.5) take url encoded query. again
passing non-urlencoded query string works with non-ASIIC (Chinese), but
fails return anything when sending request with urlencoded + Chinese.

any suggestion?
thanks
jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4033262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Uwe Reh

Hi Jie,

maybe there is a simple solution. When we used tomcat as servlet 
container for solr I notices similar problems. Even with the hints from 
the solr wiki about unicode and Tomcat, i wasn't able to fix this.
So we switched back to Jetty, querys like q=allfields2%3A能力 are 
reliable now.


Uwe

BTW: I have no idea for at all what these Japanese signs mean. So just 
let me append two of 31 hits in our bibliographic catalog



doc
  str name=idHEB052032124/str
  str name=raw_fullrecordalg: 5203212
001@ $0205
001A $4:13-05-97
001B $t13:12:07.000$01999:10-06-10
001D $0:99-99-99
001U $0utf8
001X $00
002@ $0Aau
003@ $0052032124
007I $0NacsisBN09679884
010@ $ajpn
011@ $a1993
013H $0z
019@ $ajp
021A $ULatn$T01$aNōryoku kaihatsu no shisutemu$hYaguchi Hajime
021A $UJpan$T01$a@能力開発のシステム$h矢口新著
028A $ULatn$T01$9165745363$8Yaguchi, Hajime
028A $UJpan$T01$d新$a矢口
033A $ULatn$T01$pTokyo$nNōryoku Kaihatsu Kōgaku Sentaa
033A $UJpan$T01$p東久留米$n能力開発工学センター
034D $a274 S.
034M $aIll.
036E $aYaguchi Hajime senshū$l2
036F $l2$9052031527$8Yaguchi Hajime senshū$x12
037B $aSysteme zur Entwicklung der Fähigkeiten
046L $aIn japan. Schr.
...
247C/01 $9102595631$8351457-2 4/457Marburg, Universität Marburg, Bibliothek 
des Japan-Zentrums (BJZ)
  /str
/doc
doc
  str name=idHEB286840723/str
  str name=raw_fullrecordalg: 28684072
001@ $03
001A $00030:04-01-12
001B $t22:29:11.000$01999:04-01-12
001C $t10:48:47.000$00030:04-01-12
001D $00030:04-01-12
001U $0utf8
001X $00
002@ $0Aau
003@ $0286840723
004A $A978-4-88319-546-6
007A $0286840723$aHEB
010@ $ajpn
011@ $a2010
021A $ULatn$T01$aShin kanzen masutā kanji nihongo nōryoku shiken ; N1$hIshii 
Reiko ...
021A $UJpan$T01$a新完全マスター漢字日本語能力試験 ; N1$h石井怜子 [ほか] 著
027A $ULatn$T01$aShin kanzen masutā kanji : nihongo nōryoku shiken ; enu ichi / 
Ishii Reiko ...
027A $UJpan$T01$a新完全マスター漢字 : 日本語能力試験 ; N1 / 石井怜子 [ほか] 著
028C $9230917593$8Ishii, Reiko
033A $ULatn$T01$pTōkyō$nSurīē nettowāku
033A $UJpan$T01$p東京$nスリーエーネットワーク
034D $aviii, 197, 21S.
034I $a26cm
044A $S4$aNihongokyōiku(Taigaikokujin)
045Z $aEI 4650
...
247C/01 $9102599157$8601220-6 30/220Frankfurt, Universität Frankfurt, 
Institut für Orientalische und Ostasiatische Philologien, Japanologie
  /str
/doc





POST query with non-ASCII to solr using httpclient wont work

2013-01-12 Thread Jie Sun
When I use HttpClient and its PostMethod to post a query with some Chinese,
solr fails returning any record, or return everything.
... ...
method = new PostMethod(solrReq);
method.getParams().setContentCharset(UTF-8);
method.setRequestHeader(Content-Type,
application/x-www-form-urlencoded; charset=UTF-8);
... ...

I used tcp dump and found out the query my application above sent is an
urlencoded query string to solr (see the q=xxx part):

../SPOST /solr/413/select HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
User-Agent: Jakarta Commons-HttpClient/3.1
Host: 172.20.73.142:8080
Content-Length: 192

q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+hl.fl=qt=standardwt=standardrows=20
17:09:55.592527 IP xxx yyy.webcache: tcp 0
... ...

I found this urlencoding is what causing solr query failing. I found this by
copying the above urlencoded query to a file and use curl command, then I
got same error, but if I replace the above query with decoded string, then
it works with solr:

curl -v -H 'Content-type:application/x-www-form-urlencoded; charset=utf-8' 
http://localhost:8080/solr/413/select --data @/tmp/chinese_query

when /tmp/chinese_query has following it works with solr:
q=type:message+AND+customer_id:413+AND+subject_zhs:能力+hl.fl=qt=standardwt=standardrows=20

But if I switched the /tmp/chinese_query  to use urlencoded string, it fails
again with same error:
q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+hl.fl=qt=standardwt=standardrows=20

So, my conclusion:
1) solr (I am using 3.5) only accept decoded query string, it fails with url
encoded query
2) httpclient will send out urlencoded string no matter what (there is no
way seems to me to make it sends out request in POST without urlencoding the
body).

am I missing something, or do you have any suggestion what I am doing wrong?
thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-12 Thread Otis Gospodnetic
Jie Sun,

Just use solrj :)

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 12, 2013 7:40 PM, Jie Sun jsun5...@yahoo.com wrote:

 When I use HttpClient and its PostMethod to post a query with some Chinese,
 solr fails returning any record, or return everything.
 ... ...
 method = new PostMethod(solrReq);
 method.getParams().setContentCharset(UTF-8);
 method.setRequestHeader(Content-Type,
 application/x-www-form-urlencoded; charset=UTF-8);
 ... ...

 I used tcp dump and found out the query my application above sent is an
 urlencoded query string to solr (see the q=xxx part):

 ../SPOST /solr/413/select HTTP/1.1
 Content-Type: application/x-www-form-urlencoded; charset=UTF-8
 Accept: */*
 User-Agent: Jakarta Commons-HttpClient/3.1
 Host: 172.20.73.142:8080
 Content-Length: 192


 q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+hl.fl=qt=standardwt=standardrows=20
 17:09:55.592527 IP xxx yyy.webcache: tcp 0
 ... ...

 I found this urlencoding is what causing solr query failing. I found this
 by
 copying the above urlencoded query to a file and use curl command, then I
 got same error, but if I replace the above query with decoded string, then
 it works with solr:

 curl -v -H 'Content-type:application/x-www-form-urlencoded; charset=utf-8'
 http://localhost:8080/solr/413/select --data @/tmp/chinese_query

 when /tmp/chinese_query has following it works with solr:

 q=type:message+AND+customer_id:413+AND+subject_zhs:能力+hl.fl=qt=standardwt=standardrows=20

 But if I switched the /tmp/chinese_query  to use urlencoded string, it
 fails
 again with same error:

 q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+hl.fl=qt=standardwt=standardrows=20

 So, my conclusion:
 1) solr (I am using 3.5) only accept decoded query string, it fails with
 url
 encoded query
 2) httpclient will send out urlencoded string no matter what (there is no
 way seems to me to make it sends out request in POST without urlencoding
 the
 body).

 am I missing something, or do you have any suggestion what I am doing
 wrong?
 thanks
 Jie



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-12 Thread Jie Sun
:-) Otis, I also looked at solrJ source code, seems exactly what I am doing
here... but I probably will do what you suggested ... thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4032973.html
Sent from the Solr - User mailing list archive at Nabble.com.


  1   2   3   >