Re: Best practices for Solr (how to update jar files safely)

2016-02-19 Thread Shawn Heisey
On 2/19/2016 8:47 PM, Brian Wright wrote:
> Here's the fundamental issue when talking about use of software like
> Solr in a corporate environment. As a systems engineer at Marketo, my
> best practices recommendations and justifications are based on the
> documentation provided by the project owners. If the project's docs
> state that something is feasible without any warnings, not only will
> the software engineer latch onto that documentation and want to use it
> in that way, my hands will be tied as a systems engineer when I'm
> expected to roll that architecture to production, as I have nothing to
> argue against that use case.

I've updated the "Taking Solr to Production" reference guide page with
what I feel is an appropriate caution against running multiple instances
in a typical installation.  I'd actually like to use stronger language,
but I worry that doing so will also discourage those who have a viable
use case.

This change to the reference guide is live on the Confluence wiki, but
it will most likely not appear in a released version of the guide until
the 6.0 version.  The 5.5 version of the guide has already been built
and is in the final approval phase now.

Thanks,
Shawn



Re: Best practices for Solr (how to update jar files safely)

2016-02-19 Thread Brian Wright

Hi Shawn,

Thanks for the information.

On 2/19/16 3:32 PM, Shawn Heisey wrote:

You will use fewer resources if you only run one Solr instance on each
machine.  You can still have different considerations for different
hardware with one instance -- on the servers with more resources,
configure a larger Java heap and run more indexes.
Yes, I do realize the performance across this box would run better with 
a single instance. However, jump down to the bottom comment for more detail.



If the plan is to run SolrCloud, having only one Solr instance per
physical machine will ensure that SolrCloud never places more than one
replica for a shard on the same physical host, and it will do this
without special configuration.
I will confirm the use of SolrCloud in this environment. This hasn't 
been mentioned so far, but I don't always get all information from the 
software architects immediately when I'm still in build mode.



The documentation is driven by what users ask for.  A lot of users ask
how to run multiple instances on one machine.  Your idea above would be
my preference on how to handle the documentation.  Or perhaps leave the
instructions in there, but include a strong warning indicating that one
instance will usually work better.

Each time I see somebody ask how to run multiple instances, I give them
the same advice I gave you.  It is often ignored.


Here's the fundamental issue when talking about use of software like 
Solr in a corporate environment. As a systems engineer at Marketo, my 
best practices recommendations and justifications are based on the 
documentation provided by the project owners. If the project's docs 
state that something is feasible without any warnings, not only will the 
software engineer latch onto that documentation and want to use it in 
that way, my hands will be tied as a systems engineer when I'm expected 
to roll that architecture to production, as I have nothing to argue 
against that use case.


So, here's the word of caution to project owners. Many companies work 
just like Marketo. The software team will design and build a platform 
based on what the documentation states is possible. My job (in addition 
to building a functional system) is to not only read through the install 
docs to ensure Marketo is following best practices, but also to be the 
voice of clarity when concepts are cloudy. If a doc explicitly says 
"don't do this" or "warning: not supported", I have immediate 
justification to recommend not following that path when going into 
production. However, when docs are written that state that something is 
possible without a hint of "but this is bad and here's why", my hands 
are tied when talking to management. They need hard facts and with 
reasons not to go into production with a specific design. The management 
here are fact driven and need to see reasons from the project owners / 
developers (not from me) why any given setup is not recommended.


I am personally not at all fond of doubling up services of any type when 
going into production simply for the reason that either of the two 
processes could bring down the whole box and take down both instances. 
But, that argument alone isn't strong enough to justify not doing it. 
This issue isn't limited to Solr / Java. This type of failure can happen 
with any application of any type. However, for management to take a 
design change seriously, I need technical ammo (in the form of docs that 
state major performance degradation) that I can take to them and say, 
"Hey, look at this. It says not to do this because  " and then we 
come up with an alternative design.


I should also like to note that we have a current environment, although 
much smaller, which has been running two instances per box seemingly 
without issues for several years. For me to attempt to argue that it's 
not possible or is a bad idea goes against the experience we've already 
accrued when running two Solr JVMs side-by-side. So, if there is some 
legitimate benchmarks that, for example, show side-by-side instances 
degrade each other worse than running two Xen VMs on the same systems or 
when running alone, then I have a strong reason to suggest using an 
alternative design. Unfortunately, our past use of Solr already 
justifies the use of two instances in this new replacement system.


Thanks.


Thanks,
Shawn


--
Signature

*Brian Wright*
*Sr. Systems Engineer *
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
*Email *bri...@marketo.com 
*Phone *+1.650.539.3530**
*www.marketo.com *

Marketo Logo




Re: Facet count with expand and collapse

2016-02-19 Thread Joel Bernstein
With collapse and expand the facet counts will be calculated for the
collapsed data set. You can also use the tag/exclude feature to exclude the
collapse filter query and generate facets on the uncollapsed set.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 10:44 AM, Anil  wrote:

> HI,
>
> will there be any change in the facet count in case of expand and collpase
> ? please clarify.
>
> Regards,
> Anil
>


Re: Best practices for Solr (how to update jar files safely)

2016-02-19 Thread Shawn Heisey
On 2/19/2016 3:40 PM, Brian Wright wrote:
> Without going into excessive detail on our design, I won't be able to
> sufficiently justify an answer to your question as to the why of it.
> Suffice it to say we plan to deploy this indexing for our entire
> customer base. Because of size these document collections and the way
> that they will grow over time, doubling up in machines is not feasible
> in our current infrastructure at this time. It may be justified later,
> but not today. It's less expensive to add more CPUs and RAM than
> doubling up on physical machines. Additionally, there are further
> budgetary constraints going into our international datacenters which
> prevents us from having identical clusters across the board, thus
> requiring doubling up. We're not talking about 2 or 3 machines here.
> We're talking 128 running instances of Solr with 64 clusters and many
> shards.
>

You will use fewer resources if you only run one Solr instance on each
machine.  You can still have different considerations for different
hardware with one instance -- on the servers with more resources,
configure a larger Java heap and run more indexes.

> However, that doesn't preclude the use of something like Docker or KVM
> to allow encapsulation of each Solr environment on a virtual machine
> which is hooked to a fast storage subsystem.

I started out with a Solr install using virtual machines on the free
VMWare ESXi.  Because it was impossible to remotely monitor that system,
I switched it to a Xen environment, where the hypervisor was controlled
by a full Linux installation, but the cost was still zero.  It also
seemed to perform better than VMWare.

Later I did an experiment where I set up the exact same hardware without
virtualization.  Before each host was was running several virtual
machines, each with an install of Solr.  After the change, the machine
was running one install of Solr, handling all of the same indexes that
were originally handled by the VMs.  Performance was noticeably better,
and administration got a LOT better.  One OS install, one IP address,
one TCP port, one hostname, one JVM, instead of four or five of each.

If the plan is to run SolrCloud, having only one Solr instance per
physical machine will ensure that SolrCloud never places more than one
replica for a shard on the same physical host, and it will do this
without special configuration.

> I would also suggest that if the recommendation is not to run two
> instance side-by-side, then the documentation regarding how to set
> this up should be removed and a strong statement put in its place that
> running multiple Solr instances is not a supported configuration.
> Right now, the documentation does not state this and, in fact, implies
> that it is perfectly fine to run multiple instances side by side as
> long as independent disks are used to hold the instances.

The documentation is driven by what users ask for.  A lot of users ask
how to run multiple instances on one machine.  Your idea above would be
my preference on how to handle the documentation.  Or perhaps leave the
instructions in there, but include a strong warning indicating that one
instance will usually work better.

Each time I see somebody ask how to run multiple instances, I give them
the same advice I gave you.  It is often ignored.

Thanks,
Shawn



Re: Best practices for Solr (how to update jar files safely)

2016-02-19 Thread Brian Wright

Hi Shawn,

Without going into excessive detail on our design, I won't be able to 
sufficiently justify an answer to your question as to the why of it. 
Suffice it to say we plan to deploy this indexing for our entire 
customer base. Because of size these document collections and the way 
that they will grow over time, doubling up in machines is not feasible 
in our current infrastructure at this time. It may be justified later, 
but not today. It's less expensive to add more CPUs and RAM than 
doubling up on physical machines. Additionally, there are further 
budgetary constraints going into our international datacenters which 
prevents us from having identical clusters across the board, thus 
requiring doubling up. We're not talking about 2 or 3 machines here. 
We're talking 128 running instances of Solr with 64 clusters and many 
shards.


However, that doesn't preclude the use of something like Docker or KVM 
to allow encapsulation of each Solr environment on a virtual machine 
which is hooked to a fast storage subsystem.


I would also suggest that if the recommendation is not to run two 
instance side-by-side, then the documentation regarding how to set this 
up should be removed and a strong statement put in its place that 
running multiple Solr instances is not a supported configuration. Right 
now, the documentation does not state this and, in fact, implies that it 
is perfectly fine to run multiple instances side by side as long as 
independent disks are used to hold the instances.


Note, this was not my design and I am not a fan doing this, but I'm not 
the person making this decision. I am the person who's tasked to 
implement this design choice.


Thanks.

On 2/17/16 10:19 PM, Shawn Heisey wrote:

On 2/17/2016 10:38 PM, Brian Wright wrote:

We have a new project to use Solr. Our Solr instance will use Jetty
rather than Tomcat. We plan to extend the Solr core system by adding
additional classes (jar files) to the
/opt/solr/server/solr-webapp/webapp/WEB-INF/lib directory to extend
features. We also plan to run two instances of Solr on each physical
server preferably from a single installed Solr instance. I've read the
best practices doc on running two Solr instances, and while it's
detailed about how to set up two instances, it doesn't cover our
specific use case.

Why do you want to run multiple instances on one server?  Unless you
have a REALLY good reason to have more than one instance per server,
don't do it.  One instance can handle many indexes with no problem.

The only valid reason I can think of to run more than one instance per
machine is when a single instance requires a VERY large heap.  In that
case, it *might* be better to run two instances that each have a smaller
heap, so that garbage collection times are lower.  I personally would
add more machines, rather than run multiple instances.

Generally the best way to load custom jars (and contrib components like
the dataimport handler) in Solr is to create a "lib" directory in the
solr home (where solr.xml lives) and place all extra jars there.  They
will be loaded once when Solr starts, and all cores will have access to
them.

The rest of your email was concerned with running multiple instances.
If you *REALLY* want to go against advice and do this, here's the
recommended way:

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-RunningmultipleSolrnodesperhost

It is very likely possible to run multiple instances out of the same
installation directory, but I am not sure how to do it.

Thanks,
Shawn



--
Signature

*Brian Wright*
*Sr. Systems Engineer *
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
*Email *bri...@marketo.com 
*Phone *+1.650.539.3530**
*www.marketo.com *

Marketo Logo




Re: How to boost query based on result of subquery?

2016-02-19 Thread Rajesh Hazari
Hi Ed,

Did you look into ExternalFilefield type (for ex: with name ::
 position_external_field  in your schema), which can be used to map to your
field (for ex position, hope these are not changed very often) and then use
position_external_field in your boost function.

This can be used if you can comeup with unique field values for position
field as this is application specific field,
this can be changed to something like, if these are finite.
position_5=5
position_25=25
position_55=55

for ex: =custom_function(field(query_position_external),
field(position_external))

for more info refer wiki

.

pros:
the value of this field can be refreshed with every newsearcher and
firstsearcher
using



Cons: This file has to reside in data folder of each replica,
   updating of this file will have to be some bash script.

*Please ignore if this may not work for you.*

*Rajesh**.*

On Fri, Feb 19, 2016 at 1:19 PM, Edward P  wrote:

> Hello,
>
> I am using Solr 5.4.0, one collection, multiple shards with replication.
> Sample documents:
> {
> "item_id": "30d1e667",
> "date": "2014-01-01",
> "position": "5",
> "description": "automobile license plate holder"
> }
>
> {
> "item_id": "3cf18028",
> "date": "2013-01-01",
> "position": "23",
> "description": "dinner plate"
> }
>
> {
> "item_id": "be1b2643",
> "date": "2013-06-01",
> "position": "21",
> "description": "ceramic plate"
> }
>
>
> The client sends 2 queries like this:
> (1) /select?q=item_id:30d1e667=position
> (2) /select?q=plate_position=5=custom_function($query_position,
> $position)=item_id,date,description
>
> The idea is, we have an application-specific data field "position" which we
> use to compare 2 items. The client looks up a particular item by item_id,
> gets the position data, then sends it back in the 2nd query to influence
> the ranking of items when performing a text search for "plate". Our
> custom_function is app-specific and may for example derive the boost from
> the difference of query_position and document's position.
>
> My need is: I want to combine these into one query, so the client will only
> have to send something like:
>
> /select?query_item_id=30d1e667_text=plate={… use of Solr nested
> queries, boost functions etc …}=item_id,date,description
>
> I want this to be one request so that both queries are executed against the
> same searcher (because index updates may change the position values) and so
> the details of using the "position" field are abstracted from the client.
>
> I have considered the query(subquery,default) function. This is close, but
> not exactly what I need because it returns the subquery score, not document
> values.
>
> The join query parser is also close to what I need, but I can't see how to
> use it to direct the results of a subquery into the boost function of
> another.
>
> So how can I, in a single Solr request, extract a value from the result
> document of one subquery, and pass that value into a boost function for a
> 2nd query, all using the same underlying searcher? If it's not possible
> with existing nested/sub-queries, then should I explore writing a custom
> SearchComponent, QParser, or some other plugin?
>
> thanks,
> Ed
>


Indexing Parent/Child on SolrCloud

2016-02-19 Thread naeem.tahir
Hi,

   We are implementing a solution on SolrCoud involving parent/child documents. 
Had few questions:

1. In a nested document structure does parent and child are always indexed on 
same shard?
2. Are there any limitations on number of child documents in a nested structure.
3. Any other limitations/stumbling blocks we should be aware of while working 
with nested documents on SolrCoud.
   Thanks,
    Naeem


Re: Retrieving 1000 records at a time

2016-02-19 Thread Mark Robinson
Thanks Shawn!

Best,
Mark.

On Wed, Feb 17, 2016 at 7:48 PM, Shawn Heisey  wrote:

> On 2/17/2016 3:49 PM, Mark Robinson wrote:
> > I have around 121 fields out of which 12 of them are indexed and almost
> all
> > 121 are stored.
> > Average size of a doc is 10KB.
> >
> > I was checking for start=0, rows=1000.
> > We were querying a Solr instance which was on another server and I think
> > network lag might have come into the picture also.
> >
> > I did not go for any caching as I wanted good response time in the first
> > time querying itself.
>
> Stored fields, which contain the data that is returned to the client in
> the response, are compressed on disk.  Uncompressing this data can
> contribute to the time on a slow query, but I do not think it can
> explain 30 seconds of delay.  Very large documents can be particularly
> slow to decompress, but you have indicated that each entire document is
> about 10K in size, which is not huge.
>
> It is more likely that the delay is caused by one of two things,
> possibly both:
>
> * Extremely long garbage collection pauses due to a heap that is too
> small or VERY huge (beyond 32GB) with inadequate GC tuning.
> * Not enough system memory to effectively cache the index.
>
> Some additional info that may be helpful in tracking this down further:
>
> * For each core on one machine, the size on disk of the data directory.
> * For each core, the number of documents and the number of deleted
> documents.
> * The max heap size for the Solr JVM.
> * Whether there is more than one Solr instance per server.
> * The total installed memory size in the server.
> * Whether or not the server is used for other applications.
> * What operating system the server is running.
> * Whether the index is distributed or contained in a single core.
> * Whether Solr is in SolrCloud mode or not.
> * Solr version.
>
> Thanks,
> Shawn
>
>


How to boost query based on result of subquery?

2016-02-19 Thread Edward P
Hello,

I am using Solr 5.4.0, one collection, multiple shards with replication.
Sample documents:
{
"item_id": "30d1e667",
"date": "2014-01-01",
"position": "5",
"description": "automobile license plate holder"
}

{
"item_id": "3cf18028",
"date": "2013-01-01",
"position": "23",
"description": "dinner plate"
}

{
"item_id": "be1b2643",
"date": "2013-06-01",
"position": "21",
"description": "ceramic plate"
}


The client sends 2 queries like this:
(1) /select?q=item_id:30d1e667=position
(2) /select?q=plate_position=5=custom_function($query_position,
$position)=item_id,date,description

The idea is, we have an application-specific data field "position" which we
use to compare 2 items. The client looks up a particular item by item_id,
gets the position data, then sends it back in the 2nd query to influence
the ranking of items when performing a text search for "plate". Our
custom_function is app-specific and may for example derive the boost from
the difference of query_position and document's position.

My need is: I want to combine these into one query, so the client will only
have to send something like:

/select?query_item_id=30d1e667_text=plate={… use of Solr nested
queries, boost functions etc …}=item_id,date,description

I want this to be one request so that both queries are executed against the
same searcher (because index updates may change the position values) and so
the details of using the "position" field are abstracted from the client.

I have considered the query(subquery,default) function. This is close, but
not exactly what I need because it returns the subquery score, not document
values.

The join query parser is also close to what I need, but I can't see how to
use it to direct the results of a subquery into the boost function of
another.

So how can I, in a single Solr request, extract a value from the result
document of one subquery, and pass that value into a boost function for a
2nd query, all using the same underlying searcher? If it's not possible
with existing nested/sub-queries, then should I explore writing a custom
SearchComponent, QParser, or some other plugin?

thanks,
Ed


Re: AW: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Shawn Heisey
On 2/19/2016 3:08 AM, Clemens Wyss DEV wrote:
> The logic is somewhat this:
>
> SolrClient solrClient = new HttpSolrClient( coreUrl );
> while ( got more elements to index )
> {
>   batch = create 100 SolrInputDocuments
>   solrClient.add( batch )
>  }

How much data is going into each of those SolrInputDocument objects?

If the amount of data is very small (a few kilobytes), then this sounds
like your program has a memory leak.  Can you provide more code detail? 
Ideally, you would make the entire code available by placing it on the
Internet somewhere and providing a URL.  If there's anything sensitive
in the code, like passwords or public IP addresses, feel free to redact
it, but try not to remove anything that affects how the code operates.

Thanks,
Shawn



Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
Clemens,

What i understand from your above emails that you are creating
SolrInputDocuments in a batch inside a loop which gets created in heap .
SolrJ/SolrClient doesn't have  any control on removing those objects from
heap which is controlled by Garbage Collection.  So your program may end up
in a situation where there is no more heap memory left or GC is not able to
free up memory and hits OOM because you have the loop running and creating
more & more objects.  By defaults a minimum heap memory is allocated to
java program unless you set -Xmx when you launch your Java program.

Hope that clarifies.

On Fri, Feb 19, 2016 at 12:11 PM, Clemens Wyss DEV 
wrote:

> Thanks Susheel,
> but I am having problems in and am talking about SolrJ, i.e. the
> "client-side of Solr" ...
>
> -Ursprüngliche Nachricht-
> Von: Susheel Kumar [mailto:susheel2...@gmail.com]
> Gesendet: Freitag, 19. Februar 2016 17:23
> An: solr-user@lucene.apache.org
> Betreff: Re: OutOfMemory when batchupdating from SolrJ
>
> Clemens,
>
> First allocating higher or right amount of heap memory is not a workaround
> but becomes a requirement depending on how much heap memory your Java
> program needs.
> Please read about why Solr need heap memory at
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Susheel
>
>
>
> On Fri, Feb 19, 2016 at 9:17 AM, Clemens Wyss DEV 
> wrote:
>
> > > increase heap size
> > this is a "workaround"
> >
> > Doesn't SolrClient free part of its buffer? At least documents it has
> > sent to the Solr-Server?
> >
> > -Ursprüngliche Nachricht-
> > Von: Susheel Kumar [mailto:susheel2...@gmail.com]
> > Gesendet: Freitag, 19. Februar 2016 14:42
> > An: solr-user@lucene.apache.org
> > Betreff: Re: OutOfMemory when batchupdating from SolrJ
> >
> > When you run your SolrJ Client Indexing program, can you increase heap
> > size similar below.  I guess it may be on your client side you are
> > running int OOM... or please share the exact error if below doesn't
> > work/is the issue.
> >
> >  java -Xmx4096m 
> >
> >
> > Thanks,
> >
> > Susheel
> >
> > On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV
> > 
> > wrote:
> >
> > > Guessing on ;) :
> > > must I commit after every "batch", in order to force a flushing of
> > > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream
> > > et
> > al?
> > >
> > > OTH it is propagated to NOT "commit" from a (SolrJ) client
> > >
> > > https://lucidworks.com/blog/2013/08/23/understanding-transaction-log
> > > s-
> > > softcommit-and-commit-in-sorlcloud/
> > > 'Be very careful committing from the client! In fact, don’t do it'
> > >
> > > I would not want to commit "just to flush a client side buffer" ...
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > > Gesendet: Freitag, 19. Februar 2016 11:09
> > > An: solr-user@lucene.apache.org
> > > Betreff: AW: OutOfMemory when batchupdating from SolrJ
> > >
> > > The char[] which occupies 180MB has the following "path to root"
> > >
> > > char[87690841] @ 0x7940ba658   > > name="_my_id">shopproducts#...
> > > |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > > |- value java.lang.String @ 0x79e804110   > > |boost="1.0"> > > name="_my_id">shopproducts#...
> > > |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream
> > > | @
> > > 0x77fd84680
> > > | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > > executorService for core 'fust-1-fr_CH_1' -3-thread-1
> > > | |- contentStream
> > > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream
> > > @
> > > 0x77fd846a0
> > > | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > > executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > > | |  |- [0] org.apache.solr.common.util.ContentStream[1] @
> > 0x79e802fb8
> > > | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > >
> > > And there is another byte[] with 260MB.
> > >
> > > The logic is somewhat this:
> > >
> > > SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got
> > > more elements to index ) {
> > >   batch = create 100 SolrInputDocuments
> > >   solrClient.add( batch )
> > >  }
> > >
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > > Gesendet: Freitag, 19. Februar 2016 09:07
> > > An: solr-user@lucene.apache.org
> > > Betreff: OutOfMemory when batchupdating from SolrJ
> > >
> > > Environment: Solr 5.4.1
> > >
> > > I am facing OOMs when batchupdating SolrJ. I am seeing approx
> > > 30'000(!) SolrInputDocument instances, although my batchsize is 100.
> > > I.e. I call solrClient.add( documents ) for every 100 documents only.
> > > So I'd expect to see at most 100 SolrInputDocument's in memory at
> > > any moment UNLESS

RE: Slow commits

2016-02-19 Thread Adam Neal [Extranet]
I'm out of the office now so I don't have the numbers to hand but from memory I 
think there are probably around 800-1000 fields or so. I will confirm on Monday.

If i have time over the weekend I will try and recreate the problem at home and 
see if I can post up a sample.

From: Yonik Seeley [ysee...@gmail.com]
Sent: 19 February 2016 16:25
To: solr-user@lucene.apache.org
Subject: Re: Slow commits

On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  wrote:
> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
> commits on one of my cores. The core in question is relatively small (56k 
> docs) and the issue only shows when commiting after a number of deletes, 
> commiting after additions is fine. As an example commiting after deleting 
> approximately 10% of the documents takes around 25mins. The same test on the 
> 4.10.2 instance takes around 1 second.
>
> I have done some investigation and the problem appears to be caused by having 
> dynamic fields, the core in question has a large number, performing the same 
> operation on this core with the dynamic fields removed sees a big improvement 
> on the performance with the commit taking 11 seconds (still not quite on a 
> par with 4.10.2).

Dynamic fields is a Solr schema concept, and does not translate to any
differences in Lucene.
You may be hitting something due to a large number of fields (at the
lucene level, each field name is a different field).  How many
different fields (i.e. fieldnames) do you have across the entire
index?

-Yonik

#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

#


ERROR while updating the record in sorlcoud

2016-02-19 Thread Mugeesh Husain
I am getting below error while indexing in solrlcoud, i am using implicit
router

null:org.apache.solr.common.SolrException: Error trying to proxy request for
url: http:/localhost:8984/solr/Restaurant_Restaurant_2_replica1/update
at 
org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:598)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.NoHttpResponseException: 45.33.57.46:8984 failed
to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at 
org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:565)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-while-updating-the-record-in-sorlcoud-tp4258380.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Clemens Wyss DEV
Thanks Susheel, 
but I am having problems in and am talking about SolrJ, i.e. the "client-side 
of Solr" ...

-Ursprüngliche Nachricht-
Von: Susheel Kumar [mailto:susheel2...@gmail.com] 
Gesendet: Freitag, 19. Februar 2016 17:23
An: solr-user@lucene.apache.org
Betreff: Re: OutOfMemory when batchupdating from SolrJ

Clemens,

First allocating higher or right amount of heap memory is not a workaround but 
becomes a requirement depending on how much heap memory your Java program needs.
Please read about why Solr need heap memory at 
https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Susheel



On Fri, Feb 19, 2016 at 9:17 AM, Clemens Wyss DEV 
wrote:

> > increase heap size
> this is a "workaround"
>
> Doesn't SolrClient free part of its buffer? At least documents it has 
> sent to the Solr-Server?
>
> -Ursprüngliche Nachricht-
> Von: Susheel Kumar [mailto:susheel2...@gmail.com]
> Gesendet: Freitag, 19. Februar 2016 14:42
> An: solr-user@lucene.apache.org
> Betreff: Re: OutOfMemory when batchupdating from SolrJ
>
> When you run your SolrJ Client Indexing program, can you increase heap 
> size similar below.  I guess it may be on your client side you are 
> running int OOM... or please share the exact error if below doesn't 
> work/is the issue.
>
>  java -Xmx4096m 
>
>
> Thanks,
>
> Susheel
>
> On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV 
> 
> wrote:
>
> > Guessing on ;) :
> > must I commit after every "batch", in order to force a flushing of 
> > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream 
> > et
> al?
> >
> > OTH it is propagated to NOT "commit" from a (SolrJ) client
> >
> > https://lucidworks.com/blog/2013/08/23/understanding-transaction-log
> > s-
> > softcommit-and-commit-in-sorlcloud/
> > 'Be very careful committing from the client! In fact, don’t do it'
> >
> > I would not want to commit "just to flush a client side buffer" ...
> >
> > -Ursprüngliche Nachricht-
> > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > Gesendet: Freitag, 19. Februar 2016 11:09
> > An: solr-user@lucene.apache.org
> > Betreff: AW: OutOfMemory when batchupdating from SolrJ
> >
> > The char[] which occupies 180MB has the following "path to root"
> >
> > char[87690841] @ 0x7940ba658   > name="_my_id">shopproducts#...
> > |-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
> > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > |- value java.lang.String @ 0x79e804110   > |boost="1.0"> > name="_my_id">shopproducts#...
> > |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream 
> > | @
> > 0x77fd84680
> > | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > executorService for core 'fust-1-fr_CH_1' -3-thread-1
> > | |- contentStream
> > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream 
> > @
> > 0x77fd846a0
> > | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > | |  |- [0] org.apache.solr.common.util.ContentStream[1] @
> 0x79e802fb8
> > | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
> > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> >
> > And there is another byte[] with 260MB.
> >
> > The logic is somewhat this:
> >
> > SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got 
> > more elements to index ) {
> >   batch = create 100 SolrInputDocuments
> >   solrClient.add( batch )
> >  }
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > Gesendet: Freitag, 19. Februar 2016 09:07
> > An: solr-user@lucene.apache.org
> > Betreff: OutOfMemory when batchupdating from SolrJ
> >
> > Environment: Solr 5.4.1
> >
> > I am facing OOMs when batchupdating SolrJ. I am seeing approx
> > 30'000(!) SolrInputDocument instances, although my batchsize is 100.
> > I.e. I call solrClient.add( documents ) for every 100 documents only.
> > So I'd expect to see at most 100 SolrInputDocument's in memory at 
> > any moment UNLESS
> > a) solrClient.add is "asynchronous" in its nature. Then 
> > QueryResponse would be an async-result?
> > or
> > b) SolrJ is spooling the documents in client-side
> >
> > What might be going wrong?
> >
> > Thx for your advices
> > Clemens
> >
> >
>


Boost exact search

2016-02-19 Thread Loïc Stéphan
Hello,

 

We try to boost exact search to improve relevance.

We followed this article :
http://everydaydeveloper.blogspot.fr/2012/02/solr-improve-relevancy-by-boost
ing.html and this
http://stackoverflow.com/questions/29103155/solr-exact-match-boost-over-text
-containing-the-exact-match  but it doesn’t work for us.

 

What is the best way to do this ?

 

Thanks in advance

 


cid:image001.jpg@01CDD6D4.98875830

 


--

LOIC STEPHAN
Responsable TMA

  www.w-seils.com

 


  lstep...@w-seils.com
Tel +33 (0)2 28 22 75 42

 

 



Re: Slow commits

2016-02-19 Thread Yonik Seeley
On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  wrote:
> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
> commits on one of my cores. The core in question is relatively small (56k 
> docs) and the issue only shows when commiting after a number of deletes, 
> commiting after additions is fine. As an example commiting after deleting 
> approximately 10% of the documents takes around 25mins. The same test on the 
> 4.10.2 instance takes around 1 second.
>
> I have done some investigation and the problem appears to be caused by having 
> dynamic fields, the core in question has a large number, performing the same 
> operation on this core with the dynamic fields removed sees a big improvement 
> on the performance with the commit taking 11 seconds (still not quite on a 
> par with 4.10.2).

Dynamic fields is a Solr schema concept, and does not translate to any
differences in Lucene.
You may be hitting something due to a large number of fields (at the
lucene level, each field name is a different field).  How many
different fields (i.e. fieldnames) do you have across the entire
index?

-Yonik


Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
Clemens,

First allocating higher or right amount of heap memory is not a workaround
but becomes a requirement depending on how much heap memory your Java
program needs.
Please read about why Solr need heap memory at
https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Susheel



On Fri, Feb 19, 2016 at 9:17 AM, Clemens Wyss DEV 
wrote:

> > increase heap size
> this is a "workaround"
>
> Doesn't SolrClient free part of its buffer? At least documents it has sent
> to the Solr-Server?
>
> -Ursprüngliche Nachricht-
> Von: Susheel Kumar [mailto:susheel2...@gmail.com]
> Gesendet: Freitag, 19. Februar 2016 14:42
> An: solr-user@lucene.apache.org
> Betreff: Re: OutOfMemory when batchupdating from SolrJ
>
> When you run your SolrJ Client Indexing program, can you increase heap
> size similar below.  I guess it may be on your client side you are running
> int OOM... or please share the exact error if below doesn't work/is the
> issue.
>
>  java -Xmx4096m 
>
>
> Thanks,
>
> Susheel
>
> On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV 
> wrote:
>
> > Guessing on ;) :
> > must I commit after every "batch", in order to force a flushing of
> > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream et
> al?
> >
> > OTH it is propagated to NOT "commit" from a (SolrJ) client
> >
> > https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-
> > softcommit-and-commit-in-sorlcloud/
> > 'Be very careful committing from the client! In fact, don’t do it'
> >
> > I would not want to commit "just to flush a client side buffer" ...
> >
> > -Ursprüngliche Nachricht-
> > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > Gesendet: Freitag, 19. Februar 2016 11:09
> > An: solr-user@lucene.apache.org
> > Betreff: AW: OutOfMemory when batchupdating from SolrJ
> >
> > The char[] which occupies 180MB has the following "path to root"
> >
> > char[87690841] @ 0x7940ba658   > name="_my_id">shopproducts#...
> > |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > |- value java.lang.String @ 0x79e804110   > name="_my_id">shopproducts#...
> > |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @
> > 0x77fd84680
> > | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > executorService for core 'fust-1-fr_CH_1' -3-thread-1
> > | |- contentStream
> > org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @
> > 0x77fd846a0
> > | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> > | |  |- [0] org.apache.solr.common.util.ContentStream[1] @
> 0x79e802fb8
> > | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> > |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> >
> > And there is another byte[] with 260MB.
> >
> > The logic is somewhat this:
> >
> > SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got
> > more elements to index ) {
> >   batch = create 100 SolrInputDocuments
> >   solrClient.add( batch )
> >  }
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> > Gesendet: Freitag, 19. Februar 2016 09:07
> > An: solr-user@lucene.apache.org
> > Betreff: OutOfMemory when batchupdating from SolrJ
> >
> > Environment: Solr 5.4.1
> >
> > I am facing OOMs when batchupdating SolrJ. I am seeing approx
> > 30'000(!) SolrInputDocument instances, although my batchsize is 100.
> > I.e. I call solrClient.add( documents ) for every 100 documents only.
> > So I'd expect to see at most 100 SolrInputDocument's in memory at any
> > moment UNLESS
> > a) solrClient.add is "asynchronous" in its nature. Then QueryResponse
> > would be an async-result?
> > or
> > b) SolrJ is spooling the documents in client-side
> >
> > What might be going wrong?
> >
> > Thx for your advices
> > Clemens
> >
> >
>


Re: Cannot talk to ZooKeeper - Updates are disabled.

2016-02-19 Thread Bogdan Marinescu

And is there a workaround?
That jira issue is full of Zookeeper problems. Doubt it it will be 
solved anytime soon.



On 02/19/2016 04:38 PM, Binoy Dalal wrote:

There's a JIRA ticket regarding this, and as of yet is unresolved.
https://issues.apache.org/jira/browse/SOLR-3274

On Fri, Feb 19, 2016 at 2:11 PM Bogdan Marinescu <
bogdan.marine...@awinta.com> wrote:


Hi,

  From time to time I get org.apache.solr.common.SolrException: Cannot
talk to ZooKeeper - Updates are disabled.

Most likely when sol'r receives a lot of documents. My question is, why
is this happening and how to get around it ?

Stacktrace:
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled.
  at

org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1482)
  at

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:664)
  at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
  at
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
  at

org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
  at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
  at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
  at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
  at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
  at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
  at

awinta.mdm.solr.filter.AuraSolrDispatchFilter.doFilter(AuraSolrDispatchFilter.java:58)
  at

org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
  at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
  at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
  at

org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
  at

org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
  at

org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
  at

org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
  at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at

org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
  at

org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
  at

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.eclipse.jetty.server.Server.handle(Server.java:497)
  at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
  at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
  at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
  at

org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
  at

org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
  at java.lang.Thread.run(Thread.java:745)

Thanks

Bogdan Marinescu





RE: Slow commits

2016-02-19 Thread Adam Neal [Extranet]
max heap on both instances is the same at 8gig, and only using around 1.5gig at 
the time of testing.

5.3.1 index size is 90MB 
4.10.2  index size is 125MB



From: Shawn Heisey [apa...@elyograg.org]
Sent: 19 February 2016 15:45
To: solr-user@lucene.apache.org
Subject: Re: Slow commits

On 2/19/2016 6:51 AM, Adam Neal [Extranet] wrote:
> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
> commits on one of my cores. The core in question is relatively small (56k 
> docs) and the issue only shows when commiting after a number of deletes, 
> commiting after additions is fine. As an example commiting after deleting 
> approximately 10% of the documents takes around 25mins. The same test on the 
> 4.10.2 instance takes around 1 second.

This sounds like one of two possibilities, with a strong possibility
that it's both:

* You are encountering extreme garbage collection pauses from a heap
that's too small.
* The deletes are leading to segment merges that are proceeding slowly.

What is the max heap set to on the new version?  Do you know what it was
set to on the old version?

How much disk space is used by your 56000 document index?

Thanks,
Shawn


#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

#


Re: Slow commits

2016-02-19 Thread Shawn Heisey
On 2/19/2016 6:51 AM, Adam Neal [Extranet] wrote:
> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
> commits on one of my cores. The core in question is relatively small (56k 
> docs) and the issue only shows when commiting after a number of deletes, 
> commiting after additions is fine. As an example commiting after deleting 
> approximately 10% of the documents takes around 25mins. The same test on the 
> 4.10.2 instance takes around 1 second.

This sounds like one of two possibilities, with a strong possibility
that it's both:

* You are encountering extreme garbage collection pauses from a heap
that's too small.
* The deletes are leading to segment merges that are proceeding slowly.

What is the max heap set to on the new version?  Do you know what it was
set to on the old version?

How much disk space is used by your 56000 document index?

Thanks,
Shawn



Re: Cannot talk to ZooKeeper - Updates are disabled.

2016-02-19 Thread Binoy Dalal
There's a JIRA ticket regarding this, and as of yet is unresolved.
https://issues.apache.org/jira/browse/SOLR-3274

On Fri, Feb 19, 2016 at 2:11 PM Bogdan Marinescu <
bogdan.marine...@awinta.com> wrote:

> Hi,
>
>  From time to time I get org.apache.solr.common.SolrException: Cannot
> talk to ZooKeeper - Updates are disabled.
>
> Most likely when sol'r receives a lot of documents. My question is, why
> is this happening and how to get around it ?
>
> Stacktrace:
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
> are disabled.
>  at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1482)
>  at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:664)
>  at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
>  at
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
>  at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>  at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>  at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>  at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>  at
>
> awinta.mdm.solr.filter.AuraSolrDispatchFilter.doFilter(AuraSolrDispatchFilter.java:58)
>  at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>  at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>  at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>  at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
>  at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>  at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>  at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>  at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>  at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>  at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>  at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>  at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>  at org.eclipse.jetty.server.Server.handle(Server.java:497)
>  at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>  at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>  at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>  at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>  at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>  at java.lang.Thread.run(Thread.java:745)
>
> Thanks
>
> Bogdan Marinescu
>
-- 
Regards,
Binoy Dalal


AW: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Clemens Wyss DEV
> increase heap size
this is a "workaround"

Doesn't SolrClient free part of its buffer? At least documents it has sent to 
the Solr-Server? 

-Ursprüngliche Nachricht-
Von: Susheel Kumar [mailto:susheel2...@gmail.com] 
Gesendet: Freitag, 19. Februar 2016 14:42
An: solr-user@lucene.apache.org
Betreff: Re: OutOfMemory when batchupdating from SolrJ

When you run your SolrJ Client Indexing program, can you increase heap size 
similar below.  I guess it may be on your client side you are running int 
OOM... or please share the exact error if below doesn't work/is the issue.

 java -Xmx4096m 


Thanks,

Susheel

On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV 
wrote:

> Guessing on ;) :
> must I commit after every "batch", in order to force a flushing of 
> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream et al?
>
> OTH it is propagated to NOT "commit" from a (SolrJ) client
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-
> softcommit-and-commit-in-sorlcloud/
> 'Be very careful committing from the client! In fact, don’t do it'
>
> I would not want to commit "just to flush a client side buffer" ...
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 19. Februar 2016 11:09
> An: solr-user@lucene.apache.org
> Betreff: AW: OutOfMemory when batchupdating from SolrJ
>
> The char[] which occupies 180MB has the following "path to root"
>
> char[87690841] @ 0x7940ba658   name="_my_id">shopproducts#...
> |-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
> |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> |- value java.lang.String @ 0x79e804110   name="_my_id">shopproducts#...
> |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @
> 0x77fd84680
> | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> executorService for core 'fust-1-fr_CH_1' -3-thread-1
> | |- contentStream
> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @
> 0x77fd846a0
> | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> | |  |- [0] org.apache.solr.common.util.ContentStream[1] @ 0x79e802fb8
> | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
> |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
>
> And there is another byte[] with 260MB.
>
> The logic is somewhat this:
>
> SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got 
> more elements to index ) {
>   batch = create 100 SolrInputDocuments
>   solrClient.add( batch )
>  }
>
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 19. Februar 2016 09:07
> An: solr-user@lucene.apache.org
> Betreff: OutOfMemory when batchupdating from SolrJ
>
> Environment: Solr 5.4.1
>
> I am facing OOMs when batchupdating SolrJ. I am seeing approx 
> 30'000(!) SolrInputDocument instances, although my batchsize is 100. 
> I.e. I call solrClient.add( documents ) for every 100 documents only. 
> So I'd expect to see at most 100 SolrInputDocument's in memory at any 
> moment UNLESS
> a) solrClient.add is "asynchronous" in its nature. Then QueryResponse 
> would be an async-result?
> or
> b) SolrJ is spooling the documents in client-side
>
> What might be going wrong?
>
> Thx for your advices
> Clemens
>
>


RE: Slow commits

2016-02-19 Thread Adam Neal [Extranet]
Just some additional information, the problem is mainly when the dynamic fields 
are stored. Just having them indexed reduces the commit time to around 20 
seconds. Unfortunately I need them stored.

From: Adam Neal [Extranet] [an...@mass.co.uk]
Sent: 19 February 2016 13:51
To: solr-user@lucene.apache.org
Subject: Slow commits


I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
commits on one of my cores. The core in question is relatively small (56k docs) 
and the issue only shows when commiting after a number of deletes, commiting 
after additions is fine. As an example commiting after deleting approximately 
10% of the documents takes around 25mins. The same test on the 4.10.2 instance 
takes around 1 second.

I have done some investigation and the problem appears to be caused by having 
dynamic fields, the core in question has a large number, performing the same 
operation on this core with the dynamic fields removed sees a big improvement 
on the performance with the commit taking 11 seconds (still not quite on a par 
with 4.10.2).

At the moment for this core it is quicker in 5.3.1 to delete everything and 
reindex the data than it is to delete old documents.

Are there any changes that may have caused this or anything I should be doing 
differently in 5.3.1?

I intend to download 5.4.1 next week to see if that improves things.

Thanks

Adam


#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

#


Slow commits

2016-02-19 Thread Adam Neal [Extranet]
I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
commits on one of my cores. The core in question is relatively small (56k docs) 
and the issue only shows when commiting after a number of deletes, commiting 
after additions is fine. As an example commiting after deleting approximately 
10% of the documents takes around 25mins. The same test on the 4.10.2 instance 
takes around 1 second.

I have done some investigation and the problem appears to be caused by having 
dynamic fields, the core in question has a large number, performing the same 
operation on this core with the dynamic fields removed sees a big improvement 
on the performance with the commit taking 11 seconds (still not quite on a par 
with 4.10.2).

At the moment for this core it is quicker in 5.3.1 to delete everything and 
reindex the data than it is to delete old documents.

Are there any changes that may have caused this or anything I should be doing 
differently in 5.3.1?

I intend to download 5.4.1 next week to see if that improves things.

Thanks

Adam

#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

#


Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
And if it is on Solr side, please increase the heap size on Solr side
https://cwiki.apache.org/confluence/display/solr/JVM+Settings

On Fri, Feb 19, 2016 at 8:42 AM, Susheel Kumar 
wrote:

> When you run your SolrJ Client Indexing program, can you increase heap
> size similar below.  I guess it may be on your client side you are running
> int OOM... or please share the exact error if below doesn't work/is the
> issue.
>
>  java -Xmx4096m 
>
>
> Thanks,
>
> Susheel
>
> On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV 
> wrote:
>
>> Guessing on ;) :
>> must I commit after every "batch", in order to force a flushing of
>> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream et al?
>>
>> OTH it is propagated to NOT "commit" from a (SolrJ) client
>>
>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> 'Be very careful committing from the client! In fact, don’t do it'
>>
>> I would not want to commit "just to flush a client side buffer" ...
>>
>> -Ursprüngliche Nachricht-
>> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
>> Gesendet: Freitag, 19. Februar 2016 11:09
>> An: solr-user@lucene.apache.org
>> Betreff: AW: OutOfMemory when batchupdating from SolrJ
>>
>> The char[] which occupies 180MB has the following "path to root"
>>
>> char[87690841] @ 0x7940ba658  > name="_my_id">shopproducts#...
>> |-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService
>> |for core 'fust-1-fr_CH_1' -3-thread-1 Thread
>> |- value java.lang.String @ 0x79e804110  > name="_my_id">shopproducts#...
>> |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @
>> 0x77fd84680
>> | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
>> executorService for core 'fust-1-fr_CH_1' -3-thread-1
>> | |- contentStream
>> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @
>> 0x77fd846a0
>> | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
>> executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
>> | |  |- [0] org.apache.solr.common.util.ContentStream[1] @ 0x79e802fb8
>> | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil
>> |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
>>
>> And there is another byte[] with 260MB.
>>
>> The logic is somewhat this:
>>
>> SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got more
>> elements to index ) {
>>   batch = create 100 SolrInputDocuments
>>   solrClient.add( batch )
>>  }
>>
>>
>> -Ursprüngliche Nachricht-
>> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
>> Gesendet: Freitag, 19. Februar 2016 09:07
>> An: solr-user@lucene.apache.org
>> Betreff: OutOfMemory when batchupdating from SolrJ
>>
>> Environment: Solr 5.4.1
>>
>> I am facing OOMs when batchupdating SolrJ. I am seeing approx 30'000(!)
>> SolrInputDocument instances, although my batchsize is 100. I.e. I call
>> solrClient.add( documents ) for every 100 documents only. So I'd expect to
>> see at most 100 SolrInputDocument's in memory at any moment UNLESS
>> a) solrClient.add is "asynchronous" in its nature. Then QueryResponse
>> would be an async-result?
>> or
>> b) SolrJ is spooling the documents in client-side
>>
>> What might be going wrong?
>>
>> Thx for your advices
>> Clemens
>>
>>
>


Re: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Susheel Kumar
When you run your SolrJ Client Indexing program, can you increase heap size
similar below.  I guess it may be on your client side you are running int
OOM... or please share the exact error if below doesn't work/is the issue.

 java -Xmx4096m 


Thanks,

Susheel

On Fri, Feb 19, 2016 at 6:25 AM, Clemens Wyss DEV 
wrote:

> Guessing on ;) :
> must I commit after every "batch", in order to force a flushing of
> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream et al?
>
> OTH it is propagated to NOT "commit" from a (SolrJ) client
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 'Be very careful committing from the client! In fact, don’t do it'
>
> I would not want to commit "just to flush a client side buffer" ...
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 19. Februar 2016 11:09
> An: solr-user@lucene.apache.org
> Betreff: AW: OutOfMemory when batchupdating from SolrJ
>
> The char[] which occupies 180MB has the following "path to root"
>
> char[87690841] @ 0x7940ba658   name="_my_id">shopproducts#...
> |-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService
> |for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> |- value java.lang.String @ 0x79e804110   name="_my_id">shopproducts#...
> |  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @
> 0x77fd84680
> | |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> executorService for core 'fust-1-fr_CH_1' -3-thread-1
> | |- contentStream
> org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @
> 0x77fd846a0
> | |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
> | |  |- [0] org.apache.solr.common.util.ContentStream[1] @ 0x79e802fb8
> | |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil
> |executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
>
> And there is another byte[] with 260MB.
>
> The logic is somewhat this:
>
> SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got more
> elements to index ) {
>   batch = create 100 SolrInputDocuments
>   solrClient.add( batch )
>  }
>
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 19. Februar 2016 09:07
> An: solr-user@lucene.apache.org
> Betreff: OutOfMemory when batchupdating from SolrJ
>
> Environment: Solr 5.4.1
>
> I am facing OOMs when batchupdating SolrJ. I am seeing approx 30'000(!)
> SolrInputDocument instances, although my batchsize is 100. I.e. I call
> solrClient.add( documents ) for every 100 documents only. So I'd expect to
> see at most 100 SolrInputDocument's in memory at any moment UNLESS
> a) solrClient.add is "asynchronous" in its nature. Then QueryResponse
> would be an async-result?
> or
> b) SolrJ is spooling the documents in client-side
>
> What might be going wrong?
>
> Thx for your advices
> Clemens
>
>


AW: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Clemens Wyss DEV
Guessing on ;) :
must I commit after every "batch", in order to force a flushing of 
org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream et al? 

OTH it is propagated to NOT "commit" from a (SolrJ) client 
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
'Be very careful committing from the client! In fact, don’t do it'

I would not want to commit "just to flush a client side buffer" ...

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 19. Februar 2016 11:09
An: solr-user@lucene.apache.org
Betreff: AW: OutOfMemory when batchupdating from SolrJ

The char[] which occupies 180MB has the following "path to root"

char[87690841] @ 0x7940ba658  shopproducts#...
|-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService 
|for core 'fust-1-fr_CH_1' -3-thread-1 Thread
|- value java.lang.String @ 0x79e804110  shopproducts#...
|  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @ 
0x77fd84680
| |-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService 
for core 'fust-1-fr_CH_1' -3-thread-1
| |- contentStream 
org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @ 
0x77fd846a0
| |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
| |  |- [0] org.apache.solr.common.util.ContentStream[1] @ 0x79e802fb8
| |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
|executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread

And there is another byte[] with 260MB.

The logic is somewhat this:

SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got more 
elements to index ) {
  batch = create 100 SolrInputDocuments
  solrClient.add( batch )
 }


-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
Gesendet: Freitag, 19. Februar 2016 09:07
An: solr-user@lucene.apache.org
Betreff: OutOfMemory when batchupdating from SolrJ

Environment: Solr 5.4.1

I am facing OOMs when batchupdating SolrJ. I am seeing approx 30'000(!) 
SolrInputDocument instances, although my batchsize is 100. I.e. I call 
solrClient.add( documents ) for every 100 documents only. So I'd expect to see 
at most 100 SolrInputDocument's in memory at any moment UNLESS
a) solrClient.add is "asynchronous" in its nature. Then QueryResponse would be 
an async-result? 
or
b) SolrJ is spooling the documents in client-side

What might be going wrong?

Thx for your advices
Clemens



Re: SOLR ranking

2016-02-19 Thread Alessandro Benedetti
Ok Binoy, now it is clearer :)
Yes, if add sorting and faceting as additional optional requirements, doing
2 queries could be a perilous path !

Cheers

On 19 February 2016 at 09:24, Ere Maijala  wrote:

> If he needs faceting or something (I didn't see that specified), doing two
> queries won't do, of course..
>
> --Ere
>
>
> 19.2.2016, 2.22, Binoy Dalal kirjoitti:
>
>> Hi Alessandro,
>> Don't get me wrong. Using mm, ps and pf can and absolutely will solve his
>> problem.
>>
>> Like I said above, my solution is meant to be a quick and dirty fix. It's
>> really not that complex and shouldn't take more than an hour to setup at
>> the app level. Moreover I suggested it because he said it was urgent for
>> him and setting up a proper config with mm, pf and ps might take him much
>> longer.
>>
>> Hope this clears things up :)
>>
>> On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti 
>> wrote:
>>
>> Hey Binoi ,
>>> can't understand why such complexity to be honest :/
>>> Can you explain me why playing with :
>>>
>>> edismax
>>> mm ( percentage of query terms you want to be in the results)
>>> pf ( the fields you want to be boosted if phrase matches )
>>> ps ( slop to allow)
>>>
>>> Should not solve the problem instead of the 2 phases query ?
>>>
>>> Cheers
>>>
>>> On 18 February 2016 at 18:09, Binoy Dalal 
>>> wrote:
>>>
>>> Here's an alternative solution that may be of some help.
 Here I'm assuming that you are not directly outputting the search
 results
 to the user and have some sort of layer between the results from solr
 and
 presentation to the user where some additional processing can be

>>> performed.
>>>

 1) You already know that you want phrase matches to show up higher than
 single matches. In this case, why not do an explicit phrase match first,
 with some slop or as is based on how close you want the phrase terms be

>>> to
>>>
 each other.
 2) Once you have the results from the first query, fire an OR query with
 your terms and get those results.
 3) Put results from (2) after (1) and present to the user. This happens

>>> in
>>>
 the app layer.

 This is essentially the same as running a query as such: "Rheumatoid
 Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to
 worry
 about the ordering because you're sorting your results.

 Now, this will obviously take more time since you're querying twice and
 then doing the addtional processing in the app layer, but provided your
 architecture is balanced enough and can cope with a little extra load, I

>>> do
>>>
 not think that your performance will take that bad a hit. Moreover since
 you're in a hurry, you could implement this as a quick and dirty
 solution
 to meet the project goals, provided it fits the acceptance parameters
 and
 then later play around with the scoring/sorting and figure out the best
 possible setup to suit your needs.

 On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
 emir.arnauto...@sematext.com> wrote:

 Hi Nitin,
> Can you send us how your parsed query looks like (from debug output).
>
> Thanks,
> Emir
>
> On 17.02.2016 08:38, Nitin.K wrote:
>
>> Hi Binoy,
>>
>> We are searching for both phrases and individual words
>> but we want that only those documents which are having phrases will
>>
> come

> first in the order and then the individual app.
>>
>> termPositions = true is also not working in my case.
>>
>> I have also removed the string type from copy fields. kindly look
>>
> into
>>>
 the
>
>> changed configuration below:
>>
>> Hi Emir,
>>
>> I have changed the cofiguration as per your suggestion, added pf2 /
>>
> pf3.

> Yes, i saw the difference but still the ranking is not getting
>>
> followed
>>>
 correctly in case of phrases.
>>
>> Changed configuration;
>>
>> >
> stored="true"
>
>> />
>> >
> stored="false"
>>>
 />
>
>>
>> > stored="true"/>
>> >
> stored="false"/>
>
>>
>> > multiValued="true"/>
>> >
> stored="false"
>>>
 multiValued="true"/>
>>
>> > multiValued="true"/>
>> >
> stored="false"

> multiValued="true"/>
>>
>> >
> stored="false"/>

>
>> Copy fields again for the reference :
>>
>> 
>> 
>> 
>> 
>> 
>>
>> Added following field type:
>>
>> > positionIncrementGap="100" omitNorms="true">
>>
>>
>>>
> ignoreCase="true"
>>>
 words="stopwords.txt" />
>>
>>
>> 
>>
>> Removed the string type from the copy fields.
>>

AW: OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Clemens Wyss DEV
The char[] which occupies 180MB has the following "path to root"

char[87690841] @ 0x7940ba658  shopproducts#...
|-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService for 
core 'fust-1-fr_CH_1' -3-thread-1 Thread
|- value java.lang.String @ 0x79e804110  shopproducts#...
|  '- str org.apache.solr.common.util.ContentStreamBase$StringStream @ 
0x77fd84680
| |-  java.lang.Thread @ 0x7321d9b80  SolrUtil executorService 
for core 'fust-1-fr_CH_1' -3-thread-1
| |- contentStream 
org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream @ 
0x77fd846a0
| |  |-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread
| |  |- [0] org.apache.solr.common.util.ContentStream[1] @ 0x79e802fb8
| |  |  '-  java.lang.Thread @ 0x7321d9b80  SolrUtil 
executorService for core 'fust-1-fr_CH_1' -3-thread-1 Thread

And there is another byte[] with 260MB.

The logic is somewhat this:

SolrClient solrClient = new HttpSolrClient( coreUrl );
while ( got more elements to index )
{
  batch = create 100 SolrInputDocuments
  solrClient.add( batch )
 }


-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 19. Februar 2016 09:07
An: solr-user@lucene.apache.org
Betreff: OutOfMemory when batchupdating from SolrJ

Environment: Solr 5.4.1

I am facing OOMs when batchupdating SolrJ. I am seeing approx 30'000(!) 
SolrInputDocument instances, although my batchsize is 100. I.e. I call 
solrClient.add( documents ) for every 100 documents only. So I'd expect to see 
at most 100 SolrInputDocument's in memory at any moment UNLESS 
a) solrClient.add is "asynchronous" in its nature. Then QueryResponse would be 
an async-result? 
or 
b) SolrJ is spooling the documents in client-side

What might be going wrong?

Thx for your advices
Clemens



Re: SOLR ranking

2016-02-19 Thread Ere Maijala
If he needs faceting or something (I didn't see that specified), doing 
two queries won't do, of course..


--Ere

19.2.2016, 2.22, Binoy Dalal kirjoitti:

Hi Alessandro,
Don't get me wrong. Using mm, ps and pf can and absolutely will solve his
problem.

Like I said above, my solution is meant to be a quick and dirty fix. It's
really not that complex and shouldn't take more than an hour to setup at
the app level. Moreover I suggested it because he said it was urgent for
him and setting up a proper config with mm, pf and ps might take him much
longer.

Hope this clears things up :)

On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti 
wrote:


Hey Binoi ,
can't understand why such complexity to be honest :/
Can you explain me why playing with :

edismax
mm ( percentage of query terms you want to be in the results)
pf ( the fields you want to be boosted if phrase matches )
ps ( slop to allow)

Should not solve the problem instead of the 2 phases query ?

Cheers

On 18 February 2016 at 18:09, Binoy Dalal  wrote:


Here's an alternative solution that may be of some help.
Here I'm assuming that you are not directly outputting the search results
to the user and have some sort of layer between the results from solr and
presentation to the user where some additional processing can be

performed.


1) You already know that you want phrase matches to show up higher than
single matches. In this case, why not do an explicit phrase match first,
with some slop or as is based on how close you want the phrase terms be

to

each other.
2) Once you have the results from the first query, fire an OR query with
your terms and get those results.
3) Put results from (2) after (1) and present to the user. This happens

in

the app layer.

This is essentially the same as running a query as such: "Rheumatoid
Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
about the ordering because you're sorting your results.

Now, this will obviously take more time since you're querying twice and
then doing the addtional processing in the app layer, but provided your
architecture is balanced enough and can cope with a little extra load, I

do

not think that your performance will take that bad a hit. Moreover since
you're in a hurry, you could implement this as a quick and dirty solution
to meet the project goals, provided it fits the acceptance parameters and
then later play around with the scoring/sorting and figure out the best
possible setup to suit your needs.

On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Nitin,
Can you send us how your parsed query looks like (from debug output).

Thanks,
Emir

On 17.02.2016 08:38, Nitin.K wrote:

Hi Binoy,

We are searching for both phrases and individual words
but we want that only those documents which are having phrases will

come

first in the order and then the individual app.

termPositions = true is also not working in my case.

I have also removed the string type from copy fields. kindly look

into

the

changed configuration below:

Hi Emir,

I have changed the cofiguration as per your suggestion, added pf2 /

pf3.

Yes, i saw the difference but still the ranking is not getting

followed

correctly in case of phrases.

Changed configuration;


stored="true"

/>

stored="false"

/>




stored="false"/>




stored="false"

multiValued="true"/>



stored="false"

multiValued="true"/>


stored="false"/>


Copy fields again for the reference :







Added following field type:


   
   
   
ignoreCase="true"

words="stopwords.txt" />
   
   


Removed the string type from the copy fields.

Changed Query :







http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;

pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3

After making these changes, I am able to get my search results

correctly

for

a single term but in case of phrase search, i am still not able to

get

the

results in the correct order.

Hi Modassar,

I tried using mm=100, but the order is still the same.

Hi Alessandro,

I have not yet tried the slope parameter. By default it is taking it

as

1.0

when i looked it in debug mode. Will revert you definitely. So, let

me

try

this option too.

All,

Please suggest if anyone is having any other suggestion on this. I

have

to

implement it on urgent basis and i think i am very close to it.

Thanks

all

of you. I have reached to this level just because of you guys.

Thanks and Regards,
Nitin



--
View this message in context:

http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hitting complex multilevel pivot queries in solr

2016-02-19 Thread Alvaro Cabrerizo
Hi,

The only way I can imagine is to create that auxiliar field for performing
the facet on it. It means that you have to know "a priori" the kind of
report (facet field) you need.

For example if you current data (solrdocument) is:

{
   "id": 3757,
   "country": "CountryX",
   "state": "StateY",
   "part_num: "part_numZ",
   "part_code": "part_codeW"
}

It should be changed at index time to:

{
   "id": 3757,
   "country": "CountryX",
   "state": "StateY",
   "part_num: "part_numZ",
   "part_code": "part_codeW",
   "auxField": "CountryX StateY part_numZ part_codeW"
}

And then perform the query faceting by auxField.


Regards.

On Fri, Feb 19, 2016 at 1:15 AM, Lewin Joy (TMS) 
wrote:

> Hi,
>
> The fields are single valued. But, the requirement will be at query time
> rather than index time. This is because, we will be having many such
> scenarios with different fields.
> I hoped we could concatenate at query time. I just need top 100 counts
> from the leaf level of the pivot.
> I'm also looking at facet.threads which could give responses to an extent.
> But It does not solve my issue.
>
> Hovewer, the Endeca equivalent of this application seems to be working
> well.
> Example Endeca Query:
>
> RETURN Results as SELECT Count(1) as "Total" GROUP BY "Country", "State",
> "part_num", "part_code" ORDER BY "Total" desc PAGE(0,100)
>
>
> -Lewin
>
>
> -Original Message-
> From: Alvaro Cabrerizo [mailto:topor...@gmail.com]
> Sent: Thursday, February 18, 2016 3:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Hitting complex multilevel pivot queries in solr
>
> Hi,
>
> The idea of copying fields into a new one (or various) during indexing and
> then facet the new field (or fields) looks promising. More information
> about data will be helpful (for example if the fields:country, state.. are
> single or multivalued). For example if all of the fields are single valued,
> then the combination of country,state,part_num,part_code looks like a file
> path country/state/part_num/part_code and maybe (don't know your business
> rules), the solr.PathHierarchyTokenizerFactory
>  could be
> an option to research instead of facet pivoting. On the other hand, I don't
> think that the copy field <
> https://cwiki.apache.org/confluence/display/solr/Copying+Fields> feature
> can help you to build that auxiliary field. I think that configuring an
> updateRequestProcessorChain <
> https://wiki.apache.org/solr/UpdateRequestProcessor>and building your own
> UpdateRequestProcessorFactory to concat the
> country,state,part_num,part_code values can be better way.
>
> Hope it helps.
>
> On Thu, Feb 18, 2016 at 8:47 PM, Lewin Joy (TMS) 
> wrote:
>
> > Still splitting my head over this one.
> > Let me know if anyone has any idea I could try.
> >
> > Or, is there a way to concatenate these 4 fields onto a dynamic field
> > and do a facet.field on top of this one?
> >
> > Thanks. Any idea is helpful to try.
> >
> > -Lewin
> >
> > -Original Message-
> > From: Lewin Joy (TMS) [mailto:lewin@toyota.com]
> > Sent: Wednesday, February 17, 2016 4:29 PM
> > To: solr-user@lucene.apache.org
> > Subject: Hitting complex multilevel pivot queries in solr
> >
> > Hi,
> >
> > Is there an efficient way to hit solr for complex time consuming queries?
> > I have a requirement where I need to pivot on 4 fields. Two fields
> > contain facet values close to 50. And the other 2 fields have 5000 and
> 8000 values.
> > Pivoting on the 4 fields would crash the server.
> >
> > Is there a better way to get the data?
> >
> > Example Query Params looks like this:
> > =country,state,part_num,part_code
> >
> > Thanks,
> > Lewin
> >
> >
> >
> >
>


Cannot talk to ZooKeeper - Updates are disabled.

2016-02-19 Thread Bogdan Marinescu

Hi,

From time to time I get org.apache.solr.common.SolrException: Cannot 
talk to ZooKeeper - Updates are disabled.


Most likely when sol'r receives a lot of documents. My question is, why 
is this happening and how to get around it ?


Stacktrace:
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1482)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:664)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
at 
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
awinta.mdm.solr.filter.AuraSolrDispatchFilter.doFilter(AuraSolrDispatchFilter.java:58)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:497)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)

Thanks

Bogdan Marinescu


OutOfMemory when batchupdating from SolrJ

2016-02-19 Thread Clemens Wyss DEV
Environment: Solr 5.4.1

I am facing OOMs when batchupdating SolrJ. I am seeing approx 30'000(!) 
SolrInputDocument instances, although my batchsize is 100. I.e. I call 
solrClient.add( documents ) for every 100 documents only. So I'd expect to see 
at most 100 SolrInputDocument's in memory at any moment UNLESS 
a) solrClient.add is "asynchronous" in its nature. Then QueryResponse would be 
an async-result? 
or 
b) SolrJ is spooling the documents in client-side

What might be going wrong?

Thx for your advices
Clemens