Solr 7.4.0 - bug in JMX cache stats?

2018-09-06 Thread Bojan Šmid
Hi,

  it seems the format of cache mbeans changed with 7.4.0.  And from what I
see similar change wasn't made for other mbeans, which may mean it was
accidental and may be a bug.

  In Solr 7.3.* format was (each attribute on its own, numeric type):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  lookups java.lang.Long = 0
  hits java.lang.Long = 0
  cumulative_evictions java.lang.Long = 0
  size java.lang.Long = 0
  hitratio java.lang.Float = 0.0
  evictions java.lang.Long = 0
  cumulative_lookups java.lang.Long = 0
  cumulative_hitratio java.lang.Float = 0.0
  warmupTime java.lang.Long = 0
  inserts java.lang.Long = 0
  cumulative_inserts java.lang.Long = 0
  cumulative_hits java.lang.Long = 0


  With 7.4.0 there is a single attribute "Value" (java.lang.Object):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  Value java.lang.Object = {lookups=0, evictions=0,
cumulative_inserts=0, cumulative_hits=0, hits=0, cumulative_evictions=0,
size=0, hitratio=0.0, cumulative_lookups=0, cumulative_hitratio=0.0,
warmupTime=0, inserts=0}


  So the question is - was this intentional change or a bug?

  Thanks,

Bojan


Re: Geospatial clustering + zoom in/out help

2014-02-03 Thread Bojan Šmid
Hi David,

  I was hoping to get an answer on Geospatial topic from you :). These
links basically confirm that approach I wanted to take should work ok with
similar (or even bigger) amount of data than I plan to have. Instead of my
custom NxM division of world, I'll try existing GeoHash encoding, it may be
good enough (and will be quicker to implement).

  Thanks!

  Bojan


On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W. dsmi...@mitre.org wrote:

 Hi Bojan.

 You've got some good ideas here along the lines of some that others have
 tried.  I've through together a page on the wiki about this subject some
 time ago that I'm sure you will find interesting.  It references a relevant
 stack-overflow post, and also a presentation at DrupalCon which had a
 segment from a guy using the same approach you suggest here involving
 field-collapsing and/or stats components.  The video shows it in action.

 http://wiki.apache.org/solr/SpatialClustering

 It would be helpful for everyone if you share your experience with
 whatever you choose, once you give an approach a try.

 ~ David
 
 From: Bojan Šmid [bos...@gmail.com]
 Sent: Thursday, January 30, 2014 1:15 PM
 To: solr-user@lucene.apache.org
 Subject: Geospatial clustering + zoom in/out help

 Hi,

 I have an index with 300K docs with lat,lon. I need to cluster the docs
 based on lat,lon for display in the UI. The user then needs to be able to
 click on any cluster and zoom in (up to 11 levels deep).

 I'm using Solr 4.6 and I'm wondering how best to implement this
 efficiently?

 A bit more specific questions below.

 I need to:

 1) cluster data points at different zoom levels

 2) click on a specific cluster and zoom in

 3) be able to select a region (bounding box or polygon) and show clusters
 in the selected area

 What's the best way to implement this so that queries are fast?

 What I thought I would try, but maybe there are better ways:

 * divide the world in NxM large squares and then each of these squares into
 4 more squares, and so on - 11 levels deep

 * at index time figure out all squares (at all 11 levels) each data point
 belongs to and index that info into 11 different fields: e.g.
 id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
 zoom3=square1_62_47_33 

 * at search time, use field collapsing on zoomX field to get which docs
 belong to which square on particular level

 * calculate center point of each square (by calculating mean value of
 positions for all points in that square) using StatsComponent (facet on
 zoomX field, avg on lat and lon fields) - I would consider those squares as
 separate clusters (one square is one cluster) and center points of those
 squares as center points of clusters derived from them

 I *think* the problem with this approach is that:

 * there will be many unique fields for bigger zoom levels, which means
 field collapsing / StatsComponent maaay not work fast enough

 * clusters will not look very natural because I would have many clusters on
 each zoom level and what are real geographical clusters would be
 displayed as multiple clusters since their points would in some cases be
 dispersed into multiple squares. But that may be OK

 * a lot will depend on how the squares are calculated - linearly dividing
 360 degrees by N to get equal size squares in degrees would produce
 issues with real square sizes and counts of points in each of them


 So I'm wondering if there is a better way?

 Thanks,


   Bojan



Geospatial clustering + zoom in/out help

2014-01-30 Thread Bojan Šmid
Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.
id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
zoom3=square1_62_47_33 

* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are real geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get equal size squares in degrees would produce
issues with real square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan


SolrCloud - KeeperErrorCode = NoNode - after restart

2013-12-20 Thread Bojan Šmid
Hi,

  I have a cluster with 5 Solr nodes (4.6 release) and 5 ZKs, with around
2000 collections (each with single shard, each shard having 1 or 2
replicas), running on Tomcat. Each Solr node hosts around 1000 physical
cores.

  When starting any node, I almost always see errors like:

2013-12-19 18:45:42,454 [coreLoadExecutor-4-thread-721] ERROR
org.apache.solr.cloud.ZkController- Error getting leader from zk
org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:945)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:909)
at
org.apache.solr.cloud.ZkController.getLeader(ZkController.java:873)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:807)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:757)
at
org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:272)
at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:489)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/core6_20131120/leaders/shard1
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:264)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:261)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)

  It happens just for some cores, usually for about 10-20 of them out of
1000 on one node (each time different cores fail). These 10-20 cores are
then marked as down and they are never recovered, while other cores
work ok.

  I did check ZK, there really is no node
/collections/core_20131120/leaders/shard1, but
/collections/core_20131120/leaders exists, so it looks like shard1 was
removed (maybe during previous shutdown?).

  Also, when I stop all nodes and clear ZK state, and after that start Solr
(rolling starting nodes one by one), all nodes start properly and all cores
are properly loaded (active). But after that, first restart of any Solr
node causes issues on that node.

  Any ideas about possible cause? And shouldn't Solr maybe try to recover
from such situation?

  Thanks,

  Bojan


Aggregating data with Solr, getting group stats

2013-07-15 Thread Bojan Šmid
Hi,

  I see there are few ways in Solr which can almost be used for my use
case, but all of them appear to fall short eventually.

  Here is what I am trying to do: consider the following document structure
(there are many more fields in play, but this is enough for example):

Manufacturer
ProductType
Color
Size
Price
CountAvailableItems

  Based on user parameters (search string, some filters), I would fetch a
set of documents. What I need is to group resulting documents by different
attribute combinations (say Manufacturer + Color or ProductType + Color
+ Size or ...) and get stats (Max Price, Avg Price, Num of available
items) for those groups.

  Possible solutions in Solr:

1) StatsComponent - provides all stats I would need, but its grouping
functionality is basic - it can group on a single field (stats.field +
stats.facet) while I need field combinations. There is an issue
https://issues.apache.org/jira/browse/SOLR-2472 which tried to deal with
that, but it looks like it got stuck in the past.

2) Pivot Faceting - seems like it would provide all the grouping logic I
need and in combination with
https://issues.apache.org/jira/browse/SOLR-3583Percentiles for
facets, pivot facets, and distributed pivot facets would
bring percentiles and averages. However, I would still miss things like
Max/Min/Sum and the issue is not committed yet anyway. I would also depend
on another yet to be committed issue
https://issues.apache.org/jira/browse/SOLR-2894 for distributed support.

3) Configurable Collectors -
https://issues.apache.org/jira/browse/SOLR-4465- seems promissing, but
it allows grouping by just one field and, probably
a bigger problem, seem it was just a POC and will need overhauling before
it is anywhere near being ready for commit


  Are there any other options I missed?

  Thanks,

  Bojan


DataImportHandler - too many connections MySQL error after upgrade to Solr 1.4 release

2010-02-10 Thread Bojan Šmid
Hi all,

  I had DataImportHandler working perfectly on Solr 1.4 nightly build from
June 2009. I upgraded the Solr to 1.4 release and started getting errors:


Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
Server connection failure during transaction. Due to underlying exception:
'com.mysql.jdbc.except
ions.MySQLNonTransientConnectionException: Too many connections'.


  This is the same machine, the same setup (except new Solr) that never had
problems. The error doesn't pop-up at the beginning, DIH runs for few hours
and then breaks (after few millions of rows are processed).

  Solr is the only process using MySQL, max_connections on MySQL is set to
100, so it seems like there might exist some connection leak in DIH. Few
more informations on the setup:
  MySQL version 5.0.67
  driver: mysql-connector-java-5.0.8-bin.jar
  Java: 1.6.0_14
  connection URL parameters : autoReconnect=true, batchSize=-1
  OS : CentOS 5.2

  Did anyone else had similar problems with 1.4 release?


  Regards


conditional sorting

2009-10-02 Thread Bojan Šmid
Hi all,

I need to perform sorting of my query hits by different criterion depending
on the number of hits. For instance, if there are  10 hits, sort by
date_entered, otherwise, sort by popularity.

Does anyone know if there is a way to do that with a single query, or I'll
have to send another query with desired sort criterion after I inspect
number of hits on my client?

Thx


Re: conditional sorting

2009-10-02 Thread Bojan Šmid
I tried to simplify the problem, but the point is that I could have really
complex requirements. For instance, if in the first 5 results none are
older than one year, use sort by X, otherwise sort by Y.

So, the question is, is there a way to make Solr recognize complex
situations and apply different sorting criterion.

Bojan


On Fri, Oct 2, 2009 at 4:22 PM, Uri Boness ubon...@gmail.com wrote:

 If the threshold is only 10, why can't you always sort by popularity and if
 the result set is 10 then resort on the client side based on date_entered?

 Uri


 Bojan Šmid wrote:

 Hi all,

 I need to perform sorting of my query hits by different criterion
 depending
 on the number of hits. For instance, if there are  10 hits, sort by
 date_entered, otherwise, sort by popularity.

 Does anyone know if there is a way to do that with a single query, or I'll
 have to send another query with desired sort criterion after I inspect
 number of hits on my client?

 Thx






Re: SolrCoreAware analyzer

2009-02-27 Thread Bojan Šmid
Thanks for you suggestions.

I do need SolrCore, but I could probably live with just SolrResourceLoader,
while also creating my own FieldType (which can be ResourceLoaderAware).

Bojan


On Thu, Feb 26, 2009 at 11:48 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I am writing a custom analyzer for my field type. This analyzer would
 need
 : to use SolrResourceLoader and SolrConfig, so I want to make it
 : SolrCoreAware.

 1) Solr's support for using Analyzer instances is mainly just to make it
 easy for people who already have existing ANalyzer impls that they want to
 use -- if you're writing something new, i would suggest implementing the
 TokenizerFactory API.

 2) Do you really need access to the SolrCore, or do you just need access
 to the SolrResourceLoader?  Because there is also the ResourceLoaderAware
 API.  If you take a look at StopFilterFactory you can see an example of
 how it's used.

 FWIW: The reasons Solr doesn't support SolrCoreAware Analysis related
 plugins (TokenizerFactory and TokenFilterFactory) are:

 a. it kept the initalization a lot simpler.  currently SOlrCore knows
 about the INdexSchema, but the IndexSchema doesnt' know anythng about the
 SolrCore.
 b. it allows for more reuse of the schema related code independent of the
 rest of Solr (there was talk at one point of promoting all of the
 IndexSchema/FieldType/Token*Factory code into a Lucene-Java contrib but
 so far no one has steped up to work out the refactoring)


 -Hoss




SolrCoreAware analyzer

2009-02-26 Thread Bojan Šmid
Hello,

I am writing a custom analyzer for my field type. This analyzer would need
to use SolrResourceLoader and SolrConfig, so I want to make it
SolrCoreAware.

However, it seems that Analyzer classes aren't supposed to be used in this
way (as described in http://wiki.apache.org/solr/SolrPlugins). Is there any
way to provide my analyzer with SolrCore?

The list of valid SolrCoreAware classes is in SolrResourceLoader (line 465
on current Solr trunk), so I could create a patch (which would enable
Analyzers to get SolrCore) for my Solr instance, but I would rather avoid
making patches just for myself (just complicates maintenance).

Thanks in advance,

Bojan