Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Dmitry Kan
Do you seen any (a lot?) of the warming searchers on deck, i.e. value for N:

PERFORMANCE WARNING: Overlapping onDeckSearchers=N

On Wed, May 6, 2015 at 10:58 AM, adfel70 adfe...@gmail.com wrote:

 Hello
 I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
 documents.
 it currently has 3 billion documents overall (parent and children).
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr
 processes.
 each process has 28GB heap.  each machine has 196GB RAM.

 I perform periodic indexing throughout the day. each indexing cycle adds
 around 1.5 million docs. I keep the indexing load light - 2 processes with
 bulks of 20 docs.

 My use case demands that each indexing cycle will be visible only when the
 whole cycle finishes.

 I tried various methods of using soft and hard commits:

 1. using auto hard commit with time=10secs (opensearcher=false) and an
 explicit soft commit when the indexing finishes.
 2. using auto soft commit with time=10/30/60secs during the indexing.
 3. not using soft commit at all, just using auto hard commit with
 time=10secs during the indexing (opensearcher=false) and an explicit hard
 commit with opensearcher=true when the cycle finishes.


 with all methods I encounter pretty much the same problem:
 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)
 2. if indexing cycles come too often, which causes softcommits or
 hardcommits(opensearcher=true) occur with a small interval one after
 another
 (around 5-10minutes), I start getting many OOM exceptions.


 Thank you.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Toke Eskildsen
On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
 each process has 28GB heap.  each machine has 196GB RAM.

[...]

 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)

Sanity check: Are you sure the pauses are due to garbage collection?

You have a fairly large heap and judging from your previous post
problem with facets  - out of memory exception, you are doing
non-trivial faceting. Are you using DocValues, as Marc suggested?


- Toke Eskildsen, State and University Library, Denmark




What is the best practice to Backup and delete a core from SOLR Master-Slave architecture

2015-05-06 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. I 
am trying to identify what is the best way to backup an old core and delete the 
same so as to free up space from the disk.

I did get the information on how to unload a core and delete the indexes from 
the core.

Unloading - http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0
Delete Indexes - 
http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0deleteIndex=true

What is the best approach to remove the old core ?


*   Approach 1

o   Unload the core in both Master and Slave server AND delete the index only 
from Master server (retain the indexes in slave server as a backup). If I am 
retaining the indexes in Slave server, at later point is there a way to bring 
those to Master Server ?

*   Approach 2

o   Unload and delete the indexes from both Master and Slave server. Before 
deleting, take a backup of the data dir of old core from File system. I am not 
sure if this is even possible ?

Is there any other way better way of doing this ? Please let me know

Thanks
Sangeetha


Re: SolrCloud collection properties

2015-05-06 Thread Markus Heiden
We are currently having many custom properties defined in the
core.properties which are used in our solrconfig.xml, e.g.
 str name=enabled${solr.enable.cachewarming:true}/str

Now we want to migrate to SolrCloud and want to define these properties for
a collection. But defining properties when creating a collection just
writes them into the core.properties of the created cores. This is a pain,
because we have a lot of properties and you have to specify each as an URL
parameter. Furthermore it seems that these properties are not propagated to
the cores for new shards, if you e.g. split a shard - error-prone.

As you already mentioned, we could resolve this properties ourselves by
using many configsets instead of just one. My question was, if it is
possible to use just one configset in this case and specify collection
specific properties at the collection level? This seems for me the better
way to handle the configuration complexity.

Markus

2015-05-06 3:48 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 _What_ properties? Details matter

 And how do you do this now? Assuming you do this with separate conf
 directories, these are then just configsets in Zookeeper and you can
 have as many of them as you want. Problem here is that each one of
 them is a complete set of schema and config files, AFAIK the config
 set is the finest granularity that you have OOB.

 Best,
 Erick

 On Tue, May 5, 2015 at 6:55 AM, Markus Heiden markus.hei...@s24.com
 wrote:
  Hi,
 
  we are trying to migrate from Solr 4.10 to SolrCloud 4.10. I understood
  that SolrCloud uses collections as abstraction from the cores. What I am
  missing is a possibility to store collection-specific properties in
  Zookeeper. Using property.foo=bar in CREATE-URLs just sets core-specific
  properties which are not distributed, e.g. if I migrate a shard from one
  node to another.
 
  How do I define collection-specific properties (to be used in
  solrconfig.xml and schema.xml) which get distributed with the collection
 to
  all nodes?
 
  Why do I try that? Currently we have different cores which structure is
  identical, but have each having some specific properties. I would like to
  have a single configuration for them in Zookeeper from which I want to
  create different collections, which just differ in the value of some
  properties.
 
  Markus



Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
1. yes, I'm sure that pauses are due to GCs. I monitor the cluster and
receive continuously metric from system and from java process.
I see clearly that when soft commit is triggered, major GCs start occurring
(sometimes reocuuring on the same process) and latency rises.
I use CMS GC and jdk 1.7.75

2. My previous post was about another use case, but nevertheless I have
configured docvalues in the faceted fields.


Toke Eskildsen wrote
 On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote:
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr
 processes.
 each process has 28GB heap.  each machine has 196GB RAM.
 
 [...]
 
 1. heavy GCs when soft commit is performed (methods 1,2) or when
 hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)
 
 Sanity check: Are you sure the pauses are due to garbage collection?
 
 You have a fairly large heap and judging from your previous post
 problem with facets  - out of memory exception, you are doing
 non-trivial faceting. Are you using DocValues, as Marc suggested?
 
 
 - Toke Eskildsen, State and University Library, Denmark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread Shalin Shekhar Mangar
Nope, there is no way to find that out without actually doing the split. If
you have composite keys then you could also split using the prefix of a
composite id via the split.key parameter.

On Wed, May 6, 2015 at 9:32 AM, anand.mahajan an...@zerebral.co.in wrote:

 Looks like its not possible to find out the optimal hash ranges for a split
 before you actually split it. So the only way out is to keep splitting out
 the large subshards?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204045.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Daniel Collins
Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the same, but have values HELLO and hello,
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies pn and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson erickerick...@gmail.com wrote:

 Well, working fine may be a bit of an overstatement. That has never
 been officially supported, so it just happened to work in 3.6.

 As Chris points out, if you're using SolrCloud then this will _not_
 work as routing happens early in the process, i.e. before the analysis
 chain gets the token so various copies of the doc will exist on
 different shards.

 Best,
 Erick

 On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:
  Hello Chris,
 
  yes I confirm on my SOLR3.6 it works fine since several years, and each
 doc
  added with same code is updated not added.
 
  To be more clear, I receive docs with a field name pn and it's the
  uniqueKey, and it always in uppercase
 
  so I must define in my schema.xml
 
  field name=id type=string multiValued=false indexed=true
  required=true stored=true/
  field name=pn type=text_general multiValued=true
 indexed=true
  stored=false/
  ...
 uniqueKeyid/uniqueKey
  ...
copyField source=id dest=pn/
 
  but the application that use solr already exists so it requests with pn
  field not id, i cannot change that.
  and in each docs I receive, there is not id field, just pn field, and  i
  cannot also change that.
 
  so there is a problem no ? I must import a id field and request a pn
 field,
  but I have a pn field only for import...
 
 
 
  Le 05/05/2015 01:00, Chris Hostetter a écrit :
 
  : On SOLR3.6, I defined a string_ci field like this:
  :
  : fieldType name=string_ci class=solr.TextField
  : sortMissingLast=true omitNorms=true
  : analyzer
  :   tokenizer class=solr.KeywordTokenizerFactory/
  :   filter class=solr.LowerCaseFilterFactory/
  : /analyzer
  : /fieldType
  :
  : field name=pn type=string_ci multiValued=false indexed=true
  : required=true stored=true/
 
 
  I'm really suprised that field would have worked for you (reliably) as a
  uniqueKey field even in Solr 3.6.
 
  the best practice for something like what you describe has always (going
  back to Solr 1.x) been to use a copyField to create a case insensitive
  copy of your uniqueKey for searching.
 
  if, for some reason, you really want case insensitve *updates* (so a doc
  with id foo overwrites a doc with id FOO then the only reliable way
 to
  make something like that work is to do the lowercassing in an
  UpdateProcessor to ensure it happens *before* the docs are distributed
 to
  the correct shard, and so the correct existing doc is overwritten (even
 if
  you aren't using solr cloud)
 
 
 
  -Hoss
  http://www.lucidworks.com/
 
 
 
 
  ---
  Ce courrier électronique ne contient aucun virus ou logiciel malveillant
  parce que la protection avast! Antivirus est active.
  http://www.avast.com
 



severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Hello
I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
documents.
it currently has 3 billion documents overall (parent and children).
each shard has around 200 million docs. size of each shard is 250GB.
this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
each process has 28GB heap.  each machine has 196GB RAM.

I perform periodic indexing throughout the day. each indexing cycle adds
around 1.5 million docs. I keep the indexing load light - 2 processes with
bulks of 20 docs.

My use case demands that each indexing cycle will be visible only when the
whole cycle finishes.

I tried various methods of using soft and hard commits:

1. using auto hard commit with time=10secs (opensearcher=false) and an
explicit soft commit when the indexing finishes.
2. using auto soft commit with time=10/30/60secs during the indexing.
3. not using soft commit at all, just using auto hard commit with
time=10secs during the indexing (opensearcher=false) and an explicit hard
commit with opensearcher=true when the cycle finishes.


with all methods I encounter pretty much the same problem:
1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
opensearcher=true is performed. these GCs cause heavy latency (average
latency is 3 secs. latency during the problem is 80secs)
2. if indexing cycles come too often, which causes softcommits or
hardcommits(opensearcher=true) occur with a small interval one after another
(around 5-10minutes), I start getting many OOM exceptions.


Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
Sent from the Solr - User mailing list archive at Nabble.com.


New core on Solr Cloud

2015-05-06 Thread shacky
Hi.
This is my first experience with Solr Cloud.
I installed three Solr nodes with three ZooKeeper instances and they
seemed to start well.
Now I have to create a new replicated core and I'm trying to found out
how I can do it.
I found many examples about how to create shards and cores, but I have
to create one core with only one shard replicated on all three nodes
(so basically I want to have the same data on all three nodes).

Could you help me to understand what is the correct way to make this, please?

Thank you very much!
Bye


Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Mayur Champaneria
Hello,

When I starting solr-5.1.0 in Ubuntu 12.04 by,

*/bin/var/www/solr-5.0.0/bin ./solr start*


Solr is being started and shows as below,

*Started Solr server on port 8983 (pid=14457). Happy searching!*


When I starting Solr on http://localhost:8983/solr/ its not starting.
Then I have checking the status by

*/bin/var/www/solr-5.0.0/bin ./solr status*


then at the end I have got an error as below,


*Exception in thread main java.lang.UnsupportedClassVersionError:
org/apache/solr/util/SolrCLI : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)

Could not find the main class: org.apache.solr.util.SolrCLI. Program will exit.

please visit http://localhost:8983/solr*


Same thing is repeating when starting solr on SolrCloud

Please help me in this.


-- 
*Thanks  **Regards,*


*Mayur Champaneria*

*PHP Developer ( MMT )*
*Vertex Softwares*


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Okay - Thanks for the confirmation Shalin.  Could this be a feature request
in the Collections API - that we have a Split shard dry run API that accepts
sub-shards count as a request param and returns the optimal shard ranges for
the number of sub-shards requested to be created along with the respective
document counts for each of the sub-shards? The users can then use this
shard ranges for the actual split?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: New core on Solr Cloud

2015-05-06 Thread shacky
Ok, I found out that the creation of new core/collection on Solr 5.1
is made with the bin/solr script.
So I created a new collection with this command:

./solr create_collection -c test -replicationFactor 3

Is this the correct way?

Thank you very much,
Bye!

2015-05-06 10:02 GMT+02:00 shacky shack...@gmail.com:
 Hi.
 This is my first experience with Solr Cloud.
 I installed three Solr nodes with three ZooKeeper instances and they
 seemed to start well.
 Now I have to create a new replicated core and I'm trying to found out
 how I can do it.
 I found many examples about how to create shards and cores, but I have
 to create one core with only one shard replicated on all three nodes
 (so basically I want to have the same data on all three nodes).

 Could you help me to understand what is the correct way to make this, please?

 Thank you very much!
 Bye


ZooKeeperException: Could not find configName for collection

2015-05-06 Thread shacky
Hi list.

I created a new collection on my new SolrCloud installation, the new
collection is shown and replicated on all three nodes, but on the
first node (only on this one) I get this error:

new_core: 
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
Could not find configName for collection new_core found:null

I cannot see any core named new_core on that node, and I also tried
to remove it:

root@index1:/opt/solr# ./bin/solr delete -c new_core
Connecting to ZooKeeper at zk1,zk2,zk3
ERROR: Collection new_core not found!

Could you help me, please?

Thank you very much!
Bye


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread Shalin Shekhar Mangar
Hi Anand,

The nature of the hash function (murmur3) should lead to a approximately
uniform distribution of documents across sub-shards. Have you investigated
why, if at all, the sub-shards are not balanced? Do you use composite keys
e.g. abc!id1 which cause the imbalance?

I don't think there is a (cheap) way to implement what you are asking in
the current scheme of things because unless we go through each id and
calculate the hash, we have no way of knowing the optimal distribution.
However, if we were to store the hash of the key as a separate field in the
index then it should be possible to binary search for hash ranges which
lead to approx. equal distribution of docs in sub-shards. We can implement
something like that inside Solr.

On Wed, May 6, 2015 at 4:42 PM, anand.mahajan an...@zerebral.co.in wrote:

 Okay - Thanks for the confirmation Shalin.  Could this be a feature request
 in the Collections API - that we have a Split shard dry run API that
 accepts
 sub-shards count as a request param and returns the optimal shard ranges
 for
 the number of sub-shards requested to be created along with the respective
 document counts for each of the sub-shards? The users can then use this
 shard ranges for the actual split?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Markus Heiden
UnsupportedClassVersionError means you have an old JDK. Use a more recent
one.

Markus

2015-05-06 12:59 GMT+02:00 Mayur Champaneria ma...@matchmytalent.com:

 Hello,

 When I starting solr-5.1.0 in Ubuntu 12.04 by,

 */bin/var/www/solr-5.0.0/bin ./solr start*


 Solr is being started and shows as below,

 *Started Solr server on port 8983 (pid=14457). Happy searching!*


 When I starting Solr on http://localhost:8983/solr/ its not starting.
 Then I have checking the status by

 */bin/var/www/solr-5.0.0/bin ./solr status*


 then at the end I have got an error as below,


 *Exception in thread main java.lang.UnsupportedClassVersionError:
 org/apache/solr/util/SolrCLI : Unsupported major.minor version 51.0
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:268)

 Could not find the main class: org.apache.solr.util.SolrCLI. Program will
 exit.

 please visit http://localhost:8983/solr*


 Same thing is repeating when starting solr on SolrCloud

 Please help me in this.


 --
 *Thanks  **Regards,*


 *Mayur Champaneria*

 *PHP Developer ( MMT )*
 *Vertex Softwares*



Will field type change require complete re-index?

2015-05-06 Thread Vishal Sharma
Hi,

I have been using Solr for sometime now and by mistake I used String for my
date fields. The format of the string is like this: 2015-05-05T13:24:10Z


Now, If I need to change the field type to date from String will this
require complete reindex?



*Vishal Sharma**Team Leader, SFDC*T:+1 302 753 5105
E: vish...@grazitti.com
www.grazitti.com
http://go.grazitti.com/Meet-Us-At-Marketo-User-Summit.htmlApril
13th-15th, 2015 *Meet us at*Moscone Center, Booth #32,
San Francisco Schedule a Meeting
http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule
  |[image:
Description: LinkedIn]
http://www.linkedin.com/company/grazitti-interactive [image: Description:
Twitter] https://twitter.com/grazitti [image: fbook]
https://www.facebook.com/grazitti.interactive


Solr with logstash solr_http output plugin and geoip filter

2015-05-06 Thread Daniel Marsh
Hi,

I'm currently using solr to index a moderate amount of information with
the help of logstash and the solr_http contrib output plugin.

solr is receiving documents, I've got banana as a web interface and I am
running it with a schemaless core.

I'm feeding documents via the contrib plugin solr_http and logstash. One
of the filters I'm using is geoip with the following setup:

  geoip {
source = subject_ip
database = /opt/logstash/vendor/geoip/GeoLiteCity.dat
target = geoip
fields = [latitude, longitude]
  }

However this created a string field called geoip with the value:
{latitude=2.0, longitude=13.0, location=[2.0, 13.0]}

This is meant to become three sub fields:
geoip.latitude = 2.0
geoip.longitude = 13.0
geoip.location = 2.0, 13.0

The above setup worked with logstash feeding into elasticsearch,
resulting in geoip.location being populated correctly as a field itself.

Given it did work with ES, I assume the first issue is, solr either does
not know how to parse a value as additional variables with values, OR I
simply have not configured solr correctly (I'm betting on the latter).

I have only been using solr for about 8 hours (installed today), had to
try something as no amount of tweaking would resolve the indexing
performance issues I had with ES - I'm now indexing the same amount of
data into solr at near real-time on the exact same machine that was
running ES where indexing would stop after about 2 hours.

The whole point of the geoip field is to get geoip.location which will
be the location field used by bettermap on the banana web interface.

I am not running SiLK.
I am running solr 5.1, logstash 1.4.

Regards,
Daniel


Re: Multiple index.timestamp directories using up disk space

2015-05-06 Thread rishi
We use the following merge policy on SSD's and are running on physical
machines with linux OS.

 mergeFactor10/mergeFactor
mergePolicy class=org.apache.lucene.index.TieredMergePolicy/
mergeScheduler
class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount3/int
int name=maxMergeCount15/int
/mergeScheduler
ramBufferSizeMB64/ramBufferSizeMB 

Not sure if its very aggressive, but its something we keep to prevent
deleted documents taking up too much space on our index.

Is there some error message that solr logs when rename and deletion of the
directories fails. If so we could monitor our logs to get a better idea for
the root cause. At present we can only react when things go wrong based on
disk space alarms.

Thanks,
Rishi.
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-index-timestamp-directories-using-up-disk-space-tp4201098p4204145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Thank you for the detailed answer.
How can I decrease the impact of opening a searcher in such a large index?
especially the impact of heap usage that causes OOM.

regarding GC tuning - I am doint that.
here are the params I use:
AggresiveOpts
UseLargePages
ParallelRefProcEnabled
CMSParallelRemarkEnabled
CMSMaxAbortablePrecleanTime=6000
CMDTriggerPermRatio=80
CMSInitiatingOccupancyFraction=70
UseCMSInitiatinOccupancyOnly
CMSFullGCsBeforeCompaction=1
PretenureSizeThreshold=64m
CMSScavengeBeforeRemark
UseConcMarkSweepGC
MaxTenuringThreshold=8
TargetSurvivorRatio=90
SurviorRatio=4
NewRatio=2
Xms16gb
Xmn28gb

any input on this?

How many documents per shard are recommended?
Note that I use nested documents. total collection size is 3 billion docs,
number of parent docs is 600 million. the rest are children.



Shawn Heisey-2 wrote
 On 5/6/2015 1:58 AM, adfel70 wrote:
 I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
 documents.
 it currently has 3 billion documents overall (parent and children).
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr
 processes.
 each process has 28GB heap.  each machine has 196GB RAM.
 
 I perform periodic indexing throughout the day. each indexing cycle adds
 around 1.5 million docs. I keep the indexing load light - 2 processes
 with
 bulks of 20 docs.
 
 My use case demands that each indexing cycle will be visible only when
 the
 whole cycle finishes.
 
 I tried various methods of using soft and hard commits:
 
 I personally would configure autoCommit on a five minute (maxTime of
 30) interval with openSearcher=false.  The use case you have
 outlined (not seeing changed until the indexing is done) demands that
 you do NOT turn on autoSoftCommit, that you do one manual commit at the
 end of indexing, which could be either a soft commit or a hard commit.
 I would recommend a soft commit.
 
 Because it is the openSearcher part of a commit that's very expensive,
 you can successfully do autoCommit with openSearcher=false on an
 interval like 10 or 15 seconds and not see much in the way of immediate
 performance loss.  That commit is still not free, not only in terms of
 resources, but in terms of java heap garbage generated.
 
 The general advice with commits is to do them as infrequently as you
 can, which applies to ANY commit, not just those that make changes
 visible.
 
 with all methods I encounter pretty much the same problem:
 1. heavy GCs when soft commit is performed (methods 1,2) or when
 hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)
 2. if indexing cycles come too often, which causes softcommits or
 hardcommits(opensearcher=true) occur with a small interval one after
 another
 (around 5-10minutes), I start getting many OOM exceptions.
 
 If you're getting OOM, then either you need to change things so Solr
 requires less heap memory, or you need to increase the heap size.
 Changing things might be either the config or how you use Solr.
 
 Are you tuning your garbage collection?  With a 28GB heap, tuning is not
 optional.  It's so important that the startup scripts in 5.0 and 5.1
 include it, even though the default max heap is 512MB.
 
 Let's do some quick math on your memory.  You have four instances of
 Solr on each machine, each with a 28GB heap.  That's 112GB of memory
 allocated to Java.  With 196GB total, you have approximately 84GB of RAM
 left over for caching your index.
 
 A 16-shard index with three replicas means 48 cores.  Divide that by 12
 machines and that's 4 replicas on each server, presumably one in each
 Solr instance.  You say that the size of each shard is 250GB, so you've
 got about a terabyte of index on each server, but only 84GB of RAM for
 caching.
 
 Even with SSD, that's not going to be anywhere near enough cache memory
 for good Solr performance.
 
 All these memory issues, including GC tuning, are discussed on this wiki
 page:
 
 http://wiki.apache.org/solr/SolrPerformanceProblems
 
 One additional note: By my calculations, each filterCache entry will be
 at least 23MB in size.  This means that if you are using the filterCache
 and the G1 collector, you will not be able to avoid humongous
 allocations, which is any allocation larger than half the G1 region
 size.  The max configurable G1 region size is 32MB.  You should use the
 CMS collector for your GC tuning, not G1.  If you can reduce the number
 of documents in each shard, G1 might work well.
 
 Thanks,
 Shawn





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
I dont see any of these.
I've seen them before in other clusters and uses of SOLR  but don't see any
of these messages here.



Dmitry Kan-2 wrote
 Do you seen any (a lot?) of the warming searchers on deck, i.e. value for
 N:
 
 PERFORMANCE WARNING: Overlapping onDeckSearchers=N
 
 On Wed, May 6, 2015 at 10:58 AM, adfel70 lt;

 adfel70@

 gt; wrote:
 
 Hello
 I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
 documents.
 it currently has 3 billion documents overall (parent and children).
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr
 processes.
 each process has 28GB heap.  each machine has 196GB RAM.

 I perform periodic indexing throughout the day. each indexing cycle adds
 around 1.5 million docs. I keep the indexing load light - 2 processes
 with
 bulks of 20 docs.

 My use case demands that each indexing cycle will be visible only when
 the
 whole cycle finishes.

 I tried various methods of using soft and hard commits:

 1. using auto hard commit with time=10secs (opensearcher=false) and an
 explicit soft commit when the indexing finishes.
 2. using auto soft commit with time=10/30/60secs during the indexing.
 3. not using soft commit at all, just using auto hard commit with
 time=10secs during the indexing (opensearcher=false) and an explicit hard
 commit with opensearcher=true when the cycle finishes.


 with all methods I encounter pretty much the same problem:
 1. heavy GCs when soft commit is performed (methods 1,2) or when
 hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)
 2. if indexing cycles come too often, which causes softcommits or
 hardcommits(opensearcher=true) occur with a small interval one after
 another
 (around 5-10minutes), I start getting many OOM exceptions.


 Thank you.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info





--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068p4204123.html
Sent from the Solr - User mailing list archive at Nabble.com.


getting frequent CorruptIndexException and inconsistent data though core is active

2015-05-06 Thread adfel70
Hi
I'm getting org.apache.lucene.index.CorruptIndexException 
liveDocs.count()=2000699 info.docCount()=2047904 info.getDelCount()=47207
(filename=_ney_1g.del).

This just happened for the 4th time in 2 weeks.
each time this happens in another core, usually when a replica tries to
recover, then it reports that it succeeded, and then the
CorruptIndexException  is thrown while trying to open searcher.

this core is marked as active and thus query can get redirected there and
this causes data inconsistency to users.
this occurs with solr 4.10.3, should be noted that I use nested docs.

ANOTHER problem is that replicas can get inconsistent number of docs with no
exception being reported.
This occurs usually when one of the replicas goes down during indexing. what
I end up getting is the leader being in an older version than the replicas
or having less docs than the replicas. switching leaders (stopping the
leader so that another replicas become the leader) does not fix the problem.

this occurs both in solr 4.10.3 and in solr 4.8





--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-frequent-CorruptIndexException-and-inconsistent-data-though-core-is-active-tp4204129.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: /suggest through SolrJ?

2015-05-06 Thread Alessandro Benedetti
Exactly Tomnaso ,
I was referring to that !

I wrote another mail in the dev mailing list, I will open a Jira Issue for
that !

Cheers

2015-04-29 12:16 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com:

 2015-04-27 19:22 GMT+02:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
 :

  Just had the very same problem, and I confirm that currently is quite a
  mess to manage suggestions in SolrJ !
  I have to go with manual Json parsing.
 

 or very not nice NamedList API mess (see an example in JR Oak [1][2]).

 Regards,
 Tommaso

 p.s.:
 note that this applies to Solr 4.7.1 API, but reading the thread it seems
 the problem is still there.

 [1] :

 https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L318
 [2] :

 https://github.com/apache/jackrabbit-oak/blob/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/query/SolrQueryIndex.java#L370



 
  Cheers
 
  2015-02-02 12:17 GMT+00:00 Jan Høydahl jan@cominvent.com:
 
   Using the /suggest handler wired to SuggestComponent, the
   SpellCheckResponse objects are not populated.
   Reason is that QueryResponse looks for a top-level element named
   spellcheck
  
 else if ( spellcheck.equals( n ) )  {
   _spellInfo = (NamedListObject) res.getVal( i );
   extractSpellCheckInfo( _spellInfo );
 }
  
   Earlier the suggester was the same as the Spell component, but now with
   its own component, suggestions are put in suggest.
  
   I think we're lacking a SuggestResponse.java for parsing suggest
   responses..??
  
   --
   Jan Høydahl, search solution architect
   Cominvent AS - www.cominvent.com
  
26. sep. 2014 kl. 07.27 skrev Clemens Wyss DEV clemens...@mysign.ch
 :
   
Thx to you two.
   
Just in case anybody else is trying to do this. The following SolrJ
   code corresponds to the http request
GET http://localhost:8983/solr/solrpedia/suggest?q=atmo
of  Solr in Action (chapter 10):
...
SolrServer server = new HttpSolrServer(
   http://localhost:8983/solr/solrpedia;);
SolrQuery query = new SolrQuery( atmo );
query.setRequestHandler( /suggest );
QueryResponse queryresponse = server.query( query );
...
queryresponse.getSpellCheckResponse().getSuggestions();
...
   
   
-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:s...@elyograg.org]
Gesendet: Donnerstag, 25. September 2014 17:37
An: solr-user@lucene.apache.org
Betreff: Re: /suggest through SolrJ?
   
On 9/25/2014 8:43 AM, Erick Erickson wrote:
You can call anything from SolrJ that you can call from a URL.
SolrJ has lots of convenience stuff to set particular parameters,
parse the response, etc... But in the end it's communicating with
 Solr
via a URL.
   
Take a look at something like SolrQuery for instance. It has a nice
command setFacetPrefix. Here's the entire method:
   
public SolrQuery setFacetPrefix( String field, String prefix ) {
   this.set( FacetParams.FACET_PREFIX, prefix );
   return this;
}
   
which is really
   this.set( facet.prefix, prefix ); All it's really doing is
setting a SolrParams key/value pair which is equivalent to
facet.prefix=blahblah on a URL.
   
As I remember, there's a setPath method that you can use to set
 the
destination for the request to suggest (or maybe /suggest). It's
something like that.
   
Yes, like Erick says, just use SolrQuery for most accesses to Solr on
   arbitrary URL paths with arbitrary URL parameters.  The set method is
  how
   you include those parameters.
   
The SolrQuery method Erick was talking about at the end of his email
 is
   setRequestHandler(String), and you would set that to /suggest.  Full
   disclosure about what this method actually does: it also sets the qt
parameter, but with the modern example Solr config, the qt parameter
   doesn't do anything -- you must actually change the URL path on the
   request, which this method will do if the value starts with a forward
  slash.
   
Thanks,
Shawn
   
  
  
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr not getting Start. Error : Could not find the main class: org.apache.solr.util.SolrCLI

2015-05-06 Thread Shawn Heisey
On 5/6/2015 6:37 AM, Markus Heiden wrote:
 UnsupportedClassVersionError means you have an old JDK. Use a more recent
 one.

Specifically, Unsupported major.minor version 51.0 means you are
trying to use Java 6 (1.6.0) to run a program that requires Java 7
(1.7.0).  Solr 4.8 and later (including the 5.x versions) requires Java 7.

If you're looking for the absolute minimum requirements, you only need
the JRE, not the JDK.

Thanks,
Shawn



Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Yes - I'm using 2 level composite ids and that has caused the imbalance for
some shards.
Its cars data and the composite ids are of the form year-make!model-and
couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are
just far too many Ford 2013 or 2011 cars that go and occupy the same shards.
This was done so as co-location of these docs is required as well for a few
of the search requirements - to avoid it hitting all shards all the time and
all queries do have the year and make combinations always specified and its
easier to work out the target shard for the query.

Regarding storing the hash against each document and then querying to find
out the optimal ranges - could it be done so that Solr maintains incremental
counters for each of the hash in the range for the shard - and then the
collections Splitshard API could use this internally to propose the optimal
shard ranges for the split? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Will field type change require complete re-index?

2015-05-06 Thread Shawn Heisey
On 5/6/2015 7:03 AM, Vishal Sharma wrote:
 Now, If I need to change the field type to date from String will this
 require complete reindex?

Yes, it absolutely will require a complete reindex.  A change like that
probably will result in errors on queries until a reindex is done.  You
may even need to completely delete the index directory and restart Solr
before doing your reindex to get rid of the old segments that have
information incompatible with your new schema.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Shawn Heisey
On 5/6/2015 1:58 AM, adfel70 wrote:
 I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
 documents.
 it currently has 3 billion documents overall (parent and children).
 each shard has around 200 million docs. size of each shard is 250GB.
 this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
 each process has 28GB heap.  each machine has 196GB RAM.
 
 I perform periodic indexing throughout the day. each indexing cycle adds
 around 1.5 million docs. I keep the indexing load light - 2 processes with
 bulks of 20 docs.
 
 My use case demands that each indexing cycle will be visible only when the
 whole cycle finishes.
 
 I tried various methods of using soft and hard commits:

I personally would configure autoCommit on a five minute (maxTime of
30) interval with openSearcher=false.  The use case you have
outlined (not seeing changed until the indexing is done) demands that
you do NOT turn on autoSoftCommit, that you do one manual commit at the
end of indexing, which could be either a soft commit or a hard commit.
I would recommend a soft commit.

Because it is the openSearcher part of a commit that's very expensive,
you can successfully do autoCommit with openSearcher=false on an
interval like 10 or 15 seconds and not see much in the way of immediate
performance loss.  That commit is still not free, not only in terms of
resources, but in terms of java heap garbage generated.

The general advice with commits is to do them as infrequently as you
can, which applies to ANY commit, not just those that make changes visible.

 with all methods I encounter pretty much the same problem:
 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
 opensearcher=true is performed. these GCs cause heavy latency (average
 latency is 3 secs. latency during the problem is 80secs)
 2. if indexing cycles come too often, which causes softcommits or
 hardcommits(opensearcher=true) occur with a small interval one after another
 (around 5-10minutes), I start getting many OOM exceptions.

If you're getting OOM, then either you need to change things so Solr
requires less heap memory, or you need to increase the heap size.
Changing things might be either the config or how you use Solr.

Are you tuning your garbage collection?  With a 28GB heap, tuning is not
optional.  It's so important that the startup scripts in 5.0 and 5.1
include it, even though the default max heap is 512MB.

Let's do some quick math on your memory.  You have four instances of
Solr on each machine, each with a 28GB heap.  That's 112GB of memory
allocated to Java.  With 196GB total, you have approximately 84GB of RAM
left over for caching your index.

A 16-shard index with three replicas means 48 cores.  Divide that by 12
machines and that's 4 replicas on each server, presumably one in each
Solr instance.  You say that the size of each shard is 250GB, so you've
got about a terabyte of index on each server, but only 84GB of RAM for
caching.

Even with SSD, that's not going to be anywhere near enough cache memory
for good Solr performance.

All these memory issues, including GC tuning, are discussed on this wiki
page:

http://wiki.apache.org/solr/SolrPerformanceProblems

One additional note: By my calculations, each filterCache entry will be
at least 23MB in size.  This means that if you are using the filterCache
and the G1 collector, you will not be able to avoid humongous
allocations, which is any allocation larger than half the G1 region
size.  The max configurable G1 region size is 32MB.  You should use the
CMS collector for your GC tuning, not G1.  If you can reduce the number
of documents in each shard, G1 might work well.

Thanks,
Shawn



Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
I'm trying to get the AnalyzingInfixSuggester to work but I'm not successful.
I'd be grateful if someone can point me to a working example. 

Problem:
My content is product descriptions similar to a BestBuy or NewEgg catalog.
My problem is that I'm getting only single words in the suggester results.
E.g. if I type 'len', I get the suggester results like 'Lenovo' but not
'Lenovo laptop' or something larger/longer than a single word. 

There is a suggestion here:
http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html
that the search at:
http://jirasearch.mikemccandless.com/search.py?index=jira is powered by the
AnalyzingInfixSuggester  If this is true, when I use this suggester, I get
more than a few words in the suggester results, but I don't with my setup
i.e. on my setup I get only single words. My configuration is 


searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
  str name=fieldtext/str  
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
  str name=suggestAnalyzerFieldTypetext_general/str
  bool name=exactMatchFirsttrue/bool
/lst
  /searchComponent
  
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

I copy the contents of all of my fields to a single field called 'text'. The
' text_general' type is exactly as in the solr examples:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/schema.xml?view=markup
 

I'd be grateful if anyone can help me. I don't know what to look at. Thank
you in adance.

O. O.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Completion Suggester in Solr

2015-05-06 Thread Pradeep Bhattiprolu
Hi

Is there a equivalent of Completion suggester of ElasticSearch in Solr ?

I am a user who uses both Solr and ES, in different projects.

I am not able to find a solution in Solr, where i can use :

1) FSA Structure
2) multiple terms as synonyms
3) assign a weight to each document based on certain hueristics, ex:
popularity score, user search history etc.


Any kind of help , pointers to relevant examples and documentation is
highly appreciated.

thanks in advance.

Pradeep


A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steven White
Hi Everyone,

I am using the Schema API to add a new copy field per:
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule

Unlike the other Add APIs, this one will not fail if you add an existing
copy field object.  In fact, after when I call the API over and over, the
item will appear over and over in schema.xml file like so:

  copyField source=author dest=text/
  copyField source=author dest=text/
  copyField source=author dest=text/
  copyField source=author dest=text/

Is this the expected behaviour or a bug?  As a side question, is there any
harm in having multiple copyField like I ended up with?

A final question, why there is no Replace a Copy Field?  Is this by design
for some limitation or was the API just never implemented?

Thanks

Steve


5.1.0 Heatmap + Geotools

2015-05-06 Thread Joseph Obernberger
Hi - I'm very interested in the new heat map capability of Solr 5.1.0.  
Has anyone looked at combining geotool's HeatmapProcess method with this 
data?  I'm trying this now, but I keep getting an empty image from the 
GridCoverage2D object.

Any pointers/tips?
Thank you!

-Joe


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
yes textSuggest is of type text_general with below definition

fieldType name=text_general class=solr.TextField
 positionIncrementGap=100 sortMissingLast=true omitNorms=true
 analyzer type=index
tokenizer class=solr.ClassicTokenizerFactory/
filter class=solr.ClassicFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=5
outputUnigrams=true/
  /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-FoldToASCII.txt/
 tokenizer class=solr.ClassicTokenizerFactory/
filter class=solr.ClassicFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=5
outputUnigrams=true/
  /analyzer
/fieldType

*Rajesh.*

On Wed, May 6, 2015 at 4:50 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Rajesh for responding so quickly. I tried it again with a restart
 and a reimport and I still cannot get this to work i.e. I'm seeing no
 difference.

 I'm wondering how you define: 'textSuggest' in your schema? In my case I
 use
 the field 'text' that is defined as:

 field name=text type=text_general indexed=true stored=false
 multiValued=true/

 I'm wondering if your 'textSuggest' is of type text_general ?

 Thank you again for your help
 O. O.


 Rajesh Hazari wrote
  I just tested your config with my schema and it worked.
 
  my config :
 
  searchComponent class=solr.SpellCheckComponent name=suggest1
 
  lst name=spellchecker
 
  str name=name
  suggest
  /str
 
  str name=classname
  org.apache.solr.spelling.suggest.Suggester
  /str
 
  str
  name=lookupImpl
  org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
  /str
 
  str name=field
  textSuggest
  /str
 
  float name=threshold
  0.005
  /float
 
  str name=buildOnCommit
  true
  /str
 
  str name=suggestAnalyzerFieldType
  text_general
  /str
 
  bool name=exactMatchFirst
  true
  /bool
 
  /lst
 
  /searchComponent
 
  queryConverter name=queryConverter
  class=org.apache.solr.spelling.SuggestQueryConverter/
 
  requestHandler class=org.apache.solr.handler.component.SearchHandler
  name=/suggest1
 
  lst name=defaults
 
  str name=spellcheck
  true
  /str
 
  str name=spellcheck.dictionary
  suggest
  /str
 
  str name=spellcheck.onlyMorePopular
  true
  /str
 
  str name=spellcheck.count
  5
  /str
 
  str name=spellcheck.collate
  true
  /str
 
  /lst
 
  arr name=components
 
  str
  suggest1
  /str
 
  /arr
 
  /requestHandler
 
 
 http://localhost:8585/solr/collection1/suggest1?q=applerows=10wt=jsonindent=true
 
  {
responseHeader:{
  status:0,
  QTime:2},
spellcheck:{
  suggestions:[
apple,{
  numFound:5,
  startOffset:0,
  endOffset:5,
  suggestion:[
 *
  apple
 *
  ,

 *
  apple
 *
   and,

 *
  apple
 *
   and facebook,

 *
  apple
 *
   and facebook learn,

 *
  apple
 *
   and facebook learn from]},
collation,
 *
  apple
 *
  ]}}
 
 
 
  *Rajesh**.*





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204208.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh. I think I got a bit of help from the answer at:
http://stackoverflow.com/a/29743945

While that example sort of worked for me, I'm not had the time to test what
works and what didn't. 

So far I have found that I need the the field in my searchComponent to be of
type 'string'. In my original example I had this as text_general. Next I
used the suggest_string fieldType as defined in the StackOverflow answer. I
also removed your queryConverter, and it still works, so I think it's not
needed. 

Thank you very much,
O. O. 



Rajesh Hazari wrote
 I just tested your config with my schema and it worked.
 
 my config :
   
 searchComponent class=solr.SpellCheckComponent name=suggest1
 
 lst name=spellchecker
   
 str name=name
 suggest
 /str
   
 str name=classname
 org.apache.solr.spelling.suggest.Suggester
 /str
   
 str
 name=lookupImpl
 org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
 /str
   
 str name=field
 textSuggest
 /str
   
 float name=threshold
 0.005
 /float
   
 str name=buildOnCommit
 true
 /str
   
 str name=suggestAnalyzerFieldType
 text_general
 /str
   
 bool name=exactMatchFirst
 true
 /bool
 
 /lst
   
 /searchComponent
 
 queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/
   
 requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest1
 
 lst name=defaults
   
 str name=spellcheck
 true
 /str
   
 str name=spellcheck.dictionary
 suggest
 /str
   
 str name=spellcheck.onlyMorePopular
 true
 /str
   
 str name=spellcheck.count
 5
 /str
   
 str name=spellcheck.collate
 true
 /str
 
 /lst
 
 arr name=components
   
 str
 suggest1
 /str
 
 /arr
   
 /requestHandler
 
 http://localhost:8585/solr/collection1/suggest1?q=applerows=10wt=jsonindent=true
 
 {
   responseHeader:{
 status:0,
 QTime:2},
   spellcheck:{
 suggestions:[
   apple,{
 numFound:5,
 startOffset:0,
 endOffset:5,
 suggestion:[
*
 apple
*
 ,
   
*
 apple
*
  and,
   
*
 apple
*
  and facebook,
   
*
 apple
*
  and facebook learn,
   
*
 apple
*
  and facebook learn from]},
   collation,
*
 apple
*
 ]}}
 
 
 
 *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Erick Erickson
Have you seen this? I tried to make something end-to-end with assorted
gotchas identified

 Best,
Erick

On Wed, May 6, 2015 at 3:09 PM, O. Olson olson_...@yahoo.it wrote:
 Thank you Rajesh. I think I got a bit of help from the answer at:
 http://stackoverflow.com/a/29743945

 While that example sort of worked for me, I'm not had the time to test what
 works and what didn't.

 So far I have found that I need the the field in my searchComponent to be of
 type 'string'. In my original example I had this as text_general. Next I
 used the suggest_string fieldType as defined in the StackOverflow answer. I
 also removed your queryConverter, and it still works, so I think it's not
 needed.

 Thank you very much,
 O. O.



 Rajesh Hazari wrote
 I just tested your config with my schema and it worked.

 my config :

 searchComponent class=solr.SpellCheckComponent name=suggest1

 lst name=spellchecker

 str name=name
 suggest
 /str

 str name=classname
 org.apache.solr.spelling.suggest.Suggester
 /str

 str
 name=lookupImpl
 org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
 /str

 str name=field
 textSuggest
 /str

 float name=threshold
 0.005
 /float

 str name=buildOnCommit
 true
 /str

 str name=suggestAnalyzerFieldType
 text_general
 /str

 bool name=exactMatchFirst
 true
 /bool

 /lst

 /searchComponent

 queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/

 requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest1

 lst name=defaults

 str name=spellcheck
 true
 /str

 str name=spellcheck.dictionary
 suggest
 /str

 str name=spellcheck.onlyMorePopular
 true
 /str

 str name=spellcheck.count
 5
 /str

 str name=spellcheck.collate
 true
 /str

 /lst

 arr name=components

 str
 suggest1
 /str

 /arr

 /requestHandler

 http://localhost:8585/solr/collection1/suggest1?q=applerows=10wt=jsonindent=true

 {
   responseHeader:{
 status:0,
 QTime:2},
   spellcheck:{
 suggestions:[
   apple,{
 numFound:5,
 startOffset:0,
 endOffset:5,
 suggestion:[
 *
 apple
 *
 ,
   
 *
 apple
 *
  and,
   
 *
 apple
 *
  and facebook,
   
 *
 apple
 *
  and facebook learn,
   
 *
 apple
 *
  and facebook learn from]},
   collation,
 *
 apple
 *
 ]}}



 *Rajesh**.*





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204222.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.6.2 under tomcat 8 missing corename in path

2015-05-06 Thread Shawn Heisey
On 5/6/2015 2:29 PM, Tim Dunphy wrote:
 I'm trying to setup an old version of Solr for one of our drupal
 developers. Apparently only versions 1.x or 3.x will work with the current
 version of drupal.
 
 I'm setting up solr 3.4.2 under tomcat.
 
 And I'm getting this error when I start tomcat and surf to the /solr/admin
 URL:
 
  HTTP Status 404 - missing core name in path
 
 type Status report
 
 message missing core name in path
 
 description The requested resource is not available.

The URL must include the core name.  Your defaultCoreName is
collection1, and I'm guessing you don't have a core named collection1.

Try browsing to just /solr instead of /solr/admin ... you should get a
list of links for valid cores, each of which will take you to the admin
page for that core.

Probably what you will find is that when you click on one of those
links, you will end up on /solr/corename/admin.jsp as the URL in your
browser.

Thanks,
Shawn



solr 3.6.2 under tomcat 8 missing corename in path

2015-05-06 Thread Tim Dunphy
I'm trying to setup an old version of Solr for one of our drupal
developers. Apparently only versions 1.x or 3.x will work with the current
version of drupal.

I'm setting up solr 3.4.2 under tomcat.

And I'm getting this error when I start tomcat and surf to the /solr/admin
URL:

 HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

I have solr living in /opt:

# ls -ld /opt/solr
lrwxrwxrwx. 1 root root 17 May  6 12:48 /opt/solr - apache-solr-3.6.2

And I have my cores located here:

# ls -ld /opt/solr/admin/cores
drwxr-xr-x. 3 root root 4096 May  6 14:37 /opt/solr/admin/cores

Just one core so far, until I can get this working.

# ls -l /opt/solr/admin/cores/
total 4
drwxr-xr-x. 5 root root 4096 May  6 14:08 collection1

I have this as my solr.xml file:

solr persistent=false
  cores adminPath=/admin/cores defaultCoreName=collection1
 core name=collection1 instanceDir=collection1 /
   /cores
/solr

Which is located in these two places:

# ls -l /opt/solr/solr.xml /usr/local/tomcat/conf/Catalina/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38 /opt/solr/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38
/usr/local/tomcat/conf/Catalina/solr.xml

These are the contents of my /opt/solr directory

# ls -l  /opt/solr/
total 436
drwxr-xr-x.  3 root root   4096 May  6 14:37 admin
-rw-r--r--.  1 root root 176647 Dec 18  2012 CHANGES.txt
drwxr-xr-x.  3 root root   4096 May  6 12:48 client
drwxr-xr-x.  9 root root   4096 Dec 18  2012 contrib
drwxr-xr-x.  3 root root   4096 May  6 12:48 dist
drwxr-xr-x.  3 root root   4096 May  6 12:48 docs
-rw-r--r--.  1 root root   1274 May  6 13:28 elevate.xml
drwxr-xr-x. 11 root root   4096 May  6 12:48 example
-rw-r--r--.  1 root root  81331 Dec 18  2012 LICENSE.txt
-rw-r--r--.  1 root root  20828 Dec 18  2012 NOTICE.txt
-rw-r--r--.  1 root root   5270 Dec 18  2012 README.txt
-rw-r--r--.  1 root root  55644 May  6 13:27 schema.xml
-rw-r--r--.  1 root root  60884 May  6 13:27 solrconfig.xml
-rw-r--r--.  1 root root169 May  6 14:38 solr.xml


Yet, when I bounce tomcat, this is the result that I get:

HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

Cany anyone tell me what I'm doing wrong?


Thanks!!
Tim


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh for responding so quickly. I tried it again with a restart
and a reimport and I still cannot get this to work i.e. I'm seeing no
difference. 

I'm wondering how you define: 'textSuggest' in your schema? In my case I use
the field 'text' that is defined as: 

field name=text type=text_general indexed=true stored=false
multiValued=true/

I'm wondering if your 'textSuggest' is of type text_general ?

Thank you again for your help
O. O.


Rajesh Hazari wrote
 I just tested your config with my schema and it worked.
 
 my config :
   
 searchComponent class=solr.SpellCheckComponent name=suggest1
 
 lst name=spellchecker
   
 str name=name
 suggest
 /str
   
 str name=classname
 org.apache.solr.spelling.suggest.Suggester
 /str
   
 str
 name=lookupImpl
 org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
 /str
   
 str name=field
 textSuggest
 /str
   
 float name=threshold
 0.005
 /float
   
 str name=buildOnCommit
 true
 /str
   
 str name=suggestAnalyzerFieldType
 text_general
 /str
   
 bool name=exactMatchFirst
 true
 /bool
 
 /lst
   
 /searchComponent
 
 queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/
   
 requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest1
 
 lst name=defaults
   
 str name=spellcheck
 true
 /str
   
 str name=spellcheck.dictionary
 suggest
 /str
   
 str name=spellcheck.onlyMorePopular
 true
 /str
   
 str name=spellcheck.count
 5
 /str
   
 str name=spellcheck.collate
 true
 /str
 
 /lst
 
 arr name=components
   
 str
 suggest1
 /str
 
 /arr
   
 /requestHandler
 
 http://localhost:8585/solr/collection1/suggest1?q=applerows=10wt=jsonindent=true
 
 {
   responseHeader:{
 status:0,
 QTime:2},
   spellcheck:{
 suggestions:[
   apple,{
 numFound:5,
 startOffset:0,
 endOffset:5,
 suggestion:[
*
 apple
*
 ,
   
*
 apple
*
  and,
   
*
 apple
*
  and facebook,
   
*
 apple
*
  and facebook learn,
   
*
 apple
*
  and facebook learn from]},
   collation,
*
 apple
*
 ]}}
 
 
 
 *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204208.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
Just add the queryConverter definition in your solr config you should use
see multiple term suggestions.
and also make sure you have shingleFilterFactory as one of the filter in
you schema field definitions for your field text_general.

filter class=solr.ShingleFilterFactory maxShingleSize=5
outputUnigrams=true/


*Rajesh**.*

On Wed, May 6, 2015 at 1:47 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Rajesh. I'm not familiar with the queryConverter. How do you wire
 it up to the rest of the setup? Right now, I just put it between the
 SpellCheckComponent and the RequestHandler i.e. my config is as:

 searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str

 name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
   str name=fieldtext/str
   float name=threshold0.005/float
   str name=buildOnCommittrue/str
   str name=suggestAnalyzerFieldTypetext_general/str
   bool name=exactMatchFirsttrue/bool
 /lst
   /searchComponent

   queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/

   requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
 lst name=defaults
   str name=spellchecktrue/str
   str name=spellcheck.dictionarysuggest/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.collatetrue/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler

 Is this correct? I do not see any difference in my results i.e. the
 suggestions are the same as before.
 O. O.





 Rajesh Hazari wrote
  make sure you have this query converter defined in your config
  queryConverter name=queryConverter
  class=org.apache.solr.spelling.SuggestQueryConverter/
  *Thanks,*
  *Rajesh**.*





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
I just tested your config with my schema and it worked.

my config :
  searchComponent class=solr.SpellCheckComponent name=suggest1
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
  str name=fieldtextSuggest/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
  str name=suggestAnalyzerFieldTypetext_general/str
  bool name=exactMatchFirsttrue/bool
/lst
  /searchComponent

queryConverter name=queryConverter
class=org.apache.solr.spelling.SuggestQueryConverter/

  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest1
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest1/str
/arr
  /requestHandler


http://localhost:8585/solr/collection1/suggest1?q=applerows=10wt=jsonindent=true

{
  responseHeader:{
status:0,
QTime:2},
  spellcheck:{
suggestions:[
  apple,{
numFound:5,
startOffset:0,
endOffset:5,
suggestion:[bapple/b,
  bapple/b and,
  bapple/b and facebook,
  bapple/b and facebook learn,
  bapple/b and facebook learn from]},
  collation,bapple/b]}}



*Rajesh**.*

On Wed, May 6, 2015 at 2:48 PM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:

 Just add the queryConverter definition in your solr config you should use
 see multiple term suggestions.
 and also make sure you have shingleFilterFactory as one of the filter in
 you schema field definitions for your field text_general.

 filter class=solr.ShingleFilterFactory maxShingleSize=5
 outputUnigrams=true/


 *Rajesh**.*

 On Wed, May 6, 2015 at 1:47 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Rajesh. I'm not familiar with the queryConverter. How do you
 wire
 it up to the rest of the setup? Right now, I just put it between the
 SpellCheckComponent and the RequestHandler i.e. my config is as:

 searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str

 name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
   str name=fieldtext/str
   float name=threshold0.005/float
   str name=buildOnCommittrue/str
   str name=suggestAnalyzerFieldTypetext_general/str
   bool name=exactMatchFirsttrue/bool
 /lst
   /searchComponent

   queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/

   requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
 lst name=defaults
   str name=spellchecktrue/str
   str name=spellcheck.dictionarysuggest/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.collatetrue/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler

 Is this correct? I do not see any difference in my results i.e.
 the
 suggestions are the same as before.
 O. O.





 Rajesh Hazari wrote
  make sure you have this query converter defined in your config
  queryConverter name=queryConverter
  class=org.apache.solr.spelling.SuggestQueryConverter/
  *Thanks,*
  *Rajesh**.*





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread Rajesh Hazari
make sure you have this query converter defined in your config

queryConverter name=queryConverter
class=org.apache.solr.spelling.SuggestQueryConverter/

*Thanks,*
*Rajesh**.*

On Wed, May 6, 2015 at 12:39 PM, O. Olson olson_...@yahoo.it wrote:

 I'm trying to get the AnalyzingInfixSuggester to work but I'm not
 successful.
 I'd be grateful if someone can point me to a working example.

 Problem:
 My content is product descriptions similar to a BestBuy or NewEgg catalog.
 My problem is that I'm getting only single words in the suggester results.
 E.g. if I type 'len', I get the suggester results like 'Lenovo' but not
 'Lenovo laptop' or something larger/longer than a single word.

 There is a suggestion here:

 http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html
 that the search at:
 http://jirasearch.mikemccandless.com/search.py?index=jira is powered by
 the
 AnalyzingInfixSuggester  If this is true, when I use this suggester, I get
 more than a few words in the suggester results, but I don't with my setup
 i.e. on my setup I get only single words. My configuration is


 searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str

 name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
   str name=fieldtext/str
   float name=threshold0.005/float
   str name=buildOnCommittrue/str
   str name=suggestAnalyzerFieldTypetext_general/str
   bool name=exactMatchFirsttrue/bool
 /lst
   /searchComponent

   requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
 lst name=defaults
   str name=spellchecktrue/str
   str name=spellcheck.dictionarysuggest/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.collatetrue/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler

 I copy the contents of all of my fields to a single field called 'text'.
 The
 ' text_general' type is exactly as in the solr examples:

 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/schema.xml?view=markup

 I'd be grateful if anyone can help me. I don't know what to look at. Thank
 you in adance.

 O. O.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Trying to get AnalyzingInfixSuggester to work in Solr?

2015-05-06 Thread O. Olson
Thank you Rajesh. I'm not familiar with the queryConverter. How do you wire
it up to the rest of the setup? Right now, I just put it between the
SpellCheckComponent and the RequestHandler i.e. my config is as: 

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
  str name=fieldtext/str  
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
  str name=suggestAnalyzerFieldTypetext_general/str
  bool name=exactMatchFirsttrue/bool
/lst
  /searchComponent
  
  queryConverter name=queryConverter
class=org.apache.solr.spelling.SuggestQueryConverter/ 
  
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

Is this correct? I do not see any difference in my results i.e. the
suggestions are the same as before.
O. O.





Rajesh Hazari wrote
 make sure you have this query converter defined in your config
 queryConverter name=queryConverter
 class=org.apache.solr.spelling.SuggestQueryConverter/
 *Thanks,*
 *Rajesh**.*





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204173.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Steve Rowe
Hi Steve,

It’s by design that you can copyField the same source/dest multiple times - 
according to Yonik (not sure where this was discussed), this capability has 
been used in the past to effectively boost terms in the source field.  

The API isn’t symmetric here though: I’m guessing deleting a mutiply specified 
copy field rule will delete all of them, but this isn’t tested, so I’m not sure.

There is no replace-copy-field command because copy field rules don’t have 
dependencies (i.e., nothing else in the schema refers to copy field rules), 
unlike fields, dynamic fields and field types, so 
delete-copy-field/add-copy-field works as one would expect.

For fields, dynamic fields and field types, a delete followed by an add is not 
the same as a replace, since (dynamic) fields could have dependent copyFields, 
and field types could have dependent (dynamic) fields.  delete-* commands are 
designed to fail if there are any existing dependencies, while the replace-* 
commands will maintain the dependencies if they exist.

Steve

 On May 6, 2015, at 6:44 PM, Steven White swhite4...@gmail.com wrote:
 
 Hi Everyone,
 
 I am using the Schema API to add a new copy field per:
 https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule
 
 Unlike the other Add APIs, this one will not fail if you add an existing
 copy field object.  In fact, after when I call the API over and over, the
 item will appear over and over in schema.xml file like so:
 
  copyField source=author dest=text/
  copyField source=author dest=text/
  copyField source=author dest=text/
  copyField source=author dest=text/
 
 Is this the expected behaviour or a bug?  As a side question, is there any
 harm in having multiple copyField like I ended up with?
 
 A final question, why there is no Replace a Copy Field?  Is this by design
 for some limitation or was the API just never implemented?
 
 Thanks
 
 Steve



Re: Union and intersection methods in solr DocSet

2015-05-06 Thread Gajendra Dadheech
Hey Chris,

Thanks for reply.

The exception is ArrayIndexOutOfBound. It is coming because searcher may
return bitDocSet for query1 and sortedIntDocSet for query2 [could be
possible]. In that case, sortedIntDocSet doesn't implement intersection and
will cause this exception.


Thanks and regards,
Gajendra Dadheech


On Thu, May 7, 2015 at 6:06 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : DocSet docset1 = Searcher.getDocSet(query1)
 : DocSet docset2 = Searcher.getDocSet(query2);
 :
 : Docset finalDocset = docset1.intersection(docset2);
 :
 : Is this a valid approach ? Give docset could either be a sortedintdocset
 or
 : a bitdocset. I am facing ArrayIndexOutOfBoundException when
 : union/intersected between different kind of docsets.

 as far as i know, that should be a totally valid usage -- since you didn't
 provide the details of the stack trace or the code you wrote that
 produced it it's hard to guess why/where it's causing the exception.

 FWIW: SolrIndexSearcher has getDocSet methods that take multiple arguments
 which might be more efficient then doing the intersection directly (and
 are cache aware)

 if all you care about is the *size* of the intersection, see the
 SolrIndexSearcher.numDocs methods.

 -Hoss
 http://www.lucidworks.com/



Re: A defect in Schema API with Add a New Copy Field Rule?

2015-05-06 Thread Yonik Seeley
On Wed, May 6, 2015 at 8:10 PM, Steve Rowe sar...@gmail.com wrote:
 It’s by design that you can copyField the same source/dest multiple times - 
 according to Yonik (not sure where this was discussed), this capability has 
 been used in the past to effectively boost terms in the source field.

Yep, used to be relatively common.
Perhaps the API could be cleaner though if we supported that by
passing an optional numTimes or numCopies?  Seems like a sane
delete / overwrite options would thus be easier?

-Yonik


Re: Union and intersection methods in solr DocSet

2015-05-06 Thread Chris Hostetter

: DocSet docset1 = Searcher.getDocSet(query1)
: DocSet docset2 = Searcher.getDocSet(query2);
: 
: Docset finalDocset = docset1.intersection(docset2);
: 
: Is this a valid approach ? Give docset could either be a sortedintdocset or
: a bitdocset. I am facing ArrayIndexOutOfBoundException when
: union/intersected between different kind of docsets.

as far as i know, that should be a totally valid usage -- since you didn't 
provide the details of the stack trace or the code you wrote that 
produced it it's hard to guess why/where it's causing the exception.

FWIW: SolrIndexSearcher has getDocSet methods that take multiple arguments 
which might be more efficient then doing the intersection directly (and 
are cache aware)

if all you care about is the *size* of the intersection, see the 
SolrIndexSearcher.numDocs methods.

-Hoss
http://www.lucidworks.com/


Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Shawn Heisey
On 5/6/2015 8:55 AM, adfel70 wrote:
 Thank you for the detailed answer.
 How can I decrease the impact of opening a searcher in such a large index?
 especially the impact of heap usage that causes OOM.

See the wiki link I sent.  It talks about some of the things that
require a lot of heap and ways you can reduce those requirements.  The
lists are nowhere near complete.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 regarding GC tuning - I am doint that.
 here are the params I use:
 AggresiveOpts
 UseLargePages
 ParallelRefProcEnabled
 CMSParallelRemarkEnabled
 CMSMaxAbortablePrecleanTime=6000
 CMDTriggerPermRatio=80
 CMSInitiatingOccupancyFraction=70
 UseCMSInitiatinOccupancyOnly
 CMSFullGCsBeforeCompaction=1
 PretenureSizeThreshold=64m
 CMSScavengeBeforeRemark
 UseConcMarkSweepGC
 MaxTenuringThreshold=8
 TargetSurvivorRatio=90
 SurviorRatio=4
 NewRatio=2
 Xms16gb
 Xmn28gb

This list seems to have come from re-typing the GC options.  If this is
a cut/paste, I would not expect it to work -- there are typos and part
of each option is missing other characters.  Assuming that this is not
cut/paste, it is mostly similar to the CMS options that I once used for
my own index:

http://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector

 How many documents per shard are recommended?
 Note that I use nested documents. total collection size is 3 billion docs,
 number of parent docs is 600 million. the rest are children.

For the G1 collector, you'd want to limit each shard to about 100
million docs.  I have no idea about limitations and capabilities where
very large memory allocations are concerned with the CMS collector. 
Running the latest Java 8 is *strongly* recommended, no matter what
collector you're using, because recent versions have incorporated GC
improvements with large memory allocations.  With Java 8u40 and later,
the limitations for 16MB huge allocations on the G1 collector might not
even apply.

Thanks,
Shawn



Re: What is the best practice to Backup and delete a core from SOLR Master-Slave architecture

2015-05-06 Thread Erick Erickson
Well, they're just files on disk. You can freely copy the index files
around wherever you want. I'd do a few practice runs first though. So:
1 unload the core (or otherwise shut it down).
2 copy the data directory and all sub directories.
3 I'd also copy the conf directory to insure a consistent picture of
the index when you restored it.
4 delete the core however you please.

Of course before I did 4 I'd try bringing up the core on some other
machine a few times, just to be sure you had all the necessary
parts... Once you were confident of the process you don't need to
restore _every_ time.

Best,
Erick

On Wed, May 6, 2015 at 3:08 AM, sangeetha.subraman...@gtnexus.com
sangeetha.subraman...@gtnexus.com wrote:
 Hi,

 I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. 
 I am trying to identify what is the best way to backup an old core and delete 
 the same so as to free up space from the disk.

 I did get the information on how to unload a core and delete the indexes from 
 the core.

 Unloading - http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0
 Delete Indexes - 
 http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0deleteIndex=true

 What is the best approach to remove the old core ?


 *   Approach 1

 o   Unload the core in both Master and Slave server AND delete the index only 
 from Master server (retain the indexes in slave server as a backup). If I am 
 retaining the indexes in Slave server, at later point is there a way to bring 
 those to Master Server ?

 *   Approach 2

 o   Unload and delete the indexes from both Master and Slave server. Before 
 deleting, take a backup of the data dir of old core from File system. I am 
 not sure if this is even possible ?

 Is there any other way better way of doing this ? Please let me know

 Thanks
 Sangeetha


Re: New core on Solr Cloud

2015-05-06 Thread Erick Erickson
That should have put one replica on each machine, if it did you're fine.

Best,
Erick

On Wed, May 6, 2015 at 3:58 AM, shacky shack...@gmail.com wrote:
 Ok, I found out that the creation of new core/collection on Solr 5.1
 is made with the bin/solr script.
 So I created a new collection with this command:

 ./solr create_collection -c test -replicationFactor 3

 Is this the correct way?

 Thank you very much,
 Bye!

 2015-05-06 10:02 GMT+02:00 shacky shack...@gmail.com:
 Hi.
 This is my first experience with Solr Cloud.
 I installed three Solr nodes with three ZooKeeper instances and they
 seemed to start well.
 Now I have to create a new replicated core and I'm trying to found out
 how I can do it.
 I found many examples about how to create shards and cores, but I have
 to create one core with only one shard replicated on all three nodes
 (so basically I want to have the same data on all three nodes).

 Could you help me to understand what is the correct way to make this, please?

 Thank you very much!
 Bye


Re: ZooKeeperException: Could not find configName for collection

2015-05-06 Thread Erick Erickson
Have you looked arond at your directories on disk? I'm _not_ talking
about the admin UI here. The default is core discovery mode, which
recursively looks under solr_home and thinks there's a core wherever
it finds a core.properties file. If you find such a thing, rename it
or remove the directory.

Another alternative would be to push a configset named new_core up
to Zookeeper, that might allow you to see (and then delete) the
collection new_core belongs to.

It looks like you tried to use the admin UI to create a core and it's
all local or something like that.

Best,
Erick

On Wed, May 6, 2015 at 4:00 AM, shacky shack...@gmail.com wrote:
 Hi list.

 I created a new collection on my new SolrCloud installation, the new
 collection is shown and replicated on all three nodes, but on the
 first node (only on this one) I get this error:

 new_core: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 Could not find configName for collection new_core found:null

 I cannot see any core named new_core on that node, and I also tried
 to remove it:

 root@index1:/opt/solr# ./bin/solr delete -c new_core
 Connecting to ZooKeeper at zk1,zk2,zk3
 ERROR: Collection new_core not found!

 Could you help me, please?

 Thank you very much!
 Bye


Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina

Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the same, but have values HELLO and hello,
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies pn and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson erickerick...@gmail.com wrote:


Well, working fine may be a bit of an overstatement. That has never
been officially supported, so it just happened to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name pn and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 field name=id type=string multiValued=false indexed=true
required=true stored=true/
 field name=pn type=text_general multiValued=true

indexed=true

stored=false/
...
uniqueKeyid/uniqueKey
...
   copyField source=id dest=pn/

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
:
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id foo overwrites a doc with id FOO then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Solr cloud clusterstate.json update query ?

2015-05-06 Thread Erick Erickson
Gopal:

Did you see my previous answer?

Best,
Erick

On Tue, May 5, 2015 at 9:42 PM, Gopal Jee zgo...@gmail.com wrote:
 about 2 , live_nodes under zookeeper is ephemeral node (please see
 zookeeper ephemeral node). So, once connection from solr zkClient to
 zookeeper is lost, these nodes will disappear automatically. AFAIK,
 clusterstate.json is updated by overseer based on messages published to a
 queue in zookeeper by solr zkclients. In case, solr node dies ungracefully,
 I am not sure how this event is updated in clusterstate.json.
 *Can someone shed some light *on ungraceful solr shutdown and consequent
 status update in clusterstate. I guess there would be some ay, because all
 nodes in a cluster decides clusterstate based on watched clusterstate.json
 node. They will not be watching live_nodes for updating their state.

 Gopal

 On Wed, May 6, 2015 at 6:33 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 about 1. This shouldn't be happening, so I wouldn't concentrate
 there first. The most common reason is that you have a short Zookeeper
 timeout and the replicas go into a stop-the-world garbage collection
 that exceeds the timeout. So the first thing to do is to see if that's
 happening. Here are a couple of good places to start:

 http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/
 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

 2 Partial answer is that ZK does a keep-alive type thing and if the
 solr nodes it knows about don't reply, it marks the nodes as down.

 Best,
 Erick

 On Tue, May 5, 2015 at 5:42 AM, Sai Sreenivas K sa...@myntra.com wrote:
  Could you clarify on the following questions,
  1. Is there a way to avoid all the nodes simultaneously getting into
  recovery state when a bulk indexing happens ? Is there an api to disable
  replication on one node for a while ?
 
  2. We recently changed the host name on nodes in solr.xml. But the old
 host
  entries still exist in the clusterstate.json marked as active state.
 Though
  live_nodes has the correct information. Who updates clusterstate.json if
  the node goes down in an ungraceful fashion without notifying its down
  state ?
 
  Thanks,
  Sai Sreenivas K




 --


Solr port went down on remote server

2015-05-06 Thread Nitin Solanki
Hi,
   I have installed Solr on remote server and started on port 8983.
Now, I have bind my local machine port 8983 with remote server 8983 of Solr
using *ssh* (Ubuntu OS). When I am requesting on Solr for getting the
suggestions on remote server through local machine calls. Sometimes it
gives response, sometimes doesn't.

I am not able to detect the problem that why is it so?
Is it remote server binding issue?  OR Solr went down ?
I am not getting the problem.

To detect the problem, I ran a crontab job using telnet command to check
existence of port (8983) of Solr. It is working fine without throwing any
connection refused error. I am able to detect the problem. Any help please..


Limit the documents for each shard in solr cloud

2015-05-06 Thread Jilani Shaik
Hi,

Is it possible to restrict number of documents per shard in Solr cloud?

Lets say we have Solr cloud with 4 nodes, and on each node we have one
leader and one replica. Like wise total we have 8 shards that includes
replicas. Now I need to index my documents in such a way that each shard
will have only 5 million documents. Total documents in Solr cloud should be
20 million documents.


Thanks,
Jilani