Re: How To: Debuging the whole indexing process

2015-05-29 Thread Alexandre Rafalovitch
In production or in test? I assume in test.

This level of detail usually implies some sort of Java debugger and java
instrumentation enabled. E.g. Chronon, which is commercial but can be tried
as a plugin with IntelliJ Idea full version trial.

Regards,
Alex
On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote:

 Hi,

 I want to debug the whole indexing process, the life cycle of indexing
 process (each and every function call by going via function to function),
 from the posting of the data.xml to creation of various index files ( _fnm,
 _fdt, etc ). So how/what should I setup and start, please help. I will be
 thankful to you.



 
 
  *add  doc  field name=title![CDATA[Aman Tandon]]/field
field name=job_role![CDATA[Search Engineer]]/field*


  *  /doc/add*


 With Regards
 Aman Tandon



Re: Ability to load solrcore.properties from zookeeper

2015-05-29 Thread Alan Woodward
Yeah, you could do it like that.  But looking at it further, I think 
solrcore.properties is actually being loaded in entirely the wrong place - it 
should be done by whatever is creating the CoreDescriptor, and then passed in 
as a Properties object to the CD constructor.  At the moment, you can't refer 
to a property defined in solrcore.properties within your core.properties file.

I'll open a JIRA if Steve hasn't already done so

Alan Woodward
www.flax.co.uk


On 28 May 2015, at 17:57, Chris Hostetter wrote:

 
 : certainly didn't intend to write it like this!).  The problem here will 
 : be that CoreDescriptors are currently built entirely from 
 : core.properties files, and the CoreLocators that construct them don't 
 : have any access to zookeeper.
 
 But they do have access to the CoreContainer which is passed to the 
 CoreDescriptor constructor -- it has all the ZK access you'd need at the 
 time when loadExtraProperties() is called.
 
 correct?
 
 as fleshed out in my last emil...
 
 :  patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method 
 ... 
 :  it would need to build up the path including the core name and get the 
 :  system level resource loader (CoreContainer.getResourceLoader()) to 
 access 
 :  it since the core doesn't exist yet so there is no core level 
 :  ResourceLoader to use.
 
 
 -Hoss
 http://www.lucidworks.com/



Re: SolrCloud 4.8.0 - Snapshots directory take a lot of space

2015-05-29 Thread Vincenzo D'Amore
bump

On Fri, May 8, 2015 at 4:45 PM, Vincenzo D'Amore v.dam...@gmail.com wrote:

 Hi All,

 Looking at data directory in my solrcloud cluster I have found a lot of
 old snapshot directory in

 Like these:
 snapshot.20150506003702765
 snapshot.20150506003702760
 snapshot.20150507002849492
 snapshot.20150507002849473
 snapshot.20150507002849459

 or even a month older. These directories keep really a lot of space, 2 or
 3 times the whole index.

 May I delete these directories? If yes, is there a best practice?


 --
 Vincenzo D'Amore
 email: v.dam...@gmail.com
 skype: free.dev
 mobile: +39 349 8513251




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Index optimize runs in background.

2015-05-29 Thread Modassar Ather
I have not added any timeout in the indexer except zk client time out which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Are you timing out on the client request? The theory here is that it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped

 Shot in the dark.
 Erick

 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I could not notice it but with my past experience of commit which used to
  take around 2 minutes is now taking around 8 seconds. I think this is
 also
  running as background.
 
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  The indexer takes almost 2 hours to optimize. It has a multi-threaded
 add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and optimize. I
 have
  seen that the optimize goes into background after 10 minutes and indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This behavior I
  have seen in multiple iteration of the indexing of same data.
 
  There is nothing significant I found in log which I can share. I can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 
  On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  All strange of course. What do your Solr logs show when this happens?
  And how reproducible is this?
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
   In this case, optimising makes sense, once the index is generated,
 you
   are not updating It.
  
   Upayavira
  
   On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
   Our index has almost 100M documents running on SolrCloud of 5 shards
  and
   each shard has an index size of about 170+GB (for the record, we are
  not
   using stored fields - our documents are pretty large). We perform a
  full
   indexing every weekend and during the week there are no updates
 made to
   the
   index. Most of the queries that we run are pretty complex with
 hundreds
   of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
  etc.
   and take many minutes to execute. A difference of 10-20% is also a
 big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it
 has
   worked well for us. Every once in a while, we upgrade Solr to the
  latest
   version and try without optimizing so that we can save the many
 hours
  it
   take to optimize such a huge index, but find optimized index work
 well
   for
   us.
  
   Erick I was indexing today the documents and saw the optimize
 happening
   in
   background.
  
   On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
No results yet. I finished the test harness last night (not
 really a
unit test, a stand-alone program that endlessly adds stuff and
 tests
that every commit returns the correct number of docs).
   
8,000 cycles later there aren't any problems reported.
   
Siiigh.
   
   
On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
  modather1...@gmail.com
wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize
 running
  in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
  modather1...@gmail.com

 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every
 time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We
 have
  been
 optimizing the index after indexing for years and it has worked
  well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr
  4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I
  can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can
  get to
it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
  wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn 

How To: Debuging the whole indexing process

2015-05-29 Thread Aman Tandon
Hi,

I want to debug the whole indexing process, the life cycle of indexing
process (each and every function call by going via function to function),
from the posting of the data.xml to creation of various index files ( _fnm,
_fdt, etc ). So how/what should I setup and start, please help. I will be
thankful to you.





 *add  doc  field name=title![CDATA[Aman Tandon]]/field
   field name=job_role![CDATA[Search Engineer]]/field*


 *  /doc/add*


With Regards
Aman Tandon


Help for a field in my schema ?

2015-05-29 Thread Bruno Mannina

Dear Solr-Users,

(SOLR 5.0 Ubuntu)

I have xml files with tags like this
claimXXYYY

where XX is a language code like FR EN DE PT etc... (I don't know the
number of language code I can have)
and YYY is a number [1..999]

i.e.:
claimen1
claimen2
claimen3
claimfr1
claimfr2
claimfr3

I would like to define fields named:
*claimen* equal to claimenYYY (EN language, all numbers, indexed=true,
stored=true) (search needed and must be displayed)
*claim *equal to all claimXXYYY (all languages, all numbers,
indexed=true, stored false) (search not needed but must be displayed)

Is it possible to have these 2 fields ?

Could you help me to declare them in my schema.xml ?

Thanks a lot for your help !

Bruno



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


Re: How to index 20 000 files with a command line ?

2015-05-29 Thread Sergey Shvets
Hello Bruno,

You can use find command with exec attribute.

regards
 Sergey

Friday, May 29, 2015, 3:11:37 PM, you wrote:

Dear Solr Users,

Habitualy i use this command line to index my files:
 bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files 
1kox150ko)

I get this error:
Error: bin/post argument too long

How could I index the whole directory ?

Thanks a lot for your help,

Solr 5.0 - Ubuntu

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com




-- 
Best regards,
 Sergeymailto:ser...@bintime.com



Re: How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina

oh yes like this:

 find  /data/hbl-201522/-name  *.xml  -exec  bin/post -c hbl  {}  \;

?

Le 29/05/2015 14:15, Sergey Shvets a écrit :

Hello Bruno,

You can use find command with exec attribute.

regards
  Sergey

Friday, May 29, 2015, 3:11:37 PM, you wrote:

Dear Solr Users,

Habitualy i use this command line to index my files:
  bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kox150ko)

I get this error:
Error: bin/post argument too long

How could I index the whole directory ?

Thanks a lot for your help,

Solr 5.0 - Ubuntu

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com








---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com


How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina

Dear Solr Users,

Habitualy i use this command line to index my files:
bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kox150ko)

I get this error:
Error: bin/post argument too long

How could I index the whole directory ?

Thanks a lot for your help,

Solr 5.0 - Ubuntu

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com



Re: Number of clustering labels to show

2015-05-29 Thread Stanislaw Osinski
Hi,

The number of clusters primarily depends on the parameters of the specific
clustering algorithm. If you're using the default Lingo algorithm, the
number of clusters is governed by
the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look
at the documentation (
https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings)
for some more details (the Tweaking at Query-Time section shows how to
pass the specific parameters at request time). A complete overview of the
Lingo clustering algorithm parameters is here:
http://doc.carrot2.org/#section.component.lingo.

Stanislaw

--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com

On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 I'm trying to increase the number of cluster result to be shown during the
 search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
 shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels
 shown.

 Is this the correct way to do this? I understand that setting it to 20
 might not necessary mean 20 lables will be shown, as the setting is for
 maximum number. But when I set this to 5, it should reduce the number of
 labels to 5?

 I'm using Solr 5.1.


 Regards,
 Edwin



Re: docValues: Can we apply synonym

2015-05-29 Thread Alessandro Benedetti
Even if a little bit outdated, that query parser is really really cool to
manage synonyms !
+1 !

2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Thanks chris.

 Yes we are using it for handling multiword synonym problem.

 With Regards
 Aman Tandon

 On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Again, I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:42 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Ok and what synonym processor you is talking about maybe it could help ?
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Sorry, my bad.   The synonym processor I mention works differently.
 It's
   an extension of the EDisMax query processor and doesn't require field
   level synonym configs.
  
   -Original Message-
   From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
   Sent: Wednesday, May 27, 2015 6:12 PM
   To: solr-user@lucene.apache.org
   Subject: RE: docValues: Can we apply synonym
  
   But the query analysis isn't on a specific field, it is applied to the
   query string.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Hi Charles,
  
   The problem here is that the docValues works only with primitives data
   type only like String, int, etc So how could we apply synonym on
   primitive data type.
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Is there any reason you cannot apply the synonyms at query time?
 Applying synonyms at indexing time has problems, e.g. polluting the
term frequency for synonyms added, preventing distance queries, ...
   
Since city names often have multiple terms, e.g. New York, Den
Hague, etc., I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
   filling.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
We found this to fix synonyms like ny for New York and vice
 versa.
Haven't tried it with docValues, tho.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Tuesday, May 26, 2015 11:15 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Yes it could be :)
   
Anyway thanks for helping.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms (
 including original word), but simply using this kind of processor
 will change the token position and offsets, modifying the actual
 content of the
document .

  I am from Bombay will become  I am from Bombay Mumbai which
 can be annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to
   search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms
   before, the indexing pipeline ( because docValues field can
   not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon 
 amantandon...@gmail.com
  :
  
We are interested in using docValues for better memory
utilization
 and
speed.
   
Currently we are faceting the search results on *city. *In
city we
 have
also added the synonym for cities like mumbai, bombay (These
are
 Indian
cities). So that result of mumbai is also eligible when
somebody will applying filter of bombay on search results.
   
I need this functionality to 

Re: Index optimize runs in background.

2015-05-29 Thread Erick Erickson
I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote:
 I have not added any timeout in the indexer except zk client time out which
 is 30 seconds. I am simply calling client.close() at the end of indexing.
 The same code was not running in background for optimize with solr-4.10.3
 and org.apache.solr.client.solrj.impl.CloudSolrServer.

 On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 Are you timing out on the client request? The theory here is that it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped

 Shot in the dark.
 Erick

 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I could not notice it but with my past experience of commit which used to
  take around 2 minutes is now taking around 8 seconds. I think this is
 also
  running as background.
 
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  The indexer takes almost 2 hours to optimize. It has a multi-threaded
 add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and optimize. I
 have
  seen that the optimize goes into background after 10 minutes and indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This behavior I
  have seen in multiple iteration of the indexing of same data.
 
  There is nothing significant I found in log which I can share. I can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 
  On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  All strange of course. What do your Solr logs show when this happens?
  And how reproducible is this?
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
   In this case, optimising makes sense, once the index is generated,
 you
   are not updating It.
  
   Upayavira
  
   On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
   Our index has almost 100M documents running on SolrCloud of 5 shards
  and
   each shard has an index size of about 170+GB (for the record, we are
  not
   using stored fields - our documents are pretty large). We perform a
  full
   indexing every weekend and during the week there are no updates
 made to
   the
   index. Most of the queries that we run are pretty complex with
 hundreds
   of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
  etc.
   and take many minutes to execute. A difference of 10-20% is also a
 big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it
 has
   worked well for us. Every once in a while, we upgrade Solr to the
  latest
   version and try without optimizing so that we can save the many
 hours
  it
   take to optimize such a huge index, but find optimized index work
 well
   for
   us.
  
   Erick I was indexing today the documents and saw the optimize
 happening
   in
   background.
  
   On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
No results yet. I finished the test harness last night (not
 really a
unit test, a stand-alone program that endlessly adds stuff and
 tests
that every commit returns the correct number of docs).
   
8,000 cycles later there aren't any problems reported.
   
Siiigh.
   
   
On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
  modather1...@gmail.com
wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize
 running
  in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
  modather1...@gmail.com

 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every
 time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We
 have
  been
 optimizing the index after indexing for years and it has worked
  well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr
  4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I
  can't
 reproduce this at will, sii.

   

Re: docValues: Can we apply synonym

2015-05-29 Thread Erick Erickson
Do take time for performance testing with that parser. It can be slow
depending on your
data as I remember. That said it solves the problem it set out to
solve so if it meets
your SLAs, it can be a life-saver.

Best,
Erick


On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Even if a little bit outdated, that query parser is really really cool to
 manage synonyms !
 +1 !

 2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Thanks chris.

 Yes we are using it for handling multiword synonym problem.

 With Regards
 Aman Tandon

 On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Again, I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:42 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Ok and what synonym processor you is talking about maybe it could help ?
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Sorry, my bad.   The synonym processor I mention works differently.
 It's
   an extension of the EDisMax query processor and doesn't require field
   level synonym configs.
  
   -Original Message-
   From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
   Sent: Wednesday, May 27, 2015 6:12 PM
   To: solr-user@lucene.apache.org
   Subject: RE: docValues: Can we apply synonym
  
   But the query analysis isn't on a specific field, it is applied to the
   query string.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Hi Charles,
  
   The problem here is that the docValues works only with primitives data
   type only like String, int, etc So how could we apply synonym on
   primitive data type.
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Is there any reason you cannot apply the synonyms at query time?
 Applying synonyms at indexing time has problems, e.g. polluting the
term frequency for synonyms added, preventing distance queries, ...
   
Since city names often have multiple terms, e.g. New York, Den
Hague, etc., I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
   filling.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
We found this to fix synonyms like ny for New York and vice
 versa.
Haven't tried it with docValues, tho.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Tuesday, May 26, 2015 11:15 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Yes it could be :)
   
Anyway thanks for helping.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms (
 including original word), but simply using this kind of processor
 will change the token position and offsets, modifying the actual
 content of the
document .

  I am from Bombay will become  I am from Bombay Mumbai which
 can be annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to
   search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms
   before, the indexing pipeline ( because docValues field can
   not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon 
 amantandon...@gmail.com
  :
  
We are interested in using docValues for better memory
utilization
 and
speed.
   
Currently we are faceting the search 

Re: Ignoring the Document Cache per query

2015-05-29 Thread Bryan Bende
Thanks Erik. I realize this really makes no sense, but I was looking to
work around a problem. Here is the scenario...

Using Solr 5.1 we have a service that utilizes the new mlt query parser to
get recommendations. So we start up the application,
ask for recommendations for a document, and everything works.

Another feature is to dislike a document, and once it is disliked it
shouldn't show up as a recommended document. It
does this by looking up the disliked documents for a user and adding a
filter query to the recommendation call which excludes
the disliked documents.

So now we dislike a document that was in the original list of
recommendations above, then ask for the recommendations again,
and now we get nothing back. If we restart Solr, or reload the collection,
then we can get it to work, but as soon as we dislike another
document we get back into a weird state.

Through trial and error I narrowed down that if we set the documentCache
size to 0, then this problem doesn't happen. Since we can't
really figure out why this is happening in Solr, we were hoping there was
some way to not use the document cache on the call where
we use the mlt query parser.

On Thu, May 28, 2015 at 5:44 PM, Erick Erickson erickerick...@gmail.com
wrote:

 First, there isn't that I know of. But why would you want to do this?

 On the face of it, it makes no sense to ignore the doc cache. One of its
 purposes is to hold the document (read off disk) for successive
 search components _in the same query_. Otherwise, each component
 might have to do a disk seek.

 So I must be missing why you want to do this.

 Best,
 Erick

 On Thu, May 28, 2015 at 1:23 PM, Bryan Bende bbe...@gmail.com wrote:
  Is there a way to the document cache on a per-query basis?
 
  It looks like theres {!cache=false} for preventing the filter cache from
  being used for a given query, looking for the same thing for the document
  cache.
 
  Thanks,
 
  Bryan



Re: Help for a field in my schema ?

2015-05-29 Thread Erick Erickson
Well yes, but the second doesn't do what you say you want,

bq: *claim *equal to all claimXXYYY (all languages, all numbers,
indexed=true, stored false) (search not needed but must be displayed)

You can search this field, but specifying it in a field list (fl) will
return nothing, you need indexed=false and stored=true.

But there seems to be a problem here. You say I don't know how many
language codes there are, so I'm assuming  you want
claimen
claimfr
claimde
etc., that you want to search separately. So somewhere on the
ingestion side or in a custom update processor (personally I'd do it
in a SolrJ program in the ETL pipeline) you need to figure out which
of these fields to populate. A dynamic field would work, something
like:
dynamicField name=claim* type=some text type indexed=true
stored=false/

Now, anything that starts with claim will get its own field.
Now, a copyField from claim* to display_claim (indexed=false,
stored=true) will show the contents.

But the problem here is that all your different languages get the same
analysis applied so you can't do, say, language-specific stemming. If
all your languages are Western, you might be able to use one of the
folding filters to ignore diacritics etc and get good enough
results.

There is no need to store these twice, so the searchable forms should
have stored=false, just always specify display_claim in your fl list.

Best,
Erick

On Fri, May 29, 2015 at 5:27 AM, Bruno Mannina bmann...@free.fr wrote:
 Dear Solr-Users,

 (SOLR 5.0 Ubuntu)

 I have xml files with tags like this
 claimXXYYY

 where XX is a language code like FR EN DE PT etc... (I don't know the number
 of language code I can have)
 and YYY is a number [1..999]

 i.e.:
 claimen1
 claimen2
 claimen3
 claimfr1
 claimfr2
 claimfr3

 I would like to define fields named:
 *claimen* equal to claimenYYY (EN language, all numbers, indexed=true,
 stored=true) (search needed and must be displayed)
 *claim *equal to all claimXXYYY (all languages, all numbers, indexed=true,
 stored false) (search not needed but must be displayed)

 Is it possible to have these 2 fields ?

 Could you help me to declare them in my schema.xml ?

 Thanks a lot for your help !

 Bruno



 ---
 Ce courrier électronique ne contient aucun virus ou logiciel malveillant
 parce que la protection avast! Antivirus est active.
 http://www.avast.com


user interface

2015-05-29 Thread Mustafa KIZILDAĞ
Hi,

My name is Mustafa. I'm a master student at YTU in Turkey. I am doing a
crawler for Voip problem for my job and scholl. I want to configure Solr's
user interface. For example, I want to add an image or add a comment on
user interface?

I searched about it but could't find a good result.

Could you help me,

Best Regards.

Mustafa KIZILDAĞ


Re: CLUSTERSTATUS timeout

2015-05-29 Thread Joseph Obernberger

I'm also getting this error with 5.1.0 and a 27 shard setup.

null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection 
time out:180s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:740)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:692)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:1042)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:259)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:783)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:282)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:745)

Just another data point.

-Joe

On 12/17/2014 8:44 AM, adfel70 wrote:

Hi Jonathan,
We are having the exact same problem with Solr 4.8.0.
Did you manage to resolve this one?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CLUSTERSTATUS-timeout-tp4173224p4174741.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: When is too many fields in qf is too many?

2015-05-29 Thread Reitzel, Charles
Before giving up, I might try a copyTo fields per field group and see how 
that works.   Won't that get you down to  10-20 fields per query and be stable 
wrt view changes?

But Solr is column oriented, in that the core query logic is a scatter/gather 
over qf list.   Perhaps there is a reason qf does not support wildcards.  Not 
sure.  But it seems likely.

That said, having thousands of columns is not weird at all in some 
applications.   You might be better served with a product oriented to this type 
of usage.  Maybe HBASE?

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Thursday, May 28, 2015 5:59 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?

Hi Folks,

First, thanks for taking the time to read and reply to this subject, it is much 
appreciated, I have yet to come up with a final solution that optimizes Solr.  
To give you more context, let me give you the big picture of how the 
application and the database is structured for which I'm trying to enable Solr 
search on.

Application: Has the concept of views.  A view contains one or more object 
types.  An object type may exist in any view.  An object type has one or more 
field groups.  A field group has a set of fields.  A field group can be used 
with any object type of any view.  Notice how field groups are free standing, 
that they can be linked to an object type of any view?

Here is a diagram of the above:

FieldGroup-#1 == Field-1, Field-2, Field-5, etc.
FieldGroup-#2 == Field-1, Field-5, Field-6, Field-7, Field-8, etc.
FieldGroup-#3 == Field-2, Field-5, Field-8, etc.

View-#1 == ObjType-#2 (using FieldGroup-#1  #3)  +  ObjType-#4 (using
FieldGroup-#1)  +  ObjType-#5 (using FieldGroup-#1, #2, #3, etc).

View-#2 == ObjType-#1 (using FieldGroup-#3, #15, #16, #19, etc.)  +
 ObjType-#4 (using FieldGroup-#1, #4, #19, etc.)  +  etc.

View-#3 == ObjType-#1 (using FieldGroup-#1,  #8)  +  etc.

Do you see where this is heading?  To make it even a bit more interesting,
ObjType-#4 (which is in view-#1 and #2 per the above) which in both views, it 
uses FieldGroup-#1, in one view it can be configured to have its own fields off 
FieldGroup-#1.

With the above setting, a user is assigned a view and can be moved around views 
but cannot be in multiple views at the same time.  Based on which view that 
user is in, that user will see different fields of ObjType-#1 (the example I 
gave for FieldGroup-#1) or even not see an object type that he was able to see 
in another view.

If I have not lost you with the above, you can see that per view, there can be 
may fields.  To make it even yet more interesting, a field in
FieldGroup-#1 may have the exact same name as a field in another FieldGroup and 
the two could be of different type (one is date, the other is string type).  
Thus when I build my Solr doc object (and create list of Solr
fields) those fields must be prefixed with the FieldGroup name otherwise I 
could end up overwriting the type of another field.

Are you still with me?  :-)

Now you see how a view can end up with many fields (over 3500 in my case), but 
a doc I post to Solr for indexing will have on average 50 fields, worse case 
maybe 200 fields.  This is fine, and it is not my issue but I want to call it 
out to get it out of our way.

Another thing I need to mention is this (in case it is not clear from the 
above).  Users create and edit records in the DB by an instance of ObjType-#N.  
Those object types that are created do NOT belong to a view, in fact they do 
NOT have any view concept in them.  They simply have the concept of what fields 
the user can see / edit based on which view that user is in.  In effect, in the 
DB, we have instances of object types data.

One last thing I should point out is that views, and field groups are dynamic.  
This month, View-#3 may have ObjType-#1, but next month it may not or a new 
object type may be added to it.

Still with me?  If so, you are my hero!!  :-)

So, I setup my Solr schema.xml to include all fields off each field group that 
exists in the database like so:

field name=FieldGroup-1.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1. ... ... ... ... /
field name=FieldGroup-2.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Date type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2. ... ... ... ... /
field name=FieldGroup-3. ... ... ... ... /
field name=FieldGroup-4. ... ... ... ... /

You got the idea.  Each record of an object type I index contains ALL the 
fields off that that object type REGARDLESS which view that object type is set 
to 

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-29 Thread Reitzel, Charles
Thanks, Erick.   I appreciate the sanity check.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 28, 2015 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: optimal shard assignment with low shard key cardinality using 
compositeId to enable shard splitting

Charles:

You raise good points, and I didn't mean to say that co-locating docs due to 
some critera was never a good idea. That said, it does add administrative 
complexity that I'd prefer to avoid unless necessary.

I suppose it largely depends on what the load and response SLAs are.
If there's 1 query/second peak load, the sharding overhead for queries is 
probably not noticeable. If there are 1,000 QPS, then it might be worth it.

Measure, measure, measure..

I think your composite ID understanding is fine.

Best,
Erick

On Thu, May 28, 2015 at 1:40 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
 We have used a similar sharding strategy for exactly the reasons you say.   
 But we are fairly certain that the # of documents per user ID is  5000 and, 
 typically, 500.   Thus, we think the overhead of distributed searches 
 clearly outweighs the benefits.   Would you agree?   We have done some load 
 testing (with 100's of simultaneous users) and performance has been good with 
 data and queries distributed evenly across shards.

 In Matteo's case, this model appears to apply well to user types B and C.
 Not sure about user type A, though.At  100,000 docs per user per year, 
 on average, that load seems ok for one node.   But, is it enough to benefit 
 significantly from a parallel search?

 With a 2 part composite ID, each part will contribute 16 bits to a 32 bit 
 hash value, which is then compared to the set of hash ranges for each active 
 shard.   Since the user ID will contribute the high-order bytes, it will 
 dominate in matching the target shard(s).   But dominance doesn't mean the 
 lower order 16 bits will always be ignored, does it?   I.e. if the original 
 shard has been split, perhaps multiple times, isn't it possible that one user 
 IDs documents will be spread over a multiple shards?

 In Matteo's case, it might make sense to specify fewer bits to the user ID 
 for user category A.   I.e. what I described above is the default for 
 userId!docId.   But if you use userId/8!docId/24 (8 bits for userId and 24 
 bits for the document ID), then couldn't one user's docs might be split over 
 multiple shards, even without splitting?

 I'm just making sure I understand how composite ID sharding works correctly.  
  Have I got it right?  Has any of this logic changed in 5.x?

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, May 21, 2015 11:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: optimal shard assignment with low shard key cardinality 
 using compositeId to enable shard splitting

 I question your base assumption:

 bq: So shard by document producer seems a good choice

  Because what this _also_ does is force all of the work for a query onto one 
 node and all indexing for a particular producer ditto. And will cause you to 
 manually monitor your shards to see if some of them grow out of proportion to 
 others. And

 I think it would be much less hassle to just let Solr distribute the docs as 
 it may based on the uniqueKey and forget about it. Unless you want, say, to 
 do joins etc There will, of course, be some overhead that you pay here, 
 but unless you an measure it and it's a pain I wouldn't add the complexity 
 you're talking about, especially at the volumes you're talking.

 Best,
 Erick

 On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com 
 wrote:
 Hi
 I'd like some feedback on how I'd like to solve the following 
 sharding problem


 I have a collection that will eventually become big

 Average document size is 1.5kb
 Every year 30 Million documents will be indexed

 Data come from different document producers (a person, owner of his
 documents) and queries are almost always performed by a document 
 producer who can only query his own document. So shard by document 
 producer seems a good choice

 there are 3 types of doc producer
 type A,
 cardinality 105 (there are 105 producers of this type) produce 17M 
 docs/year (the aggregated production af all type A producers) type B 
 cardinality ~10k produce 4M docs/year type C cardinality ~10M produce 
 9M docs/year

 I'm thinking about
 use compositeId ( solrDocId = producerId!docId ) to send all docs of the 
 same producer to the same shards. When a shard becomes too large I can use 
 shard splitting.

 problems
 -documents from type A producers could be oddly distributed among 
 shards, because hashing doesn't work well on small numbers (105) see 
 Appendix

 As a solution I could do this when a new typeA producer (producerA1) arrives:

 1) client app: generate a producer code
 2) client app: simulate murmurhashing 

Re: Ignoring the Document Cache per query

2015-05-29 Thread Erick Erickson
This is totally weird. The document cache should really have nothing
to do with whether MLT returns documents or not AFAIK. So either I'm
totally misunderstanding MLT, you're leaving out a step or there's
some bug in Solr. The fact that setting the document cache to 0
changes the behavior, or restarting Solr and submitting the exact same
request gives different behavior is strong evidence it's a problem
with Solr.

Could I ask you to open a JIRA and add all the relevant details you
can? Especially if you could get it to work (well actually fail) with
the techproducts data. But barring that, the (perhaps sanitized)
queries you send to get diff results before and after.

Best,
Erick

On Fri, May 29, 2015 at 7:10 AM, Bryan Bende bbe...@gmail.com wrote:
 Thanks Erik. I realize this really makes no sense, but I was looking to
 work around a problem. Here is the scenario...

 Using Solr 5.1 we have a service that utilizes the new mlt query parser to
 get recommendations. So we start up the application,
 ask for recommendations for a document, and everything works.

 Another feature is to dislike a document, and once it is disliked it
 shouldn't show up as a recommended document. It
 does this by looking up the disliked documents for a user and adding a
 filter query to the recommendation call which excludes
 the disliked documents.

 So now we dislike a document that was in the original list of
 recommendations above, then ask for the recommendations again,
 and now we get nothing back. If we restart Solr, or reload the collection,
 then we can get it to work, but as soon as we dislike another
 document we get back into a weird state.

 Through trial and error I narrowed down that if we set the documentCache
 size to 0, then this problem doesn't happen. Since we can't
 really figure out why this is happening in Solr, we were hoping there was
 some way to not use the document cache on the call where
 we use the mlt query parser.

 On Thu, May 28, 2015 at 5:44 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 First, there isn't that I know of. But why would you want to do this?

 On the face of it, it makes no sense to ignore the doc cache. One of its
 purposes is to hold the document (read off disk) for successive
 search components _in the same query_. Otherwise, each component
 might have to do a disk seek.

 So I must be missing why you want to do this.

 Best,
 Erick

 On Thu, May 28, 2015 at 1:23 PM, Bryan Bende bbe...@gmail.com wrote:
  Is there a way to the document cache on a per-query basis?
 
  It looks like theres {!cache=false} for preventing the filter cache from
  being used for a given query, looking for the same thing for the document
  cache.
 
  Thanks,
 
  Bryan



Re: Deleting Fields

2015-05-29 Thread Shawn Heisey
On 5/29/2015 5:08 PM, Joseph Obernberger wrote:
 Hi All - I have a lot of fields to delete, but noticed that once I
 started deleting them, I quickly ran out of heap space.  Is
 delete-field a memory intensive operation?  Should I delete one field,
 wait a while, then delete the next?

I'm not aware of a way to delete a field.  I may have a different
definition of what a field is than you do, though.

Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.

Can you be more specific about exactly what you are doing, what you are
seeing, and what you want to see instead?

Also, please be aware of this:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn



Re: How to setup solr in cluster

2015-05-29 Thread Erick Erickson
You really have to tell us more about what you mean. You have two
problems to solve
1 putting Solr on all the nodes and starting/stopping it. Puppet or
Chef help here, although it's perfectly possible to do this manually.
2 creating collecitons etc. For this you just need all your Solr
instances communicating with the Zookeeper you have set up.

So tell us what you have tried and what you are having problems with
and perhaps we can offer more specific suggestions.

Best,
Erick

On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote:
 Hi All,

 I am trying to setup solr on a cluster with 16 nodes.
 Only documentation I could find, talks about a local cluster which behaves 
 like a real cluster.
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

 I read about using tools like Chef or Puppet to configure solr on 
 production level cluster.

 Does this group has any suggestion about what is the best way to set it up?

 Thanks
 Sumit Purohit


RE: How to setup solr in cluster

2015-05-29 Thread Purohit, Sumit
Thanks for the reply.
I have tried example cloud setup using the link I mentioned.
I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the 
node.
That’s when I find out about Chef and Puppet. 

My problem is manually setting and start/stop solr  does not seem that 
efficient  to me and I wanted to sought community's suggestion.

Thanks
Sumit

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, May 29, 2015 4:43 PM
To: solr-user@lucene.apache.org
Subject: Re: How to setup solr in cluster

You really have to tell us more about what you mean. You have two problems to 
solve
1 putting Solr on all the nodes and starting/stopping it. Puppet or
Chef help here, although it's perfectly possible to do this manually.
2 creating collecitons etc. For this you just need all your Solr
instances communicating with the Zookeeper you have set up.

So tell us what you have tried and what you are having problems with and 
perhaps we can offer more specific suggestions.

Best,
Erick

On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote:
 Hi All,

 I am trying to setup solr on a cluster with 16 nodes.
 Only documentation I could find, talks about a local cluster which behaves 
 like a real cluster.
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+
 SolrCloud

 I read about using tools like Chef or Puppet to configure solr on 
 production level cluster.

 Does this group has any suggestion about what is the best way to set it up?

 Thanks
 Sumit Purohit


Re: docValues: Can we apply synonym

2015-05-29 Thread Aman Tandon
Hi Upayavira,

How the copyField will help in my scenario when I have to add the synonym
in docValue enable field.

With Regards
Aman Tandon

On Sat, May 30, 2015 at 1:18 AM, Upayavira u...@odoko.co.uk wrote:

 Use copyField to clone the field for faceting purposes.

 Upayavira

 On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
  Hi Erick,
 
  Thanks for suggestion, We are this query parser plugin (
  *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
  synonym. So it does work slower than edismax that's why it is not in
  contrib right? (I am asking this question because we are using for all
  our
  searches to handle 10 multiword ice cube, icecube etc)
 
  *Moreover I thought a solution for this docValue problem*
 
  I need to make city field as *multivalued* and by this I mean i will add
  the synonym (*mumbai, bombay*) as an extra value to that field if
  present.
  Now searching operation will work fine as before.
 
  
   *field name=citymumbai/fieldfield name=citybombay/field*
 
 
  The only prob is if we have to remove the 'city alias/synonym facets'
  when
  we are providing results to the clients.
 
  *mumbai, 1000*
 
 
  With Regards
  Aman Tandon
 
  On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
   Do take time for performance testing with that parser. It can be slow
   depending on your
   data as I remember. That said it solves the problem it set out to
   solve so if it meets
   your SLAs, it can be a life-saver.
  
   Best,
   Erick
  
  
   On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
   benedetti.ale...@gmail.com wrote:
Even if a little bit outdated, that query parser is really really
 cool to
manage synonyms !
+1 !
   
2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
Thanks chris.
   
Yes we are using it for handling multiword synonym problem.
   
With Regards
Aman Tandon
   
On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Again, I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.


 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Ok and what synonym processor you is talking about maybe it could
   help ?

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Sorry, my bad.   The synonym processor I mention works
 differently.
It's
  an extension of the EDisMax query processor and doesn't require
   field
  level synonym configs.
 
  -Original Message-
  From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
  Sent: Wednesday, May 27, 2015 6:12 PM
  To: solr-user@lucene.apache.org
  Subject: RE: docValues: Can we apply synonym
 
  But the query analysis isn't on a specific field, it is applied
 to
   the
  query string.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:08 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Hi Charles,
 
  The problem here is that the docValues works only with
 primitives
   data
  type only like String, int, etc So how could we apply synonym on
  primitive data type.
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Is there any reason you cannot apply the synonyms at query
 time?
Applying synonyms at indexing time has problems, e.g.
 polluting
   the
   term frequency for synonyms added, preventing distance
 queries,
   ...
  
   Since city names often have multiple terms, e.g. New York, Den
   Hague, etc., I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great,
 less
  filling.
  
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   We found this to fix synonyms like ny for New York and
 vice
versa.
   Haven't tried it with docValues, tho.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Tuesday, May 26, 2015 11:15 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Yes it could be :)
  
   Anyway thanks for helping.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I should investigate that, as 

RE: How to setup solr in cluster

2015-05-29 Thread Purohit, Sumit
Sorry for this second email, but another problem of mine is :
when I copy sorl folder on each node and start them, should I run it as 1 node 
cluster on each node and use the same name for collection ? OR I have to create 
individual Shard on each node.

Thanks for your help.

Thanks
sumit 

-Original Message-
From: Purohit, Sumit 
Sent: Friday, May 29, 2015 5:10 PM
To: solr-user@lucene.apache.org
Subject: RE: How to setup solr in cluster

Thanks for the reply.
I have tried example cloud setup using the link I mentioned.
I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the 
node.
That’s when I find out about Chef and Puppet. 

My problem is manually setting and start/stop solr  does not seem that 
efficient  to me and I wanted to sought community's suggestion.

Thanks
Sumit

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, May 29, 2015 4:43 PM
To: solr-user@lucene.apache.org
Subject: Re: How to setup solr in cluster

You really have to tell us more about what you mean. You have two problems to 
solve
1 putting Solr on all the nodes and starting/stopping it. Puppet or
Chef help here, although it's perfectly possible to do this manually.
2 creating collecitons etc. For this you just need all your Solr
instances communicating with the Zookeeper you have set up.

So tell us what you have tried and what you are having problems with and 
perhaps we can offer more specific suggestions.

Best,
Erick

On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote:
 Hi All,

 I am trying to setup solr on a cluster with 16 nodes.
 Only documentation I could find, talks about a local cluster which behaves 
 like a real cluster.
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+
 SolrCloud

 I read about using tools like Chef or Puppet to configure solr on 
 production level cluster.

 Does this group has any suggestion about what is the best way to set it up?

 Thanks
 Sumit Purohit


How to setup solr in cluster

2015-05-29 Thread Purohit, Sumit
Hi All,

I am trying to setup solr on a cluster with 16 nodes.
Only documentation I could find, talks about a local cluster which behaves like 
a real cluster.
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

I read about using tools like Chef or Puppet to configure solr on 
production level cluster.

Does this group has any suggestion about what is the best way to set it up?

Thanks
Sumit Purohit


Deleting Fields

2015-05-29 Thread Joseph Obernberger
Hi All - I have a lot of fields to delete, but noticed that once I 
started deleting them, I quickly ran out of heap space.  Is delete-field 
a memory intensive operation?  Should I delete one field, wait a while, 
then delete the next?

Thank you!

-Joe


Re: Deleting Fields

2015-05-29 Thread Joseph Obernberger
Thank you Shawn - I'm referring to fields in the schema.  With Solr 5, 
you can delete fields from the schema.

https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField

-Joe

On 5/29/2015 7:30 PM, Shawn Heisey wrote:

On 5/29/2015 5:08 PM, Joseph Obernberger wrote:

Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space.  Is
delete-field a memory intensive operation?  Should I delete one field,
wait a while, then delete the next?

I'm not aware of a way to delete a field.  I may have a different
definition of what a field is than you do, though.

Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.

Can you be more specific about exactly what you are doing, what you are
seeing, and what you want to see instead?

Also, please be aware of this:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn






Re: How to setup solr in cluster

2015-05-29 Thread Erick Erickson
None of the above. You simply start Solr on each node, then use the
Collections API to create your collection. Solr will taker care of
creating the individual replicas on each of the nodes with respect to
the parameters you pass to the CREATE command.

Best,
Erick

On Fri, May 29, 2015 at 5:46 PM, Purohit, Sumit sumit.puro...@pnnl.gov wrote:
 Sorry for this second email, but another problem of mine is :
 when I copy sorl folder on each node and start them, should I run it as 1 
 node cluster on each node and use the same name for collection ? OR I have to 
 create individual Shard on each node.

 Thanks for your help.

 Thanks
 sumit

 -Original Message-
 From: Purohit, Sumit
 Sent: Friday, May 29, 2015 5:10 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How to setup solr in cluster

 Thanks for the reply.
 I have tried example cloud setup using the link I mentioned.
 I am trying to setup solr on all 16 nodes + 1 external zookeeper on 1 of the 
 node.
 That’s when I find out about Chef and Puppet.

 My problem is manually setting and start/stop solr  does not seem that 
 efficient  to me and I wanted to sought community's suggestion.

 Thanks
 Sumit

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, May 29, 2015 4:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to setup solr in cluster

 You really have to tell us more about what you mean. You have two problems to 
 solve
 1 putting Solr on all the nodes and starting/stopping it. Puppet or
 Chef help here, although it's perfectly possible to do this manually.
 2 creating collecitons etc. For this you just need all your Solr
 instances communicating with the Zookeeper you have set up.

 So tell us what you have tried and what you are having problems with and 
 perhaps we can offer more specific suggestions.

 Best,
 Erick

 On Fri, May 29, 2015 at 4:40 PM, Purohit, Sumit sumit.puro...@pnnl.gov 
 wrote:
 Hi All,

 I am trying to setup solr on a cluster with 16 nodes.
 Only documentation I could find, talks about a local cluster which behaves 
 like a real cluster.
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+
 SolrCloud

 I read about using tools like Chef or Puppet to configure solr on 
 production level cluster.

 Does this group has any suggestion about what is the best way to set it up?

 Thanks
 Sumit Purohit


Re: Deleting Fields

2015-05-29 Thread Erick Erickson
Yes, but deleting fields from the schema only means that _future_
documents will throw an undefined field error. All the documents
currently in the index will retain that field.

Why you're hitting an OOM is a mystery though. But delete field isn't
removing the contents if indexed documents. Showing us the full stack
when you hit an OOM would be helpful.

Best,
Erick

On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger
j...@lovehorsepower.com wrote:
 Thank you Shawn - I'm referring to fields in the schema.  With Solr 5, you
 can delete fields from the schema.
 https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField

 -Joe


 On 5/29/2015 7:30 PM, Shawn Heisey wrote:

 On 5/29/2015 5:08 PM, Joseph Obernberger wrote:

 Hi All - I have a lot of fields to delete, but noticed that once I
 started deleting them, I quickly ran out of heap space.  Is
 delete-field a memory intensive operation?  Should I delete one field,
 wait a while, then delete the next?

 I'm not aware of a way to delete a field.  I may have a different
 definition of what a field is than you do, though.

 Solr lets you delete entire documents, but deleting a field from the
 entire index would involve re-indexing every document in the index,
 excluding that field.

 Can you be more specific about exactly what you are doing, what you are
 seeing, and what you want to see instead?

 Also, please be aware of this:

 http://people.apache.org/~hossman/#threadhijack

 Thanks,
 Shawn





Re: docValues: Can we apply synonym

2015-05-29 Thread Aman Tandon
Hi Erick,

Thanks for suggestion, We are this query parser plugin (
*SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
synonym. So it does work slower than edismax that's why it is not in
contrib right? (I am asking this question because we are using for all our
searches to handle 10 multiword ice cube, icecube etc)

*Moreover I thought a solution for this docValue problem*

I need to make city field as *multivalued* and by this I mean i will add
the synonym (*mumbai, bombay*) as an extra value to that field if present.
Now searching operation will work fine as before.


 *field name=citymumbai/fieldfield name=citybombay/field*


The only prob is if we have to remove the 'city alias/synonym facets' when
we are providing results to the clients.

*mumbai, 1000*


With Regards
Aman Tandon

On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Do take time for performance testing with that parser. It can be slow
 depending on your
 data as I remember. That said it solves the problem it set out to
 solve so if it meets
 your SLAs, it can be a life-saver.

 Best,
 Erick


 On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  Even if a little bit outdated, that query parser is really really cool to
  manage synonyms !
  +1 !
 
  2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
  Thanks chris.
 
  Yes we are using it for handling multiword synonym problem.
 
  With Regards
  Aman Tandon
 
  On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Again, I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Wednesday, May 27, 2015 6:42 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Ok and what synonym processor you is talking about maybe it could
 help ?
  
   With Regards
   Aman Tandon
  
   On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Sorry, my bad.   The synonym processor I mention works differently.
  It's
an extension of the EDisMax query processor and doesn't require
 field
level synonym configs.
   
-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Wednesday, May 27, 2015 6:12 PM
To: solr-user@lucene.apache.org
Subject: RE: docValues: Can we apply synonym
   
But the query analysis isn't on a specific field, it is applied to
 the
query string.
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Wednesday, May 27, 2015 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Hi Charles,
   
The problem here is that the docValues works only with primitives
 data
type only like String, int, etc So how could we apply synonym on
primitive data type.
   
With Regards
Aman Tandon
   
On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Is there any reason you cannot apply the synonyms at query time?
  Applying synonyms at indexing time has problems, e.g. polluting
 the
 term frequency for synonyms added, preventing distance queries,
 ...

 Since city names often have multiple terms, e.g. New York, Den
 Hague, etc., I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
filling.


 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 We found this to fix synonyms like ny for New York and vice
  versa.
 Haven't tried it with docValues, tho.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Tuesday, May 26, 2015 11:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Yes it could be :)

 Anyway thanks for helping.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I should investigate that, as usually synonyms are analysis
 stage.
  A simple way is to replace the word with all its synonyms (
  including original word), but simply using this kind of
 processor
  will change the token position and offsets, modifying the actual
  content of the
 document .
 
   I am from Bombay will become  I am from Bombay Mumbai which
  can be annoying.
  So a clever approach must be investigated.
 
  2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com
 :
 
   Okay So how could I do it with UpdateProcessors?
  
   With Regards
   Aman Tandon
  
   On Tue, 

Re: user interface

2015-05-29 Thread Erik Hatcher
Which user interface?  Do you mean the admin UI?   Or perhaps /browse?  


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On May 29, 2015, at 1:34 PM, Mustafa KIZILDAĞ mustafakizilda...@gmail.com 
 wrote:
 
 Hi,
 
 My name is Mustafa. I'm a master student at YTU in Turkey. I am doing a
 crawler for Voip problem for my job and scholl. I want to configure Solr's
 user interface. For example, I want to add an image or add a comment on
 user interface?
 
 I searched about it but could't find a good result.
 
 Could you help me,
 
 Best Regards.
 
 Mustafa KIZILDAĞ



Re: How To: Debuging the whole indexing process

2015-05-29 Thread Aman Tandon
Thanks Alex, yes it for my testing to understand the code/process flow
actually.

Any other ideas.

With Regards
Aman Tandon

On Fri, May 29, 2015 at 12:48 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 In production or in test? I assume in test.

 This level of detail usually implies some sort of Java debugger and java
 instrumentation enabled. E.g. Chronon, which is commercial but can be tried
 as a plugin with IntelliJ Idea full version trial.

 Regards,
 Alex
 On 29 May 2015 4:38 pm, Aman Tandon amantandon...@gmail.com wrote:

  Hi,
 
  I want to debug the whole indexing process, the life cycle of indexing
  process (each and every function call by going via function to function),
  from the posting of the data.xml to creation of various index files (
 _fnm,
  _fdt, etc ). So how/what should I setup and start, please help. I will be
  thankful to you.
 
 
 
  
  
   *add  doc  field name=title![CDATA[Aman Tandon]]/field
 field name=job_role![CDATA[Search Engineer]]/field*
 
 
   *  /doc/add*
 
 
  With Regards
  Aman Tandon
 



Re: docValues: Can we apply synonym

2015-05-29 Thread Upayavira
Use copyField to clone the field for faceting purposes.

Upayavira

On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
 Hi Erick,
 
 Thanks for suggestion, We are this query parser plugin (
 *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
 synonym. So it does work slower than edismax that's why it is not in
 contrib right? (I am asking this question because we are using for all
 our
 searches to handle 10 multiword ice cube, icecube etc)
 
 *Moreover I thought a solution for this docValue problem*
 
 I need to make city field as *multivalued* and by this I mean i will add
 the synonym (*mumbai, bombay*) as an extra value to that field if
 present.
 Now searching operation will work fine as before.
 
 
  *field name=citymumbai/fieldfield name=citybombay/field*
 
 
 The only prob is if we have to remove the 'city alias/synonym facets'
 when
 we are providing results to the clients.
 
 *mumbai, 1000*
 
 
 With Regards
 Aman Tandon
 
 On Fri, May 29, 2015 at 7:26 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Do take time for performance testing with that parser. It can be slow
  depending on your
  data as I remember. That said it solves the problem it set out to
  solve so if it meets
  your SLAs, it can be a life-saver.
 
  Best,
  Erick
 
 
  On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
  benedetti.ale...@gmail.com wrote:
   Even if a little bit outdated, that query parser is really really cool to
   manage synonyms !
   +1 !
  
   2015-05-29 1:01 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
   Thanks chris.
  
   Yes we are using it for handling multiword synonym problem.
  
   With Regards
   Aman Tandon
  
   On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
Again, I would recommend using Nolan Lawson's
SynonymExpandingExtendedDismaxQParserPlugin.
   
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
   
-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com]
Sent: Wednesday, May 27, 2015 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym
   
Ok and what synonym processor you is talking about maybe it could
  help ?
   
With Regards
Aman Tandon
   
On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:
   
 Sorry, my bad.   The synonym processor I mention works differently.
   It's
 an extension of the EDisMax query processor and doesn't require
  field
 level synonym configs.

 -Original Message-
 From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
 Sent: Wednesday, May 27, 2015 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: docValues: Can we apply synonym

 But the query analysis isn't on a specific field, it is applied to
  the
 query string.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Hi Charles,

 The problem here is that the docValues works only with primitives
  data
 type only like String, int, etc So how could we apply synonym on
 primitive data type.

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Is there any reason you cannot apply the synonyms at query time?
   Applying synonyms at indexing time has problems, e.g. polluting
  the
  term frequency for synonyms added, preventing distance queries,
  ...
 
  Since city names often have multiple terms, e.g. New York, Den
  Hague, etc., I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
 filling.
 
 
  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
  We found this to fix synonyms like ny for New York and vice
   versa.
  Haven't tried it with docValues, tho.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Tuesday, May 26, 2015 11:15 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Yes it could be :)
 
  Anyway thanks for helping.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I should investigate that, as usually synonyms are analysis
  stage.
   A simple way is to replace the word with all its synonyms (
   including original word), but simply using this kind of
  processor
   will change the token position and offsets, modifying the actual
   content of the
  document .
  
I am from Bombay will become  I am from Bombay Mumbai which
 

Re: [solr 5.1] Looking for full text + collation search field

2015-05-29 Thread TK Solr


On 5/21/15, 5:19 AM, Björn Keil wrote:

Thanks for the advice. I have tried the field type and it seems to do what it 
is supposed to in combination with a lower case filter.

However, that raises another slight problem:

German umlauts are supposed to be treated slightly different for the purpose of searching than for sorting. For sorting 
a normal ICUCollationField with standard rules should suffice*, for the purpose of searching I cannot just replace an 
ü with a u, ü is supposed to equal ue, or, in terms of 
RuleBasedCollators, there is a secondary difference.


I haven't used this personally but GermanNormalizationFilter seems to do the job
https://lucene.apache.org/core/5_1_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html