date:20150417

Hi All,

I have solr synonyms stored in multiple files as defined in the schema:

!ENTITY sinonimi_freeling
sinonimi_freeling/sfaa,sinonimi_freeling/sfab,sinonimi_freeling/sfac,sinonimi_freeling/sfad,sinonimi_freeling/sfae,sinonimi_freeling/sfaf,sinonimi_freeling/sfag,sinonimi_freeling/sfah,sinonimi_freeling/sfai,sinonimi_freeling/sfaj,sinonimi_freeling/sfak

so that I can specify synonym resource in this way:

filter class=solr.SynonymFilterFactory synonyms=sinonimi_freeling;
expand=false ignoreCase=true /

I'm quite worried because I tried to update one synonym file adding at the
end the new synonyms. SolrCloud didn't update its synonyms list.
So I reloaded the core and then I started to have floating results querying
solrcloud.

I had to stop and restart all the tomcat instances to stop this strange
behaviour.

Is there a best practice to update synonyms when you are using
SynonymFilterFactory?

How can I updated the synonym resources, why cannnot I simply upload the
new file into zookeeper?

Best regards,
Vincenzo

Re: SolrCloud 4.8.0 upgrade

2015-04-17 Thread Toke Eskildsen

Vincenzo D'Amore v.dam...@gmail.com wrote:
 I have a SolrCloud cluster with 3 server, I would like to use stats.facet,
 but this feature is available only if I upgrade to 4.10.

 May I simply redeploy new solr cloud version in tomcat or should reload all
 the documents?
 There are other drawbacks?

Support for the Disk-format for DocValues was removed after 4.8, so you should 
check if you use that: DocValuesFormat=Disk for the field in the schema, if I 
remember correctly.

- Toke Eskildsen

Re: spellcheck enabled but not getting any suggestions.

2015-04-17 Thread elisabeth benoit

Shouldn't you specify a spellcheck.dictionary in your request handler?

Best regards,
Elisabeth

2015-04-17 11:24 GMT+02:00 Derek Poh d...@globalsources.com:

 Hi

 I have enabled spellcheck but not getting any suggestions withincorrectly
 spelled keywords.
 I added the spellcheck into the/select request handler.

 What steps did I miss out?

 spellcheck list in return result:
 lst name=spellcheck
 lst name=suggestions/
 /lst


 solrconfig.xml:

 requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
  will be overridden by parameters in the request
   --
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dftext/str
!-- Spell checking defaults --
str name=spellcheckon/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str
  /lst

  !-- append spellchecking to our list of components --
  arr name=last-components
 strspellcheck/str
  /arr

 /requestHandler

Re: solr 4.8.0 update synonyms in zookeeper splitted files

On 4/17/2015 6:02 AM, Vincenzo D'Amore wrote:
I have solr synonyms stored in multiple files as defined in the schema:

!ENTITY sinonimi_freeling
sinonimi_freeling/sfaa,sinonimi_freeling/sfab,sinonimi_freeling/sfac,sinonimi_freeling/sfad,sinonimi_freeling/sfae,sinonimi_freeling/sfaf,sinonimi_freeling/sfag,sinonimi_freeling/sfah,sinonimi_freeling/sfai,sinonimi_freeling/sfaj,sinonimi_freeling/sfak

so that I can specify synonym resource in this way:

filter class=solr.SynonymFilterFactory synonyms=sinonimi_freeling;
expand=false ignoreCase=true /

I'm quite worried because I tried to update one synonym file adding at the
end the new synonyms. SolrCloud didn't update its synonyms list.
So I reloaded the core and then I started to have floating results querying
solrcloud.

I had to stop and restart all the tomcat instances to stop this strange
behaviour.

Is there a best practice to update synonyms when you are using
SynonymFilterFactory?

How can I updated the synonym resources, why cannnot I simply upload the
new file into zookeeper?

I've not encountered the !ENTITY syntax or used more than one synonym
file. I'll have to take your word for it that this works.

When you update a config resource, you must reload or restart for it to
take effect. If the resource is used in index analysis, you must
reindex after reloading. Resources used in query analysis will take
effect immediately.

With SolrCloud, you should reload the entire collection (with the
Collections API), not just a core (with the CoreAdmin API).

I don't know what you mean by floating results above.

Thanks,
Shawn

Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:



Hello,
 
What is the best way to search in a field ignoring accents?
 
The field has the type:
fieldType name=text_general_edge_ngram 
class=solr.TextField positionIncrementGap=100
   analyzer type=index
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
/fieldType
 
I’ve tried adding the filter:  filter class=solr.ASCIIFoldingFilterFactory/ 
but some strange results happened.. like:
 
Search by “Mourao” and the results were:
Mourão - OK
Monteiro - NOTOK
Morais - NOTOK
 
Thanks in advanced,
 
Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU

RE: search ignoring accents

2015-04-17 Thread Pedro Figueiredo

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
 


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:



Hello,
 
What is the best way to search in a field ignoring accents?
 
The field has the type:
fieldType name=text_general_edge_ngram 
class=solr.TextField positionIncrementGap=100
   analyzer type=index
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
/fieldType
 
I’ve tried adding the filter:  filter class=solr.ASCIIFoldingFilterFactory/
but some strange results happened.. like:
 
Search by “Mourao” and the results were:
Mourão - OK
Monteiro - NOTOK
Morais - NOTOK
 
Thanks in advanced,
 
Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU

Re: Solr 5.x deployment in production

On 4/16/2015 2:07 PM, Steven White wrote:
 In my case, I have to deploy Solr on Windows, AIX, and Linux (all server
 edition).  We are a WebSphere shop, moving away from it means I have to
 deal with politics and culture.

You *can* run Solr 5.0 (and 5.1) in another container, just like you
could with all previous Solr versions.  There are additional steps that
have to be taken, such as correctly installing the logging jars and the
logging config, but if you've used Solr 4.3 or later, you already know this:

http://wiki.apache.org/solr/SolrLogging

Eventually, hopefully before we reach the 6.0 release, that kind of
deployment won't be possible, because Solr will be a true application
(like Jetty itself), not a webapp contained in a .war file.  It may take
us quite a while to reach that point.  If you are already using the
scripts that come with Solr 5.x, you will have a seamless transition to
the new implementation.

The docs for 5.0 say that we aren't supporting deployment in a
third-party servlet container, even though that still is possible. 
There are several reasons for this:

* Eventually it won't be possible, because Solr's implementation will
change.

* We now have scripts that will start Solr in a consistent manner.
** This means that our instructions won't have to change for a new
implementation.

* There are a LOT of containers available.
** Each one requires different instructions.
** Are problems caused by the container, or Solr?  We may not know.

* Jetty is the only container that gets tested.
** Bugs with other containers have happened.
** User feedback is usually the only way such bugs can be found.

Thanks,
Shawn

Re: Range facets in sharded search

2015-04-17 Thread Will Miller

Thanks for the fast turnaround, you beat me to opening the Jira and fixed it 
too! Much appreciated.

Thanks,
Will

From: Tomás Fernández Löbbe tomasflo...@gmail.com
Sent: Thursday, April 16, 2015 10:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Range facets in sharded search

Should be fixed in 5.2. See https://issues.apache.org/jira/browse/SOLR-7412

On Thu, Apr 16, 2015 at 3:18 PM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 This looks like a bug. The logic to merge range facets from shards seems
 to only be merging counts, not the first level elements.
 Could you create a Jira?

 On Thu, Apr 16, 2015 at 2:38 PM, Will Miller wmil...@fbbrands.com wrote:

 I am seeing some some odd behavior with range facets across multiple
 shards. When querying each node directly with distrib=false the facet
 returned matches what is expected. When doing the same query against the
 collection and it spans the two shards, the facet after and between buckets
 are wrong.

 I can re-create a similar problem using the out of the box example
 scripts and data. I am running on Windows and tested both Solr 5.0.0 and
 5.1.0. This is the steps to reproduce:

 c:\solr-5.1.0\solr -e cloud

 These are the selections I made:

 (specify 1-4 nodes) [2]: 2
 Please enter the port for node1 [8983]: 8983
 Please enter the port for node2 [7574]: 7574
 Please provide a name for your new collection: [gettingstarted]
 gettingstarted
 How many shards would you like to split gettingstarted into? [2] 2
 How many replicas per shard would you like to create? [2] 1
 Please choose a configuration ...  [data_driven_schema_configs]
 sample_techproducts_configs

 I then posted some of the sample XMLs:

 C:\solr-5.1.0\example\exampledocs java -Dc=gettingstarted -jar post.jar
 vidcard.xml, hd.xml, ipod_other.xml, ipod_video.xml, mem.xml, monitor.xml,
 monitor2.xml,mp500.xml, sd500.xml

 This first query is against node1 with distrib=false:

 http://localhost:8983/solr/gettingstarted/select/?q=*:*wt=jsonindent=truedistrib=falsefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND

 There are 7 Results (results ommited).
 facet_ranges:{
   price:{
 counts:[
   0.0,1,
   20.0,0,
   40.0,0,
   60.0,0,
   80.0,1],
 gap:20.0,
 start:0.0,
 end:100.0,
 before:0,
 after:5,
 between:2}},

 This second query is against node2 with distrib=false:

 http://localhost:7574/solr/gettingstarted/select/?q=*:*wt=jsonindent=truedistrib=falsefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND

 7 Results (one product does not have a price):
 facet_ranges:{
   price:{
 counts:[
   0.0,1,
   20.0,0,
   40.0,0,
   60.0,1,
   80.0,0],
 gap:20.0,
 start:0.0,
 end:100.0,
 before:0,
 after:4,
 between:2}},

 Finally querying the entire collection:

 http://localhost:7574/solr/gettingstarted/select/?q=*:*wt=jsonindent=truefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND

 14 results (one without a price range):
 facet_ranges:{
   price:{
 counts:[
   0.0,2,
   20.0,0,
   40.0,0,
   60.0,1,
   80.0,1],
 gap:20.0,
 start:0.0,
 end:100.0,
 before:0,
 after:5,
 between:2}},

 Notice that both the after and the between are wrong here. The actual
 buckets do correctly represent the right values but I would expect
 between to be 5 and after to be 13.

 There appears to be a recently fixed issue (
 https://issues.apache.org/jira/browse/SOLR-6154) with range facet in
 distributed queries but it was related to buckets not always appearing with
 mincount=1 for the field. This looks like it is a different problem.

 Anyone have any suggestions or notice anythign wrong with my query
 parameters? I can open a Jira ticket but wanted to run it by the larger
 audience first to see if I am missing anything obvious.

 Thanks,

 Will

Re: 1:M connectivity

On 4/16/2015 2:27 PM, Oded Sofer wrote:
 The issue is the firewall setting needed for the cloud. We do not want to 
 open all nodes to all others nodes. However, we found that add-index to a 
 specific node tries to access all other nodes though we set it to index 
 locally on that node only. 

That is basic SolrCloud operation.

If the nodes cannot communicate with each other, they cannot keep
replicas in sync, they will not know when another node goes down
(required to keep the clusterstate current), multi-shard distributed
search will not work, Solr cannot load balance queries across the cloud,
collection creation will not work, and so on.  There are probably
several other fundamental SolrCloud operations that require inter-node
communication.

If you don't want the nodes to talk to each other, you probably need to
stop using SolrCloud, plus give up distributed search and replication
entirely.

Thanks,
Shawn

Re: Nno servers hosting shard.

2015-04-17 Thread Ugo Matrangolo

Hi,

sounds like you hit a Full GC. Check your GC.log.

Ugo
On 17 Apr 2015 08:24, Modassar Ather modather1...@gmail.com wrote:

 Hi,

 Any suggestion will be really helpful. Kindly provide your inputs.

 Thanks,
 Modassar

 On Thu, Apr 16, 2015 at 4:27 PM, Modassar Ather modather1...@gmail.com
 wrote:

  Hi,
 
  I have a setup of 5 node SolrCloud (Lucene/Solr version 5.1.0) without
  replicas. When I am executing complex and large queries with wild-cards
  after some time I am getting following exceptions.
  The index size on each of the node is around 170GB and the memory is set
  to -Xms20g -Xmx24g on each node.
 
  Empty shard!
  org.apache.solr.common.SolrException log
  SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
  at
 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:214)
  at
 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:184)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  There is no OutofMemory or any other major lead for me to understand what
  had caused it. May be I am missing something. There are following other
  exceptions:
 
  SEVERE: null:org.apache.solr.common.SolrException:
  org.apache.solr.client.solrj.SolrServerException: Timeout occurred while
  waiting response from server at: http://server:8080/solr/collection
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:342)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
  at
  org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193)
  at
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
  at
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
 
  WARNING: listener throws error
  org.apache.solr.common.SolrException:
  org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /configs/collection/params.json
  at
 
 org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:163)
  at
  org.apache.solr.core.SolrConfig.refreshRequestParams(SolrConfig.java:919)
  at org.apache.solr.core.SolrCore$11.run(SolrCore.java:2500)
  at org.apache.solr.cloud.ZkController$4.run(ZkController.java:2366)
  Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /configs/collection/params.json
  at
  org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:294)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:291)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
  at

Re: SolrCloud Core Reload

Hi,

this morning I have optimised my SolrCloud cluster (3 instances).
I have many collections, all are in shard and replica for each node.
At the end of optimisation task (about 10 minutes) all cores are optimised
on every node.

How can be sure than also reload affects all the cores?


On Fri, Apr 17, 2015 at 9:31 AM, Anshum Gupta ans...@anshumgupta.net
wrote:

 I don't think there is any Collection level support at this point in the
 Solr admin UI. Whatever you do via the UI would be core level, unless I'm
 forgetting something.

 On Thu, Apr 16, 2015 at 5:15 PM, Vincenzo D'Amore v.dam...@gmail.com
 wrote:

  Hi all,
 
  I have a solrcloud cluster with 3 server and there are many cores.
  Using the SolrCloud UI Admin Core, if I execute core optimize (or
  reload), all the core in the cluster will be optimized or reloaded? or
  only the selected core?.
 
  Best regards,
  Vincenzo
 



 --
 Anshum Gupta




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: SolrCloud Core Reload

On 4/17/2015 7:21 AM, Vincenzo D'Amore wrote:
 this morning I have optimised my SolrCloud cluster (3 instances).
 I have many collections, all are in shard and replica for each node.
 At the end of optimisation task (about 10 minutes) all cores are optimised
 on every node.
 
 How can be sure than also reload affects all the cores?


The optimize command is sent at the core level, to a specific machine,
but sets in motion an optimize of the entire collection, one core at a
time.  The optimize update command ignores distrib=false -- it always
optimizes the entire collection.

If you send a RELOAD action to a core in a collection, it will only
affect that core.  There is a separate RELOAD action on the Collections
API which will reload every core in the collection on all servers.

Perhaps we should change how optimize works, and provide an OPTIMIZE
action on the Collections API, so it works much the same as RELOAD.  I
remember seeing an issue in Jira about adding distrib=false support to
optimize, but now I can't find it.  Changing optimize to work like
RELOAD would fix that issue.

Thanks,
Shawn

RE: search ignoring accents

2015-04-17 Thread Pedro Figueiredo

And for this example what filter should I use?

Filter by edr should give the result Pedro
The NGram create tokens starting at the beginning or the ending, and in the 
middle?

Thanks!

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
 


-Original Message-
From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] 
Sent: 17 April 2015 12:22
To: solr-user@lucene.apache.org; 'Ahmet Arslan'
Subject: RE: search ignoring accents

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU
 


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:



Hello,
 
What is the best way to search in a field ignoring accents?
 
The field has the type:
fieldType name=text_general_edge_ngram 
class=solr.TextField positionIncrementGap=100
   analyzer type=index
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
/fieldType
 
I’ve tried adding the filter:  filter class=solr.ASCIIFoldingFilterFactory/
but some strange results happened.. like:
 
Search by “Mourao” and the results were:
Mourão - OK
Monteiro - NOTOK
Morais - NOTOK
 
Thanks in advanced,
 
Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU

Re: facets on external field

Hi Jainam,

One workaround is to use facet.query and frange query parser.


facet.query={!frange l=50 u=100}field(price)

Ahmet


On Thursday, April 16, 2015 1:01 PM, jainam vora jainam.v...@gmail.com wrote:
Hi,

I am using external field for price field since it changes frequently.
generate facets using external field? how?

I understand that faceting requires indexing and external fields fields are
not actually indexed.

Is there any solution for this problem?


-- 
Thanks  Regards,
Jainam Vora

Re: SolrCloud 4.8.0 upgrade

Solr/Lucene are supposed to _always_ read one major version back. Thus
your 4.10 should be able to read indexes produced all the way back to
(and including) 3.x. Sometimes experimental formats are excepted.

In your case you should be fine since you're upgrading from 4.8..

As always, though, I'd recommend copying your indexes someplace just
to be paranoid before upgrading.

Best,
Erick

On Fri, Apr 17, 2015 at 10:28 AM, Vincenzo D'Amore v.dam...@gmail.com wrote:
 Thanks for your answers, I looked at changes and we don't use
 DocValuesFormat.
 The question is, if I upgrade the SolrCloud version to 4.10, should I
 reload entirely all documents?
 Is there a binary compatibility between these two versions reading the
 solar home?

 On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Look at CHANGES.txt for both Lucene and Solr, there's always an
 upgrading section for each release.

 Best,
 Erick

 On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  Vincenzo D'Amore v.dam...@gmail.com wrote:
  I have a SolrCloud cluster with 3 server, I would like to use
 stats.facet,
  but this feature is available only if I upgrade to 4.10.
 
  May I simply redeploy new solr cloud version in tomcat or should reload
 all
  the documents?
  There are other drawbacks?
 
  Support for the Disk-format for DocValues was removed after 4.8, so you
 should check if you use that: DocValuesFormat=Disk for the field in the
 schema, if I remember correctly.
 
  - Toke Eskildsen




 --
 Vincenzo D'Amore
 email: v.dam...@gmail.com
 skype: free.dev
 mobile: +39 349 8513251

Re: JSON Facet Analytics API in Solr 5.1

2015-04-17 Thread Mike Murphy

I like the first way.  It matches how elasticsearch does it
http://www.elastic.co/guide/en/elasticsearch/reference/1.x/search-aggregations-bucket-range-aggregation.html

Can we specify explicit ranges in Solr now like we can in elasticsearch?

I do like how Solr's version of aggs can be much shorter though!

elasticsearch :

{
aggs : {
min_price : { min : { field : price } }
}
}

solr :

{
facet : { min_price : min(price)  }
}

Great work!

On Fri, Apr 17, 2015 at 12:20 PM, Yonik Seeley ysee...@gmail.com wrote:
 Does anyone have any thoughts on the current general structure of JSON facets?
 The current general form of a facet command is:

 facet_name : { facet_type : facet_args }

 For example:

 top_authors : { terms : {
   field : author,
   limit : 5,
 }}

 One alternative I considered in the past is having the type in the args:

 top_authors : {
   type : terms,
   field : author,
   limit : 5
 }

 It's a flatter structure... probably better in some ways, but worse in
 other ways.
 Thoughts / preferences?

 -Yonik


 On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
 Folks, there's a new JSON Facet API in the just released Solr 5.1
 (actually, a new facet module under the covers too).

 It's marked as experimental so we have time to change the API based on
 your feedback.  So let us know what you like, what you would change,
 what's missing, or any other ideas you may have!

 I've just started the documentation for the reference guide (on our
 confluence wiki), so for now the best doc is on my blog:

 http://yonik.com/json-facet-api/
 http://yonik.com/solr-facet-functions/
 http://yonik.com/solr-subfacets/

 I'll also be hanging out more on the #solr-dev IRC channel on freenode
 if you want to hit me up there about any development ideas.

 -Yonik


--Mike

Re: JSON Facet Analytics API in Solr 5.1

2015-04-17 Thread Jean-Sebastien Vachon

I prefer the second way. I find it more readable and shorter.

Thanks for making Solr even better ;)

From: Yonik Seeley ysee...@gmail.com
Sent: Friday, April 17, 2015 12:20 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON Facet  Analytics API in Solr 5.1

Does anyone have any thoughts on the current general structure of JSON facets?
The current general form of a facet command is:

facet_name : { facet_type : facet_args }

For example:

top_authors : { terms : {
  field : author,
  limit : 5,
}}

One alternative I considered in the past is having the type in the args:

top_authors : {
  type : terms,
  field : author,
  limit : 5
}

It's a flatter structure... probably better in some ways, but worse in
other ways.
Thoughts / preferences?

-Yonik

On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
 Folks, there's a new JSON Facet API in the just released Solr 5.1
 (actually, a new facet module under the covers too).

 It's marked as experimental so we have time to change the API based on
 your feedback.  So let us know what you like, what you would change,
 what's missing, or any other ideas you may have!

 I've just started the documentation for the reference guide (on our
 confluence wiki), so for now the best doc is on my blog:

 http://yonik.com/json-facet-api/
 http://yonik.com/solr-facet-functions/
 http://yonik.com/solr-subfacets/

 I'll also be hanging out more on the #solr-dev IRC channel on freenode
 if you want to hit me up there about any development ideas.

 -Yonik

Re: JSON Facet Analytics API in Solr 5.1

Personally I find the second form easier to read. The second level of
nesting in the first example confuses me at first glance.

I don't have a really strong preference here, but I vote for the second form.



On Fri, Apr 17, 2015 at 9:20 AM, Yonik Seeley ysee...@gmail.com wrote:
 Does anyone have any thoughts on the current general structure of JSON facets?
 The current general form of a facet command is:

 facet_name : { facet_type : facet_args }

 For example:

 top_authors : { terms : {
   field : author,
   limit : 5,
 }}

 One alternative I considered in the past is having the type in the args:

 top_authors : {
   type : terms,
   field : author,
   limit : 5
 }

 It's a flatter structure... probably better in some ways, but worse in
 other ways.
 Thoughts / preferences?

 -Yonik


 On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
 Folks, there's a new JSON Facet API in the just released Solr 5.1
 (actually, a new facet module under the covers too).

 It's marked as experimental so we have time to change the API based on
 your feedback.  So let us know what you like, what you would change,
 what's missing, or any other ideas you may have!

 I've just started the documentation for the reference guide (on our
 confluence wiki), so for now the best doc is on my blog:

 http://yonik.com/json-facet-api/
 http://yonik.com/solr-facet-functions/
 http://yonik.com/solr-subfacets/

 I'll also be hanging out more on the #solr-dev IRC channel on freenode
 if you want to hit me up there about any development ideas.

 -Yonik

Re: Solr 5.0, defaultSearchField, defaultOperator ?

Hi,

df and q.op are the ones you are looking for.
You can define them in defaults section.

Ahmet



On Friday, April 17, 2015 9:18 PM, Bruno Mannina bmann...@free.fr wrote:
Dear Solr users,

Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
schema for solr 5.0.

I have two questions:
- how can I set the defaultSearchField ?
I don't want to use in the query the df tag  because I have a lot of
modification to do for that on my web project.

- how can I set the defaultOperator (and|or) ?

It seems that these options are now deprecated in SOLR 5.0 schema.

Thanks a lot for your comment,

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles

Thanks for getting back.   Something like that crossed my mind but I checked 
the values on the way into SolrJ SolrInputDocument match the values printed in 
the Admin Query interface and they both match the expected value in the error 
message exactly.

Besides the difference is only in the last few bits ...

Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 
expected=1498643112821522400 actual=1498643112821522432

Note, all my _version_ values have zeroes in the last two digits.  But, again, 
there is agreement between the Admin UI and every stage of my client (from 
query in my REST service, to REST client in browser, back to update in my REST 
service).

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, April 16, 2015 5:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spurious _version_ conflict?


: I notice that the expected value in the error message matches both what
: I pass in and the index contents.  But the actual value in the error
: message is different only in the last (low order) two digits.  
: Consistently.

what does your client code look like?  Are you sure you aren't being bit by a 
JSON parsing library that can't handle long values and winds up truncating them?

https://issues.apache.org/jira/browse/SOLR-6364



-Hoss
http://www.lucidworks.com/

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles

Here's another data point.  To work around this issue, I am converting all 
non-null _version_ values to the constant 1 on the way into Solr.  As a result, 
updates work fine.

Immediately after the update+commit, a /select?q=*:* returns the _version_ 
value of 1498715798795976700 for id == '553d0f5d320c4321b13f4312ff907218'.

Looking in solr.log, however, the LogUpdateProcessor displays the following:
DEBUG - 2015-04-17 16:06:04.918; 
org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE FINISH 
{versions=truewt=javabinversion=2}
INFO  - 2015-04-17 16:06:04.918; org.apache.solr.update.LoggingInfoStream; 
[DW][commitScheduler-12-thread-1]: commitScheduler-12-thread-1 finishFullFlush 
success=true

INFO  - 2015-04-17 16:06:04.918; 
org.apache.solr.update.processor.LogUpdateProcessor; [bb] webapp=/solr 
path=/update params={versions=truewt=javabinversion=2} 
{add=[553d0f5d320c4321b13f4312ff907218 (1498715798795976704), ... ]} 0 15

Note: 1498715798795976700 is returned from the update to SolrJ with 
versions=true.

I.e. the last two digits disagree, with the client showing only zeroes.   So, 
yes, it appears some truncation is taking place.   But it looks to be upstream 
from my client code (which is seeing the same thing as the Admin UI).

I am running 4.10.3 on 64-bit Windows desktop.   Java is jdk1.7.0_67, 64-bit.

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Friday, April 17, 2015 11:37 AM
To: solr-user@lucene.apache.org
Subject: RE: Spurious _version_ conflict?

Thanks for getting back.   Something like that crossed my mind but I checked 
the values on the way into SolrJ SolrInputDocument match the values printed in 
the Admin Query interface and they both match the expected value in the error 
message exactly.

Besides the difference is only in the last few bits ...

Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 
expected=1498643112821522400 actual=1498643112821522432

Note, all my _version_ values have zeroes in the last two digits.  But, again, 
there is agreement between the Admin UI and every stage of my client (from 
query in my REST service, to REST client in browser, back to update in my REST 
service).

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, April 16, 2015 5:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spurious _version_ conflict?


: I notice that the expected value in the error message matches both what
: I pass in and the index contents.  But the actual value in the error
: message is different only in the last (low order) two digits.  
: Consistently.

what does your client code look like?  Are you sure you aren't being bit by a 
JSON parsing library that can't handle long values and winds up truncating them?

https://issues.apache.org/jira/browse/SOLR-6364



-Hoss
http://www.lucidworks.com/

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

Re: search ignoring accents

Pedro:

For your example, don't use EdgeNgrams, use just NGrams. That'll index
tokens like
(in the 2gram case) pe er dr ro and searching against edr would look
for ed dr. which would match.

However, this isn't in line with your first example where you got
results you didn't expect. You'll have to
be careful to search for these pairwise tokens as _phrases_ to prevent
false matches.

Best,
Erick

On Fri, Apr 17, 2015 at 4:50 AM, Pedro Figueiredo
pjlfigueir...@criticalsoftware.com wrote:
 And for this example what filter should I use?

 Filter by edr should give the result Pedro
 The NGram create tokens starting at the beginning or the ending, and in the 
 middle?

 Thanks!

 Pedro Figueiredo
 Senior Engineer

 pjlfigueir...@criticalsoftware.com
 M. 934058150


 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
 T. +351 229 446 927 | F. +351 229 446 929
 www.criticalsoftware.com

 PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
 A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU



 -Original Message-
 From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
 Sent: 17 April 2015 12:22
 To: solr-user@lucene.apache.org; 'Ahmet Arslan'
 Subject: RE: search ignoring accents

 Hi Ahmet,

 Yes... the EdgeNGram is what produces those results...
 I need it to improve the search by name by the applications users.

 Thanks.

 Pedro Figueiredo
 Senior Engineer

 pjlfigueir...@criticalsoftware.com
 M. 934058150


 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

 PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
 RATED COMPANY CMMI® is registered in the USPTO by CMU



 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
 Sent: 17 April 2015 12:01
 To: solr-user@lucene.apache.org
 Subject: Re: search ignoring accents

 Hi Pedro,

 solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
 Confusion comes from EdgeNGram, why do you need it?

 Ahmet



 On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
 pjlfigueir...@criticalsoftware.com wrote:



 Hello,

 What is the best way to search in a field ignoring accents?

 The field has the type:
 fieldType name=text_general_edge_ngram 
 class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer 
 class=solr.LowerCaseTokenizerFactory/
filter 
 class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
/analyzer
analyzer type=query
tokenizer 
 class=solr.LowerCaseTokenizerFactory/
filter 
 class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
/analyzer
 /fieldType

 I’ve tried adding the filter:  filter 
 class=solr.ASCIIFoldingFilterFactory/
 but some strange results happened.. like:

 Search by “Mourao” and the results were:
 Mourão - OK
 Monteiro - NOTOK
 Morais - NOTOK

 Thanks in advanced,

 Pedro Figueiredo
 Senior Engineer

 pjlfigueir...@criticalsoftware.com
 M. 934058150

 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

 PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
 RATED COMPANY CMMI® is registered in the USPTO by CMU

Re: SolrCloud 4.8.0 upgrade

Thanks for your answers, I looked at changes and we don't use
DocValuesFormat.
The question is, if I upgrade the SolrCloud version to 4.10, should I
reload entirely all documents?
Is there a binary compatibility between these two versions reading the
solar home?

On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Look at CHANGES.txt for both Lucene and Solr, there's always an
 upgrading section for each release.

 Best,
 Erick

 On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  Vincenzo D'Amore v.dam...@gmail.com wrote:
  I have a SolrCloud cluster with 3 server, I would like to use
 stats.facet,
  but this feature is available only if I upgrade to 4.10.
 
  May I simply redeploy new solr cloud version in tomcat or should reload
 all
  the documents?
  There are other drawbacks?
 
  Support for the Disk-format for DocValues was removed after 4.8, so you
 should check if you use that: DocValuesFormat=Disk for the field in the
 schema, if I remember correctly.
 
  - Toke Eskildsen




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Solr 5.0, defaultSearchField, defaultOperator ?

2015-04-17 Thread Bruno Mannina


Dear Solr users,

Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
schema for solr 5.0.

I have two questions:
- how can I set the defaultSearchField ?
I don't want to use in the query the df tag  because I have a lot of
modification to do for that on my web project.

- how can I set the defaultOperator (and|or) ?

It seems that these options are now deprecated in SOLR 5.0 schema.

Thanks a lot for your comment,

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

RE: Spurious _version_ conflict?


you still haven't provided any details on what your client code looks like 
-- ie: what code is talking to solr? what response format is it asking 
for? is it JSON? what is parsing that JSON?

as for the admin UI: if you are looking at a JSON response in the Query 
screen of the Admin UI, then the Javascript engine of your webbrowser is 
being use to parse the JSON and prettty print it for you.

what does the _version_ in the *RAW* response from your /get or /select 
request return when you use something like curl that does *NO* processing 
of the response data?



: Date: Fri, 17 Apr 2015 15:37:21 +
: From: Reitzel, Charles charles.reit...@tiaa-cref.org
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org solr-user@lucene.apache.org
: Subject: RE: Spurious _version_ conflict?
: 
: Thanks for getting back.   Something like that crossed my mind but I checked 
the values on the way into SolrJ SolrInputDocument match the values printed in 
the Admin Query interface and they both match the expected value in the error 
message exactly.
: 
: Besides the difference is only in the last few bits ...
: 
: Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 
expected=1498643112821522400 actual=1498643112821522432
: 
: Note, all my _version_ values have zeroes in the last two digits.  But, 
again, there is agreement between the Admin UI and every stage of my client 
(from query in my REST service, to REST client in browser, back to update in my 
REST service).
: 
: -Original Message-
: From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
: Sent: Thursday, April 16, 2015 5:04 PM
: To: solr-user@lucene.apache.org
: Subject: Re: Spurious _version_ conflict?
: 
: 
: : I notice that the expected value in the error message matches both what
: : I pass in and the index contents.  But the actual value in the error
: : message is different only in the last (low order) two digits.  
: : Consistently.
: 
: what does your client code look like?  Are you sure you aren't being bit by a 
JSON parsing library that can't handle long values and winds up truncating them?
: 
: https://issues.apache.org/jira/browse/SOLR-6364
: 
: 
: 
: -Hoss
: http://www.lucidworks.com/
: 
: *
: This e-mail may contain confidential or privileged information.
: If you are not the intended recipient, please notify the sender immediately 
and then delete it.
: 
: TIAA-CREF
: *
: 
: 

-Hoss
http://www.lucidworks.com/

Re: Merge indexes in MapReduce

The core admin MERGEINDEXES will work for you I'm pretty sure. You
copy the NRT index over to the all-the-time box. MERGEINDEXES just
takes the path to the index you want to add to the existing core.

Note the warnings in the reference guide about taking care that the
indexes aren't changing and committing at the very end of the
operation.

I suspect this is one of the cases where optimizing is called for, I
don't believe the MERGEINDEXES call triggers any kind of segment
merging, and since your all-the-time index isn't getting incremental
updates (I'm assuming), there's no event to trigger incremental
merges.

Best
Erick

On Fri, Apr 17, 2015 at 2:24 AM, ariya bala ariya...@gmail.com wrote:
 Hi Norgorn,

 I think there is no ready-made tool out of the box, but you have the spare
 parts in the MapreduceIndexerTool :-)
 With little effort you can decouple the index merging component from
 MRIndexerTool and use based on the needs.
 I did the same.

 On Fri, Apr 17, 2015 at 10:40 AM, Norgorn lsunnyd...@mail.ru wrote:

 Thank you for the reply.
 Out schema is:
 1) Index real-time (on separate machine).
 2) NRT index becomes large.
 3) Copy NRT index on other machine.
 3) Merge NRT-made indexes with large (all-the-time) index.
 4) Remove NRT index (until now it was available for searching).

 At the end we have big, optimized index with data of all the time.
 And we'r ready to index more data and indexing will be fast.

 Excuse me, if I'm describing unclearly.

 About optimization - indexing with low merge-factor results in lot of
 segments, which results in slow search, so we have to make it.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Merge-indexes-in-MapReduce-tp4200106p4200346.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 *Ariya *

Re: search ignoring accents

Hi Pedro,

Requirement of Filter by edr should give the result Pedro can be done 
expanding terms at index time only.
You can remove the ngram filter from query analyzer. 
But remember that ngram filter produces a lot of tokens. Try it on analysis 
page.

Regarding starting at the beginning or the ending, there is an 
EdgeNGramTokenFilter where you can specify side, front or back.

Ahmet




On Friday, April 17, 2015 2:50 PM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:
And for this example what filter should I use?

Filter by edr should give the result Pedro
The NGram create tokens starting at the beginning or the ending, and in the 
middle?

Thanks!

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU




-Original Message-
From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] 
Sent: 17 April 2015 12:22
To: solr-user@lucene.apache.org; 'Ahmet Arslan'
Subject: RE: search ignoring accents

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
pjlfigueir...@criticalsoftware.com wrote:



Hello,

What is the best way to search in a field ignoring accents?

The field has the type:
fieldType name=text_general_edge_ngram 
class=solr.TextField positionIncrementGap=100
   analyzer type=index
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.LowerCaseTokenizerFactory/
   filter 
class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/
   /analyzer
/fieldType

I’ve tried adding the filter:  filter class=solr.ASCIIFoldingFilterFactory/
but some strange results happened.. like:

Search by “Mourao” and the results were:
Mourão - OK
Monteiro - NOTOK
Morais - NOTOK

Thanks in advanced,

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU

Re: HttpSolrServer and CloudSolrServer

Additionally when indexing, CloudSolrServer collects up the documents
for each shard and routes them to the leader for that shard, moving
that processing away from whatever node you happen so contact using
HttpSolrServer.

Finally, HttpSolrServer is a single point of failure if the node you
point to goes down, whereas CloudSolrServer will compensate if any
node goes down.

Best,
Erick

On Fri, Apr 17, 2015 at 2:39 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote:
 If you're using SolrCloud then you should use CloudSolrServer as it is able
 to abstract / hide the interaction with the cluster. HttpSolrServer
 communicates directly with a Solr instance.

 Best,
 Andrea


 On 04/17/2015 10:59 AM, Vijay Bhoomireddy wrote:

 Hi All,


 Good Morning!!


 For SolrCloud deployment, for indexing data through SolrJ, which is the
 preferred / correct SolrServer class to use? HttpSolrServer of
 CloudSolrServer? In case both can be used, when to use which? Any help
 please.


 Thanks  Regards

 Vijay

Re: Solr 5.x deployment in production

2015-04-17 Thread Steven White

Thanks Shawn, this makes a lot of sense.

With WAR going away and no mention of Solr deployment strategy (see:
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production)
isn't good; there is a gab in Solr's release. It feels as if Solr 5.x was
rushed out ignoring Windows Servers deployment.

-- George

On Fri, Apr 17, 2015 at 9:24 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 4/16/2015 2:07 PM, Steven White wrote:
  In my case, I have to deploy Solr on Windows, AIX, and Linux (all server
  edition).  We are a WebSphere shop, moving away from it means I have to
  deal with politics and culture.

 You *can* run Solr 5.0 (and 5.1) in another container, just like you
 could with all previous Solr versions.  There are additional steps that
 have to be taken, such as correctly installing the logging jars and the
 logging config, but if you've used Solr 4.3 or later, you already know
 this:

 http://wiki.apache.org/solr/SolrLogging

 Eventually, hopefully before we reach the 6.0 release, that kind of
 deployment won't be possible, because Solr will be a true application
 (like Jetty itself), not a webapp contained in a .war file.  It may take
 us quite a while to reach that point.  If you are already using the
 scripts that come with Solr 5.x, you will have a seamless transition to
 the new implementation.

 The docs for 5.0 say that we aren't supporting deployment in a
 third-party servlet container, even though that still is possible.
 There are several reasons for this:

 * Eventually it won't be possible, because Solr's implementation will
 change.

 * We now have scripts that will start Solr in a consistent manner.
 ** This means that our instructions won't have to change for a new
 implementation.

 * There are a LOT of containers available.
 ** Each one requires different instructions.
 ** Are problems caused by the container, or Solr?  We may not know.

 * Jetty is the only container that gets tested.
 ** Bugs with other containers have happened.
 ** User feedback is usually the only way such bugs can be found.

 Thanks,
 Shawn

Re: Bad contentType for search handler :text/xml; charset=UTF-8


Off the cuff, it sounds like you are making a POST request to the 
SearchHandler (ie: /search or /query) and the Content-TYpe you are sending 
is text/xml; charset=UTF-8

In the past SearchHandler might have ignored that Content-Type, but now 
that structured queries can be sent as POST data, it's trying to parse the 
POST body and it can't make sense of your XML data.

As erick said: with out more details on what your client code looks like 
it's hard to give you additional advice -- the first big question you wnat 
to ask yourself though, is *why*, in SOlr 5.0, you were POSTing XML data 
to Solr -- what was the purpose of that POSTed XML data?



: Date: Thu, 16 Apr 2015 22:57:30 -0700 (MST)
: From: Pavel Hladik pavel.hla...@profimedia.cz
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Bad contentType for search handler :text/xml; charset=UTF-8
: 
: Hi,
: 
: we have migrated Solr from 5.0 do 5.1 and we can't search now, we have a
: ERROR for SolrCore like in subject. I can't get any info through Google.
: 
: Please, can someone help what is going on?
: 
: Thanks,
: 
: Pavel
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/

Re: SolrCloud 4.8.0 upgrade

Look at CHANGES.txt for both Lucene and Solr, there's always an
upgrading section for each release.

Best,
Erick

On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 Vincenzo D'Amore v.dam...@gmail.com wrote:
 I have a SolrCloud cluster with 3 server, I would like to use stats.facet,
 but this feature is available only if I upgrade to 4.10.

 May I simply redeploy new solr cloud version in tomcat or should reload all
 the documents?
 There are other drawbacks?

 Support for the Disk-format for DocValues was removed after 4.8, so you 
 should check if you use that: DocValuesFormat=Disk for the field in the 
 schema, if I remember correctly.

 - Toke Eskildsen

Solr Cloud reclaiming disk space from deleted documents

2015-04-17 Thread Rishi Easwaran

Hi All,

Running into an issue and wanted to see if anyone had some suggestions.
We are seeing this with both solr 4.6 and 4.10.3 code.
We are running an extremely update heavy application, with millions of writes 
and deletes happening to our indexes constantly.  An issue we are seeing is 
that solr cloud reclaiming the disk space that can be used for new inserts, by 
cleanup up deletes. 

We used to run optimize periodically with our old multicore set up, not sure if 
that works for solr cloud.

Num Docs:28762340
Max Doc:48079586
Deleted Docs:19317246

Version 1429299216227
Gen 16525463
Size 109.92 GB

In our solrconfig.xml we use the following configs.

indexConfig
!-- Values here affect all index writers and act as a default unless 
overridden. --
useCompoundFilefalse/useCompoundFile
maxBufferedDocs1000/maxBufferedDocs
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

mergeFactor10/mergeFactor
mergePolicy class=org.apache.lucene.index.TieredMergePolicy/
mergeScheduler 
class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount3/int
int name=maxMergeCount15/int
/mergeScheduler
ramBufferSizeMB64/ramBufferSizeMB

/indexConfig


Any suggestions on which which tunable to adjust, mergeFactor, mergeScheduler 
thread counts etc would be great.

Thanks,
Rishi.

Re: Differentiating user search term in Solr

: It looks to me that f with qq is doing phrase search, that's not what I
: want. The data in the field title is Apache Solr Release Notes

if you don't wnat phrase queries then you don't want pharse queries and
that's fine -- but it wasn't clear from any of your original emails
because you never provided (that i saw) any concrete examples of the types
of queries you expected, the types of matches you wanted, and the types of
matches you did *NOT* want. details matter

https://wiki.apache.org/solr/UsingMailingLists

Based on that one concrete example i've now seen of what you *do* want to
match: it seems that maybe a general description of your objective is that
each of the words in your user input should treated as a mandatory
clause in a boolean query -- but the concept of a word is already
something that violates your earlier statement about not wanting the query
parser to treat any reserved characters as special -- in order to
recognize that Apache, Solr and Notes should each be treated as
independent mandatory clauses in a boolean query, then some query parser
needs to recognize that *whitespace* is a syntactically significant
character in your query string: it's what seperates the words in your
input.

the reason the field parser produces phrase queries in the example URLs
you mentioned is because that parser doesn't have *ANY* special reserved
characters -- not even whitespace. it passes the entire input string to
the analyzer of the configured (f) field. if you are using TextField with
a Tokenizer that means it gets split on whitespace, resulting in multiple
*sequential* tokens, which will result in a phrase query (on the other
hand, using something like StrField will cause the entire input string,
spaces an all, to be serached as one single Term)

: I looked over the links you provided and tried out the examples, in each
: case if the user-typed-text contains any reserved characters, it will fail
: with a syntax error (the exception is when I used f and qq but like I
: said, that gave me 0 hit).

As i said: Details matter. which examples did you try? what configs were
you using? what data where you using? which version of solr are you using?
what exactly was the syntax error? etc ?

f and qq are not magic -- saying you used them just means you used
*some* parser that supports an f param ... if you tried it with the
term or field parser then i don't know why you would have gotten a
SyntaxError, but based on your goal it sounds like those parsers aren't
really useful to you. (see below)

: If you can give me a concrete example, please do. My need is to pass to
: Solr the text Apache: Solr Notes (without quotes) and get a hit as if I
: passed Apache\: Solr Notes ?

To re-iterate, saying you want the same bhavior as if you passed Apache\:
Solr Notes is a vague statment -- as if you passed that string to *what*
? to the standard parser? to the dismax parser? using what request
options? (q.op? qf? df?) ... query strings don't exist in a vacume. the
details context matters.

(I'm sorry if it feels like i keep hitting you over the head about this,
i'm just trying to help you realize the breadth and scope of the variables
involved in a question like the one you are asking, so you consider the
full context and understand *how* to think about the problem you are
trying to solve, and what questions to ask yourselve / this list)

My *BEST* guess as to a parser that might help you is the simple
parser...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser

...by default it supports several syntactically significant operators
(which can be escaped), but those can be disabled using the q.operators
option. As the documentation notes Any errors in syntax are ignored and
the query parser will interpret as best it can. This can mean, however,
odd results in some cases. so a lot of experimentation with a
large sample of expected good/bad queries is important to make sure you
understand what types of query structures search results you'll get out
of them

A trivial example of using the simple parser, with the Solr 5.1
bin/solr -e techproducts example configs/data would be...

http://localhost:8983/solr/techproducts/select?fl=id,namedebug=querydefType=simpleq.op=ANDq.operators=df=nameq=apple%20-ipod

which matches the name Apple 60 GB iPod with Video Playback Black even
though there is a - in front of ipod, because the q.operators= param
tells the parser to ignore all of it's operators. (at which point the
literal string -ipod is passed to the analyzer for the name field, and
it's striped off by the tokenizer). On the other hand it does not match
the name Belkin Mobile Power Cord for iPod w/ Dock because it doesn't
contain apple.

That was a trivial good example query -- it's important to remeber
however that localparam parsing happens *before* the actual query parser
is given the input string (it

Re: Java.net.socketexception: broken pipe Solr 4.10.2

2015-04-17 Thread vsilgalis

I haven't had time to really take a look at this. But read a couple of
articles regarding the hard commit and it actually makes sense.  We were
seeing tlogs in the multiple GBs during ingest.  I will have some time in a
couple of weeks to come back to testing indexing. Thanks for the help.

Vy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-net-socketexception-broken-pipe-Solr-4-10-2-tp4199484p4200498.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 5.1 'unique' facet function / calcDistinct

2015-04-17 Thread levanDev

Perfect, thank you for the information -- will have a look through those
classes. 

Thank you,
Levan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/5-1-unique-facet-function-calcDistinct-tp4200110p4200535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.0, defaultSearchField, defaultOperator ?


: df and q.op are the ones you are looking for.
: You can define them in defaults section.

specifically...

https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig


: 
: Ahmet
: 
: 
: 
: On Friday, April 17, 2015 9:18 PM, Bruno Mannina bmann...@free.fr wrote:
: Dear Solr users,
: 
: Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
: schema for solr 5.0.
: 
: I have two questions:
: - how can I set the defaultSearchField ?
: I don't want to use in the query the df tag  because I have a lot of
: modification to do for that on my web project.
: 
: - how can I set the defaultOperator (and|or) ?
: 
: It seems that these options are now deprecated in SOLR 5.0 schema.
: 
: Thanks a lot for your comment,
: 
: Regards,
: Bruno
: 
: ---
: Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
: http://www.avast.com
: 

-Hoss
http://www.lucidworks.com/

help with schema containing nested documents

2015-04-17 Thread Nicolae Pandrea

Hi,

I need some documentation/samples on how to create a SOLR schema with nested 
documents.
I have been looking online but could not find anything.

Thank you in advance,
Nick Pandrea

RE: Enrich search results with external data

2015-04-17 Thread ha.pham

Hi  Sujit,



Many thanks for your blog post, responding to my question, and suggesting the 
alternative option ☺



I think I prefer your approach because we can supply our own Comparator. The 
reason is that we need to meet some strict requirements: we can only call the 
external system once to retrieve extra fields (price, inventory, etc.) for 
probably a subset of the search result.  Therefore we need to be able to sort 
and facet on the list of items that some of them may not have external fields. 
I think using the Comparator would help with the sorting but let me know if you 
have different ideas.



Do you have suggestion how we should deal with the facet requirement? I am 
thinking about adding another Facet Component that will be executed after the 
standard FacetComponent. Let me know if you think we should consider other 
options.



Thanks,



-Ha



-Original Message-

From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit 
Pal

Sent: Saturday, April 11, 2015 10:23 AM

To: solr-user@lucene.apache.org; Ahmet Arslan

Subject: Re: Enrich search results with external data



Hi Ha,



I am the author of the blog post you mention. To your question, I don't know if 
the code will work without change (since the Lucene/Solr API has evolved so 
much over the last few years), but a more preferred way using Function 
Queries way may be found in slides for Timothy Potter's talk here:

http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011



Here he speaks of external fields stored in a database and accessed using a 
custom component (rather than from a flat file as in ExternalFieldField), and 
using function queries to influence the ranking based on the external field. 
However, per this document on function queries, you can use the output of a 
function query to sort as well by passing the function to the sort parameter.

https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function



Hope this helps,

Sujit





On Fri, Apr 10, 2015 at 10:38 PM, Ahmet Arslan iori...@yahoo.com.invalid

wrote:



 Hi,



 Who don't you include/add/index those additional fields, at least the

 one used in sorting?



 Also, you may find

 https://stanbol.apache.org/docs/trunk/components/enhancer/ relevant.



 Ahmet







 On Saturday, April 11, 2015 1:04 AM, ha.p...@arvatosystems.com 

 ha.p...@arvatosystems.com wrote:

 This ticket seems to address the problem I have



 https://issues.apache.org/jira/browse/SOLR-1566







 and as the result of that ticket, DocTransformer is added since Solr 4.0.

 I wrote a simple DocTransformer and found that the transformer is

 executed AFTER pagination. In our application, we need the external

 fields added before sorting/pagination. I've looked around for the

 option to change the execution order but haven't had any luck. Does anyone 
 know the solution?







 The ticket also states it is not possible for components to add

 fields to outgoing documents which are not in the stored fields of the 
 document.

 Does anyone know if this is still true?







 Thanks,







 -Ha











 -Original Message-



 From: Pham, Ha



 Sent: Thursday, April 09, 2015 11:41 PM



 To: solr-user@lucene.apache.org





 Subject: Enrich search results with external data







 Hi everyone,







 We have a requirement to append external data (e.g. price/inventory of

 product, retrieved from an ERP via web services) to query result and

 support sorting and pagination based on those external fields. For

 example if Solr returns 100 records and the page size user selects is

 20, the sorting on the external  fields is still on 100 records. This

 limits us from enriching search results outside of Solr. I guess this

 is a common problem so hopefully someone could share their experience.







 I am considering using a PostFilter and enrich documents in collect()

 method as below







 @Override



 public void collect(int docId) throws IOException { DoubleField price

 = new DoubleField (PRICE, 1.23, Field.Store.YES); Document

 currentDoc = context.reader().document(docId); currentDoc.add(price);

 }







 but the result documents don't have PRICE fields. Did I miss anything here?







 I also did some research and it seems the approach mentioned here

 http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-exte

 rnal.html is close to what we need to achieve but since the document

 is 4 years old, I don't know if there's a better approach for our

 problem (we are using solr 5.0)?







 Thanks,







 -Ha

RE: Spurious _version_ conflict?

2015-04-17 Thread Reitzel, Charles

Ah, starting to see the light ...  thanks for your patience.

First, this is a Java REST service using solrj.   I am using default transport 
(wt=javabin, I think).

But right-clicking the URL at the top of the Admin query  page and selecting 
open in new tab displays the non-truncated _version_ values.

Also, I am getting the non-truncated values from SolrJ QueryResponse.   I think 
I short-circuited my diagnosis when I saw matching truncated values in the 
browser.   

So, my bad.

To be safe, I will transport _version_ values as strings.   Thanks for your 
help!

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, April 17, 2015 12:50 PM
To: solr-user@lucene.apache.org
Subject: RE: Spurious _version_ conflict?


you still haven't provided any details on what your client code looks like
-- ie: what code is talking to solr? what response format is it asking for? is 
it JSON? what is parsing that JSON?

as for the admin UI: if you are looking at a JSON response in the Query 
screen of the Admin UI, then the Javascript engine of your webbrowser is being 
use to parse the JSON and prettty print it for you.

what does the _version_ in the *RAW* response from your /get or /select request 
return when you use something like curl that does *NO* processing of the 
response data?



: Date: Fri, 17 Apr 2015 15:37:21 +
: From: Reitzel, Charles charles.reit...@tiaa-cref.org
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org solr-user@lucene.apache.org
: Subject: RE: Spurious _version_ conflict?
: 
: Thanks for getting back.   Something like that crossed my mind but I checked 
the values on the way into SolrJ SolrInputDocument match the values printed in 
the Admin Query interface and they both match the expected value in the error 
message exactly.
: 
: Besides the difference is only in the last few bits ...
: 
: Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 
expected=1498643112821522400 actual=1498643112821522432
: 
: Note, all my _version_ values have zeroes in the last two digits.  But, 
again, there is agreement between the Admin UI and every stage of my client 
(from query in my REST service, to REST client in browser, back to update in my 
REST service).
: 
: -Original Message-
: From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
: Sent: Thursday, April 16, 2015 5:04 PM
: To: solr-user@lucene.apache.org
: Subject: Re: Spurious _version_ conflict?
: 
: 
: : I notice that the expected value in the error message matches both what
: : I pass in and the index contents.  But the actual value in the error
: : message is different only in the last (low order) two digits.  
: : Consistently.
: 
: what does your client code look like?  Are you sure you aren't being bit by a 
JSON parsing library that can't handle long values and winds up truncating them?
: 
: https://issues.apache.org/jira/browse/SOLR-6364
: 
: 
: 
: -Hoss
: http://www.lucidworks.com/
: 
: *
: This e-mail may contain confidential or privileged information.
: If you are not the intended recipient, please notify the sender immediately 
and then delete it.
: 
: TIAA-CREF
: *
: 
: 

-Hoss
http://www.lucidworks.com/

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

Re: 5.1 'unique' facet function / calcDistinct

2015-04-17 Thread levanDev

I've posted the issue here, please let me know if any additional information
needs to be provided.

https://issues.apache.org/jira/browse/SOLR-7417

Happy to provide the feedback, using the sub-facets has been a lot of fun,
the nested facet query is especially useful.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/5-1-unique-facet-function-calcDistinct-tp4200110p4200534.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud reclaiming disk space from deleted documents

On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
 Running into an issue and wanted to see if anyone had some suggestions.
 We are seeing this with both solr 4.6 and 4.10.3 code.
 We are running an extremely update heavy application, with millions of writes 
 and deletes happening to our indexes constantly.  An issue we are seeing is 
 that solr cloud reclaiming the disk space that can be used for new inserts, 
 by cleanup up deletes. 

 We used to run optimize periodically with our old multicore set up, not sure 
 if that works for solr cloud.

 Num Docs:28762340
 Max Doc:48079586
 Deleted Docs:19317246

 Version 1429299216227
 Gen 16525463
 Size 109.92 GB

 In our solrconfig.xml we use the following configs.

 indexConfig
 !-- Values here affect all index writers and act as a default unless 
 overridden. --
 useCompoundFilefalse/useCompoundFile
 maxBufferedDocs1000/maxBufferedDocs
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/maxFieldLength

 mergeFactor10/mergeFactor
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy/
 mergeScheduler 
 class=org.apache.lucene.index.ConcurrentMergeScheduler
 int name=maxThreadCount3/int
 int name=maxMergeCount15/int
 /mergeScheduler
 ramBufferSizeMB64/ramBufferSizeMB
 
 /indexConfig

This part of my response won't help the issue you wrote about, but it
can affect performance, so I'm going to mention it.  If your indexes are
stored on regular spinning disks, reduce mergeScheduler/maxThreadCount
to 1.  If they are stored on SSD, then a value of 3 is OK.  Spinning
disks cannot do seeks (read/write head moves) fast enough to handle
multiple merging threads properly.  All the seek activity required will
really slow down merging, which is a very bad thing when your indexing
load is high.  SSD disks do not have to seek, so multiple threads are OK
there.

An optimize is the only way to reclaim all of the disk space held by
deleted documents.  Over time, as segments are merged automatically,
deleted doc space will be automatically recovered, but it won't be
perfect, especially as segments are merged multiple times into very
large segments.

If you send an optimize command to a core/collection in SolrCloud, the
entire collection will be optimized ... the cloud will do one shard
replica (core) at a time until the entire collection has been
optimized.  There is no way (currently) to ask it to only optimize a
single core, or to do multiple cores simultaneously, even if they are on
different servers.

Thanks,
Shawn

Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-17 Thread Rishi Easwaran

Thanks Shawn for the quick reply.
Our indexes are running on SSD, so 3 should be ok.
Any recommendation on bumping it up?

I guess will have to run optimize for entire solr cloud and see if we can 
reclaim space.

Thanks,
Rishi. 
 

 

 

 

-Original Message-
From: Shawn Heisey apa...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Apr 17, 2015 6:22 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
 Running into an issue and wanted
to see if anyone had some suggestions.
 We are seeing this with both solr 4.6
and 4.10.3 code.
 We are running an extremely update heavy application, with
millions of writes and deletes happening to our indexes constantly.  An issue we
are seeing is that solr cloud reclaiming the disk space that can be used for new
inserts, by cleanup up deletes. 

 We used to run optimize periodically with
our old multicore set up, not sure if that works for solr cloud.

 Num
Docs:28762340
 Max Doc:48079586
 Deleted Docs:19317246

 Version
1429299216227
 Gen 16525463
 Size 109.92 GB

 In our solrconfig.xml we
use the following configs.

 indexConfig
 !-- Values here
affect all index writers and act as a default unless overridden. --

useCompoundFilefalse/useCompoundFile

maxBufferedDocs1000/maxBufferedDocs

maxMergeDocs2147483647/maxMergeDocs

maxFieldLength1/maxFieldLength


mergeFactor10/mergeFactor
 mergePolicy
class=org.apache.lucene.index.TieredMergePolicy/
 mergeScheduler
class=org.apache.lucene.index.ConcurrentMergeScheduler
 int
name=maxThreadCount3/int
 int
name=maxMergeCount15/int
 /mergeScheduler

ramBufferSizeMB64/ramBufferSizeMB
 
 /indexConfig

This
part of my response won't help the issue you wrote about, but it
can affect
performance, so I'm going to mention it.  If your indexes are
stored on regular
spinning disks, reduce mergeScheduler/maxThreadCount
to 1.  If they are stored
on SSD, then a value of 3 is OK.  Spinning
disks cannot do seeks (read/write
head moves) fast enough to handle
multiple merging threads properly.  All the
seek activity required will
really slow down merging, which is a very bad thing
when your indexing
load is high.  SSD disks do not have to seek, so multiple
threads are OK
there.

An optimize is the only way to reclaim all of the disk
space held by
deleted documents.  Over time, as segments are merged
automatically,
deleted doc space will be automatically recovered, but it won't
be
perfect, especially as segments are merged multiple times into very
large
segments.

If you send an optimize command to a core/collection in SolrCloud,
the
entire collection will be optimized ... the cloud will do one
shard
replica (core) at a time until the entire collection has been
optimized.
There is no way (currently) to ask it to only optimize a
single core, or to do
multiple cores simultaneously, even if they are on
different
servers.

Thanks,
Shawn

Re: JSON Facet Analytics API in Solr 5.1

2015-04-17 Thread Trey Grainger

Agreed, I also prefer the second way. I find it more readible, less verbose
while communicating the same information, less confusing to mentally parse
(is 'terms' the name of my facet, or the type of my facet?...), and less
prone to syntactlcally valid, but logically invalid inputs.  Let's break
those topics down.

*1) Less verbose while communicating the same information:*
The flatter structure is particularly useful when you have nested facets to
reduce unnecessary verbosity / extra levels. Let's contrast the two
approaches with just 2 levels of subfacets:

** Current Format **
top_genres:{
terms:{
field: genre,
limit: 5,
facet:{
top_authors:{
terms:{
field: author,
limit: 4,
facet: {
top_books:{
terms:{
field: title,
limit: 5
   }
   }
}
}
}
}
}
}

** Flat Format **
top_genres:{
type: terms,
field: genre,
limit: 5,
facet:{
top_authors:{
type: terms
field: author,
limit: 4,
facet: {
top_books:{
type: terms
field: title,
limit: 5
   }
}
}
}
}

The flat format is clearly shorter and more succinct, while communicating
the same information. What value do the extra levels add?


*2) Less confusing to mentally parse*
I also find the flatter structure less confusing, as I'm consistently
having to take a mental pause with the current format to verify whether
terms is the name of my facet or the type of my facet and have to count
the curly braces to figure this out.  Not that I would name my facets like
this, but to give an extreme example of why that extra mental calculation
is necessary due to the name of an attribute in the structure being able to
represent both a facet name and facet type:

terms: {
terms: {
field: genre,
limit: 5,
facet: {
terms: {
terms:{
field: author
limit: 4
}
}
}
}
}

In this example, the first terms is a facet name, the second terms is a
facet type, the third is a facet name, etc. Even if you don't name your
facets like this, it still requires parsing someone else's query mentally
to ensure that's not what was done.

3) *Less prone to syntactically valid, but logically invalid inputs*
Also, given this first format (where the type is indicated by one of
several possible attributes: terms, range, etc.), what happens if I pass in
multiple of the valid JSON attributes... the flatter structure prevents
this from being possible (which is a good thing!):

top_authors : {
terms : {
field : author,
limit : 5
},
range : {
field : price,
start : 0,
end : 100,
gap : 20
}
}

I don't think the response format can currently handle this without adding
in extra levels to make it look like the input side, so this is an
exception case even thought it seems syntactically valid.

So in conclusion, I'd give a strong vote to the flatter structure. Can
someone enumerate the benefits of the current format over the flatter
structure (I'm probably dense and just failing to see them currently)?

Thanks,

-Trey


On Fri, Apr 17, 2015 at 2:28 PM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 I prefer the second way. I find it more readable and shorter.

 Thanks for making Solr even better ;)

 
 From: Yonik Seeley ysee...@gmail.com
 Sent: Friday, April 17, 2015 12:20 PM
 To: solr-user@lucene.apache.org
 Subject: Re: JSON Facet  Analytics API in Solr 5.1

 Does anyone have any thoughts on the current general structure of JSON
 facets?
 The current general form of a facet command is:

 facet_name : { facet_type : facet_args }

 For example:

 top_authors : { terms : {
   field : author,
   limit : 5,
 }}

 One alternative I considered in the past is having the type in the args:

 top_authors : {
   type : terms,
   field : author,
   limit : 5
 }

 It's a flatter structure... probably better in some ways, but worse in
 other ways.
 Thoughts / preferences?

 -Yonik


 On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
  Folks, there's a new JSON Facet API in the just released Solr 5.1
  (actually, a new facet module under the covers too).
 
  It's marked as experimental so we have time to change the API based on
  your feedback.  So let us know what you like, what you would change,
  what's missing, or any other ideas you may have!
 
  I've just started the documentation for the reference guide (on our
  confluence wiki), so

Re: MoreLikeThis (mlt) in sharded SolrCloud

2015-04-17 Thread Anshum Gupta

Ah, I meant SOLR-7418 https://issues.apache.org/jira/browse/SOLR-7418.

On Fri, Apr 17, 2015 at 4:30 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 Hi Ere,

 Those seem like valid issues. I've created an issue : SOLR-7275
 https://issues.apache.org/jira/browse/SOLR-7275 and will create more as
 I find more of those.
 I plan to get to them and fix over the weekend.

 On Wed, Apr 15, 2015 at 5:13 AM, Ere Maijala ere.maij...@helsinki.fi
 wrote:

 Hi,

 I'm trying to gather information on how mlt works or is supposed to work
 with SolrCloud and a sharded collection. I've read issues SOLR-6248,
 SOLR-5480 and SOLR-4414, and docs at 
 https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling
 with multiple issues. I've been testing with Solr 5.1 and the Getting
 Started sample cloud. So, with a freshly extracted Solr, these are the
 steps I've done:

 bin/solr start -e cloud -noprompt
 bin/post -c gettingstarted docs/
 bin/post -c gettingstarted example/exampledocs/books.json

 After this I've tried different variations of queries with limited
 success:

 http://localhost:8983/solr/gettingstarted/select?q={!mlt}non-existing
 causes java.lang.NullPointerException at
 org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:80)

 http://localhost:8983/solr/gettingstarted/select?q={!mlt}978-0641723445
 causes java.lang.NullPointerException at
 org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:84)

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=title}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=title%7D978-0641723445
 
 causes java.lang.NullPointerException at
 org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759)

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=cat}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=cat%7D978-0641723445
 
 actually gives results

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=author,cat}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=author,cat%7D978-0641723445
 
 again causes Java.lang.NullPointerException at
 org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759)


 I guess the actual question is, how am I supposed to use the handler to
 replicate behavior of non-distributed mlt that was formerly used with
 qt=morelikethis and the following configuration in solrconfig.xml:

   requestHandler name=morelikethis class=solr.MoreLikeThisHandler
 lst name=defaults
   str
 name=mlt.fltitle,title_short,callnumber-label,topic,language,author,publishDate/str
   str name=mlt.qf
 title^75
 title_short^100
 callnumber-label^400
 topic^300
 language^30
 author^75
 publishDate
   /str
   int name=mlt.mintf1/int
   int name=mlt.mindf1/int
   str name=mlt.boosttrue/str
   int name=mlt.count5/int
   int name=rows5/int
 /lst
   /requestHandler

 Real-life full schema and config can be found at 
 https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
 .

 --Ere

 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland




 --
 Anshum Gupta




-- 
Anshum Gupta

Highlighting

2015-04-17 Thread Misagh Karimi



Hello All,
I am new to solr and trying to configure highlighting. If I look at the 
result in xml, or json format, I can see the highlighting part of the 
data and it looks good. However the velocity page does not show the 
highlighted words on my result page. Do I need to do something extra for 
the highlighting results to show up on the page that is generated by 
Velocity?


 Here is my hl setting in solrconfig.xml:
str name=hlon/str
str name=hl.flseriesTitle/str
str name=f.name.hl.fragsize0/str
str name=f.name.hl.alternateFieldseriesTitle/str

Here is those fields in schema.xml:
field name=seriesTitle type=text indexed=true stored=true/

fieldType name=text class=solr.TextField positionIncrementGap=100 
autoGeneratePhraseQueries=true

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
  /
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

Thank you in advance.
--
Misagh Karimi

Multilevel nested level support using Solr

2015-04-17 Thread Steven White

Hi folks,

In my DB, my records are nested in a folder base hierarchy:

Root
Level_1
record_1
record_2
Level_2
record_3
record_4
Level_3
record_5
Level_1
Level_2
Level_3
record_6
record_7
record_8

You got the idea.

Is there anything in Solr that will let me preserve this structer and thus
when I'm searching to tell it in which level to narrow down the search?  I
have four search levels needs:

1) Be able to search inside only level: Root.Level_1.Level_2.* (and
everything under Level_2 from this path).

2) Be able to search inside a level regardless it's path: Level_2.* (no
matter where Level_2 is, i want to search on all records under Level_2
and everything under it's path.

3) Same as #1 but limit the search to within that level (nothing below its
level are searched).

4) Same as #3 but limit the search to within that level (nothing below its
level are searched).

I found this:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
but it looks like it supports one level only and requires the whole two
levels be updated even if 1 of the doc in the nest is updated.

Thanks

Steve

Re: JSON Facet Analytics API in Solr 5.1

2015-04-17 Thread Yonik Seeley

Does anyone have any thoughts on the current general structure of JSON facets?
The current general form of a facet command is:

facet_name : { facet_type : facet_args }

For example:

top_authors : { terms : {
  field : author,
  limit : 5,
}}

One alternative I considered in the past is having the type in the args:

top_authors : {
  type : terms,
  field : author,
  limit : 5
}

It's a flatter structure... probably better in some ways, but worse in
other ways.
Thoughts / preferences?

-Yonik


On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote:
 Folks, there's a new JSON Facet API in the just released Solr 5.1
 (actually, a new facet module under the covers too).

 It's marked as experimental so we have time to change the API based on
 your feedback.  So let us know what you like, what you would change,
 what's missing, or any other ideas you may have!

 I've just started the documentation for the reference guide (on our
 confluence wiki), so for now the best doc is on my blog:

 http://yonik.com/json-facet-api/
 http://yonik.com/solr-facet-functions/
 http://yonik.com/solr-subfacets/

 I'll also be hanging out more on the #solr-dev IRC channel on freenode
 if you want to hit me up there about any development ideas.

 -Yonik

Re: Bad contentType for search handler :text/xml; charset=UTF-8

Not unless you provide a lot more details. Specifically, anything in your Solr
logs that looks suspicious _and_ in your container logs (Tomcat? Jetty?).

Plus the message you sent.

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Thu, Apr 16, 2015 at 10:57 PM, Pavel Hladik
pavel.hla...@profimedia.cz wrote:
 Hi,

 we have migrated Solr from 5.0 do 5.1 and we can't search now, we have a
 ERROR for SolrCore like in subject. I can't get any info through Google.

 Please, can someone help what is going on?

 Thanks,

 Pavel



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.8.0 update synonyms in zookeeper splitted files

On 4/17/2015 7:45 PM, Vincenzo D'Amore wrote:

Hi Shawn,

thanks for your answer.

I apologise for my english, for floating results I meant random results
in queries.

As far as I know, we should split the synonyms file because of zookeeper,
there is a limit in the size of files (1MB).
All my synonyms are about 10MB.

That's a very large synonyms file. If your synonyms happen at index
time, that might slow down indexing, and as I said before in my previous
reply, a full reindex would be required after updating the synonyms. If
your synonyms are at query time, a reindex wouldn't be required. Such a
large synonym file at query time could add noticeable time to query
parsing, because every term in the query would need to be checked
against every synonym.

Regarding the 1MB limit in zookeeper, you might find it more useful to
increase the limit instead of trying to use multiple files. Adding
-Djute.maxbuffer= to the java commandline on all Solr (Tomcat)
instances and all Zookeeper instances will increase this limit.

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Experimental+Options%2FFeatures

As a general rule, storing very large stuff in zookeeper is not
recommended, but synonyms will only be read when a core first starts up
or is reloaded, so I do not think it is a big problem in this case.

I have tried again in dev environment these steps:
1. put into zookeeper an updated synonym file sinonimi_freeling/sfak (added
just one new synonym )
2. reload the core using Core Admin UI

Then I started to receive random results executing a simple query like:

http://src-dev-3:8080/solr/0bis/select/?q=smartphonefl=*rows=24

There are random numFound in

result name=response numFound=641 start=0 maxScore=4.653946

and the order of documents vary.

If numFound is changing when you run the same query multiple times,
there is one of two things happening:

1) You have documents with the same uniqueKey value in more than one
shard. This can happen if you are using implicit (manual) document
routing for multiple shards.

2) Different replicas of your index have different settings (such as the
synonyms), or different documents in the index.Different settings can
happen if you update the config and then only reload/restart some of
your cores. Different documents in different replicas is usually an
indication of a bug, or something going very wrong, such as OutOfMemory
errors.

Thanks,
Shawn

Solr Performance with Ram size variation

2015-04-17 Thread Kamal Kishore Aggarwal

Hi,

As per this article, the linux machine is preferred to have 1.5 times RAM
with respect to index size. So, to verify this, I tried testing the solr
performance in different volumes of RAM allocation keeping other
configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same
in both the cases. I am using solr 4.8.1 with tomcat server.

https://wiki.apache.org/solr/SolrPerformanceProblems

1) Initially, the linux machine had 32 GB RAM, out of which I allocated
14GB to solr.

export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log

The average search time for 1000 queries 300ms.

2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to
Solr. Now, on a strange note, the average search time for the same set of
queries was 3000ms.

Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But,
still the search time was higher as compared to first case.

What am I missing. Please suggest.

Re: Solr Performance with Ram size variation

2015-04-17 Thread Puneet Pawaia

Hi,
This may be irrelevant but your machine configuration reminded me of some
reading I had done some time back on memory vs ssd.
Do a search on solr ssd and you should get some meaningful posts.
Like this one https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

Regards
Puneet
On 18 Apr 2015 07:45, Kamal Kishore Aggarwal kkroyal@gmail.com
wrote:

 Hi,

 As per this article, the linux machine is preferred to have 1.5 times RAM
 with respect to index size. So, to verify this, I tried testing the solr
 performance in different volumes of RAM allocation keeping other
 configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same
 in both the cases. I am using solr 4.8.1 with tomcat server.

 https://wiki.apache.org/solr/SolrPerformanceProblems

 1) Initially, the linux machine had 32 GB RAM, out of which I allocated
 14GB to solr.

 export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC
 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log

 The average search time for 1000 queries 300ms.

 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to
 Solr. Now, on a strange note, the average search time for the same set of
 queries was 3000ms.

 Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But,
 still the search time was higher as compared to first case.

 What am I missing. Please suggest.

Re: Solr Performance with Ram size variation

2015-04-17 Thread Otis Gospodnetic

Hi,

Because you went over 31-32 GB heap you lost the benefit of compressed
pointers and even though you gave the JVM more memory the GC may have had
to work harder.  This is a relatively well educated guess, which you can
confirm if you run tests and look at GC counts, times, JVM heap memory pool
utilization, etc.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Fri, Apr 17, 2015 at 10:14 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi,

 As per this article, the linux machine is preferred to have 1.5 times RAM
 with respect to index size. So, to verify this, I tried testing the solr
 performance in different volumes of RAM allocation keeping other
 configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same
 in both the cases. I am using solr 4.8.1 with tomcat server.

 https://wiki.apache.org/solr/SolrPerformanceProblems

 1) Initially, the linux machine had 32 GB RAM, out of which I allocated
 14GB to solr.

 export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC
 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log

 The average search time for 1000 queries 300ms.

 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to
 Solr. Now, on a strange note, the average search time for the same set of
 queries was 3000ms.

 Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But,
 still the search time was higher as compared to first case.

 What am I missing. Please suggest.

Re: Enrich search results with external data

2015-04-17 Thread Sujit Pal

Hi Ha,

Yes, I think if you want to facet on the external field, the custom
component seems to be the best option IMO.

-sujit

On Fri, Apr 17, 2015 at 3:02 PM, ha.p...@arvatosystems.com wrote:

 Hi  Sujit,



 Many thanks for your blog post, responding to my question, and suggesting
 the alternative option ☺



 I think I prefer your approach because we can supply our own Comparator.
 The reason is that we need to meet some strict requirements: we can only
 call the external system once to retrieve extra fields (price, inventory,
 etc.) for probably a subset of the search result.  Therefore we need to be
 able to sort and facet on the list of items that some of them may not have
 external fields. I think using the Comparator would help with the sorting
 but let me know if you have different ideas.



 Do you have suggestion how we should deal with the facet requirement? I am
 thinking about adding another Facet Component that will be executed after
 the standard FacetComponent. Let me know if you think we should consider
 other options.



 Thanks,



 -Ha



 -Original Message-

 From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of
 Sujit Pal

 Sent: Saturday, April 11, 2015 10:23 AM

 To: solr-user@lucene.apache.org; Ahmet Arslan

 Subject: Re: Enrich search results with external data



 Hi Ha,



 I am the author of the blog post you mention. To your question, I don't
 know if the code will work without change (since the Lucene/Solr API has
 evolved so much over the last few years), but a more preferred way using
 Function Queries way may be found in slides for Timothy Potter's talk here:


 http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011



 Here he speaks of external fields stored in a database and accessed using
 a custom component (rather than from a flat file as in ExternalFieldField),
 and using function queries to influence the ranking based on the external
 field. However, per this document on function queries, you can use the
 output of a function query to sort as well by passing the function to the
 sort parameter.

 https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function



 Hope this helps,

 Sujit





 On Fri, Apr 10, 2015 at 10:38 PM, Ahmet Arslan iori...@yahoo.com.invalid

 wrote:



  Hi,

 

  Who don't you include/add/index those additional fields, at least the

  one used in sorting?

 

  Also, you may find

  https://stanbol.apache.org/docs/trunk/components/enhancer/ relevant.

 

  Ahmet

 

 

 

  On Saturday, April 11, 2015 1:04 AM, ha.p...@arvatosystems.com 

  ha.p...@arvatosystems.com wrote:

  This ticket seems to address the problem I have

 

  https://issues.apache.org/jira/browse/SOLR-1566

 

 

 

  and as the result of that ticket, DocTransformer is added since Solr 4.0.

  I wrote a simple DocTransformer and found that the transformer is

  executed AFTER pagination. In our application, we need the external

  fields added before sorting/pagination. I've looked around for the

  option to change the execution order but haven't had any luck. Does
 anyone know the solution?

 

 

 

  The ticket also states it is not possible for components to add

  fields to outgoing documents which are not in the stored fields of the
 document.

  Does anyone know if this is still true?

 

 

 

  Thanks,

 

 

 

  -Ha

 

 

 

 

 

  -Original Message-

 

  From: Pham, Ha

 

  Sent: Thursday, April 09, 2015 11:41 PM

 

  To: solr-user@lucene.apache.org

 

 

  Subject: Enrich search results with external data

 

 

 

  Hi everyone,

 

 

 

  We have a requirement to append external data (e.g. price/inventory of

  product, retrieved from an ERP via web services) to query result and

  support sorting and pagination based on those external fields. For

  example if Solr returns 100 records and the page size user selects is

  20, the sorting on the external  fields is still on 100 records. This

  limits us from enriching search results outside of Solr. I guess this

  is a common problem so hopefully someone could share their experience.

 

 

 

  I am considering using a PostFilter and enrich documents in collect()

  method as below

 

 

 

  @Override

 

  public void collect(int docId) throws IOException { DoubleField price

  = new DoubleField (PRICE, 1.23, Field.Store.YES); Document

  currentDoc = context.reader().document(docId); currentDoc.add(price);

  }

 

 

 

  but the result documents don't have PRICE fields. Did I miss anything
 here?

 

 

 

  I also did some research and it seems the approach mentioned here

  http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-exte

  rnal.html is close to what we need to achieve but since the document

  is 4 years old, I don't know if there's a better approach for our

  problem (we are using solr 5.0)?

 

 

 

  Thanks,

 

 

 

  -Ha

Re: MoreLikeThis (mlt) in sharded SolrCloud

2015-04-17 Thread Anshum Gupta

The other issue that would fix half of your problems is:
https://issues.apache.org/jira/browse/SOLR-7143

On Fri, Apr 17, 2015 at 4:35 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 Ah, I meant SOLR-7418 https://issues.apache.org/jira/browse/SOLR-7418.

 On Fri, Apr 17, 2015 at 4:30 PM, Anshum Gupta ans...@anshumgupta.net
 wrote:

 Hi Ere,

 Those seem like valid issues. I've created an issue : SOLR-7275
 https://issues.apache.org/jira/browse/SOLR-7275 and will create more
 as I find more of those.
 I plan to get to them and fix over the weekend.

 On Wed, Apr 15, 2015 at 5:13 AM, Ere Maijala ere.maij...@helsinki.fi
 wrote:

 Hi,

 I'm trying to gather information on how mlt works or is supposed to work
 with SolrCloud and a sharded collection. I've read issues SOLR-6248,
 SOLR-5480 and SOLR-4414, and docs at 
 https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling
 with multiple issues. I've been testing with Solr 5.1 and the Getting
 Started sample cloud. So, with a freshly extracted Solr, these are the
 steps I've done:

 bin/solr start -e cloud -noprompt
 bin/post -c gettingstarted docs/
 bin/post -c gettingstarted example/exampledocs/books.json

 After this I've tried different variations of queries with limited
 success:

 http://localhost:8983/solr/gettingstarted/select?q={!mlt}non-existing
 causes java.lang.NullPointerException at
 org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:80)

 http://localhost:8983/solr/gettingstarted/select?q={!mlt}978-0641723445
 
 causes java.lang.NullPointerException at
 org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:84)

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=title}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=title%7D978-0641723445
 
 causes java.lang.NullPointerException at
 org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759)

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=cat}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=cat%7D978-0641723445
 
 actually gives results

 
 http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=author,cat}978-0641723445
 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=author,cat%7D978-0641723445
 
 again causes Java.lang.NullPointerException at
 org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759)


 I guess the actual question is, how am I supposed to use the handler to
 replicate behavior of non-distributed mlt that was formerly used with
 qt=morelikethis and the following configuration in solrconfig.xml:

   requestHandler name=morelikethis class=solr.MoreLikeThisHandler
 lst name=defaults
   str
 name=mlt.fltitle,title_short,callnumber-label,topic,language,author,publishDate/str
   str name=mlt.qf
 title^75
 title_short^100
 callnumber-label^400
 topic^300
 language^30
 author^75
 publishDate
   /str
   int name=mlt.mintf1/int
   int name=mlt.mindf1/int
   str name=mlt.boosttrue/str
   int name=mlt.count5/int
   int name=rows5/int
 /lst
   /requestHandler

 Real-life full schema and config can be found at 
 https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
 .

 --Ere

 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland




 --
 Anshum Gupta




 --
 Anshum Gupta




-- 
Anshum Gupta

Re: SolrCloud 4.8.0 upgrade

Great!! Thank you very much.

On Fri, Apr 17, 2015 at 7:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Solr/Lucene are supposed to _always_ read one major version back. Thus
 your 4.10 should be able to read indexes produced all the way back to
 (and including) 3.x. Sometimes experimental formats are excepted.

 In your case you should be fine since you're upgrading from 4.8..

 As always, though, I'd recommend copying your indexes someplace just
 to be paranoid before upgrading.

 Best,
 Erick

 On Fri, Apr 17, 2015 at 10:28 AM, Vincenzo D'Amore v.dam...@gmail.com
 wrote:
  Thanks for your answers, I looked at changes and we don't use
  DocValuesFormat.
  The question is, if I upgrade the SolrCloud version to 4.10, should I
  reload entirely all documents?
  Is there a binary compatibility between these two versions reading the
  solar home?
 
  On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Look at CHANGES.txt for both Lucene and Solr, there's always an
  upgrading section for each release.
 
  Best,
  Erick
 
  On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk
 
  wrote:
   Vincenzo D'Amore v.dam...@gmail.com wrote:
   I have a SolrCloud cluster with 3 server, I would like to use
  stats.facet,
   but this feature is available only if I upgrade to 4.10.
  
   May I simply redeploy new solr cloud version in tomcat or should
 reload
  all
   the documents?
   There are other drawbacks?
  
   Support for the Disk-format for DocValues was removed after 4.8, so
 you
  should check if you use that: DocValuesFormat=Disk for the field in
 the
  schema, if I remember correctly.
  
   - Toke Eskildsen
 
 
 
 
  --
  Vincenzo D'Amore
  email: v.dam...@gmail.com
  skype: free.dev
  mobile: +39 349 8513251




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: solr 4.8.0 update synonyms in zookeeper splitted files