date:20130423

Thanks for the answers.

2013/4/23 Erick Erickson erickerick...@gmail.com

 bq: However what will happen to that 10 nodes when I specify replication
 factor?


 I think they just sit around doing nothing.

 Best
 Erick

 On Mon, Apr 22, 2013 at 7:24 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Sorry but if I have 10 shards and a collection with replication factor
 of 1
  and if I start up 30 nodes what happens to that last 10 nodes? I mean:
 
  10 nodes as leader
  10 nodes as replica
 
  if I don't specify replication factor there was going to be a round robin
  system that assigns other 10 machine as:
  + 10 nodes as replica
 
  However what will happen to that 10 nodes when I specify replication
 factor?
 
 
  2013/4/22 Erick Erickson erickerick...@gmail.com
 
  1) Imagine you have lots and lots and lots of different Solr indexes
  and a 50 node cluster. Further imagine that one of those indexes has 2
  shards, and a leader + shard is adequate to handle the load. You need
  some way to limit the number of nodes your index gets distributed to,
  that's what replicationFactor is for. So in this case
  replicationFactor=2 will stop assigning nodes to that particular
  collection after there's a leader + 1 replica
 
  2 In the system you described, there won't be more than one
  shard/node. But one strategy for growth is to overshard. That is, in
  the early days you put (numbers from thin air) 10 shards/node and they
  are all quite small. As your index grows, you move to two nodes with 5
  shards each. And later to 5 nodes with 2 shards and so on. There are
  cases where you want some way to make the most of your hardware yet
  plan for expansion.
 
  Best
  Erick
 
  On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   I know that: when using SolrCloud we define the number of shards into
 the
   system. When we start up new Solr instances each one will be a a
 leader
  for
   a shard, and if I continue to start up new Solr instances (that has
   exceeded the number number of shards) each one will be a replica for
 each
   leader as a round robin process.
  
   However when I read wiki there are two parameters: *replicationFactor
  *and *
   maxShardsPerNode.
  
   *1) Can you give details about what are they. If all newly added Solr
   instances becomes a replica what is that replication factor for?
   2) If what I wrote is true about that round robin process what is
 that *
   maxShardsPerNode*? How can be more than one shard at the system I
  described?

Re: Too many close, count -1

2013-04-23 Thread Yago Riveiro

Hoss, 

I use solr as a SolrCluster, the main feature that I use is faceting to do some 
analytics and normal queries to do free text search and retrieve data using 
filters.

I don't use any custom plugin or contribute plugin.

At the moment I'm importing my data from mysql to solr, I don't use dih, 
instead I use a custom mechanism. In this import, I don't do hard or soft 
commits, I relay this responsibility to solr.
 
I don't know if this info is useful but I have a lot of: WARNING: [XXX] 
PERFORMANCE WARNING: Overlapping onDeckSearchers=2

The cluster is formed by a thousand of collection, I have a collection for each 
client.

My solrconfig:

config
  luceneMatchVersionLUCENE_40/luceneMatchVersion
  directoryFactory name=DirectoryFactory 
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

  indexConfig
ramBufferSizeMB256/ramBufferSizeMB
mergeFactor20/mergeFactor
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/
lockTypenative/lockType
!-- Commit Deletion Policy

 Custom deletion policies can be specified here. The class must
 implement org.apache.lucene.index.IndexDeletionPolicy.

 
http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/index/IndexDeletionPolicy.html

 The default Solr IndexDeletionPolicy implementation supports
 deleting index commit points on number of commits, age of
 commit point and optimized status.

 The latest commit point should always be preserved regardless
 of the criteria.
--
!--
deletionPolicy class=solr.SolrDeletionPolicy
--
  !-- The number of commit points to be kept --
  !-- str name=maxCommitsToKeep1/str --
  !-- The number of optimized commit points to be kept --
  !-- str name=maxOptimizedCommitsToKeep0/str --
  !--
  Delete all commit points once they have reached the given age.
  Supports DateMathParser syntax e.g.
--
  !--
 str name=maxCommitAge30MINUTES/str
  --
 str name=maxCommitAge60MINUTES/str
!--
/deletionPolicy
--

!-- Lucene Infostream

 To aid in advanced debugging, Lucene provides an InfoStream
 of detailed information when indexing.

 Setting The value to true will instruct the underlying Lucene
 IndexWriter to write its debugging info the specified file
  --
 !-- infoStream file=INFOSTREAM.txtfalse/infoStream --
  /indexConfig

  query
!-- If true, stored fields that are not requested will be loaded lazily.

This can result in a significant speed improvement if the usual case is to
not load all stored fields, especially if the skipped fields are large 
compressed
text fields.
--
enableLazyFieldLoadingtrue/enableLazyFieldLoading
queryResultWindowSize1000/queryResultWindowSize
queryResultMaxDocsCached3000/queryResultMaxDocsCached
maxWarmingSearchers2/maxWarmingSearchers
useFilterForSortedQuerytrue/useFilterForSortedQuery
filterCache
  class=solr.FastLRUCache
  size=2000
  initialSize=1500
  autowarmCount=750
  cleanupThread=true/
queryResultCache
  class=solr.FastLRUCache
  size=2000
  initialSize=1500
  autowarmCount=750
  cleanupThread=true/
documentCache
  class=solr.FastLRUCache
  size=2
  initialSize=1
  autowarmCount=0
  cleanupThread=true/
  /query

  updateHandler class=solr.DirectUpdateHandler2
 updateLog
str name=dir${solr.data.dir:}/str
 /updateLog
 !-- Commit documents definitions --
 autoCommit
maxDocs5000/maxDocs
maxTime1/maxTime
  /autoCommit
 autoSoftCommit
maxTime2500/maxTime
 /autoSoftCommit
 maxPendingDeletes2/maxPendingDeletes
  /updateHandler

  requestDispatcher handleSelect=false
requestParsers enableRemoteStreaming=true 
multipartUploadLimitInKB=10485760/
  /requestDispatcher

  requestHandler name=/select class=solr.SearchHandler/
  !--   request handler that returns indented JSON by default --
  requestHandler name=/query class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=wtjson/str
   str name=indenttrue/str
   str name=dftext/str
 /lst
  /requestHandler
  !-- realtime get handler, guaranteed to return the latest stored fields of
   any document, without the need to commit or open a new searcher.  The
   current implementation relies on the updateLog feature being enabled. --
  requestHandler name=/get class=solr.RealTimeGetHandler
 lst name=defaults
   str name=omitHeadertrue/str
   str name=wtjson/str
   str name=indentfalse/str
 /lst
  /requestHandler
  requestHandler name=/admin/ class=solr.admin.AdminHandlers /
  requestHandler name=standard class=solr.StandardRequestHandler 
default=true /
  requestHandler name=/update class=solr.UpdateRequestHandler /
  requestHandler

Re: Error creating collection

2013-04-23 Thread yriveiro

The solr version is 4.2.1.

Here the stack trace:

SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore XXX':
Could not get shard_id for core: XXX
coreNodeName:192.168.20.47:8983_solr_XXX$
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483)$
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140)$
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)$
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)$
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)$
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)$
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)$
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)$
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)$
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)$
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)$
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)$
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)$
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)$
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)$
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)$
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)$
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)$
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)$
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)$
at java.lang.Thread.run(Unknown Source)$
Caused by: org.apache.solr.common.SolrException: Could not get shard_id for
core: XXX coreNodeName:192.168.20.47:8983_solr_XXX$
at
org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:1221)$
at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1294)$
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:861)$
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)$
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479)$
 20 more$



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4058231.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DocValues with docValuesFormat=Disk

2013-04-23 Thread Abhishek Sanoujam


Answering myself - adding this line in solrconfig.xml made it work:

codecFactory name=CodecFactory class=solr.SchemaCodecFactory /



On 4/23/13 3:42 PM, Abhishek Sanoujam wrote:

Hi all,

I am trying to experiment with DocValues 
(http://wiki.apache.org/solr/DocValues) and use the Disk 
docValuesFormat.


Here's how my field type declaration looks like:
fieldtype name=stringDv class=solr.StrField
   sortMissingLast=true omitNorms=true 
docValuesFormat=Disk/


I don't even have any fields using that type.

Also I've updated solrconfig.xml with:
luceneMatchVersionLUCENE_42/luceneMatchVersion

Am running with solr-4.2.1. My solr core is totally empty, and there 
is nothing in the data dir.


Am getting this weird error while starting up the solr core:

org.apache.solr.common.SolrException: FieldType 'stringDv' is 
configured with a docValues format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.init(SolrCore.java:822)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.init(SolrCore.java:735)
... 13 more
Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create 
core: p5-upsShard-1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.init(SolrCore.java:822)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
... 10 more
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.init(SolrCore.java:735)
... 13 more


Is there any other config change that I need to do? I've read 
http://wiki.apache.org/solr/DocValues multiple times, but am unable to 
see any light to solve the problem.


--
-
Cheers,
Abhishek



--
-
Cheers,
Abhishek

DIH Abort does not close input file

2013-04-23 Thread jmrouand

Hi All,

I'm using DIH with FileListEntityProcessor in order to index from xml files.
If I perform a DIH with command=abort, it seems that the xml file being
processed by dataimport is not closed.
When I try to delete it, I get an error message this file is opened by
Apache Tomcat
Is it a known problem ?
I'm using Solr Version: 1.4.1.2010

Regards,
Jean-Michel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Abort-does-not-close-input-file-tp4058214.html
Sent from the Solr - User mailing list archive at Nabble.com.

Complex Join Query

2013-04-23 Thread ashimbose

Is there any other enterprise search other than SOLR which supports Complex
Join Query,as Solr does not support the same. As per my requirement I need
to search Complex Join Query which will search from document Indexing or in
main memory. As it is very faster than any disk based database. 

Any help is appreciable.

Regards,
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-Join-Query-tp4058233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Memory Impact of a New Core

2013-04-23 Thread Jérôme Étévé

Hi all,

We've got quite a lot of (mostly small) solr cores in our Solr instance.
They all share the same solrconfig.xml and schema.xml (only the data
differs).

I'm wondering how far can I go in terms of number of cores. CPU is not an
issue, but memory could be.

An idea/guideline about the impact of a new Solr Core in a Solr instance?

Thanks!

Jerome.

-- 
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

dataimporthandler does not distribute documents on solr cloud

2013-04-23 Thread Montu v Boda

Hi

we solr cloud with 4 shards and when we try to import the data using
dataimporthandler, it does not distribute documents in all 4 shards.

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Memory Impact of a New Core

2013-04-23 Thread Guido Medina

I'm not an expert, but at some extent I think it will come down to few 
factors:


 * How much data is been cached per core.
 * If memory is an issue and still you want performance, I/O with low
   cache could be an issue (SSDs?)
 * Soft commits which implies open searchers per soft commit (and per
   core) will depend on caches.

I do believe at the end it will be a direct result of your caching and 
I/O. If all you care is performance, caching (memory) could be replaced 
with faster I/O though soft commits will be fragile to memory due to its 
nature (depends on caching/memory and low I/O usage)


Hope I made sense, I probably tried too many points of view in a single 
idea.


Guido.

On 23/04/13 11:50, Jérôme Étévé wrote:

Hi all,

We've got quite a lot of (mostly small) solr cores in our Solr instance.
They all share the same solrconfig.xml and schema.xml (only the data
differs).

I'm wondering how far can I go in terms of number of cores. CPU is not an
issue, but memory could be.

An idea/guideline about the impact of a new Solr Core in a Solr instance?

Thanks!

Jerome.

Re: Memory Impact of a New Core

2013-04-23 Thread Jérôme Étévé

Thanks!

Yeah I know about the caching/commit things

My question is more about the impact of a Pure creation of a Solr core,
indepently of its usage memory requirements (like caches and stuff).

From the experiments I did using JMX, it's not measurable, but I might be
wrong.



On 23 April 2013 12:25, Guido Medina guido.med...@temetra.com wrote:

 I'm not an expert, but at some extent I think it will come down to few
 factors:

  * How much data is been cached per core.
  * If memory is an issue and still you want performance, I/O with low
cache could be an issue (SSDs?)
  * Soft commits which implies open searchers per soft commit (and per
core) will depend on caches.

 I do believe at the end it will be a direct result of your caching and
 I/O. If all you care is performance, caching (memory) could be replaced
 with faster I/O though soft commits will be fragile to memory due to its
 nature (depends on caching/memory and low I/O usage)

 Hope I made sense, I probably tried too many points of view in a single
 idea.

 Guido.


 On 23/04/13 11:50, Jérôme Étévé wrote:

 Hi all,

 We've got quite a lot of (mostly small) solr cores in our Solr instance.
 They all share the same solrconfig.xml and schema.xml (only the data
 differs).

 I'm wondering how far can I go in terms of number of cores. CPU is not an
 issue, but memory could be.

 An idea/guideline about the impact of a new Solr Core in a Solr instance?

 Thanks!

 Jerome.





-- 
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

what is the maximum XML file size to import?

2013-04-23 Thread Sharmila Thapa

Hello,

What is the maximum size limit of the XML document file that is allowed to
import into solr to index from java -Durl. As I am testing to import XMLfile
of 5 GB and it throws an error like
SimplePostTool: WARNING: Solr returned an error #400 Bad Request
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://10.0.1.140:8080/solr/solr1/update



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: terms starting with multilingual character don't list on solr auto-suggestion list

2013-04-23 Thread Sharmila Thapa

Hi Jack, 

Sorry for late response. 
I have used following settings for auto-suggestion:

  searchComponent name=terms class=solr.TermsComponent/
  
  requestHandler name=/terms class=solr.SearchHandler startup=lazy
 lst name=defaults
  bool name=termstrue/bool
/lst 
arr name=components
  strterms/str
/arr
  /requestHandler

and used the following field fieldType:
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$1$2/ 
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$1$2/ 
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType



--
View this message in context: 
http://lucene.472066.n3.nabble.com/terms-starting-with-multilingual-character-don-t-list-on-solr-auto-suggestion-list-tp4056288p4058264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Memory Impact of a New Core

The overhead of just opening a core is insignificant relatively to using it
so, unless you are worried about hitting the max  number of open files
limit, it seems unimportant.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Apr 23, 2013 7:46 AM, Jérôme Étévé jerome.et...@gmail.com wrote:

 Thanks!

 Yeah I know about the caching/commit things

 My question is more about the impact of a Pure creation of a Solr core,
 indepently of its usage memory requirements (like caches and stuff).

 From the experiments I did using JMX, it's not measurable, but I might be
 wrong.



 On 23 April 2013 12:25, Guido Medina guido.med...@temetra.com wrote:

  I'm not an expert, but at some extent I think it will come down to few
  factors:
 
   * How much data is been cached per core.
   * If memory is an issue and still you want performance, I/O with low
 cache could be an issue (SSDs?)
   * Soft commits which implies open searchers per soft commit (and per
 core) will depend on caches.
 
  I do believe at the end it will be a direct result of your caching and
  I/O. If all you care is performance, caching (memory) could be replaced
  with faster I/O though soft commits will be fragile to memory due to its
  nature (depends on caching/memory and low I/O usage)
 
  Hope I made sense, I probably tried too many points of view in a single
  idea.
 
  Guido.
 
 
  On 23/04/13 11:50, Jérôme Étévé wrote:
 
  Hi all,
 
  We've got quite a lot of (mostly small) solr cores in our Solr instance.
  They all share the same solrconfig.xml and schema.xml (only the data
  differs).
 
  I'm wondering how far can I go in terms of number of cores. CPU is not
 an
  issue, but memory could be.
 
  An idea/guideline about the impact of a new Solr Core in a Solr
 instance?
 
  Thanks!
 
  Jerome.
 
 
 


 --
 Jerome Eteve
 +44(0)7738864546
 http://www.eteve.net/

Re: Complex Join Query

Have a look at ElasticSearch,  maybe it's a better fit.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Apr 23, 2013 6:38 AM, ashimbose ashimb...@gmail.com wrote:

 Is there any other enterprise search other than SOLR which supports Complex
 Join Query,as Solr does not support the same. As per my requirement I need
 to search Complex Join Query which will search from document Indexing or in
 main memory. As it is very faster than any disk based database.

 Any help is appreciable.

 Regards,
 Ashim



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Complex-Join-Query-tp4058233.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: what is the maximum XML file size to import?

2013-04-23 Thread Alexandre Rafalovitch

This does not seem to be related to the XML size. Check the exact
error message on the server side. Looks to me like the URL may not be
correct. I think in some cases, post.jar automatically adds /update
handler, so maybe you are doubling it up.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Apr 23, 2013 at 8:02 AM, Sharmila Thapa shar...@gmail.com wrote:
 Hello,

 What is the maximum size limit of the XML document file that is allowed to
 import into solr to index from java -Durl. As I am testing to import XMLfile
 of 5 GB and it throws an error like
 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://10.0.1.140:8080/solr/solr1/update



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: What to test, calculate, measeure for a pre-production version of SolrCloud?

Hi,

Let me get my crystal ball OK, now let's try inlining.

On Tue, Apr 23, 2013 at 5:48 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 * I want to measure how much RAM I should define for my Solr instances,
 * I will try to make some predictions about how much disk space I will need
 at production step.

This one is easy if your index is static or grows slowly.  If not,
you'll want to set alert thresholds on disk space free/used for
capacity planning/expansion purposes.
You probably saw threads about needing about 3x the disk space (3x the
size of your index) for about a week ago.

 * Maybe I will check my answer for that question: which RAID to use (or not
 use) etc.

 For that questions I got answers from mail list and I have some
 approximations about them. Also I know that it is not easy to answer such
 questions and I should test them to get more accurate answers.
 My question is that::

 What do you suggest me at pre-production and test step?

 * i.e. give much more heap size to Solr instances to calculate RAM

Impossible to tell precisely, but you can launch Solr, hammer it (next
bullet), look at your monitoring tool or just JConsole, ask the JVM to
run GC (you can do that from JConsole), observe heap once everything
has been fully loaded (for sorting, faceting, etc.).  That will give
you an idea of bare minimum heap.  Increase from here.  Don't expect
to find one magic number that will be good forever, because that won't
be the case (this is where keeping an eye with monitoring and alerting
comes into play) unless your system is completely static (static
index, same type, volume, and distribution of queries, etc.)

 * use solrmeter to test qps for your cluster

Sure.  JMeter or SolrMeter will do.  The latter is written by one of
the Solr guys and gives you more Solr-specific data, so +1 for that
one. :)

 * use sematext or anything else for performance monitoring etc.

I'm completely unbiased here, of course ;)
Yes, you need some sort of monitoring (+alerting) if you are serious
about your search in production.  If you already have something, hook
that up.  If you don't have anything or don't want to bother with
maintaining a monitoring system, get some SaaS, like SPM for Solr.

 I need some advices what to test, calculate, measeure etc. Also there was a
 question about Codahale metrics and Graphite. You can advice something
 about that too.

One of the main decision factors is whether you want the
responsibility of maintaining something like Graphite in house or give
it up and focus on your service/product.  The tendency seems to be the
latter, but there are still organizations who choose the former.

 PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is
 tagged at repository) I will use it.

If you are in pre-production and asking questions about memory and
disk, my feeling is you should wait for 4.3. :)

HTH

Otis
--
Solr  ElasticSearch Support
http://sematext.com/

Problem with solr, HTTP/Request.php and tomcat.

2013-04-23 Thread Viviane Ventura

Hi!
I'm using solr with tomcat and i need to add a record using
HTTP/Request.php (PEAR).
So, i created a test file with the following code:

?php
require_once HTTP/Request.php;

$req = new HTTP_Request(http://localhost:8080/solr/stats/update;);
$req-setMethod(HTTP_REQUEST_METHOD_POST);

$xml = 'adddocfield name=typeAllFields/fieldfield
name=id412263fc396ab4.19731404/fieldfield
name=datestamp2013-02-18T14:25:16Z/fieldfield
name=browserFirefox/fieldfield
name=browserVersion18.0/fieldfield
name=ipaddress192.168.2.22/fieldfield name=referrer
http://ijsn627.ijsn.es.gov.br/vufind/Biblioteca//fieldfield
name=url/vufind/Biblioteca/Search/Results?lookfor=amp;type=AllFieldsamp;submit=Pesquisar/field/doc/add';

$req-addHeader('Content-Type', 'text/xml; charset=utf-8');
$req-addHeader('Content-Length', strlen($xml));
$req-setBody($xml);

if (!PEAR::isError($req-sendRequest())) {
 $response1 = $req-getResponseBody();
 echo $req-getResponseCode();
} else {
 $response1 = ;
}

$req-clearPostData();
echo $response1;
echo $response2;

?

That should work, right? But i'm getting the error code 400. The same error
appear when i enable the statistics module in vufind.

With curl the update is ok:

  curl http://localhost:8080/solr/stats/update/?commit=true -H
Content-Type: text/xml; charset=utf-8 --data-binary 'adddocfield
name=typeAllFields/fieldfield
name=id412263fc396ab4.19731404/fieldfield
name=datestamp2013-02-18T14:25:16Z/fieldfield
name=browserFirefox/fieldfield
name=browserVersion18.0/fieldfield
name=ipaddress192.168.2.22/fieldfield name=referrer
http://ijsn627.ijsn.es.gov.br/vufind/Biblioteca//fieldfield
name=url/vufind/Biblioteca/Search/Results?lookfor=amp;type=AllFieldsamp;submit=Pesquisar/field/doc/add';

I have no idea whats happening... Any ideas?

Apache-tomcat: 7.0.27
Solr: 3.5

Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez

Hi!

Currently I'm working on a basica search engine for, the main problem is that 
during some tests a problem was detected, in the application if a user search 
for the + or - term only or the + string it causes an exception in my 
application, the problem is caused for an 
org.apache.lucene.queryParser.ParseException in solr. I get the same response 
if, from the solr admin interface, I search for the + term. For what I've seen 
the + character gets encoded into %2B which cause the exception. Is there 
any way of escaping this character so they behave like any other character? or 
at least get no response for this cases? 

I'm using solr 3.6.2, deployed in tomcat7.

Greetings! 
http://www.uci.cu

Re: fuzzy search issue with PatternTokenizer Factory

2013-04-23 Thread meghana

Fuzzy Search is looking independent of all the analyzer, but it seems that
its not independent of tokenizer. As If i just change my tokenizer to
*Solr.StandardTokenizerFactory* , Fuzzy search started working fine, If it
is independent of Tokenizer then this should not occur.

And I also , I had analyzed my terms in Admin UI Analysis page, and the term
coming perfectly fine as expected, only this is only issue which I am
facing. but i cant analyze the fuzzy term in Admin UI Analysis page. so not
able to catch the issue.

Jack Krupansky-2 wrote
Once again, fuzzy search is completely independent of your analyzer or
pattern tokenizer. Please use the Solr Admin UI Analysis page to debug
whether the terms are what you expect. And realize that fuzzy search has a
maximum editing distance of 2 and that includes case changes.

-- Jack Krupansky

-Original Message-
From: meghana
Sent: Monday, April 22, 2013 3:25 AM
To:

solr-user@.apache

Subject: Re: fuzzy search issue with PatternTokenizer Factory

Jack,

the regex will split tokens by anything expect alphabets , numbers, '' ,
'-' and ns: (where n is number from 0 to , e.g 4323s: )

Lets say for example my text is like below.

*this is nice* day sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day sun is risen*

pattern seem to working fine with different text,

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED* , etc...

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all
results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly.

Jack Krupansky-2 wrote
Give us some examples of tokens that you are expecting that pattern to
tokenize. And express the pattern in simple English as well. Some some
actual input data.

I suspect that Solr is working fine - but you may not have precisely
specified your pattern. But we don't know what your pattern is supposed
to
recognize.

Maybe some of your previous hits had punctuation adjacent to to the terms
that your pattern doesn't recognize.

And use the Solr Admin UI Analysis page to see how your sample input data
is
analyzed.
w
One other thing... without a group, the pattern specifies what
delimiter
sequence will split the rest of the input into tokens. I suspect you
didn't mean this.

-- Jack Krupansky

-Original Message-
From: meghana
Sent: Friday, April 19, 2013 9:01 AM
To:

solr-user@.apache

Subject: fuzzy search issue with PatternTokenizer Factory

I m using Solr4.2 , I have changed my text field definition, to use the
Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
and
changed my schema defination as below

fieldType name=text_token class=solr.TextField

positionIncrementGap=100

analyzer type=index

tokenizer class=solr.PatternTokenizerFactory

pattern=[^a-zA-Z0-9amp;\-']|\d{0,4}s: /

filter class=solr.StopFilterFactory ignoreCase=true

words=stopwords.txt enablePositionIncrements=false /

filter class=solr.LowerCaseFilterFactory/

/analyzer

analyzer type=query

tokenizer class=solr.PatternTokenizerFactory

pattern=[^a-zA-Z0-9amp;\-']|\d{0,4}s: /

filter class=solr.StopFilterFactory ignoreCase=true

words=stopwords_extra_query.txt enablePositionIncrements=false /

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt

ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/

/analyzer

/fieldType
after doing so, fuzzy search do not seems to working properly as it was
working before.

I m searching with search term : worde~1

on search , before it was returning , around 300 records , but now its
returning only 5 records. not sure what can be issue.

Can anybody help me to make it work!!

--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4058267.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: what is the maximum XML file size to import?

On 4/23/2013 6:02 AM, Sharmila Thapa wrote:
 What is the maximum size limit of the XML document file that is allowed to
 import into solr to index from java -Durl. As I am testing to import XMLfile
 of 5 GB and it throws an error like
 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://10.0.1.140:8080/solr/solr1/update

Unless the simple post tool is capable of breaking the input XML into
many pieces, you'll run into the POST size limit of your servlet
container.  I don't know if it has this capability, but I would be
somewhat surprised if it did.

Solr is packaged so the example uses jetty (start.jar), but you may be
running under tomcat or one of a few other choices.  The history of the
POST limit in Solr is a little complex.

The example jetty config in Solr 3.x (and possibly earlier) used a 1MiB
POST buffer.  You could change that value with no problem.  If you used
another container, you could change it using that container's
configuration method.

When 4.0 was released, jetty 8.x had a bug and the 1MiB configuration in
the example wasn't working, so the limit became 200KB, jetty's default.
 Just like earlier versions, if you were using another container, you
could change the limit using that container's configuration.

The bug in jetty has now been fixed.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130

Solr 4.1 changed things, with SOLR-4265.  Now Solr controls the max POST
size itself, defaulting formdataUploadLimitInKB in solrconfig.xml to 2048.

https://issues.apache.org/jira/browse/SOLR-4265

Thanks,
Shawn

Re: Export Index and Re-Index XML

2013-04-23 Thread Jan Høydahl

Hi,

I have done this many times. First use a curl job or something to download the 
complete index as CSV

q=*:*rows=999wt=csv

Then use post.jar to push that csv into the new node.

Alternatively you can query with XML and use xslt update request handler with 
parm tr=updateXml which is a stylesheet for indexing response XML directly.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

23. apr. 2013 kl. 02:11 skrev Kalyan Kuram kalyan.ku...@live.com:

 Thank you all very much for your help.I do have field configured as stored 
 and index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i 
 think needed.I am trying to index the data from Adobe CQ and its a push based 
 indexing and pain to index data from a very large repository.I think i can 
 manage this with SolrEntityProcessor for now and will think of modelling data 
 for re-indexing purposes
 Kalyan
 
 From: j...@basetechnology.com
 To: solr-user@lucene.apache.org
 Subject: Re: Export Index and Re-Index XML
 Date: Mon, 22 Apr 2013 19:54:26 -0400
 
 Any fields which have stored values can be read and output, but 
 indexed-only, non-stored fields cannot be read or exported. Even if they 
 could be, their values are post-analysis, which means that there is a good 
 chance that they cannot be run through term analysis again.
 
 It is always best to keep a copy of your raw source data separate from the 
 data you add to Solr. Or, at least make sure any important data is stored.
 
 In short, you need to model your data for reindexing, which is a fact of 
 life in Solr land.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Kalyan Kuram
 Sent: Monday, April 22, 2013 7:07 PM
 To: solr-user@lucene.apache.org
 Subject: Export Index and Re-Index XML
 
 Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
 and then re-index back into Solr,The reason i need to do this is i 
 misconfigured fieldtype and to make it work i need to re-index the content
 Kalyan

Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Kai Becker

Hi,

you need to escape that char in search terms.
Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

The %2B is just the url encoding, but it will still be a + for Solr, so just 
put a \ in front of the chars I mentioned.

Cheers,
Kai

Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

 Hi!
 
 Currently I'm working on a basica search engine for, the main problem is that 
 during some tests a problem was detected, in the application if a user search 
 for the + or - term only or the + string it causes an exception in 
 my application, the problem is caused for an 
 org.apache.lucene.queryParser.ParseException in solr. I get the same response 
 if, from the solr admin interface, I search for the + term. For what I've 
 seen the + character gets encoded into %2B which cause the exception. Is 
 there any way of escaping this character so they behave like any other 
 character? or at least get no response for this cases? 
 
 I'm using solr 3.6.2, deployed in tomcat7.
 
 Greetings! 
 http://www.uci.cu

Re: what is the maximum XML file size to import?

2013-04-23 Thread Alexandre Rafalovitch

DataImportHandler might be a better way to import very large XML files
if it can be loaded from Solr-local file system.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Apr 23, 2013 at 9:46 AM, Shawn Heisey s...@elyograg.org wrote:
 On 4/23/2013 6:02 AM, Sharmila Thapa wrote:
 What is the maximum size limit of the XML document file that is allowed to
 import into solr to index from java -Durl. As I am testing to import XMLfile
 of 5 GB and it throws an error like
 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://10.0.1.140:8080/solr/solr1/update

 Unless the simple post tool is capable of breaking the input XML into
 many pieces, you'll run into the POST size limit of your servlet
 container.  I don't know if it has this capability, but I would be
 somewhat surprised if it did.

 Solr is packaged so the example uses jetty (start.jar), but you may be
 running under tomcat or one of a few other choices.  The history of the
 POST limit in Solr is a little complex.

 The example jetty config in Solr 3.x (and possibly earlier) used a 1MiB
 POST buffer.  You could change that value with no problem.  If you used
 another container, you could change it using that container's
 configuration method.

 When 4.0 was released, jetty 8.x had a bug and the 1MiB configuration in
 the example wasn't working, so the limit became 200KB, jetty's default.
  Just like earlier versions, if you were using another container, you
 could change the limit using that container's configuration.

 The bug in jetty has now been fixed.
 https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130

 Solr 4.1 changed things, with SOLR-4265.  Now Solr controls the max POST
 size itself, defaulting formdataUploadLimitInKB in solrconfig.xml to 2048.

 https://issues.apache.org/jira/browse/SOLR-4265

 Thanks,
 Shawn

Re: What to test, calculate, measeure for a pre-production version of SolrCloud?

To be clear, there are no solid and reliable prediction rules for Solr - for 
the simple reason that there are too many non-linear variables - you need to 
stand up a proof of concept system, load it with representative data and 
execute representative queries and then measure that system. You can then 
use those numbers to size your production system.


I don't want to give you the impression that this notion of predicting or 
calculating the size of a production Solr system is a viable option. Sure, 
you can try and maybe you will get lucky and maybe you won't be lucky. Flip 
a coin. But what sane manager would want to plan production based on 
flipping a coin?


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 23, 2013 5:48 AM
To: solr-user@lucene.apache.org
Subject: What to test, calculate, measeure for a pre-production version of 
SolrCloud?


Hi Folks;

This week we will make a pre-production version of our system. I've been
askng some questions for a time and I gor really good responses from mail
list. At pre-production and test step:

* I want to measure how much RAM I should define for my Solr instances,
* I will try to make some predictions about how much disk space I will need
at production step.
* Maybe I will check my answer for that question: which RAID to use (or not
use) etc.

For that questions I got answers from mail list and I have some
approximations about them. Also I know that it is not easy to answer such
questions and I should test them to get more accurate answers.
My question is that::

What do you suggest me at pre-production and test step?

* i.e. give much more heap size to Solr instances to calculate RAM
* use solrmeter to test qps for your cluster
* use sematext or anything else for performance monitoring etc.

I need some advices what to test, calculate, measeure etc. Also there was a
question about Codahale metrics and Graphite. You can advice something
about that too.

PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is
tagged at repository) I will use it.

Re: Problem with solr, HTTP/Request.php and tomcat.

On 4/23/2013 7:30 AM, Viviane Ventura wrote:
 I'm using solr with tomcat and i need to add a record using
 HTTP/Request.php (PEAR).
 So, i created a test file with the following code:
 
 ?php
 require_once HTTP/Request.php;

At a quick glance (and not having much experience with PHP) your code
looks like it SHOULD work, but something is obviously wrong.

The Solr server's log should have something useful for you, if you are
logging at INFO or higher.  The exact location of the log will depend on
your servlet container.  For tomcat, that is generally in the catalina
logfile.  The log will include the request parameters, but it won't
include the body.  The error message may give you a clue, though.

You would be better off using a PHP programming API specifically made
for Solr, rather than using HTTP directly and sending XML.  If you are
using Solr 4.x, I believe that all of them may have bugs because Solr
4.0 finished removing options that were deprecated a long time ago, and
the PHP programming APIs include those options.  There are at least
three API choices available:

http://wiki.apache.org/solr/SolPHP

The PECL plugin for Solr has a filed bug, to which I attached a patch.
As it says in the bug notes, I probably didn't fix it right, but I have
confirmed with a PHP user that it does fix the problem:

https://bugs.php.net/bug.php?id=62332

Thanks,
Shawn

Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez

Hi Kai:

Thanks for your reply, for what I've understood this logic must be included in 
my application, It would be possible to, for instance, use some regular 
expression at querying time in my schema to avoid a query that contains only 
this characters? for instance + and + would be a good catch to avoid.

Thanks in advance!

- Mensaje original -
De: Kai Becker m...@kai-becker.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 9:48:26
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

Hi,

you need to escape that char in search terms.
Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

The %2B is just the url encoding, but it will still be a + for Solr, so just 
put a \ in front of the chars I mentioned.

Cheers,
Kai

Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

 Hi!
 
 Currently I'm working on a basica search engine for, the main problem is that 
 during some tests a problem was detected, in the application if a user search 
 for the + or - term only or the + string it causes an exception in 
 my application, the problem is caused for an 
 org.apache.lucene.queryParser.ParseException in solr. I get the same response 
 if, from the solr admin interface, I search for the + term. For what I've 
 seen the + character gets encoded into %2B which cause the exception. Is 
 there any way of escaping this character so they behave like any other 
 character? or at least get no response for this cases? 
 
 I'm using solr 3.6.2, deployed in tomcat7.
 
 Greetings! 
 http://www.uci.cu

http://www.uci.cu
http://www.uci.cu

Re: What to test, calculate, measeure for a pre-production version of SolrCloud?

Another aspect I neglected to mention: Think about distinguishing between 
development, test, and production systems - all separately. Your 
development system is whether you try out ideas and experiment - your proof 
of concept. Your test or pre-production system is where you verify that 
your ideas are really ready to go - the test system should parallel the 
production system and approximate real load. And finally your production 
system is where you don't have the libery to just try stuff out.


For real, cloud systems it's all about scaling of commodity boxes. Pick a 
reasonable size box and then put a reasonable amount of data on that box, 
then you can calculate how many boxes you will need for scaling (shards). 
And your HA (High Availability) and Query load requirements will drive how 
many replicas you will need for each shard.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Tuesday, April 23, 2013 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: What to test, calculate, measeure for a pre-production version 
of SolrCloud?


To be clear, there are no solid and reliable prediction rules for Solr - for
the simple reason that there are too many non-linear variables - you need to
stand up a proof of concept system, load it with representative data and
execute representative queries and then measure that system. You can then
use those numbers to size your production system.

I don't want to give you the impression that this notion of predicting or
calculating the size of a production Solr system is a viable option. Sure,
you can try and maybe you will get lucky and maybe you won't be lucky. Flip
a coin. But what sane manager would want to plan production based on
flipping a coin?

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 23, 2013 5:48 AM
To: solr-user@lucene.apache.org
Subject: What to test, calculate, measeure for a pre-production version of
SolrCloud?

Hi Folks;

This week we will make a pre-production version of our system. I've been
askng some questions for a time and I gor really good responses from mail
list. At pre-production and test step:

* I want to measure how much RAM I should define for my Solr instances,
* I will try to make some predictions about how much disk space I will need
at production step.
* Maybe I will check my answer for that question: which RAID to use (or not
use) etc.

For that questions I got answers from mail list and I have some
approximations about them. Also I know that it is not easy to answer such
questions and I should test them to get more accurate answers.
My question is that::

What do you suggest me at pre-production and test step?

* i.e. give much more heap size to Solr instances to calculate RAM
* use solrmeter to test qps for your cluster
* use sematext or anything else for performance monitoring etc.

I need some advices what to test, calculate, measeure etc. Also there was a
question about Codahale metrics and Graphite. You can advice something
about that too.

PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is
tagged at repository) I will use it.

Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jérôme Étévé

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and RD as words.
#
+ = ALPHA
 # = ALPHA
* = ALPHA
 = ALPHA
@ = ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types=delimiter_types.txt


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into + , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
jlbetanco...@uci.cuwrote:

 Hi Kai:

 Thanks for your reply, for what I've understood this logic must be
 included in my application, It would be possible to, for instance, use some
 regular expression at querying time in my schema to avoid a query that
 contains only this characters? for instance + and + would be a good
 catch to avoid.

 Thanks in advance!

 - Mensaje original -
 De: Kai Becker m...@kai-becker.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 23 de Abril 2013 9:48:26
 Asunto: Re: Querying only for + character causes
 org.apache.lucene.queryParser.ParseException

 Hi,

 you need to escape that char in search terms.
 Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

 The %2B is just the url encoding, but it will still be a + for Solr, so
 just put a \ in front of the chars I mentioned.

 Cheers,
 Kai

 Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

  Hi!
 
  Currently I'm working on a basica search engine for, the main problem is
 that during some tests a problem was detected, in the application if a user
 search for the + or - term only or the + string it causes an
 exception in my application, the problem is caused for an
 org.apache.lucene.queryParser.ParseException in solr. I get the same
 response if, from the solr admin interface, I search for the + term. For
 what I've seen the + character gets encoded into %2B which cause the
 exception. Is there any way of escaping this character so they behave like
 any other character? or at least get no response for this cases?
 
  I'm using solr 3.6.2, deployed in tomcat7.
 
  Greetings!
  http://www.uci.cu

 http://www.uci.cu
 http://www.uci.cu




-- 
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

Update on shards

2013-04-23 Thread Arkadi Colson


Hi

Is it correct that when inserting or updating document into solr you 
have to talk to a solr host where at least one shard of that collection 
is stored?

For select you can talk to any host within the collection.configName?

BR,
Arkadi

Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez

Hi Jérôme:

Thanks for your suggestion Jérôme, I'll do as you told me for allowing the 
search of this specific tokens. I've also taked into account the option of add 
the quote if lenght is 1 in the application level, but I would like to keep 
this logic inside of Solr (if possible), this is why I was thinking of some 
kind of replace regular expresion at query time, so if this change in the 
future it won't require also changing the application level, can you advice me 
on this?

Greetings!

- Mensaje original -
De: Jérôme Étévé jerome.et...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and RD as words.
#
+ = ALPHA
 # = ALPHA
* = ALPHA
 = ALPHA
@ = ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types=delimiter_types.txt


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into + , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
jlbetanco...@uci.cuwrote:

 Hi Kai:

 Thanks for your reply, for what I've understood this logic must be
 included in my application, It would be possible to, for instance, use some
 regular expression at querying time in my schema to avoid a query that
 contains only this characters? for instance + and + would be a good
 catch to avoid.

 Thanks in advance!

 - Mensaje original -
 De: Kai Becker m...@kai-becker.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 23 de Abril 2013 9:48:26
 Asunto: Re: Querying only for + character causes
 org.apache.lucene.queryParser.ParseException

 Hi,

 you need to escape that char in search terms.
 Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

 The %2B is just the url encoding, but it will still be a + for Solr, so
 just put a \ in front of the chars I mentioned.

 Cheers,
 Kai

 Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

  Hi!
 
  Currently I'm working on a basica search engine for, the main problem is
 that during some tests a problem was detected, in the application if a user
 search for the + or - term only or the + string it causes an
 exception in my application, the problem is caused for an
 org.apache.lucene.queryParser.ParseException in solr. I get the same
 response if, from the solr admin interface, I search for the + term. For
 what I've seen the + character gets encoded into %2B which cause the
 exception. Is there any way of escaping this character so they behave like
 any other character? or at least get no response for this cases?
 
  I'm using solr 3.6.2, deployed in tomcat7.
 
  Greetings!
  http://www.uci.cu

 http://www.uci.cu
 http://www.uci.cu




--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu

Re: Update on shards

2013-04-23 Thread Michael Della Bitta

I believe as of 4.2 you can talk to any host in the cloud.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote:
 Hi

 Is it correct that when inserting or updating document into solr you have to
 talk to a solr host where at least one shard of that collection is stored?
 For select you can talk to any host within the collection.configName?

 BR,
 Arkadi

Re: What to test, calculate, measeure for a pre-production version of SolrCloud?

The other thing to keep in the back of your mind as you go through
this process is that search is addicting to most organizations.
Meaning your Solr solution may quickly become a victim of its own
success. The queries we tested before going production 5+ months ago
and the queries we handle today are very different beasts. We're now
dealing with much more complexity because when we started out, the
business side didn't have a full appreciation for what was possible.
Now that they've seen Solr in action (pun intended), my team can't
keep up with all the great ideas our PM's have for how to leverage
Solr in many places that were unforeseen during initial planning.

We're entering in our third phase of adoption and are having to
increase node count and RAM significantly. Bottom line is to do all
the important things Otis and Jack have suggested, but also realize
that what you design today may only be valid for 6 months or so. Of
course I can't speak to your business situation but we've just
accepted that we need to revisit our infrastructure decisions
frequently. Admittedly, this is much easier in a cloud like Amazon
than if you're buying your own hardware.

Cheers,
Tim

On Tue, Apr 23, 2013 at 8:10 AM, Jack Krupansky j...@basetechnology.com wrote:
 Another aspect I neglected to mention: Think about distinguishing between
 development, test, and production systems - all separately. Your
 development system is whether you try out ideas and experiment - your proof
 of concept. Your test or pre-production system is where you verify that
 your ideas are really ready to go - the test system should parallel the
 production system and approximate real load. And finally your production
 system is where you don't have the libery to just try stuff out.

 For real, cloud systems it's all about scaling of commodity boxes. Pick a
 reasonable size box and then put a reasonable amount of data on that box,
 then you can calculate how many boxes you will need for scaling (shards).
 And your HA (High Availability) and Query load requirements will drive how
 many replicas you will need for each shard.

 -- Jack Krupansky

 -Original Message- From: Jack Krupansky
 Sent: Tuesday, April 23, 2013 9:54 AM
 To: solr-user@lucene.apache.org
 Subject: Re: What to test, calculate, measeure for a pre-production version
 of SolrCloud?


 To be clear, there are no solid and reliable prediction rules for Solr - for
 the simple reason that there are too many non-linear variables - you need to
 stand up a proof of concept system, load it with representative data and
 execute representative queries and then measure that system. You can then
 use those numbers to size your production system.

 I don't want to give you the impression that this notion of predicting or
 calculating the size of a production Solr system is a viable option. Sure,
 you can try and maybe you will get lucky and maybe you won't be lucky. Flip
 a coin. But what sane manager would want to plan production based on
 flipping a coin?

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, April 23, 2013 5:48 AM
 To: solr-user@lucene.apache.org
 Subject: What to test, calculate, measeure for a pre-production version of
 SolrCloud?

 Hi Folks;

 This week we will make a pre-production version of our system. I've been
 askng some questions for a time and I gor really good responses from mail
 list. At pre-production and test step:

 * I want to measure how much RAM I should define for my Solr instances,
 * I will try to make some predictions about how much disk space I will need
 at production step.
 * Maybe I will check my answer for that question: which RAID to use (or not
 use) etc.

 For that questions I got answers from mail list and I have some
 approximations about them. Also I know that it is not easy to answer such
 questions and I should test them to get more accurate answers.
 My question is that::

 What do you suggest me at pre-production and test step?

 * i.e. give much more heap size to Solr instances to calculate RAM
 * use solrmeter to test qps for your cluster
 * use sematext or anything else for performance monitoring etc.

 I need some advices what to test, calculate, measeure etc. Also there was a
 question about Codahale metrics and Graphite. You can advice something
 about that too.

 PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is
 tagged at repository) I will use it.

Solr index searcher to lucene index searcher

2013-04-23 Thread parnab kumar

Hi ,

Can anyone please point out from where a solr search originates
and how it passes to the lucene index searcher and back to solr . I
actually what to know which class in solr directly calls the lucene Index
Searcher .

Thanks.
Pom

Re: Solr index searcher to lucene index searcher

   org.apache.solr.search.SolrIndexSearcher

On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote:
 Hi ,

 Can anyone please point out from where a solr search originates
 and how it passes to the lucene index searcher and back to solr . I
 actually what to know which class in solr directly calls the lucene Index
 Searcher .

 Thanks.
 Pom

EdgeGram filter

2013-04-23 Thread hassancrowdc

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif  it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Update on shards

If you use jetty - which you should :) It's what we test with. Tomcat only gets 
user testing.

If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 
(we are voting on 4.3 now).

No clue on other containers.

- Mark

On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 I believe as of 4.2 you can talk to any host in the cloud.
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote:
 Hi
 
 Is it correct that when inserting or updating document into solr you have to
 talk to a solr host where at least one shard of that collection is stored?
 For select you can talk to any host within the collection.configName?
 
 BR,
 Arkadi

Re: Solr index searcher to lucene index searcher

2013-04-23 Thread parnab kumar

Hi  ,

Timothy,Thanks for pointing out . But i have a specific requirement
. For any query it passes through the search handler and solr finally
directs it to lucene Index Searcher. As results are matched and collected
as TopDocs in lucene i want to inspect the top K Docs , reorder them by
some logic and pass the final TopDocs to solr which solr may send as a
response .

I need to know the point where actually these interaction between solr and
lucene takes place .
Can anyone please help where to look into for this purpose .

Thanks..
Pom

On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote:

org.apache.solr.search.SolrIndexSearcher

 On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com
 wrote:
  Hi ,
 
  Can anyone please point out from where a solr search
 originates
  and how it passes to the lucene index searcher and back to solr . I
  actually what to know which class in solr directly calls the lucene Index
  Searcher .
 
  Thanks.
  Pom

Re: DocValues with docValuesFormat=Disk

2013-04-23 Thread Mou

Hi,

If you use a codec which is not default, you need to download/build lucene
codec jars and put it in solr_home/lib directory and add the codecfactory in
the solr config file.

Look here for detail instruction

http://wiki.apache.org/solr/SimpleTextCodecExample

Best,
Mou





--
View this message in context: 
http://lucene.472066.n3.nabble.com/DocValues-with-docValuesFormat-Disk-tp4058238p4058344.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr index searcher to lucene index searcher

Perhaps http://search-lucene.com/?q=custom+hits+collector ?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 23, 2013 at 12:32 PM, parnab kumar parnab.2...@gmail.com wrote:
 Hi  ,

 Timothy,Thanks for pointing out . But i have a specific requirement
 . For any query it passes through the search handler and solr finally
 directs it to lucene Index Searcher. As results are matched and collected
 as TopDocs in lucene i want to inspect the top K Docs , reorder them by
 some logic and pass the final TopDocs to solr which solr may send as a
 response .

 I need to know the point where actually these interaction between solr and
 lucene takes place .
 Can anyone please help where to look into for this purpose .

 Thanks..
 Pom

 On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote:

org.apache.solr.search.SolrIndexSearcher

 On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com
 wrote:
  Hi ,
 
  Can anyone please point out from where a solr search
 originates
  and how it passes to the lucene index searcher and back to solr . I
  actually what to know which class in solr directly calls the lucene Index
  Searcher .
 
  Thanks.
  Pom

Re: Solr index searcher to lucene index searcher

Take a look at Solr's DelegatingCollector - this article might be of
interest too: 
http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html

On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar parnab.2...@gmail.com wrote:
 Hi  ,

 Timothy,Thanks for pointing out . But i have a specific requirement
 . For any query it passes through the search handler and solr finally
 directs it to lucene Index Searcher. As results are matched and collected
 as TopDocs in lucene i want to inspect the top K Docs , reorder them by
 some logic and pass the final TopDocs to solr which solr may send as a
 response .

 I need to know the point where actually these interaction between solr and
 lucene takes place .
 Can anyone please help where to look into for this purpose .

 Thanks..
 Pom

 On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote:

org.apache.solr.search.SolrIndexSearcher

 On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com
 wrote:
  Hi ,
 
  Can anyone please point out from where a solr search
 originates
  and how it passes to the lucene index searcher and back to solr . I
  actually what to know which class in solr directly calls the lucene Index
  Searcher .
 
  Thanks.
  Pom

Re: Is there a way to load multiple schema when using zookeeper?

Yes, you can effectively chroot all the configs for a collection (to
support multiple collections in same ensemble) - see wiki:
http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

On Tue, Apr 23, 2013 at 11:23 AM, bbarani bbar...@gmail.com wrote:

 I have used multiple schema files by using multiple cores but not sure if I
 will be able to use multiple schema configuration when integrating SOLR with
 zookeeper. Can someone please let me know if its possible and if so, how?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Is there a way to load multiple schema when using zookeeper?

2013-04-23 Thread bbarani


I have used multiple schema files by using multiple cores but not sure if I
will be able to use multiple schema configuration when integrating SOLR with
zookeeper. Can someone please let me know if its possible and if so, how?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to load multiple schema when using zookeeper?

2013-04-23 Thread Chris Hostetter


: Yes, you can effectively chroot all the configs for a collection (to
: support multiple collections in same ensemble) - see wiki:
: http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

I don't think chroot is suitable for what's being asked about here ... 
that would completely isolate two cloud clusters from eachother.

i believe the intent of the question is asking about having a cloud 
cluster in which multiple collections exist, and some collections use 
differnet schema.xml files then other collections

the short answer is: absolutely, each collecion can use completley 
differnet sets of configs (just like in non-cloud mode each core can use 
distinct configs) but to help make management easier when you have many 
collections re-using the same set of configs there is the concept of a 
config set ... you can push a set of configs into ZK with a specific 
configName, nad then refer to that config set name when creating 
collections... 

https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API


-Hoss

Autocommit and replication have been slowing down

2013-04-23 Thread gustavonasu

Hi,

We migrated recently from Solr 1.4 to 3.6.1. In the new version we have
noticed that after some hours (around 8) the autocommit is taking more time
to be executed.

In the new version we have noticed that after some hours the autocommit
is taking more time to be executed. We configured autocommit with maxDocs=50
and maxTime=1ms but we've gotten few (3-5) minutes to index documents (I
got this time seeing the docsPending on the Update Stats and refresh page.
Is there another way to verify that information?). 

A similar problem has been happening with the replication. We configured
the pollInterval with 60s but the replication takes some minutes to be
executed. You could see the timeElapsed value (around 6 minutes) on the
Replication Stats.

After a server restart the indexing works as we expected for some hours.

Our solrconfig.xml file is almost the default. We just increased some
params on filterCache, queryResultCache and queryResultWindowSize. 

Has anyone ever had same problem?

Could someone has a hint or direction where to start?

*** Update Handlers
name:updateHandler  
class:   org.apache.solr.update.DirectUpdateHandler2  
version: 1.0  
description: Update handler that efficiently directly updates the on-disk
main lucene index  
stats:  commits : 1085 
autocommit maxDocs : 50 
autocommit maxTime : 1ms 
autocommits : 1085 
optimizes : 0 
rollbacks : 0 
expungeDeletes : 0 
docsPending : 18 
adds : 18 
deletesById : 5 
deletesByQuery : 0 
errors : 0 
cumulative_adds : 6294 
cumulative_deletesById : 5397 
cumulative_deletesByQuery : 0 
cumulative_errors : 0 

*** Replication Stats
stats:  handlerStart : 1366654495647 
requests : 0 
errors : 0 
timeouts : 0 
totalTime : 0 
avgTimePerRequest : NaN 
avgRequestsPerSecond : 0.0 
indexSize : 2.29 GB 
indexVersion : 1354902172888 
generation : 121266 
indexPath : /opt/solr/data/index.20130418170401 
isMaster : false 
isSlave : true 
masterUrl : http://master:9090/solr/replication 
pollInterval : 00:00:60 
isPollingDisabled : false 
isReplicating : true 
timeElapsed : 376 
bytesDownloaded : 35835 
downloadSpeed : 95 
previousCycleTimeInSeconds : 0 
indexReplicatedAt : Tue Apr 23 13:44:52 BRT 2013 
confFilesReplicatedAt : Mon Mar 18 10:27:00 BRT 2013 
replicationFailedAt : Mon Apr 22 08:05:00 BRT 2013 
timesFailed : 6 
timesIndexReplicated : 45318 
lastCycleBytesDownloaded : 35835 
timesConfigReplicated : 3 
confFilesReplicated : [schema.xml] 

Thanks,
Gustavo Nasu



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

2013-04-23 Thread P Williams

Hi,

I'd like to use the SolrEntityProcessor to partially migrate an old index
to Solr 4.1.  The source is pretty old (dated 2006-06-10 16:05:12Z)...
maybe Solr 1.2?  My data-config.xml is based on the SolrEntityProcessor
example http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
and wt=xml.
 I'm getting an error from SolrJ complaining about
responseHeader
status0/status
QTime1/QTime
/responseHeader
in the response.  Does anyone know of a work-around?

Thanks,
Tricia

1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep
document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
org.apache.solr.common.SolrException: parsing error
Caused by: org.apache.solr.common.SolrException: parsing error
Caused by: java.lang.RuntimeException: this must be known type! not:
responseHeader
at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222)
 at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128)
... 43 more

Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

2013-04-23 Thread Erik Hatcher

You might be out of luck with the SolrEntityProcessor I'd recommend writing
a simple little script that pages through /select?q=*:* from the source Solr
and write to the destination Solr. Back in the day there was this fun little
beast
https://github.com/erikhatcher/solr-ruby-flare/blob/master/solr-ruby/lib/solr/importer/solr_source.rb
where you could do something like this:

Solr::Indexer.new(SolrSource.new(...), mapping).index

Erik

On Apr 23, 2013, at 13:41 , P Williams wrote:

Hi,

I'd like to use the SolrEntityProcessor to partially migrate an old index
to Solr 4.1. The source is pretty old (dated 2006-06-10 16:05:12Z)...
maybe Solr 1.2? My data-config.xml is based on the SolrEntityProcessor
example http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
and wt=xml.
I'm getting an error from SolrJ complaining about
responseHeader
status0/status
QTime1/QTime
/responseHeader
in the response. Does anyone know of a work-around?

Thanks,
Tricia

1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep
document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
org.apache.solr.common.SolrException: parsing error
Caused by: org.apache.solr.common.SolrException: parsing error
Caused by: java.lang.RuntimeException: this must be known type! not:
responseHeader
at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222)
at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128)
... 43 more

Re: Update on shards

Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1

Can you explain more what won't be at Tomcat and what will change at 4.3?

2013/4/23 Mark Miller markrmil...@gmail.com

 If you use jetty - which you should :) It's what we test with. Tomcat only
 gets user testing.

 If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in
 4.3 (we are voting on 4.3 now).

 No clue on other containers.

 - Mark

 On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  I believe as of 4.2 you can talk to any host in the cloud.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be
 wrote:
  Hi
 
  Is it correct that when inserting or updating document into solr you
 have to
  talk to a solr host where at least one shard of that collection is
 stored?
  For select you can talk to any host within the collection.configName?
 
  BR,
  Arkadi

Re: Update on shards

The request proxying does not work with tomcat without calling an explicit 
flush in the code - jetty (which the unit tests are written against) worked 
without this flush. The flush is added to 4.3.


- Mark

On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1
 
 Can you explain more what won't be at Tomcat and what will change at 4.3?
 
 2013/4/23 Mark Miller markrmil...@gmail.com
 
 If you use jetty - which you should :) It's what we test with. Tomcat only
 gets user testing.
 
 If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in
 4.3 (we are voting on 4.3 now).
 
 No clue on other containers.
 
 - Mark
 
 On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 I believe as of 4.2 you can talk to any host in the cloud.
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be
 wrote:
 Hi
 
 Is it correct that when inserting or updating document into solr you
 have to
 talk to a solr host where at least one shard of that collection is
 stored?
 For select you can talk to any host within the collection.configName?
 
 BR,
 Arkadi

Re: Update on shards

Sorry but I want to make clears the things in my mind. Is there any
documentation that explains Solr proxying? Is it same thing with that: when
I use SolrCloud and if I send document any of the nodes at my cluster the
document will be routed into the leader of appropriate shard. So you mean I
can not do that if I use Tomcat?

2013/4/23 Mark Miller markrmil...@gmail.com

 The request proxying does not work with tomcat without calling an explicit
 flush in the code - jetty (which the unit tests are written against) worked
 without this flush. The flush is added to 4.3.


 - Mark

 On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or
 4.2.1
 
  Can you explain more what won't be at Tomcat and what will change at 4.3?
 
  2013/4/23 Mark Miller markrmil...@gmail.com
 
  If you use jetty - which you should :) It's what we test with. Tomcat
 only
  gets user testing.
 
  If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in
  4.3 (we are voting on 4.3 now).
 
  No clue on other containers.
 
  - Mark
 
  On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  I believe as of 4.2 you can talk to any host in the cloud.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be
  wrote:
  Hi
 
  Is it correct that when inserting or updating document into solr you
  have to
  talk to a solr host where at least one shard of that collection is
  stored?
  For select you can talk to any host within the collection.configName?
 
  BR,
  Arkadi

Re: EdgeGram filter

Well, you could copy to another field (using copyField) and then have an 
analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
then apply the EdgeNGramFilter to that one token. But you would have to 
query explicitly against that other field. Since you are using dismax, you 
should be able to add that second field to the qf parameter. And then remove 
the EdgeNGramFilter from your main field.


-- Jack Krupansky

-Original Message- 
From: hassancrowdc

Sent: Tuesday, April 23, 2013 12:09 PM
To: solr-user@lucene.apache.org
Subject: EdgeGram filter

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif  it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Update on shards

Yeah, I'm confused now too. Do all Solr nodes in a distributed cloud really 
have to run in the same container type?? Why isn't it just raw HTTP for one 
cloud no to talk to another? I mean each node could/should be on another 
machine, right?


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 23, 2013 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Update on shards

Sorry but I want to make clears the things in my mind. Is there any
documentation that explains Solr proxying? Is it same thing with that: when
I use SolrCloud and if I send document any of the nodes at my cluster the
document will be routed into the leader of appropriate shard. So you mean I
can not do that if I use Tomcat?

2013/4/23 Mark Miller markrmil...@gmail.com


The request proxying does not work with tomcat without calling an explicit
flush in the code - jetty (which the unit tests are written against) 
worked

without this flush. The flush is added to 4.3.


- Mark

On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or
4.2.1

 Can you explain more what won't be at Tomcat and what will change at 
 4.3?


 2013/4/23 Mark Miller markrmil...@gmail.com

 If you use jetty - which you should :) It's what we test with. Tomcat
only
 gets user testing.

 If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will 
 in

 4.3 (we are voting on 4.3 now).

 No clue on other containers.

 - Mark

 On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 I believe as of 4.2 you can talk to any host in the cloud.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be
 wrote:
 Hi

 Is it correct that when inserting or updating document into solr you
 have to
 talk to a solr host where at least one shard of that collection is
 stored?
 For select you can talk to any host within the collection.configName?

 BR,
 Arkadi

Re: Update on shards

This request proxying only applies to the read side. The write side forwards 
updates around, it doesn't proxy requests.

- Mark

On Apr 23, 2013, at 2:33 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Sorry but I want to make clears the things in my mind. Is there any
 documentation that explains Solr proxying? Is it same thing with that: when
 I use SolrCloud and if I send document any of the nodes at my cluster the
 document will be routed into the leader of appropriate shard. So you mean I
 can not do that if I use Tomcat?
 
 2013/4/23 Mark Miller markrmil...@gmail.com
 
 The request proxying does not work with tomcat without calling an explicit
 flush in the code - jetty (which the unit tests are written against) worked
 without this flush. The flush is added to 4.3.
 
 
 - Mark
 
 On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or
 4.2.1
 
 Can you explain more what won't be at Tomcat and what will change at 4.3?
 
 2013/4/23 Mark Miller markrmil...@gmail.com
 
 If you use jetty - which you should :) It's what we test with. Tomcat
 only
 gets user testing.
 
 If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in
 4.3 (we are voting on 4.3 now).
 
 No clue on other containers.
 
 - Mark
 
 On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 I believe as of 4.2 you can talk to any host in the cloud.
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be
 wrote:
 Hi
 
 Is it correct that when inserting or updating document into solr you
 have to
 talk to a solr host where at least one shard of that collection is
 stored?
 For select you can talk to any host within the collection.configName?
 
 BR,
 Arkadi

Re: Autocommit and replication have been slowing down


On 4/23/2013 11:27 AM, gustavonasu wrote:

 We migrated recently from Solr 1.4 to 3.6.1. In the new version we have
noticed that after some hours (around 8) the autocommit is taking more time
to be executed.

 In the new version we have noticed that after some hours the autocommit
is taking more time to be executed. We configured autocommit with maxDocs=50
and maxTime=1ms but we've gotten few (3-5) minutes to index documents (I
got this time seeing the docsPending on the Update Stats and refresh page.
Is there another way to verify that information?).


Your question is a bit jumbled so I don't know exactly what you are 
saying for all of this, but I'll attempt to answer what I can.  Usually 
if your commits are taking a really long time, it means you're running 
into one of two problems:


1) It is taking a really long time to autowarm your Solr caches.  In 
most cases, it is the filterCache that takes the time, but not always. 
You can see how long it takes to warm the entire searcher as well as 
each individual cache in the Statistics page of the admin UI.  To fix 
this, you have to reduce the autowarmCount on your caches, reduce the 
complexity of your queries and filters or both.


2) Your Java heap is getting exhausted and Java is spending too much 
time doing full garbage collections so it can keep working.  Eventually 
this problem will result in OOM (Out of Memory) errors in your Solr log. 
 To fix this, raise your max heap, which is the -Xmx java option when 
starting your servlet container.  Raising the java heap might also 
require that you add physical RAM to your server.


On version 3.6, I believe that an index update/commit that results in 
segment merging will wait for that merging to complete.  If you do a lot 
of indexing, eventually you will run into a very large merge, and that 
can take a lot of time.  This would not explain why every autoCommit is 
taking a long time, though - it would only explain one out of dozens or 
hundreds.



 A similar problem has been happening with the replication. We configured
the pollInterval with 60s but the replication takes some minutes to be
executed. You could see the timeElapsed value (around 6 minutes) on the
Replication Stats.


If you optimize your index, or do enough index updates so that a large 
merge takes place, then a very large portion of your index will be 
comprised of brand new files, and if your index is large, that can take 
a long time to replicate.  It is also possible for the java heap problem 
(mentioned above) to cause this.


Thanks,
Shawn

Re: Update on shards


On 4/23/2013 10:14 AM, Mark Miller wrote:

If you use jetty - which you should :) It's what we test with. Tomcat only gets 
user testing.

If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 
(we are voting on 4.3 now).

No clue on other containers.

- Mark

On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:


I believe as of 4.2 you can talk to any host in the cloud.


What exactly is the 'request proxying' thing that doesn't work on 
tomcat?  Is this something different from basic SolrCloud operation 
where you send any kind of request to any server and they get directed 
where they need to go?  I haven't heard of that not working on tomcat 
before.


If there's a Jira issue that explains this in detail, you can just send 
me there.


Thanks,
Shawn

Re: dataimporthandler does not distribute documents on solr cloud

2013-04-23 Thread Joel Bernstein

What version of Solr a re you using? In Solr 4.2+ if you don't specify
numShards when creating the collection, the implicit document router will
be used. DIH running under the implicit document router most likely would
not distribute documents.

If this is the case you'll need to recreate the collection specifying
numShards.


On Tue, Apr 23, 2013 at 7:15 AM, Montu v Boda montu.b...@highqsolutions.com
 wrote:

 Hi

 we solr cloud with 4 shards and when we try to import the data using
 dataimporthandler, it does not distribute documents in all 4 shards.

 Thanks  Regards
 Montu v Boda



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

What is cluster overseer at SolrCloud?

When I read about SolrCloud wiki there writes something about cluster
overseer. What is the role of that at read and write processes? How can I
see which node is overseer at my cluster?

Re: EdgeGram filter

2013-04-23 Thread alxsss

Hi,

I was unable to find more info about 
LimitTokenCountFilterFactory
 in solr wiki. Is there any other place to get thorough description of what it 
does?

Thanks.
Alex.

 

 

 

-Original Message-
From: Jack Krupansky j...@basetechnology.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Apr 23, 2013 11:36 am
Subject: Re: EdgeGram filter


Well, you could copy to another field (using copyField) and then have an 
analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
then apply the EdgeNGramFilter to that one token. But you would have to 
query explicitly against that other field. Since you are using dismax, you 
should be able to add that second field to the qf parameter. And then remove 
the EdgeNGramFilter from your main field.

-- Jack Krupansky

-Original Message- 
From: hassancrowdc
Sent: Tuesday, April 23, 2013 12:09 PM
To: solr-user@lucene.apache.org
Subject: EdgeGram filter

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif  it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is cluster overseer at SolrCloud?


On Apr 23, 2013, at 2:53 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 When I read about SolrCloud wiki there writes something about cluster
 overseer. What is the role of that at read and write processes? How can I
 see which node is overseer at my cluster?

The Overseer's main responsibility is to write the clusterstate.json file based 
on what individual nodes publish to ZooKeeper. It also does other things, like 
assign shard and node names. If the Overseer dies, another Overseer is elected 
and it starts processing the work queue where the dead Oveseer left off.

You can see which node is the Overseer by going to the Cloud view in the admin 
UI. Click the Tree tab. Under /overseer_elect, click on the leader node. Part 
of it's id should tell you which node is acting as the overseer.

- Mark

RE: EdgeGram filter

2013-04-23 Thread Markus Jelsma

Always check the javadocs. There's a lot of info to be found there:
http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html

-Original message-
From:alx...@aim.com alx...@aim.com
Sent: Tue 23-Apr-2013 21:06
To: solr-user@lucene.apache.org
Subject: Re: EdgeGram filter

Hi,

I was unable to find more info about
LimitTokenCountFilterFactory
in solr wiki. Is there any other place to get thorough description of what
it does?

Thanks.
Alex.

-Original Message-
From: Jack Krupansky j...@basetechnology.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Apr 23, 2013 11:36 am
Subject: Re: EdgeGram filter

Well, you could copy to another field (using copyField) and then have an
analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and
then apply the EdgeNGramFilter to that one token. But you would have to
query explicitly against that other field. Since you are using dismax, you
should be able to add that second field to the qf parameter. And then remove
the EdgeNGramFilter from your main field.

-- Jack Krupansky

-Original Message-
From: hassancrowdc
Sent: Tuesday, April 23, 2013 12:09 PM
To: solr-user@lucene.apache.org
Subject: EdgeGram filter

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?

Thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is cluster overseer at SolrCloud?

Thanks for the explanation.

2013/4/23 Mark Miller markrmil...@gmail.com


 On Apr 23, 2013, at 2:53 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  When I read about SolrCloud wiki there writes something about cluster
  overseer. What is the role of that at read and write processes? How can
 I
  see which node is overseer at my cluster?

 The Overseer's main responsibility is to write the clusterstate.json file
 based on what individual nodes publish to ZooKeeper. It also does other
 things, like assign shard and node names. If the Overseer dies, another
 Overseer is elected and it starts processing the work queue where the dead
 Oveseer left off.

 You can see which node is the Overseer by going to the Cloud view in the
 admin UI. Click the Tree tab. Under /overseer_elect, click on the leader
 node. Part of it's id should tell you which node is acting as the overseer.

 - Mark

Re: Update on shards


On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote:

 What exactly is the 'request proxying' thing that doesn't work on tomcat?  Is 
 this something different from basic SolrCloud operation where you send any 
 kind of request to any server and they get directed where they need to go? I 
 haven't heard of that not working on tomcat before.

Before 4.2, if you made a read request to a node that didn't contain part of 
the collection you where searching, it would return 404. Write requests would 
be forwarded to where they belong no matter what node you sent them to, but 
read requests required that node have a part of the collection you were 
accessing.

In 4.2 we added request proxying for this read side case. If a piece of the 
collection you are querying is not found on the node you hit, a simple proxy of 
the request is done to a node that does contain a piece of the collection.

- Mark

Re: dataimporthandler does not distribute documents on solr cloud

2013-04-23 Thread Joel Bernstein

Actually, it is Solr 4.1+ where the implicit router will be used if
nuShards is not specified.


On Tue, Apr 23, 2013 at 2:52 PM, Joel Bernstein joels...@gmail.com wrote:

 What version of Solr a re you using? In Solr 4.2+ if you don't specify
 numShards when creating the collection, the implicit document router will
 be used. DIH running under the implicit document router most likely would
 not distribute documents.

 If this is the case you'll need to recreate the collection specifying
 numShards.


 On Tue, Apr 23, 2013 at 7:15 AM, Montu v Boda 
 montu.b...@highqsolutions.com wrote:

 Hi

 we solr cloud with 4 shards and when we try to import the data using
 dataimporthandler, it does not distribute documents in all 4 shards.

 Thanks  Regards
 Montu v Boda



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Joel Bernstein
 Professional Services LucidWorks




-- 
Joel Bernstein
Professional Services LucidWorks

Re: Update on shards

Hi Mark;

All in all you say that when 4.3 is tagged at repository (I mean when it is
ready) this feature will work for Tomcat too at a stable version?


2013/4/23 Mark Miller markrmil...@gmail.com


 On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote:

  What exactly is the 'request proxying' thing that doesn't work on
 tomcat?  Is this something different from basic SolrCloud operation where
 you send any kind of request to any server and they get directed where they
 need to go? I haven't heard of that not working on tomcat before.

 Before 4.2, if you made a read request to a node that didn't contain part
 of the collection you where searching, it would return 404. Write requests
 would be forwarded to where they belong no matter what node you sent them
 to, but read requests required that node have a part of the collection you
 were accessing.

 In 4.2 we added request proxying for this read side case. If a piece of
 the collection you are querying is not found on the node you hit, a simple
 proxy of the request is done to a node that does contain a piece of the
 collection.

 - Mark

Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source

2013-04-23 Thread P Williams

Thanks Erik. I remember Solr Flare :)

On Tue, Apr 23, 2013 at 11:56 AM, Erik Hatcher erik.hatc...@gmail.comwrote:

You might be out of luck with the SolrEntityProcessor I'd recommend
writing a simple little script that pages through /select?q=*:* from the
source Solr and write to the destination Solr. Back in the day there was
this fun little beast
https://github.com/erikhatcher/solr-ruby-flare/blob/master/solr-ruby/lib/solr/importer/solr_source.rb
where you could do something like this:

Solr::Indexer.new(SolrSource.new(...), mapping).index

Erik

On Apr 23, 2013, at 13:41 , P Williams wrote:

Hi,

I'd like to use the SolrEntityProcessor to partially migrate an old index
to Solr 4.1. The source is pretty old (dated 2006-06-10 16:05:12Z)...
maybe Solr 1.2? My data-config.xml is based on the SolrEntityProcessor
example
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
and wt=xml.
I'm getting an error from SolrJ complaining about
responseHeader
status0/status
QTime1/QTime
/responseHeader
in the response. Does anyone know of a work-around?

Thanks,
Tricia

1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep
document :

SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
org.apache.solr.common.SolrException: parsing error
Caused by: org.apache.solr.common.SolrException: parsing error
Caused by: java.lang.RuntimeException: this must be known type! not:
responseHeader
at

org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222)
at

org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128)
... 43 more

Re: Using Solr For a Real Search Engine

At first I will work on 100 Solr nodes and I want to use Tomcat as
container and deploy Solr as a war. I just wonder what folks are using for
large systems and what kind of problems or benefits they have with their
choices.


2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 This question is too open-ended for anyone to give you a good answer.
  Maybe you want to ask more specific questions?  As for embedding vs. war,
 start with a simpler war and think about the alternatives if that doesn't
 work for you.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  If I want to use Solr in a web search engine what kind of strategies
 should
  I follow about how to run Solr. I mean I can run it via embedded jetty or
  use war and deploy to a container? You should consider that I will have
  heavy work load on my Solr.

Re: Is there a way to load multiple schema when using zookeeper?

Ah cool, thanks for clarifying Chris - some of that multi-config
management stuff gets confusing but much clearer from your
description.

Cheers,
Tim

On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Yes, you can effectively chroot all the configs for a collection (to
 : support multiple collections in same ensemble) - see wiki:
 : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

 I don't think chroot is suitable for what's being asked about here ...
 that would completely isolate two cloud clusters from eachother.

 i believe the intent of the question is asking about having a cloud
 cluster in which multiple collections exist, and some collections use
 differnet schema.xml files then other collections

 the short answer is: absolutely, each collecion can use completley
 differnet sets of configs (just like in non-cloud mode each core can use
 distinct configs) but to help make management easier when you have many
 collections re-using the same set of configs there is the concept of a
 config set ... you can push a set of configs into ZK with a specific
 configName, nad then refer to that config set name when creating
 collections...

 https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper
 http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API


 -Hoss

Re: spellcheck: change in behavior and QTime

2013-04-23 Thread SandeepM

I apologize for the length of the previous message.

I do see a problem with spellcheck becoming faster (notice QTime).  I also
see an increase in the number of cache hits if spellcheck=false is run one
time followed by the original spellcheck query.  Seems like spellcheck=false
alters the behavior of spellcheck. 

http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell 
http://host/solr/select?spellcheck=falsespellcheck.q=cucoo's+nestdf=spell  
http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell 
--- see a faster response and increase in the number of query cache hits.

Thanks.
-- Sandeep





--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-change-in-behavior-and-QTime-tp4058014p4058402.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Update on shards

We have a 3rd release candidate for 4.3 being voted on now.

I have never tested this feature with Tomcat - only Jetty. Users have reported 
it does not work with Tomcat. That leads one to think it may have a problem in 
other containers as well.

A previous contributor donated a patch that explicitly flushes a stream in our 
proxy code - he says this allows the feature to work with Tomcat. I committed 
this feature - the flush can't hurt, and given the previous contributions of 
this individual, I'm fairly confident the fix makes things work in Tomcat. I 
have no first hand knowledge that it does work though.

You might take the RC for a spin and test it our yourself: 
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/

- Mark

On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi Mark;
 
 All in all you say that when 4.3 is tagged at repository (I mean when it is
 ready) this feature will work for Tomcat too at a stable version?
 
 
 2013/4/23 Mark Miller markrmil...@gmail.com
 
 
 On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote:
 
 What exactly is the 'request proxying' thing that doesn't work on
 tomcat?  Is this something different from basic SolrCloud operation where
 you send any kind of request to any server and they get directed where they
 need to go? I haven't heard of that not working on tomcat before.
 
 Before 4.2, if you made a read request to a node that didn't contain part
 of the collection you where searching, it would return 404. Write requests
 would be forwarded to where they belong no matter what node you sent them
 to, but read requests required that node have a part of the collection you
 were accessing.
 
 In 4.2 we added request proxying for this read side case. If a piece of
 the collection you are querying is not found on the node you hit, a simple
 proxy of the request is done to a node that does contain a piece of the
 collection.
 
 - Mark

Re: Using Solr For a Real Search Engine

Tomcat should work just fine in most cases. The downside to Tomcat is that all 
of the devs generally run Jetty since it's the default. Also, all of our units 
tests run against Jetty - in fact, a specific version of Jetty.

Usually, Solr will run fine in other webapps. Many, many users run Solr in 
other webapps. All of our tests run against a specific version of Jetty though. 
In some (generally rare) cases, that means something might work with Jetty and 
not another container until/unless the issue is reported by a user and fixed.

- Mark

On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 At first I will work on 100 Solr nodes and I want to use Tomcat as
 container and deploy Solr as a war. I just wonder what folks are using for
 large systems and what kind of problems or benefits they have with their
 choices.
 
 
 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com
 
 Hi,
 
 This question is too open-ended for anyone to give you a good answer.
 Maybe you want to ask more specific questions?  As for embedding vs. war,
 start with a simpler war and think about the alternatives if that doesn't
 work for you.
 
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 
 
 
 
 On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
 If I want to use Solr in a web search engine what kind of strategies
 should
 I follow about how to run Solr. I mean I can run it via embedded jetty or
 use war and deploy to a container? You should consider that I will have
 heavy work load on my Solr.

Re: Is there a way to load multiple schema when using zookeeper?

If I have a Zookeper Cluster for my Hbase Cluster already, can I use same
Zookeper cluster for my SolrCloud too?

2013/4/23 Timothy Potter thelabd...@gmail.com

 Ah cool, thanks for clarifying Chris - some of that multi-config
 management stuff gets confusing but much clearer from your
 description.

 Cheers,
 Tim

 On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 
  : Yes, you can effectively chroot all the configs for a collection (to
  : support multiple collections in same ensemble) - see wiki:
  : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
 
  I don't think chroot is suitable for what's being asked about here ...
  that would completely isolate two cloud clusters from eachother.
 
  i believe the intent of the question is asking about having a cloud
  cluster in which multiple collections exist, and some collections use
  differnet schema.xml files then other collections
 
  the short answer is: absolutely, each collecion can use completley
  differnet sets of configs (just like in non-cloud mode each core can use
  distinct configs) but to help make management easier when you have many
  collections re-using the same set of configs there is the concept of a
  config set ... you can push a set of configs into ZK with a specific
  configName, nad then refer to that config set name when creating
  collections...
 
 
 https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper
 
 http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
 
 
  -Hoss

Re: Solr index searcher to lucene index searcher

2013-04-23 Thread Joel Bernstein

As Timothy mentioned, Solr has the PostFIlter mechanism, but it's not
really suited for ranking/sorting changes. To effect the ranking you'd need
to work with the TopScoreDocCollector which Solr does not give you access
to. If you're doing distributed search you'd need to account for the
ranking algorithm at the aggregation step as well.

There is a pluggable collectors jira that builds under Solr 4.1 (SOLR-4465)
but it is a proof of concept at this time. You may want to chime in on this
ticket if you find it useful.


On Tue, Apr 23, 2013 at 1:21 PM, Timothy Potter thelabd...@gmail.comwrote:

 Take a look at Solr's DelegatingCollector - this article might be of
 interest too:
 http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html

 On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar parnab.2...@gmail.com
 wrote:
  Hi  ,
 
  Timothy,Thanks for pointing out . But i have a specific
 requirement
  . For any query it passes through the search handler and solr finally
  directs it to lucene Index Searcher. As results are matched and collected
  as TopDocs in lucene i want to inspect the top K Docs , reorder them by
  some logic and pass the final TopDocs to solr which solr may send as a
  response .
 
  I need to know the point where actually these interaction between solr
 and
  lucene takes place .
  Can anyone please help where to look into for this purpose .
 
  Thanks..
  Pom
 
  On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.com
 wrote:
 
 org.apache.solr.search.SolrIndexSearcher
 
  On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com
  wrote:
   Hi ,
  
   Can anyone please point out from where a solr search
  originates
   and how it passes to the lucene index searcher and back to solr . I
   actually what to know which class in solr directly calls the lucene
 Index
   Searcher .
  
   Thanks.
   Pom
 




-- 
Joel Bernstein
Professional Services LucidWorks

Reordered DBQ.

2013-04-23 Thread Marcin Rzewucki

Hi,

Recently I noticed a lot of Reordered DBQs detected messages in logs. As
far as I checked in logs it could be related with deleting documents, but
not sure. Do you know what is the reason of those messages ?

Apr 23, 2013 1:20:14 AM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@68c8e122 realtime
Apr 23, 2013 1:20:15 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection] webapp=/solr path=/update
params={update.distrib=FROMLEADER_version_=-1433067860561756160update.from=
http://host:8983/solr/collection/wt=javabinversion=2}
{deleteByQuery=cmpy:1160027 (-1433067860561756160)} 0 1478
Apr 23, 2013 1:20:15 AM org.apache.solr.update.DirectUpdateHandler2 addDoc
INFO: Reordered DBQs detected.
Update=add{_version_=1433067860472627200,id=17183780}
DBQs=[DBQ{version=1433067860561756160,q=cmpy:1160027}]
Apr 23, 2013 1:20:15 AM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@123289e6 realtime
Apr 23, 2013 1:20:16 AM org.apache.solr.update.DirectUpdateHandler2 addDoc
INFO: Reordered DBQs detected.
Update=add{_version_=1433067860476821504,id=20102172}
DBQs=[DBQ{version=1433067860561756160,q=cmpy:1160027}]
Apr 23, 2013 1:20:16 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Re: Using Solr For a Real Search Engine

Thanks for the answer. If I find something that explains using embedded
Jetty or Jetty, or Tomcat it would be nice.

2013/4/23 Mark Miller markrmil...@gmail.com

 Tomcat should work just fine in most cases. The downside to Tomcat is that
 all of the devs generally run Jetty since it's the default. Also, all of
 our units tests run against Jetty - in fact, a specific version of Jetty.

 Usually, Solr will run fine in other webapps. Many, many users run Solr in
 other webapps. All of our tests run against a specific version of Jetty
 though. In some (generally rare) cases, that means something might work
 with Jetty and not another container until/unless the issue is reported by
 a user and fixed.

 - Mark

 On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  At first I will work on 100 Solr nodes and I want to use Tomcat as
  container and deploy Solr as a war. I just wonder what folks are using
 for
  large systems and what kind of problems or benefits they have with their
  choices.
 
 
  2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Hi,
 
  This question is too open-ended for anyone to give you a good answer.
  Maybe you want to ask more specific questions?  As for embedding vs.
 war,
  start with a simpler war and think about the alternatives if that
 doesn't
  work for you.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com
  wrote:
 
  If I want to use Solr in a web search engine what kind of strategies
  should
  I follow about how to run Solr. I mean I can run it via embedded jetty
 or
  use war and deploy to a container? You should consider that I will have
  heavy work load on my Solr.

Too many unique terms

2013-04-23 Thread Manuel Le Normand

Hi there,
Looking at one of my shards (about 1M docs) i see lot of unique terms, more
than 8M which is a significant part of my total term count. These are very
likely useless terms, binaries or other meaningless numbers that come with
few of my docs.
I am totally fine with deleting them so these terms would be unsearchable.
Thinking about it i get that
1. It is impossible apriori knowing if it is unique term or not, so i
cannot add them to my stop words.
2. I have a performance decrease cause my cached chuncks do contain useless
data, and im short on memory.

Assuming a constant index, is there a way of deleting all terms that are
unique from at least the dictionary tim and tip files? Will i get
significant query time performance increase? Does any body know a class of
regex that identify meaningless terms that i can add to my updateProcessor?

Thanks
Manu

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-23 Thread SandeepM

James, Is there a way to determine how many times the collations were tried?  
Is there a parameter that can be issued that can return this in debug
information?  This would be very helpful.
Appreciate your help with this.

Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058400.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to load multiple schema when using zookeeper?

Yes - better use of existing resources. In this case, the chroot would
be helpful to keep Solr znodes separate from HBase. For the most part,
Solr in steady-state doesn't put a lot of stress on Zookeeper, for the
most part my zk nodes are snoozing.

On Tue, Apr 23, 2013 at 1:46 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 If I have a Zookeper Cluster for my Hbase Cluster already, can I use same
 Zookeper cluster for my SolrCloud too?

 2013/4/23 Timothy Potter thelabd...@gmail.com

 Ah cool, thanks for clarifying Chris - some of that multi-config
 management stuff gets confusing but much clearer from your
 description.

 Cheers,
 Tim

 On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 
  : Yes, you can effectively chroot all the configs for a collection (to
  : support multiple collections in same ensemble) - see wiki:
  : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
 
  I don't think chroot is suitable for what's being asked about here ...
  that would completely isolate two cloud clusters from eachother.
 
  i believe the intent of the question is asking about having a cloud
  cluster in which multiple collections exist, and some collections use
  differnet schema.xml files then other collections
 
  the short answer is: absolutely, each collecion can use completley
  differnet sets of configs (just like in non-cloud mode each core can use
  distinct configs) but to help make management easier when you have many
  collections re-using the same set of configs there is the concept of a
  config set ... you can push a set of configs into ZK with a specific
  configName, nad then refer to that config set name when creating
  collections...
 
 
 https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper
 
 http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
 
 
  -Hoss

Re: Is there a way to load multiple schema when using zookeeper?


On 4/23/2013 1:46 PM, Furkan KAMACI wrote:

If I have a Zookeper Cluster for my Hbase Cluster already, can I use same
Zookeper cluster for my SolrCloud too?


Yes, you can.  It is strongly recommended that you use a chroot with the 
zkHost parameter if you are sharing zookeeper.  It's a really good idea 
to use a chroot even if you're not sharing.  Here's an example zkHost 
parameter with a chroot of /mysolr.


zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181/mysolr

You only specify the chroot once at the end, not on every host entry. 
This information is also in the zookeeper documentation.


Thanks,
Shawn

Re: Is there a way to load multiple schema when using zookeeper?

I will use Nutch with map reduce to crawl huge data and use SolrCloud for
many users with high response time. Actually I wonder about performance
issues separating Zookeper cluster or using them for both Hbase and Solr.


2013/4/23 Shawn Heisey s...@elyograg.org

 On 4/23/2013 1:46 PM, Furkan KAMACI wrote:

 If I have a Zookeper Cluster for my Hbase Cluster already, can I use same
 Zookeper cluster for my SolrCloud too?


 Yes, you can.  It is strongly recommended that you use a chroot with the
 zkHost parameter if you are sharing zookeeper.  It's a really good idea to
 use a chroot even if you're not sharing.  Here's an example zkHost
 parameter with a chroot of /mysolr.

 zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181/mysolr

 You only specify the chroot once at the end, not on every host entry. This
 information is also in the zookeeper documentation.

 Thanks,
 Shawn

Re: Using Solr For a Real Search Engine

My 2 cents on this is if you have a choice, just stick with Jetty.
This article has some pretty convincing information:

http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server

The folks over at OpenLogic definitely know their stuff when it comes
to supporting open source Java app servers. I was impressed by the
fact that Google migrated from Tomcat to Jetty for AppEngine, which is
pretty compelling evidence that Jetty works well in a very large
cluster.

Lastly, the bulk of the processing in Solr happens in Solr/Lucene code
and Jetty (or whatever engine you choose) is a very small part of any
request.

On Tue, Apr 23, 2013 at 1:52 PM, Furkan KAMACI furkankam...@gmail.com wrote:
Thanks for the answer. If I find something that explains using embedded
Jetty or Jetty, or Tomcat it would be nice.

2013/4/23 Mark Miller markrmil...@gmail.com

Tomcat should work just fine in most cases. The downside to Tomcat is that
all of the devs generally run Jetty since it's the default. Also, all of
our units tests run against Jetty - in fact, a specific version of Jetty.

Usually, Solr will run fine in other webapps. Many, many users run Solr in
other webapps. All of our tests run against a specific version of Jetty
though. In some (generally rare) cases, that means something might work
with Jetty and not another container until/unless the issue is reported by
a user and fixed.

- Mark

On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

At first I will work on 100 Solr nodes and I want to use Tomcat as
container and deploy Solr as a war. I just wonder what folks are using
for
large systems and what kind of problems or benefits they have with their
choices.

2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

This question is too open-ended for anyone to give you a good answer.
Maybe you want to ask more specific questions? As for embedding vs.
war,
start with a simpler war and think about the alternatives if that
doesn't
work for you.

Otis
--
Solr ElasticSearch Support
http://sematext.com/

On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com
wrote:

If I want to use Solr in a web search engine what kind of strategies
should
I follow about how to run Solr. I mean I can run it via embedded jetty
or
use war and deploy to a container? You should consider that I will have
heavy work load on my Solr.

Re: Using Solr For a Real Search Engine

Is there any documentation that explains using Jetty as embedded or not? I
use Solr deployed at Tomcat but after you message I will consider about
Jetty. If we think about other issues i.e. when I want to update my Solr
jars/wars etc.(this is just an foo example) does any pros and cons Tomcat
or Jetty has?

2013/4/23 Timothy Potter thelabd...@gmail.com

My 2 cents on this is if you have a choice, just stick with Jetty.
This article has some pretty convincing information:

http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server

Lastly, the bulk of the processing in Solr happens in Solr/Lucene code
and Jetty (or whatever engine you choose) is a very small part of any
request.

On Tue, Apr 23, 2013 at 1:52 PM, Furkan KAMACI furkankam...@gmail.com
wrote:
Thanks for the answer. If I find something that explains using embedded
Jetty or Jetty, or Tomcat it would be nice.

2013/4/23 Mark Miller markrmil...@gmail.com

Tomcat should work just fine in most cases. The downside to Tomcat is
that
all of the devs generally run Jetty since it's the default. Also, all of
our units tests run against Jetty - in fact, a specific version of
Jetty.

Usually, Solr will run fine in other webapps. Many, many users run Solr
in
other webapps. All of our tests run against a specific version of Jetty
though. In some (generally rare) cases, that means something might work
with Jetty and not another container until/unless the issue is reported
by
a user and fixed.

- Mark

On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com
wrote:

2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

Otis
--
Solr ElasticSearch Support
http://sematext.com/

On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI
furkankam...@gmail.com
wrote:

If I want to use Solr in a web search engine what kind of strategies
should
I follow about how to run Solr. I mean I can run it via embedded
jetty
or
use war and deploy to a container? You should consider that I will
have
heavy work load on my Solr.

Re: Using Solr For a Real Search Engine


On 4/23/2013 1:52 PM, Furkan KAMACI wrote:

Thanks for the answer. If I find something that explains using embedded
Jetty or Jetty, or Tomcat it would be nice.

2013/4/23 Mark Miller markrmil...@gmail.com


Tomcat should work just fine in most cases. The downside to Tomcat is that
all of the devs generally run Jetty since it's the default. Also, all of
our units tests run against Jetty - in fact, a specific version of Jetty.

Usually, Solr will run fine in other webapps. Many, many users run Solr in
other webapps. All of our tests run against a specific version of Jetty
though. In some (generally rare) cases, that means something might work
with Jetty and not another container until/unless the issue is reported by
a user and fixed.


Mark outlines a really good reason to use Jetty - it's extremely well 
tested.  New tests are being added all the time, and most of those will 
start Jetty to run.


If you don't already have a good reason to use a container other than 
the Jetty included in Solr, then go and copy the example setup and 
modify it until it does what you need.  The one thing that's really 
missing is an init script to manage Solr startup and shutdown.  I plan 
to do something about that, but I've got a lot of cleanup to do on it.


I've only come across one truly compelling reason to use something else: 
 If your system admins are already familiar with Tomcat, Glassfish, or 
something else, then you probably want to stick with that.  For 
instance, you may have automation in place for deploying and managing 
farms of Tomcat servers.  Switching would likely be too painful.


There could be features useful for Solr in other containers that I don't 
know about.  If there are, and someone has a good reason for needing 
those features, let us know about them.  Update the wiki.


Jetty is a low-overhead servlet container without a lot of fancy 
features.  The Jetty instance that is included in the Solr example is a 
bare-bones setup.  It does not include all of the jars or config found 
in a full Jetty download, because those features are not needed for Solr.


Thanks,
Shawn

Re: Using Solr For a Real Search Engine

According to answers here for a huge crawling system and high response time
searching SolrCloud system I will try Jetty. If anyone has a good reason
they can explain it here, you are right. By the way, Shawn when I read you
answer I understand that I should choose embedded Jetty, is that right?


2013/4/23 Shawn Heisey s...@elyograg.org

 On 4/23/2013 1:52 PM, Furkan KAMACI wrote:

 Thanks for the answer. If I find something that explains using embedded
 Jetty or Jetty, or Tomcat it would be nice.

 2013/4/23 Mark Miller markrmil...@gmail.com

  Tomcat should work just fine in most cases. The downside to Tomcat is
 that
 all of the devs generally run Jetty since it's the default. Also, all of
 our units tests run against Jetty - in fact, a specific version of Jetty.

 Usually, Solr will run fine in other webapps. Many, many users run Solr
 in
 other webapps. All of our tests run against a specific version of Jetty
 though. In some (generally rare) cases, that means something might work
 with Jetty and not another container until/unless the issue is reported
 by
 a user and fixed.


 Mark outlines a really good reason to use Jetty - it's extremely well
 tested.  New tests are being added all the time, and most of those will
 start Jetty to run.

 If you don't already have a good reason to use a container other than the
 Jetty included in Solr, then go and copy the example setup and modify it
 until it does what you need.  The one thing that's really missing is an
 init script to manage Solr startup and shutdown.  I plan to do something
 about that, but I've got a lot of cleanup to do on it.

 I've only come across one truly compelling reason to use something else:
  If your system admins are already familiar with Tomcat, Glassfish, or
 something else, then you probably want to stick with that.  For instance,
 you may have automation in place for deploying and managing farms of Tomcat
 servers.  Switching would likely be too painful.

 There could be features useful for Solr in other containers that I don't
 know about.  If there are, and someone has a good reason for needing those
 features, let us know about them.  Update the wiki.

 Jetty is a low-overhead servlet container without a lot of fancy features.
  The Jetty instance that is included in the Solr example is a bare-bones
 setup.  It does not include all of the jars or config found in a full Jetty
 download, because those features are not needed for Solr.

 Thanks,
 Shawn

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-23 Thread Dyer, James

If you enable debug-level logging for class 
org.apache.solr.spelling.SpellCheckCollator, you should get a log message for 
every collation it tries like this:

Collation:   will return zzz hits.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Tuesday, April 23, 2013 2:13 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

James, Is there a way to determine how many times the collations were tried?  
Is there a parameter that can be issued that can return this in debug
information?  This would be very helpful.
Appreciate your help with this.

Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058400.html
Sent from the Solr - User mailing list archive at Nabble.com.

Aw: Re: Support of field variants in solr

2013-04-23 Thread Timo Schmidt

Ok, thanks for this hint i have two further questions to understand it 
completly.

Settingup custom request handler makes it easier to avoid all the mapping 
parameters in the query but it
would also be possible with one request handler and all mapping in the request 
arguments right?

What about indexing, i there also a mechanism like this or should the 
application deside with target field to use? 
 

Gesendet: Dienstag, 23. April 2013 um 02:32 Uhr
Von: Alexandre Rafalovitch arafa...@gmail.com
An: solr-user@lucene.apache.org
Betreff: Re: Support of field variants in solr
To route different languages, you could use different request handlers
and do different alias mapping. There are two alias mapping:
On the way in for eDisMax:
https://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming
On the way out: 
https://wiki.apache.org/solr/CommonQueryParameters#Field_alias[https://wiki.apache.org/solr/CommonQueryParameters#Field_alias]

Between the two, you can make sure that all searches to /searchES map
'content' field to 'content_es' and for /searchDE map 'content' to
'content_de'.

Hope this helps,
Alex.

Personal blog: http://blog.outerthoughts.com/[http://blog.outerthoughts.com/]
LinkedIn: 
http://www.linkedin.com/in/alexandrerafalovitch[http://www.linkedin.com/in/alexandrerafalovitch]
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)


On Mon, Apr 22, 2013 at 2:31 PM, Timo Schmidt timo-schm...@gmx.net wrote:
 Hi together,

 i am timo and work for a solr implementation company. During the last 
 projects we came to know that we need to be able to generate different 
 variants of a document.

 Example 1 (Language):

 To handle all documents in one solr core, we need a field variant for each 
 language.


 content for spanish content

 field name=content type=text_es indexed=true stored=true 
 variant=“es“ /

 content for german content

 field name=content type=text_de indexed=true stored=true 
 variant=“de“ /


 Each of these fields can be configured in the solr schema to act optimal for 
 the specific taget language.

 Example 2 (Stores):

 We have customers who want to sell the same product in different stores for 
 different prices.


 price in frankfurt

 field name=price type=sfloat indexed=true stored=true variant=“fr“ /

 price in paris

 field name=price type=sfloat indexed=true stored=true variant=“pr“ /

 To solve this in an optimal way it would be nice when this works complely 
 transparent inside solr by definig a „variantQuery“

 A select query could look like this:

 select?variantQuery=frqf=price,content

 Additional the following is possible. No variant is present, behavious should 
 be as before, so it should be relevant for all queries.

 The setting variant=“*“ would mean: There can be several wildcard variant 
 defined in a commited document. This makes sence when the data type would be 
 the same for all variants and you will have many variants (like in the price 
 example).

 The same as during query time should be possible during indexing time.

 I know, that we can do somthing like this also with dynamic fields but then 
 we need to resolve the concrete fields during index and querytime on the 
 application level, what is possible but it would be nicer to have a concept 
 like this in solr, also working with facets is easier with this approach when 
 the concrete fieldname does not need to be populated in the application.

 So my questions are:

 What do you think about this approach?
 Is it better to work with dynamic fields? Is it reasonable when you have 200 
 variants or more of a document?
 What needs to be done in solr to have something like this variant attribute 
 for fields?
 Do you have other approaches?

Re: Using Solr For a Real Search Engine


On 4/23/2013 2:25 PM, Furkan KAMACI wrote:

Is there any documentation that explains using Jetty as embedded or not? I
use Solr deployed at Tomcat but after you message I will consider about
Jetty. If we think about other issues i.e. when I want to update my Solr
jars/wars etc.(this is just an foo example) does any pros and cons Tomcat
or Jetty has?


The Jetty in the example is only 'embedded' in the sense that you don't 
have to install it separately.  It is not special -- the Jetty 
components are not changed at all, a subset of them is just included in 
the Solr download with a tuned configuration file.


If you go to www.eclipse.org/jetty and download the latest stable-8 
version, you'll see some familiar things - start.jar, an etc directory, 
a lib directory, and a contexts directory.  They have more in them than 
the example does -- extra functionality Solr doesn't need.  If you want 
to start the downloaded version, you can use 'java -jar start.jar' just 
like you do with Solr.


Thanks,
Shawn

Re: Using Solr For a Real Search Engine

Thanks for the answers. I will go with embedded Jetty for my SolrCloud. If
I face with something important I would want to share my experiences with
you.

2013/4/23 Shawn Heisey s...@elyograg.org

 On 4/23/2013 2:25 PM, Furkan KAMACI wrote:

 Is there any documentation that explains using Jetty as embedded or not? I
 use Solr deployed at Tomcat but after you message I will consider about
 Jetty. If we think about other issues i.e. when I want to update my Solr
 jars/wars etc.(this is just an foo example) does any pros and cons Tomcat
 or Jetty has?


 The Jetty in the example is only 'embedded' in the sense that you don't
 have to install it separately.  It is not special -- the Jetty components
 are not changed at all, a subset of them is just included in the Solr
 download with a tuned configuration file.

 If you go to www.eclipse.org/jetty and download the latest stable-8
 version, you'll see some familiar things - start.jar, an etc directory, a
 lib directory, and a contexts directory.  They have more in them than the
 example does -- extra functionality Solr doesn't need.  If you want to
 start the downloaded version, you can use 'java -jar start.jar' just like
 you do with Solr.

 Thanks,
 Shawn

minGramSize

2013-04-23 Thread hassancrowdc

Hi,
I want my minGramSize in ngram filter to be the size of the word passed in
the query. how can i do that? 

Because if i put minsize to 2 and write in abc it gives me result for ab and
bc  i just want abc or what ever the length of my word is, i want it to be
the minGram Size. how can i do that?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Book text with chapter line number

2013-04-23 Thread Jason Funk

Hello.

I'm trying to figure out if Solr is going to work for a new project that I am 
wanting to build. At it's heart it's a book text searching application. Each 
book is broken into chapters and each chapter is broken into lines. I want to 
be able to search these books and return relevant sections of the book and 
display the results with chapter and line number. I'm not sure how I would 
structure my data so that it's efficient and functional. I could simply treat 
each line of text as a document which would provide some of the functionality 
but what if the search query spanned two lines? Then it seems the passage the 
user was searching for wouldn't be returned. I could treat each book as a 
document and use highlighting to find the context but that seems to limit 
weighting/results for best matches as well as difficultly in finding 
chapter/line numbers. What is the best way to do this with Solr?

Is there a better tool to use to solve my problem?

Re: Reordered DBQ.

2013-04-23 Thread Yonik Seeley

On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki mrzewu...@gmail.com wrote:
 Recently I noticed a lot of Reordered DBQs detected messages in logs. As
 far as I checked in logs it could be related with deleting documents, but
 not sure. Do you know what is the reason of those messages ?

For high throughput indexing, we version updates on the leader and
forward onto other replicas w/o strict serialization.
If on a leader, an add happened before a DBQ, then on a replica the
DBQ is serviced before the add, Solr detects this reordering and fixes
it.
It's not an error or an indication that anything is wrong (hence the
INFO level log message).

-Yonik
http://lucidworks.com

Re: Too many close, count -1

2013-04-23 Thread Chris Hostetter


: Subject: Re: Too many close, count -1

Thanks for the details, nothing jumps out at me, but we're now tracking 
this in SOLR-4753...

https://issues.apache.org/jira/browse/SOLR-4753

-Hoss

Re: Solr index searcher to lucene index searcher

2013-04-23 Thread Chris Hostetter


:   . For any query it passes through the search handler and solr finally
:   directs it to lucene Index Searcher. As results are matched and collected
:   as TopDocs in lucene i want to inspect the top K Docs , reorder them by
:   some logic and pass the final TopDocs to solr which solr may send as a
:   response .

can you elaborate on what exactly your some logic involves?

instead of writing a custom collector, using a function query may be the 
best solution.

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Autocommit and replication have been slowing down

2013-04-23 Thread gustavonasu

Hi Shawn,

Thanks for the answer.

If I understand well the autoWarmCount is the number of elements used from
the cache for new searches. I guess that this isn't the problem because
after the commit property increases on the UPDATE HANDLERS (admin UI) I
can see the new docs in the searches result.

Unfortunately I can't increase the java heap on the servers right now. So I
was think to change some configurations to release some memory. For example,
we could decrease the maxBufferedDocs value. Do you know if it will be
effective?

Best Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: minGramSize

Why are you bothering to use an Edge/NGram filter if you are setting the 
minGramSize to the token size?!! I mean, why bother - just skip the 
Edge/NGrem filter and it would give the same result - setting minGramSize to 
the token size means that there would be only a single gram and it would be 
identical to the token text.


Now... tell us what you are really trying to accomplish with this diversion.

-- Jack Krupansky

-Original Message- 
From: hassancrowdc

Sent: Tuesday, April 23, 2013 4:56 PM
To: solr-user@lucene.apache.org
Subject: minGramSize

Hi,
I want my minGramSize in ngram filter to be the size of the word passed in
the query. how can i do that?

Because if i put minsize to 2 and write in abc it gives me result for ab and
bc  i just want abc or what ever the length of my word is, i want it to be
the minGram Size. how can i do that?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: minGramSize

2013-04-23 Thread Walter Underwood

Perhaps he needs different analyzer chains for index and query. Create the edge 
ngrams when indexing, but not when querying.

wunder

On Apr 23, 2013, at 2:44 PM, Jack Krupansky wrote:

 Why are you bothering to use an Edge/NGram filter if you are setting the 
 minGramSize to the token size?!! I mean, why bother - just skip the 
 Edge/NGrem filter and it would give the same result - setting minGramSize to 
 the token size means that there would be only a single gram and it would be 
 identical to the token text.
 
 Now... tell us what you are really trying to accomplish with this diversion.
 
 -- Jack Krupansky
 
 -Original Message- From: hassancrowdc
 Sent: Tuesday, April 23, 2013 4:56 PM
 To: solr-user@lucene.apache.org
 Subject: minGramSize
 
 Hi,
 I want my minGramSize in ngram filter to be the size of the word passed in
 the query. how can i do that?
 
 Because if i put minsize to 2 and write in abc it gives me result for ab and
 bc  i just want abc or what ever the length of my word is, i want it to be
 the minGram Size. how can i do that?
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html
 Sent from the Solr - User mailing list archive at Nabble.com. 

--
Walter Underwood
wun...@wunderwood.org

Does SolrCloud supports QueryElevationComponent?

When I read Lucidworks' Solr Guide I saw that:

Distributed searching does not support the QueryElevationComponent, which
configures the
top results for a given query regardless of Lucene's scoring

is that still true for SolrCloud?

Re: Book text with chapter line number