Re: spell suggestions help

2013-04-12 Thread Rohan Thakur
hi jack

I am using whitespace toknizer only and before this im using pattern
replace to replace amp; with and but its not working I guess.

my query analyser:
/analyzer
  analyzer type=query
 charFilter class=solr.PatternReplaceCharFilterFactory
pattern=amp; replacement=and/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
enablePositionIncrements=true



On Thu, Apr 11, 2013 at 6:03 PM, Jack Krupansky j...@basetechnology.comwrote:

 Try replacing standard tokenizer with whitespace tokenizer in your field
 types. And make sure not to use any other token filters that might discard
 special characters (or provide a character map if they support one.)

 Also, be side to try your test terms in the Solr Admin UI ANalyzer page to
 see that the  is preserved or which stage in term analysis it gets
 discarded.

 -- Jack Krupansky

 -Original Message- From: Rohan Thakur
 Sent: Thursday, April 11, 2013 7:39 AM
 To: solr-user@lucene.apache.org
 Subject: Re: spell suggestions help


 urlencode replaces  with space thus resulting in results that contains
 even the single terms like in the case of mobile  accessories it replaces
 it with mobile accessories and results the document containing even
 accessories which i dont want. how to tackle this I tried using pattern
 replace filter at query time to replace  with and but it did not worked I
 used amp; = replace with and in this but did not worked any guess our
 help..

 thanks
 regards
 rohan


 On Thu, Apr 11, 2013 at 4:39 PM, Rohan Thakur rohan.i...@gmail.com
 wrote:

  hi erick

 do we have to do urlencoding from the php side or does solr supports
 urlencode?


 On Thu, Apr 11, 2013 at 5:57 AM, Erick Erickson erickerick...@gmail.com
 **wrote:

  Try URL encoding it and/or escaping the 

 On Tue, Apr 9, 2013 at 2:32 AM, Rohan Thakur rohan.i...@gmail.com
 wrote:
  hi all
 
  one thing I wanted to clear is for every other query I have got correct
  suggestions but these 2 cases I am not getting what suppose to be the
  suggestions:
 
  1) I have kettle(doc frequency =5) and cable(doc frequecy=1) word
 indexed
  in direct solr spell cheker..but when I query for cattle I get cable as
  only suggestion and not kettle why is this happening i want to get
 kettle
  in suggestion as well im using jarowinkler distance according to which
  score for cattle = cable which is coming out to be 0.857 and for
 cattle =
  kettle which is coming out to be 0.777  kettle should also come in
  suggestions but its not how can I correct this any one.
 
  2) how to query for sentence like hand blandar  chopper as  is
  delimiter for solr query and thus this query is returning error.
 
  thanks in advance
  regards
  Rohan







XInclude in data-config.xml

2013-04-12 Thread stockii
hello.

is it possible to include some entities with XInclude in my data-config.xml?

i tried with this line:
xi:include href=solr/entity.xml
xmlns:xi=http://www.w3.org/2001/XInclude; /

in my entity.xml  is something like:
entity name=name query=SELECT * FROM table/entity

some ideas, why does not work? 
this blog sounds good for me =( 
http://www.raspberry.nl/2010/10/30/solr-xml-config-includes/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XInclude-in-data-config-xml-tp4055487.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow qTime for distributed search

2013-04-12 Thread Furkan KAMACI
Manuel Le Normand, I am sorry but I want to learn something. You said you
have 40 dedicated servers. What is your total document count, total
document size, and total shard size?

2013/4/11 Manuel Le Normand manuel.lenorm...@gmail.com

 Hi,
 We have different working hours, sorry for the reply delay. Your assumed
 numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
 there are two shards per server (+2 slaves that should do no work
 normally).
 An average query has about 30 conditions (OR AND mixed), most of them
 textual, a small part on dateTime. They use only simple queries (no facet,
 filters etc.) as it is taken from the actual query set of my entreprise
 that works with an old search engine.

 As we said, if the shards in collection1 and collection2 have the same
 number of docs each (and same RAM  CPU per shard), it is apparently not a
 slow IO issue, right? So the fact of not having cached all my index doesn't
 seem the be the bottleneck.Moreover, i do store the fields but my query set
 requests only the id's and rarely snippets so I'd assume that the plenty of
 RAM i'd give the OS wouldn't make any difference as these *.fdt files don't
 need to get cached.

 The conclusion i get to is that the merging issue is the problem, and the
 only possibility of outsmarting it is to distribute to much fewer shards,
 meaning that i'll get back to few millions of docs per shard which are
 about linearly slower with the num of docs per shard. Though the latter
 should improve if i give much more RAM per server.

 I'll try tweaking a bit my schema and making better use of solr cache
 (filter query as an example), but i have something telling me the problem
 might be elsewhere. My main clue to it is that merging seems a simple CPU
 task, and tests show that even with a small amount of responses it takes a
 long time (and clearly the merging task on few docs is very short)


 On Wed, Apr 10, 2013 at 2:50 AM, Shawn Heisey s...@elyograg.org wrote:

  On 4/9/2013 3:50 PM, Furkan KAMACI wrote:
 
  Hi Shawn;
 
  You say that:
 
  *... your documents are about 50KB each.  That would translate to an
 index
  that's at least 25GB*
 
  I know we can not say an exact size but what is the approximately ratio
 of
  document size / index size according to your experiences?
 
 
  If you store the fields, that is actual size plus a small amount of
  overhead.  Starting with Solr 4.1, stored fields are compressed.  I
 believe
  that it uses LZ4 compression.  Some people store all fields, some people
  store only a few or one - an ID field.  The size of stored fields does
 have
  an impact on how much OS disk cache you need, but not as much as the
 other
  parts of an index.
 
  It's been my experience that termvectors take up almost as much space as
  stored data for the same fields, and sometimes more.  Starting with Solr
  4.2, termvectors are also compressed.
 
  Adding docValues (new in 4.2) to the schema will also make the index
  larger.  The requirements here are similar to stored fields.  I do not
 know
  whether this data gets compressed, but I don't think it does.
 
  As for the indexed data, this is where I am less clear about the storage
  ratios, but I think you can count on it needing almost as much space as
 the
  original data.  If the schema uses types or filters that produce a lot of
  information, the indexed data might be larger than the original input.
   Examples of data explosions in a schema: trie fields with a non-zero
  precisionStep, the edgengram filter, the shingle filter.
 
  Thanks,
  Shawn
 
 



RE: SolrCloud leader to replica

2013-04-12 Thread Zhang, Lisheng
Hi Otis and Timothy,

Thanks very much for helps, sure I will test to make sure. What I
mentioned before is a mere possibility, likely you are correct:
the small delay may not matter in reality (yes we do use the same
way to do pagination and no isse ever happened even once).

Surely solr is enormously valuable to us and we really appreciate 
your helps!

Lisheng


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thursday, April 11, 2013 5:27 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader to replica


Hi,

I think Timothy is right about what Lisheng is really after, which is
consistency.

I agree with what Timothy is implying here - changes of search being
inconsistent are very, very small.  I'm guessing Lisheng is trying to
solve a problem he doesn't actually have yet?  Also, think about a
non-SolrCloud solution.  What happens when a user pages through
results?  Typically that just re-runs the same query, but with a
different page offset.  What happens if between page 1 and page 2 the
index changes and a searcher is reopened?  Same sort of problem can
happen, right?  Yet, in a few hundred client engagements involving
Solr or ElasticSearch I don't recall this ever being an issue.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 8:13 PM, Timothy Potter thelabd...@gmail.com wrote:
 Hmmm ... I was following this discussion but then got confused when Lisheng
 said to change Solr to compromise consistency in order to increase
 availability when your concern is how long replica is behind leader.
 Seems you want more consistency vs. less in this case? One of the reasons
 behind Solr's leader election approach is to achieve low-latency eventual
 consistency (Mark's term from the linked to discussion).

 Un-committed docs are only visible if you use real-time get, in which case
 the request is served by the shard leader (or replica) from its update log.
 I suppose there's a chance of a few millis between the leader having the
 request in its tlog and the replica having the doc it its tlog but that
 seems like the nature of the beast. Meaning that Solr never promised to be
 100% consistent at millisecond granularity in a distributed model - any
 small time-window between what a leader has and replica are probably
 network latency which you should solve outside of Solr. I suspect you could
 direct all your real-time get requests to leaders only using some smart
 client like CloudSolrServer if it mattered that much.

 Otherwise, all other queries require the document to be committed to be
 visible. I suppose there is a very small window when a new searcher is open
 on the leader and the new searcher is not yet open on the replica. However,
 with soft-commits, that too seems like a milli or two based on network
 latency.

 @Shawn - yes, I've actually seen this work in my cluster. We lose replicas
 from time-to-time and indexing keeps on trucking.





 On Thu, Apr 11, 2013 at 4:51 PM, Zhang, Lisheng 
 lisheng.zh...@broadvision.com wrote:

 Hi Otis,

 Thanks very much for helps, your explanation is very clear.

 My main concern is not the return status for indexing calls (although
 which is
 also important), my main concern is how long replica is behind the leader
 (or
 putting in your way, how consistent search picture is to client A and B).

 Our application requires clients see same result whether he hits leader or
 replica, so it seems we do have a problem here. If no better solution I may
 consider to change solr4 a little (I have not read solr4x fully yet) to
 compromise
 consistency (C) in order to increase availability (A), on a high level do
 you see
 serious problems in this approach (I am familiar with lucene/solr code to
 some
 extent)?

 Thanks and best regards, Lisheng

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Thursday, April 11, 2013 2:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud leader to replica


 But note that I misspoke, which I realized after re-reading the thread
 I pointed you to.  Mark explains it nicely there:
 * the index call returns only when (and IF!) indexing to all replicas
 succeeds

 BUT, that should not be mixed with what search clients see!
 Just because the indexing client sees the all or nothing situation
 depending on whether indexing was successful on all replicas does NOT
 mean that search clients will always see a 100% consistent picture.
 Client A could hit the leader and see a newly indexed document, while
 client B could query the replica and not see that same document simply
 because the doc hasn't gotten there yet, or because soft commit hasn't
 happened just yet.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 11, 2013 at 4:39 PM, Zhang, Lisheng
 lisheng.zh...@broadvision.com wrote:
  Thanks very much for your helps!
 
  -Original Message-
  From: Otis Gospodnetic 

Re: XInclude in data-config.xml

2013-04-12 Thread Alexandre Rafalovitch
Are you sure your original problem is not fixable with resolvable
properties and variable substitutions ${varname}? Because Solr has good
support for that.

Otherwise, check that you have the right relative file path. I am not sure
what the XML processor think it is. Use truss/strace on Unix/Mac and
Process Monitor on Windows. It is often faster to check that to try to
guess.

Regards,
Alex

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Apr 12, 2013 at 3:31 AM, stockii stock.jo...@googlemail.com wrote:

 hello.

 is it possible to include some entities with XInclude in my
 data-config.xml?

 i tried with this line:
 xi:include href=solr/entity.xml
 xmlns:xi=http://www.w3.org/2001/XInclude; /

 in my entity.xml  is something like:
 entity name=name query=SELECT * FROM table/entity

 some ideas, why does not work?
 this blog sounds good for me =(
 http://www.raspberry.nl/2010/10/30/solr-xml-config-includes/



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/XInclude-in-data-config-xml-tp4055487.html
 Sent from the Solr - User mailing list archive at Nabble.com.



how to migrate solr 1.4 index to solr 4.2 index

2013-04-12 Thread Montu v Boda
Hi

any body can help on my below question please.

how to migrate solr 1.4 index to solr 4.2 index?

I have do the following but not work completely.

i have migrate 1.4 index to 3.5 index and it's done successfully.

but now when i try to migrate 3.5 index to 4.2 index then it is not migrate
successfully and give me the below error.

INFO: [] webapp=/solr35 path=/replication
params={file=_zp.nrmcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
status=0 QTime=0 
Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr35 path=/replication
params={file=_yj.frqcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
status=0 QTime=0 
Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr35 path=/replication
params={file=_zp.tiscommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
status=0 QTime=0 
Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr35 path=/replication
params={file=_yj_8.delcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
status=0 QTime=0 
Apr 12, 2013 4:50:03 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 3 secs
Apr 12, 2013 4:50:04 PM org.apache.solr.update.DefaultSolrCoreState
newIndexWriter
INFO: Creating new IndexWriter...
Apr 12, 2013 4:50:04 PM org.apache.solr.update.DefaultSolrCoreState
newIndexWriter
INFO: Waiting until IndexWriter is unused... core=collection1
Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory
closeCacheValue
INFO: looking to close D:\solr421\data\index.2013041216531
[CachedDirrefCount=0;path=D:\solr421\data\index.2013041216531;done=true]
Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory close
INFO: Closing directory: D:\solr421\data\index.2013041216531
Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory
closeCacheValue
INFO: Removing directory before core close:
D:\solr421\data\index.2013041216531
Apr 12, 2013 4:50:04 PM org.apache.solr.common.SolrException log
SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
failed : 
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:459)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:222)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource:
SimpleFSIndexInput(path=D:\solr421\data\index\_yj.fdx)): 1 (needs to be
between 2 and 3). This version of Lucene only supports indexes created with
release 3.0 and later.
at
org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.checkCodeVersion(Lucene3xStoredFieldsReader.java:119)
at
org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacyInfos(Lucene3xSegmentInfoReader.java:74)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:312)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:673)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:198)
at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:180)
at
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:615)
at
org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:622)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446)
... 10 more

Apr 12, 2013 4:50:23 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr35 path=/replication
params={slave=falsecommand=detailsqt=/replicationwt=javabinversion=2}
status=0 QTime=0 
Apr 12, 

Re: XInclude in data-config.xml

2013-04-12 Thread Jack Krupansky

Is your data-config.xml file located in your Solr conf directory?

You have solr/ at the front of your path, so is the included file really 
in conf/solr?


Otherwise, this should work.

Make sure you only have a single XML element in the included file.

-- Jack Krupansky

-Original Message- 
From: stockii

Sent: Friday, April 12, 2013 3:31 AM
To: solr-user@lucene.apache.org
Subject: XInclude in data-config.xml

hello.

is it possible to include some entities with XInclude in my data-config.xml?

i tried with this line:
xi:include href=solr/entity.xml
xmlns:xi=http://www.w3.org/2001/XInclude; /

in my entity.xml  is something like:
entity name=name query=SELECT * FROM table/entity

some ideas, why does not work?
this blog sounds good for me =(
http://www.raspberry.nl/2010/10/30/solr-xml-config-includes/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XInclude-in-data-config-xml-tp4055487.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: spell suggestions help

2013-04-12 Thread Jack Krupansky
Be sure to use the Solr Admin UI Analysis page to verify what is happening 
at each stage of analysis. For BOTH index and query.


You only showed us your query analyzer... show us the index analyzer as 
well.


Did you make sure to delete the index data and completely reindex after 
changing the index analyzer?


Or maybe your index and query analyzers are not in-sync and compatible.

Do you have anything in your stopwords file? and is usually considered a 
stop word - so the stop filter would remove it.


-- Jack Krupansky

-Original Message- 
From: Rohan Thakur

Sent: Friday, April 12, 2013 2:12 AM
To: solr-user@lucene.apache.org
Subject: Re: spell suggestions help

hi jack

I am using whitespace toknizer only and before this im using pattern
replace to replace amp; with and but its not working I guess.

my query analyser:
/analyzer
 analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=amp; replacement=and/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory
   ignoreCase=true
   words=lang/stopwords_en.txt
   enablePositionIncrements=true



On Thu, Apr 11, 2013 at 6:03 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Try replacing standard tokenizer with whitespace tokenizer in your field
types. And make sure not to use any other token filters that might discard
special characters (or provide a character map if they support one.)

Also, be side to try your test terms in the Solr Admin UI ANalyzer page to
see that the  is preserved or which stage in term analysis it gets
discarded.

-- Jack Krupansky

-Original Message- From: Rohan Thakur
Sent: Thursday, April 11, 2013 7:39 AM
To: solr-user@lucene.apache.org
Subject: Re: spell suggestions help


urlencode replaces  with space thus resulting in results that contains
even the single terms like in the case of mobile  accessories it replaces
it with mobile accessories and results the document containing even
accessories which i dont want. how to tackle this I tried using pattern
replace filter at query time to replace  with and but it did not worked I
used amp; = replace with and in this but did not worked any guess our
help..

thanks
regards
rohan


On Thu, Apr 11, 2013 at 4:39 PM, Rohan Thakur rohan.i...@gmail.com
wrote:

 hi erick


do we have to do urlencoding from the php side or does solr supports
urlencode?


On Thu, Apr 11, 2013 at 5:57 AM, Erick Erickson erickerick...@gmail.com
**wrote:

 Try URL encoding it and/or escaping the 


On Tue, Apr 9, 2013 at 2:32 AM, Rohan Thakur rohan.i...@gmail.com
wrote:
 hi all

 one thing I wanted to clear is for every other query I have got 
 correct

 suggestions but these 2 cases I am not getting what suppose to be the
 suggestions:

 1) I have kettle(doc frequency =5) and cable(doc frequecy=1) word
indexed
 in direct solr spell cheker..but when I query for cattle I get cable 
 as

 only suggestion and not kettle why is this happening i want to get
kettle
 in suggestion as well im using jarowinkler distance according to which
 score for cattle = cable which is coming out to be 0.857 and for
cattle =
 kettle which is coming out to be 0.777  kettle should also come in
 suggestions but its not how can I correct this any one.

 2) how to query for sentence like hand blandar  chopper as  is
 delimiter for solr query and thus this query is returning error.

 thanks in advance
 regards
 Rohan











Re: XInclude in data-config.xml

2013-04-12 Thread Andre Bois-Crettez

On 04/12/2013 09:31 AM, stockii wrote:

hello.

is it possible to include some entities with XInclude in my data-config.xml?


We first struggled with XInclude, and then switched to use custom
entities, which worked much better for our needs (reusing  common parts
in several SearchHandlers).
ex. in solrconfig.xml :

?xml version=1.0 encoding=UTF-8 ?
!DOCTYPE config [
!ENTITY solrconfigcommon SYSTEM solrconfig_common.xml
]
config
...
requestHandler name=search class=solr.SearchHandler default=true
lst name=defaults
solrconfigcommon;
/lst
...

/config

in solrconfig_common.xml :

!-- XML fragment used as entity in solrconfig.xml --
str name=echoParamsexplicit/str
str name=defTypeedismax/str
str name=qftitle^4 description^1/str
str name=q*:*/str
str name=q.alt*:*/str
str name=rows20/str
str name=q.opAND/str
str name=pftitle~2^2.0/str

HTH
André


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: XInclude in data-config.xml

2013-04-12 Thread Sujatha Arun
Hi Andre,

In 3.6.1 version when we used the entities in schema.xml for language
analyzers ,it  gave errors on server restart and core would not load .

Regards,
Sujatha


2013/4/12 Andre Bois-Crettez andre.b...@kelkoo.com

 On 04/12/2013 09:31 AM, stockii wrote:

 hello.

 is it possible to include some entities with XInclude in my
 data-config.xml?


 We first struggled with XInclude, and then switched to use custom
 entities, which worked much better for our needs (reusing  common parts
 in several SearchHandlers).
 ex. in solrconfig.xml :

 ?xml version=1.0 encoding=UTF-8 ?
 !DOCTYPE config [
 !ENTITY solrconfigcommon SYSTEM solrconfig_common.xml
 ]
 config
 ...
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 solrconfigcommon;
 /lst
 ...

 /config

 in solrconfig_common.xml :

 !-- XML fragment used as entity in solrconfig.xml --
 str name=echoParamsexplicit/**str
 str name=defTypeedismax/str
 str name=qftitle^4 description^1/str
 str name=q*:*/str
 str name=q.alt*:*/str
 str name=rows20/str
 str name=q.opAND/str
 str name=pftitle~2^2.0/str

 HTH
 André


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



SolrCloud vs Solr master-slave replication

2013-04-12 Thread Victor Ruiz
Hi,

I've just posted this week an issue today with our Solr index:
http://lucene.472066.n3.nabble.com/corrupted-index-in-slave-td4054769.html,

Today, that error started to happen constantly for almost every request, and
I created a JIRA issue becaue I thought it was a bug
https://issues.apache.org/jira/browse/SOLR-4707

As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:

* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~1 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance? 
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability? 

Kind Regards,

Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr spell correction help

2013-04-12 Thread Rohan Thakur
hi all

I have configured solr direct spell correction on spell field most of the
words solr is correcting and giving suggestions but on some words like
mentioned below is giving absurd results:

1) blender(indexed)
2) kettle(indexed)
3) electric(indexed)

problems:
1) when I search for blandar its giving correct result as blender but when
I search for blandars its not giving correction as blender

2) for this when I search for kettle the correct spell its still showing it
to be false but not giving suggestions and even the results documents are
showing up. and when I search for cettle its giving correct result as
kettle but when I search for cattle its not giving any suggestions.

3) for this again when I search for electric the correct spell its showing
it to be false in suggestions section but not giving any suggestions and
documents are also returning for this spelling as its the correct one.

even if I want solr to return samsung as spell suggetion if I search for
sam what could be the configuration and what could be the solution for
above problems? please help.

thanks in advance

regards
Rohan


Re: how to migrate solr 1.4 index to solr 4.2 index

2013-04-12 Thread Upayavira
Try optimising your index in 3.5 before migrating to 4.2, as this should
upgrade all segments to the 3.x format.

Note however, you are likely to find issues using an index from 1.4 in a
4.x system. You will have to maintain the old field definitions using
the old components, which will likely render some features
non-functioning.

For example, I have an index in that situation. My date fields were of
type DateField not TrieDateField, meaning I could not use them in boost
functions.

If you can, try to think of and plan a way to re-index your content.

Upayavira

On Fri, Apr 12, 2013, at 12:24 PM, Montu v Boda wrote:
 Hi
 
 any body can help on my below question please.
 
 how to migrate solr 1.4 index to solr 4.2 index?
 
 I have do the following but not work completely.
 
 i have migrate 1.4 index to 3.5 index and it's done successfully.
 
 but now when i try to migrate 3.5 index to 4.2 index then it is not
 migrate
 successfully and give me the below error.
 
 INFO: [] webapp=/solr35 path=/replication
 params={file=_zp.nrmcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
 status=0 QTime=0 
 Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr35 path=/replication
 params={file=_yj.frqcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
 status=0 QTime=0 
 Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr35 path=/replication
 params={file=_zp.tiscommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
 status=0 QTime=0 
 Apr 12, 2013 4:50:03 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr35 path=/replication
 params={file=_yj_8.delcommand=filecontentchecksum=truegeneration=1190qt=/replicationwt=filestream}
 status=0 QTime=0 
 Apr 12, 2013 4:50:03 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
 INFO: Total time taken for download : 3 secs
 Apr 12, 2013 4:50:04 PM org.apache.solr.update.DefaultSolrCoreState
 newIndexWriter
 INFO: Creating new IndexWriter...
 Apr 12, 2013 4:50:04 PM org.apache.solr.update.DefaultSolrCoreState
 newIndexWriter
 INFO: Waiting until IndexWriter is unused... core=collection1
 Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory
 closeCacheValue
 INFO: looking to close D:\solr421\data\index.2013041216531
 [CachedDirrefCount=0;path=D:\solr421\data\index.2013041216531;done=true]
 Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory
 close
 INFO: Closing directory: D:\solr421\data\index.2013041216531
 Apr 12, 2013 4:50:04 PM org.apache.solr.core.CachingDirectoryFactory
 closeCacheValue
 INFO: Removing directory before core close:
 D:\solr421\data\index.2013041216531
 Apr 12, 2013 4:50:04 PM org.apache.solr.common.SolrException log
 SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index
 fetch
 failed : 
   at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:459)
   at
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281)
   at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:222)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
   at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
   at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
 Source)
   at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
 version is not supported (resource:
 SimpleFSIndexInput(path=D:\solr421\data\index\_yj.fdx)): 1 (needs to be
 between 2 and 3). This version of Lucene only supports indexes created
 with
 release 3.0 and later.
   at
 org.apache.lucene.codecs.lucene3x.Lucene3xStoredFieldsReader.checkCodeVersion(Lucene3xStoredFieldsReader.java:119)
   at
 org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacyInfos(Lucene3xSegmentInfoReader.java:74)
   at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:312)
   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
   at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
   at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
   at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:673)
   at 
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
   at 
 org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
   at
 

Re: solr spell correction help

2013-04-12 Thread Jack Krupansky

blandars its not giving correction as blender

They have an edit distance of 3. Direct Spell is limited to a maximum ED of 
2.


-- Jack Krupansky

-Original Message- 
From: Rohan Thakur

Sent: Friday, April 12, 2013 8:45 AM
To: solr-user@lucene.apache.org
Subject: solr spell correction help

hi all

I have configured solr direct spell correction on spell field most of the
words solr is correcting and giving suggestions but on some words like
mentioned below is giving absurd results:

1) blender(indexed)
2) kettle(indexed)
3) electric(indexed)

problems:
1) when I search for blandar its giving correct result as blender but when
I search for blandars its not giving correction as blender

2) for this when I search for kettle the correct spell its still showing it
to be false but not giving suggestions and even the results documents are
showing up. and when I search for cettle its giving correct result as
kettle but when I search for cattle its not giving any suggestions.

3) for this again when I search for electric the correct spell its showing
it to be false in suggestions section but not giving any suggestions and
documents are also returning for this spelling as its the correct one.

even if I want solr to return samsung as spell suggetion if I search for
sam what could be the configuration and what could be the solution for
above problems? please help.

thanks in advance

regards
Rohan 



updateLog in Solr 4.2

2013-04-12 Thread vicky desai
If i disable update log in solr 4.2 then i get the following exception
SEVERE: :java.lang.NullPointerException
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266)
at
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:761)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:727)
at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
at
org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Apr 12, 2013 6:39:56 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException:
at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931)
at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
at
org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266)
at
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:761)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:727)
at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
... 12 more

and solr fails to start . However if i add updatelog in my solrconfig.xml it
starts. Is the update log parameter mandatory for solr4.2



--
View this message in context: 
http://lucene.472066.n3.nabble.com/updateLog-in-Solr-4-2-tp4055548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to migrate solr 1.4 index to solr 4.2 index

2013-04-12 Thread Montu v Boda
hi

thanks it work's for us

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-migrate-solr-1-4-index-to-solr-4-2-index-tp4055531p409.html
Sent from the Solr - User mailing list archive at Nabble.com.


Downloaded Solr 4.2.1 Source: Build Failing

2013-04-12 Thread Umesh Prasad
common.compile-core:
[javac] Compiling 337 source files to
/Users/umeshprasad/Downloads/solr-4.2.1/solr/build/solr-core/classes/java
[javac]
/Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
*omponent/QueryComponent.java:765: cannot find symbol
[javac] symbol  : class ShardFieldSortedHitQueue
[javac] location: class org.apache.solr.handler.component.QueryComponent
[javac]   ShardFieldSortedHitQueue queue;*
[javac]   ^
[javac]
/Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:766:
cannot find symbol
[javac] symbol  : class ShardFieldSortedHitQueue
[javac] location: class org.apache.solr.handler.component.QueryComponent
[javac]   queue = new ShardFieldSortedHitQueue(sortFields,
ss.getOffset() + ss.getCount());
[javac]   ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors


-- 
---
Thanks  Regards
Umesh Prasad


Need hook to know when replication backup is actually completed.

2013-04-12 Thread Timothy Potter
Hi,

I'd like to use the backup command to create a backup of each shard
leader's index periodically. This is for disaster recovery in case our data
center goes offline.

We use SolrCloud leader/replica for day-to-day fault-tolerance and it works
great.

The backup command (http://master_host:port/solr/replication?command=backup)
works just fine but it returns immediately while the actual backup creation
runs in the background on the shard leader.

Is there any way to know when the actual backup is complete? I need that
hook to then move the backup to another storage device outside of our data
center, e.g. S3.

What are others doing for this type of backup process?

Thanks in advance.
Tim


Which tokenizer or analizer should use and field type

2013-04-12 Thread anurag.jain
my schema file is :

copyField source=title dest =keyword/
copyField source=body dest =keyword/
copyField source=company_name dest=keyword/
copyField source=company_profile dest=keyword/

field name=title type=text_general indexed=true stored=true/
field name=body type=text_general indexed=true stored=true/
field name=company_name type=text_general indexed=true
stored=true/
field name=company_profile type=text_general indexed=true
stored=true/

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType





values are like,

title: Assistant Coach/ Junior Assistant
body: p http://i.imgur.com/buPga.jpg br /br /Oil India Ltd. invites
applications for the post of strongSr Medical Officer (Paediatrics)
/strongbr / www.freshersworld.combr / strongQualification/strong :
MD (Paediatrics) br /br / strongNo of Post/strong : 1URbr / br
/strong Pay Scale/strong : Rs 32900 -58000 br / br / strongAge as
on 11.04.2013/strong : 32 yrsbr / /ppstrongSelection Procedure :
/strongSelection for the above post will be based on Written Test, Group
Discussion (GD), Viva-Voce and Medical Examination.br / /p

company_profile: pThe story of strongOil India Limited (OIL)/strong
traces and symbolises the development and growth of the Indian petroleum
industry. From the discovery of crude oil in the far east of India at
Digboi, Assam in 1889 to its present status as a fully integrated upstream
petroleum company, OIL has come far, crossing many milestones./p,

company_name: Oil India Limited,



please give me suggestion about field type i should use.

keyword is copyfield i am using for search. i do not want to search on html
content.

How search will happen ?


if i give words to search

project assistant,manager


it only should give me keyword have project assistance or manager.

right now it is giving me results which has project or assistance or manager
that is wrong case for me.

Please give me solution for it. I have to complete that task by today thats
why i am not able to do research on it. 


need field type definitions for each field. and how search query i'll write
?? 

thanks in advance






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spatial search question

2013-04-12 Thread kfdroid
We currently do a radius search from a given Lat/Long point and it works
great. I have a new requirement to do a search on a larger radius from the
same point, but not include the smaller radius.  Kind of a donut (torus)
shaped search. 

How would I do this (Solr 4)?  Search where radius is between 20km and 40km
for example?
Thanks,
Ken



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-question-tp4055597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: tokenizer of solr

2013-04-12 Thread Mingfeng Yang
Jack,

Thanks so much for this info.  It's awesome.

Ming


On Thu, Apr 11, 2013 at 7:32 PM, Jack Krupansky j...@basetechnology.comwrote:

 In that case, use the types=wdfftypes.txt attribute of WDF and map @
 and _ to ALPHA as shown in:
 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 .


 -- Jack Krupansky

 -Original Message- From: Mingfeng Yang
 Sent: Thursday, April 11, 2013 8:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: tokenizer of solr


 looks like it's due to the word delimiter filter.  Anyone know if the
 protected file support regular expression or not?

 Ming


 On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  Try the whitespace tokenizer.

 -- Jack Krupansky

 -Original Message- From: Mingfeng Yang Sent: Thursday, April 11,
 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr
 Dear Solr users and developers,

 I am trying to index some documents some of which are twitter messages,
 and
 we have a problem when indexing retweet.

 Say a twitter user named jpc_108 post a tweet, and then someone retweet
 his msg, and now @jpc_108 become part of the tweet text body.

 Seems like before indexing, the tokenizer factory of solr turns @jpc_108
 into jpc and 108, and when we search for jpc_108, it's not there
 anymore.


 Is there anyway we can keep jcp_108 when it appears as @jpc_108?

 Thanks,
 Ming-





Re: SolrCloud vs Solr master-slave replication

2013-04-12 Thread Shawn Heisey

On 4/12/2013 6:45 AM, Victor Ruiz wrote:

As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:

* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~1 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance?
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability?


The fact that your replication is producing a corrupt index suggests 
that your network, your server hardware, or your software install is 
unreliable.  The TCP protocol used for all Solr communication (as well 
as the Internet in general) has error detection and retransmissions. 
I'm not saying that replication can't have bugs, but usually those bugs 
result in replication not working, they don't typically cause index 
corruption.


I see a previous message where you say everything is on the same LAN 
with gigabit ethernet.  There are a lot of things that can go wrong with 
gigabit.  At the physical layer: Using cat5 cable instead of cat5e or 
cat6 can lead to problems.  You could have a bad cable, or the RJ45 
connectors could be badly crimped.  If you are using patch panels, they 
may be bad or only rated for cat5.  At layer 2, you can have duplex 
mismatches, common when one side is hard-set to full duplex and the 
other side is left at auto or is a dumb switch that can't be changed. 
Even if you have these problems, it still won't usually cause data 
corruption unless the hardware or OS is also faulty.


One somewhat common example of a problem that can cause data corruption 
in network communication is buggy firmware on the network card, 
especially with Broadcom chips.  Upgrading to the latest firmware will 
usually fix these problems.


Now for your questions: SolrCloud doesn't use replication during normal 
operation.  When you index, the indexing happens on all replicas in 
parallel.


Replication does sometimes get used by SolrCloud, but only if a replica 
goes down and there's not enough information in the transaction log to 
reconstruct recent updates when it comes back up.


As for whether or not to use shards: that's really up to you.  Solr 
should have no trouble with a single-shard 9GB index that has 20 million 
documents, as long as you give enough memory to the java heap and have 
8GB or so left over for the OS to cache the index.  That means you want 
to have 12-16GB of RAM in each server.  If Solr is not the only thing 
running on the hardware, then you'd want more RAM.


For the update and query volume you have described, having plenty of RAM 
and lots of CPU cores will be critical.


Thanks,
Shawn



Re: Downloaded Solr 4.2.1 Source: Build Failing

2013-04-12 Thread Chris Hostetter

: 
/Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
: *omponent/QueryComponent.java:765: cannot find symbol
: [javac] symbol  : class ShardFieldSortedHitQueue
: [javac] location: class org.apache.solr.handler.component.QueryComponent
: [javac]   ShardFieldSortedHitQueue queue;*

Weird ... can you provide us more details about the java compiler you are 
using?

ShardFieldSortedHitQueue is a package protected class declared in 
ShardDoc.java (in the same package as QueryComponent).  That isn't exactly 
a best practice, but it shouldn't be causing a compilation failure.


-Hoss


Configure compositekey

2013-04-12 Thread gpssolr2020
Hi,

i want to explore the hash document routing available in Solr 4.1. So please
share the configuration for generating composite key.


Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configure-compositekey-tp4055645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need hook to know when replication backup is actually completed.

2013-04-12 Thread Timothy Potter
Update to this ... did some code scanning and it looks like the backup
status is available via the details command, e.g.

lst name=backup
  str name=startTimeFri Apr 12 17:53:17 UTC 2013/str
  int name=fileCount120/int
  str name=statussuccess/str
  str name=snapshotCompletedAtFri Apr 12 17:58:22 UTC 2013/str
/lst

So with a little polling of the details command from my backup script and
I'm good to go. If anyone knows of a more direct way, let me know otherwise
I'm moving ahead with this approach.

Cheers,
Tim


On Fri, Apr 12, 2013 at 9:31 AM, Timothy Potter thelabd...@gmail.comwrote:

 Hi,

 I'd like to use the backup command to create a backup of each shard
 leader's index periodically. This is for disaster recovery in case our data
 center goes offline.

 We use SolrCloud leader/replica for day-to-day fault-tolerance and it
 works great.

 The backup command (
 http://master_host:port/solr/replication?command=backup) works just fine
 but it returns immediately while the actual backup creation runs in the
 background on the shard leader.

 Is there any way to know when the actual backup is complete? I need that
 hook to then move the backup to another storage device outside of our data
 center, e.g. S3.

 What are others doing for this type of backup process?

 Thanks in advance.
 Tim



Re: Solr 4.2.1 SSLInitializationException

2013-04-12 Thread Chris Hostetter

: Thanks for your response.  As I mentioned in my email, I would prefer 
: the application to not have access to the keystore. Do you know if there 

I'm confused ... it seems that you (or GlassFish) has created a 
Catch-22...

You say you don't want the application to have access to the keystore, but 
aparently you (or glassfish) is explicitly setting javax.net.ssl.keyStore 
to tell the application what keystore to use.  The keystore you specify 
has a password set on it, but you are not telling the application what the 
password is, so it can't use that keystore.

If you don't wnat to application to have access to the keystore at all, 
have you tried unsetting javax.net.ssl.keyStore ?

: is a way of specifying  a different HttpClient implementation (e.g. 
: DefaultHttpClient rather than SystemDefaultHttpClient) ?

In SolrJ client code you can specify whatever HttpClient implementation 
you want.  In Solr (for it's use of talking to other nodes in distributed 
search, which is what is indicated in your stack trace) 
SystemDefaultHttpClient is hard coded.


-Hoss

CSS appearing in Solr 4.2.1 logs

2013-04-12 Thread Tim Vaillancourt
Hey guys,

This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x
logs?

Often I am finding entire CSS documents (likely from Solr's Admin) in my
jetty's stderrout log.

Example:

2013-04-12 00:23:20.363:WARN:oejh.HttpGenerator:Ignoring extra content /**
 * @license RequireJS order 1.0.5 Copyright (c) 2010-2011, The Dojo
Foundation All Rights Reserved.
 * Available via the MIT or new BSD license.
 * see: http://github.com/jrburke/requirejs for details
 */
/*jslint nomen: false, plusplus: false, strict: false */
/*global require: false, define: false, window: false, document: false,
  setTimeout: false */

//Specify that requirejs optimizer should wrap this code in a closure that
//maps the namespaced requirejs API to non-namespaced local variables.
/*requirejs namespace: true */

(function () {

//Sadly necessary browser inference due to differences in the way
//that browsers load and execute dynamically inserted javascript
//and whether the script/cache method works when ordered execution is
//desired. Currently, Gecko and Opera do not load/fire onload for
scripts with
//type=script/cache but they execute injected scripts in order
//unless the 'async' flag is present.
//However, this is all changing in latest browsers implementing HTML5
//spec. With compliant browsers .async true by default, and
//if false, then it will execute in order. Favor that test first for
forward
//compatibility.
var testScript = typeof document !== undefined 
 typeof window !== undefined 
 document.createElement(script),

supportsInOrderExecution = testScript  (testScript.async ||
   ((window.opera 

Object.prototype.toString.call(window.opera) === [object Opera]) ||
   //If Firefox 2 does not have to be
supported, then
   //a better check may be:
   //('mozIsLocallyAvailable' in
window.navigator)
   (MozAppearance in
document.documentElement.style))),



Due this, my logs are getting really huge, and sometimes it breaks my tail
-F commands on the logs, printing what looks like binary, so there is
possibly some other junk in my logs aside from CSS.

I am running Jetty 8.1.10 and Solr 4.2.1 (stable build).

Cheers!

Tim Vaillancourt


dataimporter.last_index_time SolrCloud

2013-04-12 Thread jimtronic
My data-config files use the dataimporter.last_index_time variable, but it
seems to have stopped working when I upgraded to 4.2.

In previous 4.x versions, I saw that it was being written to zookeeper, but
now there's nothing there.

Did anything change? Or should I be doing something differently?

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimporter-last-index-time-SolrCloud-tp4055679.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter.last_index_time SolrCloud

2013-04-12 Thread William Bell
Same issue here. Also in the file there is multiple last index times for
each entity and we cannot reference the individual anymore.

DIH.entity1.last_index_time does not pass through to the query anymore.

On Friday, April 12, 2013, jimtronic wrote:

 My data-config files use the dataimporter.last_index_time variable, but it
 seems to have stopped working when I upgraded to 4.2.

 In previous 4.x versions, I saw that it was being written to zookeeper, but
 now there's nothing there.

 Did anything change? Or should I be doing something differently?

 Thanks!
 Jim



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataimporter-last-index-time-SolrCloud-tp4055679.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Need hook to know when replication backup is actually completed.

2013-04-12 Thread Nate Fox
Tim, thank you for this! I had been looking for this a while back (even
posted something on serverfault) and never got a decent answer. This is
exactly what I was looking for.


--
Nate Fox
Sr Systems Engineer

o: 310.658.5775
m: 714.248.5350

Follow us @NEOGOV http://twitter.com/NEOGOV and on
Facebookhttp://www.facebook.com/neogov

NEOGOV http://www.neogov.com/ is among the top fastest growing software
companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and
the LA Business Journal. We are hiring!http://www.neogov.com/#/company/careers



On Fri, Apr 12, 2013 at 12:04 PM, Timothy Potter thelabd...@gmail.comwrote:

 Update to this ... did some code scanning and it looks like the backup
 status is available via the details command, e.g.

 lst name=backup
   str name=startTimeFri Apr 12 17:53:17 UTC 2013/str
   int name=fileCount120/int
   str name=statussuccess/str
   str name=snapshotCompletedAtFri Apr 12 17:58:22 UTC 2013/str
 /lst

 So with a little polling of the details command from my backup script and
 I'm good to go. If anyone knows of a more direct way, let me know otherwise
 I'm moving ahead with this approach.

 Cheers,
 Tim


 On Fri, Apr 12, 2013 at 9:31 AM, Timothy Potter thelabd...@gmail.com
 wrote:

  Hi,
 
  I'd like to use the backup command to create a backup of each shard
  leader's index periodically. This is for disaster recovery in case our
 data
  center goes offline.
 
  We use SolrCloud leader/replica for day-to-day fault-tolerance and it
  works great.
 
  The backup command (
  http://master_host:port/solr/replication?command=backup) works just fine
  but it returns immediately while the actual backup creation runs in the
  background on the shard leader.
 
  Is there any way to know when the actual backup is complete? I need that
  hook to then move the backup to another storage device outside of our
 data
  center, e.g. S3.
 
  What are others doing for this type of backup process?
 
  Thanks in advance.
  Tim
 



Re: how to migrate solr 1.4 index to solr 4.2 index

2013-04-12 Thread gpssolr2020
Hi ,

have you re-indexed again or moved 1.4 index to 4.2??

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-migrate-solr-1-4-index-to-solr-4-2-index-tp4055531p4055686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Which tokenizer or analizer should use and field type

2013-04-12 Thread Jack Krupansky
Unfortunately, Solr doesn't have a query parser that would give the meaning 
you want to:


project assistant,manager

For now, you would need to write that query as:

(project AND assistant) OR manager

Or maybe as:

project assistant~5 OR manager

That would require project and assistant to occur with a few words of each 
other.


Or, if you have q.op defaulted to OR:

project assistant~5 manager

Add the HTML strip char filter to your text field type:

charFilter class=solr.HTMLStripCharFilterFactory /

text_general is a semi-decent place to start.

-- Jack Krupansky

-Original Message- 
From: anurag.jain

Sent: Friday, April 12, 2013 11:32 AM
To: solr-user@lucene.apache.org
Subject: Which tokenizer or analizer should use and field type

my schema file is :

copyField source=title dest =keyword/
copyField source=body dest =keyword/
copyField source=company_name dest=keyword/
copyField source=company_profile dest=keyword/

field name=title type=text_general indexed=true stored=true/
field name=body type=text_general indexed=true stored=true/
field name=company_name type=text_general indexed=true
stored=true/
field name=company_profile type=text_general indexed=true
stored=true/

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType





values are like,

title: Assistant Coach/ Junior Assistant
body: p http://i.imgur.com/buPga.jpg br /br /Oil India Ltd. invites
applications for the post of strongSr Medical Officer (Paediatrics)
/strongbr / www.freshersworld.combr / strongQualification/strong :
MD (Paediatrics) br /br / strongNo of Post/strong : 1URbr / br
/strong Pay Scale/strong : Rs 32900 -58000 br / br / strongAge as
on 11.04.2013/strong : 32 yrsbr / /ppstrongSelection Procedure :
/strongSelection for the above post will be based on Written Test, Group
Discussion (GD), Viva-Voce and Medical Examination.br / /p

company_profile: pThe story of strongOil India Limited (OIL)/strong
traces and symbolises the development and growth of the Indian petroleum
industry. From the discovery of crude oil in the far east of India at
Digboi, Assam in 1889 to its present status as a fully integrated upstream
petroleum company, OIL has come far, crossing many milestones./p,

company_name: Oil India Limited,



please give me suggestion about field type i should use.

keyword is copyfield i am using for search. i do not want to search on html
content.

How search will happen ?


if i give words to search

project assistant,manager


it only should give me keyword have project assistance or manager.

right now it is giving me results which has project or assistance or manager
that is wrong case for me.

Please give me solution for it. I have to complete that task by today thats
why i am not able to do research on it.


need field type definitions for each field. and how search query i'll write
??

thanks in advance






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: CSS appearing in Solr 4.2.1 logs

2013-04-12 Thread Chris Hostetter

: This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x
: logs?

are you sure you're running 4.2.1 and not 4.2?

https://issues.apache.org/jira/browse/SOLR-4573


-Hoss


RE: CSS appearing in Solr 4.2.1 logs

2013-04-12 Thread Vaillancourt, Tim
Thanks Chris! Somehow I managed to miss that ticket searching, thanks for 
looking for me.

I will confirm the version I have and I am glad to hear this was reported and 
resolved!

Cheers,

Tim

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, April 12, 2013 2:53 PM
To: solr-user@lucene.apache.org
Subject: Re: CSS appearing in Solr 4.2.1 logs


: This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x
: logs?

are you sure you're running 4.2.1 and not 4.2?

https://issues.apache.org/jira/browse/SOLR-4573


-Hoss



Re: Spatial search question

2013-04-12 Thread Lance Norskog

Outer distance AND NOT inner distance?

On 04/12/2013 09:02 AM, kfdroid wrote:

We currently do a radius search from a given Lat/Long point and it works
great. I have a new requirement to do a search on a larger radius from the
same point, but not include the smaller radius.  Kind of a donut (torus)
shaped search.

How would I do this (Solr 4)?  Search where radius is between 20km and 40km
for example?
Thanks,
Ken



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-question-tp4055597.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Easier way to do this?

2013-04-12 Thread David Smiley (@MITRE.org)
Bill,

I responded to the issue you created about this: 
https://issues.apache.org/jira/browse/SOLR-4704

In summary, use {!geofilt}.

~ David


Billnbell wrote
 I would love for the SOLR spatial 4 to support pt so that I can run # of
 results around a central point easily like in 3.6. How can I pass
 parameters to a Circle() ? I would love to send PT to this query since the
 pt is the same across multiple areas
 
 For example:
 
 http://localhost:8983/solr/core/select?rows=0q=*:*facet=truefacet.query={!
 key=.5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0072369))%22facet.query={!
 key=1}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.01447))%22facet.query={!
 key=5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0723))%22facet.query={!
 key=10}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.1447))%22{!
 key=25}facet.query=store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.361846))%22facet.query={!
 key=50}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.72369))%22facet.query={!
 key=100}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=1.447))%22





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easier-way-to-do-this-tp4055474p4055732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support old syntax including geodist

2013-04-12 Thread David Smiley (@MITRE.org)
Hi Bill,

FYI see https://issues.apache.org/jira/browse/SOLR-4242


Billnbell wrote
 Since Spatial Lucene 4 does not seem to support geodist(), even sending
 d,pt,fq={!geofilt}does not help me = I need to sort. So I end up having to
 set up the sortsq. Any other ideas on how to support the old syntax on the
 new spatial? Can I create a transform or something ?

A couple times or more I've looked into how geodist() works with the
intention of adding support for the new spatial 4 field type but I wind up
concluding the result would be a big hack because geodist() works
fundamentally differently then how it would need to work, yet it would
somehow have to work in two different ways.  Maybe I should just accept that
it's going to be an ugly hack, trading that for making things easier for
users.

Another thing I want to mention is that if you've got a single value'ed
spatial field then I suggest using LatLonType if for nothing else but
sorting, and hence you can use geodist().


 Convert
 
 http://localhost:8983/solr/providersearch/select?rows=20q=*:*fq={!geofilt}pt=26.012156,-80.311943d=50sfield=store_geohashsort=geodist()
 asc
 
 To
  
 
 http://localhost:8983/solr/providersearch/select?rows=20q=*:*fq={!%20v=$geoq}sortsq={!%20score=distance%20v=$geoq}geoq=store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.72369))%22fl=store_lat_lon,distance:mul(query($sortsq),69.09)sort=query($sortsq)%20asc

I'm aware things can get ugly but can't you just use 'q' for the spatial
query that turns the distance as the score both for sorting and returning
it?  It'd significantly simply this query.

~ David




-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-old-syntax-including-geodist-tp4055476p4055733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial search question

2013-04-12 Thread David Smiley (@MITRE.org)
Yup, Lance is right.  But it won't always work if you have multi-valued data
since it wouldn't match a document that had a point both in the ring and the
hole.

Another approach that internally works faster and addresses the multi-value
case is to implement a custom Spatial4j Shape.  In this case, you could
create a special aggregate Shape that basically accepts one shape and
excludes the other, in its custom relate() method.  It's like a subtracting
shape.  This is generically useful and on my list of things to do but I
haven't had the need.  The other step would be parsing it somehow, so you
might do that by extending the existing spatial 4 field type.

~ David


Lance Norskog-2 wrote
 Outer distance AND NOT inner distance?
 
 On 04/12/2013 09:02 AM, kfdroid wrote:
 We currently do a radius search from a given Lat/Long point and it works
 great. I have a new requirement to do a search on a larger radius from
 the
 same point, but not include the smaller radius.  Kind of a donut (torus)
 shaped search.

 How would I do this (Solr 4)?  Search where radius is between 20km and
 40km
 for example?
 Thanks,
 Ken



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spatial-search-question-tp4055597.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-question-tp4055597p4055735.html
Sent from the Solr - User mailing list archive at Nabble.com.