Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-28 Thread eks dev
SolrCluod is going to be great, NRT feature is really huge step
forward, as well as central configuration, elasticity ...

The only thing I do not yet understand is treatment of cases that were
traditionally covered by Master/Slave setup. Batch update

If I get it right (?), updates to replicas are sent one by one,
meaning when one server receives update, it gets forwarded to all
replicas. This is great for reduced update latency case, but I do not
know how is it implemented if you hit it with batch update. This
would cause huge amount of update commands going to replicas. Not so
good for throughput.

- Master slave does distribution at segment level, (no need to
replicate analysis, far less network traffic). Good for batch updates
- SolrCloud does par update command (low latency, but chatty and
Analysis step is done N_Servers times). Good for incremental updates

Ideally, some sort of batching is going to be available in
SolrCloud, and some cont roll over it, e.g. forward batches of 1000
documents (basically keep update log slightly longer and forward it as
a batch update command). This would still cause duplicate analysis,
but would reduce network traffic.

Please bare in mind, this is more of a question than a statement,  I
didn't look at the cloud code. It might be I am completely wrong here!





On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson erickerick...@gmail.com wrote:
 As I understand it (and I'm just getting into SolrCloud myself), you can
 essentially forget about master/slave stuff. If you're using NRT,
 the soft commit will make the docs visible, you don't ned to do a hard
 commit (unlike the master/slave days). Essentially, the update is sent
 to each shard leader and then fanned out into the replicas for that
 leader. All automatically. Leaders are elected automatically. ZooKeeper
 is used to keep the cluster information.

 Additionally, SolrCloud keeps a transaction log of the updates, and replays
 them if the indexing is interrupted, so you don't risk data loss the way
 you used to.

 There aren't really masters/slaves in the old sense any more, so
 you have to get out of that thought-mode (it's hard, I know).

 The code is under pretty active development, so any feedback is
 valuable

 Best
 Erick

 On Mon, Feb 27, 2012 at 3:26 AM, roz dev rozde...@gmail.com wrote:
 Hi All,

 I am trying to understand features of Solr Cloud, regarding commits and
 scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

 Any inputs are welcome.

 Thanks

 -Saroj


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-28 Thread Mikhail Khludnev
Hello,

For me it looks like typical nested documents use-case. You have an
apartment document with nested lease documents. Where lease has from and to
fields. So, you need to find apartments which has no lease in the range
provided $need_from $need_to
conflicted leases can be found by

(from:[$need_from TO $need_to] )
OR
(to:[$need_from TO $need_to])
OR
 ((from:[* TO $need_from]) AND (to:[$need_from TO *]) )

then you need to get apartment which has no such leases.

AFAIK it can be done by Joins and Grouping in Solr. Some time ago I did by
SpanQueries http://blog.griddynamics.com/search/label/Solr, but now I'm
collaborating https://issues.apache.org/jira/browse/SOLR-3076 around rocket
sciencehttp://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
.

Also I have an idea about OOM. if you rollover range by the app forming
SolrInputDocuments it can cause the problem. To solve it you can pass two
numbers as a range but provide an own analyzer via schema which will expand
them into stream of numbers.

Regards

On Tue, Feb 28, 2012 at 2:11 AM, federico.wachs
federico.wa...@2clams.comwrote:

 This is used on an apartment booking system, and what I store as solr
 documents can be seen as apartments. These apartments can be booked for a
 certain amount of days with a check in and a check out date hence the
 ranges
 I was speaking of before.

 What I want to do is to filter off the apartments that are booked so my
 users won't have a bad user experience while trying to book an apartment
 that suits their needs.

 Did I make any sense? Please let me know, otherwise I can explain
 furthermore.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782304.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Delta-Import adding duplicate entry.

2012-02-28 Thread Ahmet Arslan
 but every time when i am executing delta-import through DIH
 it picked only
 changed data that is ok, but rather then updating its adding
 duplicate
 records. 

Do you have uniqueKey.../uniqueKey  defined in your schema.xml?
http://wiki.apache.org/solr/UniqueKey


Re: Delta-Import adding duplicate entry.

2012-02-28 Thread Suneel
I have made unique key in schema.xml now its working for me

thanx a lot



Regards,

-
Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-adding-duplicate-entry-tp3783114p3783550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud on Trunk

2012-02-28 Thread Andre Bois-Crettez

Consistent hashing seem like a solution to reduce the shuffling of keys
when adding/deleting shards :
http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

Twitter describe a more flexible sharding in section Gizzard handles
partitioning through a forwarding table
https://github.com/twitter/gizzard
An explicit mapping could allow to take advantage of heterogeneous
servers, and still allow for reduced shuffling of document when
expanding/reducing the cluster.

Are there any ideas or progress in this direction, be it in a branch or
in JIRA issues ?


Andre


Jamie Johnson wrote:

The case is actually anytime you need to add another shard.  With the
current implementation if you need to add a new shard the current
hashing approach breaks down.  Even with many small shards I think you
still have this issue when you're adding/updating/deleting docs.  I'm
definitely interested in hearing other approaches that would work
though if there are any.

On Sat, Jan 28, 2012 at 7:53 PM, Lance Norskog goks...@gmail.com wrote:


If this is to do load balancing, the usual solution is to use many
small shards, so you can just move one or two without doing any
surgery on indexes.

On Sat, Jan 28, 2012 at 2:46 PM, Yonik Seeley
yo...@lucidimagination.com wrote:


On Sat, Jan 28, 2012 at 3:45 PM, Jamie Johnson jej2...@gmail.com wrote:


Second question, I know there are discussion about storing the shard
assignments in ZK (i.e. shard 1 is responsible for hashed values
between 0 and 10, shard 2 is responsible for hashed values between 11
and 20, etc), this isn't done yet right?  So currently the hashing is
based on the number of shards instead of having the assignments being
calculated the first time you start the cluster (i.e. based on
numShards) so it could be adjusted later, right?


Right.  Storing the hash range for each shard/node is something we'll
need to dynamically change the number of shards (as opposed to
replicas), so we'll need to start doing it sooner or later.

-Yonik
http://www.lucidimagination.com



--
Lance Norskog
goks...@gmail.com






--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


solr returns reduced results for same query after adding a new field to the schema.

2012-02-28 Thread Mark Swinson
Hi,

I'm currently setting up a schema in solr, which is being imported using
the data-import plugin.

The initial config contains the following key information:

...

fieldType  name=standardTextType
class=solr.TextField positionIncrementGap=100 stored=false
multiValued=false
analyzer type=index
tokenizer
class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
class=solr.KeywordTokenizerFactory/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType name=ingredientSuggestionType
class=solr.TextField positionIncrementGap=100 stored=false
multiValued=true
analyzer type=index
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=chefSuggestionType
class=solr.TextField positionIncrementGap=100 stored=false
multiValued=false
analyzer type=index
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
filter
class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
side=front/
/analyzer
analyzer type=query
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=programmeSuggestionType
class=solr.TextField positionIncrementGap=100 stored=false
multiValued=false
analyzer type=index
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
filter
class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
side=front/
/analyzer
analyzer type=query
tokenizer
class=solr.WhitespaceTokenizerFactory/
filter
class=solr.LowerCaseFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
...

with the following fields:

...
fields
field name=recipeId type=standardTextType
indexed=true stored=true required=true/
field name=programmeName type=standardTextType
indexed=true stored=true required=true/
field name=programmeSuggestion
type=programmeSuggestionType stored=true/
field name=chefName type=standardTextType
indexed=true stored=true required=true/
field name=chefSuggestion type=chefSuggestionType
stored=true/
field name=ingredientText type=standardTextType
indexed=true stored=true required=true/
field name=ingredientSuggestion
type=ingredientSuggestionType stored=true/

and
...
copyField source=programmeName dest=programmeSuggestion/
copyField source=chefName dest=chefSuggestion/
copyField source=ingredientText dest=ingredientSuggestion/
uniqueKeyrecipeId/uniqueKey
 

Re: solr returns reduced results for same query after adding a new field to the schema.

2012-02-28 Thread Dmitry Kan
Hi,

you meant you query is:

?q=ingredientSuggestion:banana

right?

On Tue, Feb 28, 2012 at 2:29 PM, Mark Swinson mark.swin...@bbc.co.ukwrote:

 Hi,

 I'm currently setting up a schema in solr, which is being imported using
 the data-import plugin.

 The initial config contains the following key information:

...

fieldType  name=standardTextType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
 class=solr.KeywordTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType name=ingredientSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=true
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=chefSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
filter
 class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
 side=front/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=programmeSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
filter
 class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
 side=front/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
...

 with the following fields:

...
fields
field name=recipeId type=standardTextType
 indexed=true stored=true required=true/
field name=programmeName type=standardTextType
 indexed=true stored=true required=true/
field name=programmeSuggestion
 type=programmeSuggestionType stored=true/
field name=chefName type=standardTextType
 indexed=true stored=true required=true/
field name=chefSuggestion type=chefSuggestionType
 stored=true/
field name=ingredientText type=standardTextType
 indexed=true stored=true required=true/
field name=ingredientSuggestion
 type=ingredientSuggestionType stored=true/

 and
...
copyField 

RE: solr returns reduced results for same query after adding a new field to the schema.

2012-02-28 Thread Mark Swinson
yes, sorry.

-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 28 February 2012 12:33
To: solr-user@lucene.apache.org
Subject: Re: solr returns reduced results for same query after adding a new 
field to the schema.

Hi,

you meant you query is:

?q=ingredientSuggestion:banana

right?

On Tue, Feb 28, 2012 at 2:29 PM, Mark Swinson mark.swin...@bbc.co.ukwrote:

 Hi,

 I'm currently setting up a schema in solr, which is being imported using
 the data-import plugin.

 The initial config contains the following key information:

...

fieldType  name=standardTextType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
 class=solr.KeywordTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType name=ingredientSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=true
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=chefSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
filter
 class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
 side=front/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType  name=programmeSuggestionType
 class=solr.TextField positionIncrementGap=100 stored=false
 multiValued=false
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
filter
 class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15
 side=front/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
...

 with the following fields:

...
fields
field name=recipeId type=standardTextType
 indexed=true stored=true required=true/
field name=programmeName type=standardTextType
 indexed=true stored=true required=true/
field name=programmeSuggestion
 type=programmeSuggestionType stored=true/
field name=chefName type=standardTextType
 indexed=true stored=true required=true/
field name=chefSuggestion type=chefSuggestionType
 

Re: Index empty after restart.

2012-02-28 Thread Erick Erickson
What did you do that you expect there to be data? Index stuff? How?

Did you commit after you were done?

Best
Erick

On Mon, Feb 27, 2012 at 11:40 AM, zarni aung zau...@gmail.com wrote:
 Check in the data directory to make sure that they are present.  If so, you
 just need to load the cores again.

 On Mon, Feb 27, 2012 at 11:30 AM, Wouter de Boer 
 wouter.de.b...@springest.nl wrote:

 Hi,

 I run SOLR on Jetty. After a restart of Jetty, the indices are empty.
 Anyone
 an idea what the reason can be?

 Regards,
 Wouter.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-empty-after-restart-tp3781237p3781237.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: maxClauseCount Exception

2012-02-28 Thread Vadim Kisselmann
Set maxBooleanClauses in your solrconfig.xml higher, default is 1024.
Your query blast this limit.
Regards
Vadim



2012/2/22 Darren Govoni dar...@ontrenet.com

 Hi,
  I am suddenly getting a maxClauseCount exception for no reason. I am
 using Solr 3.5. I have only 206 documents in my index.

 Any ideas? This is wierd.

 QUERY PARAMS: [hl, hl.snippets, hl.simple.pre, hl.simple.post, fl,
 hl.mergeContiguous, hl.usePhraseHighlighter, hl.requireFieldMatch,
 echoParams, hl.fl, q, rows, start]|#]


 [#|2012-02-22T13:40:13.129-0500|INFO|glassfish3.1.1|
 org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=Thread-2;|[]
 webapp=/solr3 path=/select
 params={hl=truehl.snippets=4hl.simple.pre=b/bfl=*,scorehl.mergeContiguous=truehl.usePhraseHighlighter=truehl.requireFieldMatch=trueechoParams=allhl.fl=text_tq={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)rows=20start=0wt=javabinversion=2}
 hits=204 status=500 QTime=166 |#]


 [#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1|
 org.apache.solr.servlet.SolrDispatchFilter|
 _ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery
 $TooManyClauses: maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at org.apache.lucene.search.ScoringRewrite
 $1.addClause(ScoringRewrite.java:51)
at org.apache.lucene.search.ScoringRewrite
 $1.addClause(ScoringRewrite.java:41)
at org.apache.lucene.search.ScoringRewrite
 $3.collect(ScoringRewrite.java:95)
at

 org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38)
at
 org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93)
at
 org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:385)
at

 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217)
at
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185)
at

 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at org.apache.so




Re: SolrCloud on Trunk

2012-02-28 Thread Jamie Johnson
Very interesting Andre.  I believe this is inline with the larger
vision, specifically you'd use the hashing algorithm to create the
initial splits in the forwarding table, then if you needed to add a
new shard you'd need to split/merge an existing range.  I think
creating the algorithm is probably the easier part (maybe I'm wrong?),
the harder part to me appears to be splitting the index based on the
new ranges and then moving that split to a new core.  I'm aware of the
index splitter contrib which could be used for this, but I am unaware
of where specifically this is on the roadmap for SolrCloud.  Anyone
else have those details?

On Tue, Feb 28, 2012 at 5:40 AM, Andre Bois-Crettez
andre.b...@kelkoo.com wrote:
 Consistent hashing seem like a solution to reduce the shuffling of keys
 when adding/deleting shards :
 http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

 Twitter describe a more flexible sharding in section Gizzard handles
 partitioning through a forwarding table
 https://github.com/twitter/gizzard
 An explicit mapping could allow to take advantage of heterogeneous
 servers, and still allow for reduced shuffling of document when
 expanding/reducing the cluster.

 Are there any ideas or progress in this direction, be it in a branch or
 in JIRA issues ?


 Andre



 Jamie Johnson wrote:

 The case is actually anytime you need to add another shard.  With the
 current implementation if you need to add a new shard the current
 hashing approach breaks down.  Even with many small shards I think you
 still have this issue when you're adding/updating/deleting docs.  I'm
 definitely interested in hearing other approaches that would work
 though if there are any.

 On Sat, Jan 28, 2012 at 7:53 PM, Lance Norskog goks...@gmail.com wrote:

 If this is to do load balancing, the usual solution is to use many
 small shards, so you can just move one or two without doing any
 surgery on indexes.

 On Sat, Jan 28, 2012 at 2:46 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:

 On Sat, Jan 28, 2012 at 3:45 PM, Jamie Johnson jej2...@gmail.com
 wrote:

 Second question, I know there are discussion about storing the shard
 assignments in ZK (i.e. shard 1 is responsible for hashed values
 between 0 and 10, shard 2 is responsible for hashed values between 11
 and 20, etc), this isn't done yet right?  So currently the hashing is
 based on the number of shards instead of having the assignments being
 calculated the first time you start the cluster (i.e. based on
 numShards) so it could be adjusted later, right?

 Right.  Storing the hash range for each shard/node is something we'll
 need to dynamically change the number of shards (as opposed to
 replicas), so we'll need to start doing it sooner or later.

 -Yonik
 http://www.lucidimagination.com


 --
 Lance Norskog
 goks...@gmail.com




 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à l'attention
 exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
 message, merci de le détruire et d'en avertir l'expéditeur.


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-28 Thread federico.wachs
Hi Mikhail, thanks for your concern and reply.

I've read a few dozen times your reply and I think I get what you mean, but
I'm not exactly sure how to go forward with your approach. You are saying
that I should be able to have nested documents, but I haven't been able to
submit a Document with another Document on it so far.

I'm using SolrJ to integrate with my Solr servers, do you think you could
guide me a bit on how you would accomplish to nest two different kinds of
documents?

Thank you for your time and explanation I really appreciate it!

Regards,
Federicp

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3784220.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: sun-java6 alternatives for Solr 3.5

2012-02-28 Thread Chantal Ackermann
You can download Oracle's Java (which was Sun's) from Oracle directly.
You will have to create an account with them. You can use the same
account for reading the java forum and downloading other software like
their famous DB.

Simply download. JDK6 is still a binary as were all Sun packages before.
Do a chmod +x and run it. You have to accept the license, and then it
unpacks itself in that same directory - no root privileges required.

As of JDK 7 you can download tar.gz packages.

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Actually, you're better of downloading and installing by yourself
because you can have several different versions in parallel and the
automatic updates do not override your installed version. That comes in
handy if you are a Java developer, at least...

Cheers,
Chantal


On Mon, 2012-02-27 at 21:38 +0100, Demian Katz wrote:
 For what it's worth, I run Solr 3.5 on Ubuntu using the OpenJDK packages and 
 I haven't run into any problems.  I do realize that sometimes the Sun JDK has 
 features that are missing from other Java implementations, but so far it 
 hasn't affected my use of Solr.
 
 - Demian
 
  -Original Message-
  From: ku3ia [mailto:dem...@gmail.com]
  Sent: Monday, February 27, 2012 2:25 PM
  To: solr-user@lucene.apache.org
  Subject: sun-java6 alternatives for Solr 3.5
  
  Hi all!
  I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to
  my sources list and updated it, but I can't find a package sun-java6-*:
  root@ubuntu:~# apt-cache search java6
  default-jdk - Standard Java or Java compatible Development Kit
  default-jre - Standard Java or Java compatible Runtime
  default-jre-headless - Standard Java or Java compatible Runtime (headless)
  openjdk-6-jdk - OpenJDK Development Kit (JDK)
  openjdk-6-jre - OpenJDK Java runtime, using Hotspot JIT
  openjdk-6-jre-headless - OpenJDK Java runtime, using Hotspot JIT (headless)
  
  Than I had goggled and found an article:
  https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-
  December/001528.html
  
  I'm using Solr 3.5 and Apache Tomcat 6.0.32.
  Please advice me what I must do in this situation, because I always used
  sun-java6-* packages for Tomcat and Solr and it worked fine
  Thanks!
  
  --
  View this message in context: http://lucene.472066.n3.nabble.com/sun-java6-
  alternatives-for-Solr-3-5-tp3781792p3781792.html
  Sent from the Solr - User mailing list archive at Nabble.com.



RE: sun-java6 alternatives for Solr 3.5

2012-02-28 Thread ku3ia
Hi. Thanks for your responses. Yesterday I tried openjdk-6-jre package from
ubuntu 10.04 LTS repos. I'll monitor the situation, but seems it works! (c)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sun-java6-alternatives-for-Solr-3-5-tp3781792p3784278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Java6 End of Life, upgrading to 7

2012-02-28 Thread Shawn Heisey
Due to the End of Life announcement for Java6, I am going to need to 
upgrade to Java 7 in the very near future.  I'm running Solr 3.5.0 
modified with a couple of JIRA patches.


https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date

I saw the announcement that Java 7u1 had fixed all the known bugs 
relating to Solr.  Is there anything I need to be aware of when 
upgrading?  These are the commandline switches I am using that apply to 
Java itself:


-Xms8192M
-Xmx8192M
-XX:NewSize=6144M
-XX:SurvivorRatio=4
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Thanks,
Shawn



Re: Unique key constraint and optimistic locking (versioning)

2012-02-28 Thread Per Steffensen
Created SOLR-3173 on the part about making insert fail if document (with 
same uniqueKey) already exists. SOLR-3173 also includes to make update 
not insert document if not already exists - just for consistency with 
normal RDBMS behaviour. So basically the feature allowes you to turn on 
this behaviour of having database (RDBMS) semantics, and when you do 
you get both.
Tomorrrow will create another Jira issue on the versioning/optimistic 
locking part.


Per Steffensen skrev:

Hi

Does solr/lucene provide any mechanism for unique key constraint and 
optimistic locking (versioning)?
Unique key constraint: That a client will not succeed creating a new 
document in solr/lucene if a document already exists having the same 
value in some field (e.g. an id field). Of course implemented right, 
so that even though two or more threads are concurrently trying to 
create a new document with the same value in this field, only one of 
them will succeed.
Optimistic locking (versioning): That a client will only succeed 
updating a document if this updated document is based on the version 
of the document currently stored in solr/lucene. Implemented in the 
optimistic way that clients during an update have to tell which 
version of the document they fetched from Solr and that they therefore 
have used as a starting-point for their updated document. So basically 
having a version field on the document that clients increase by one 
before sending to solr for update, and some code in Solr that only 
makes the update succeed if the version number of the updated document 
is exactly one higher than the version number of the document already 
stored. Of course again implemented right, so that even though two or 
more thrads are concurrently trying to update a document, and they all 
have their updated document based on the current version in 
solr/lucene, only one of them will succeed.


Or do I have to do stuff like this myself outside solr/lucene - e.g. 
in the client using solr.


Regards, Per Steffensen





SOLR-3159 and 3x - Upgrading Jetty

2012-02-28 Thread Shawn Heisey
SOLR-3159 will bring Jetty 8 to Solr trunk.  It mentions that the JDK is 
required for JSP under Jetty 8 (and getting rid of JSP in Solr), which 
probably means that Apache can't put it into branch_3x.  My systems use 
the JDK, so I would not expect that to be a problem for me.  Solr is not 
directly accessible to the outside world, so I am not worried about any 
security implications associated with having the JDK installed.  I'm 
running 3.5.0 with a couple of patches installed.


I downloaded the Jetty 8 distribution and noticed that it has a lot more 
files in etc/ than the example Solr, which just has jetty.xml.   I am 
using a setup based on the Solr example.  Would I need all these new 
config files, or is the jetty.xml that I already have enough?


Thanks,
Shawn



Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-28 Thread Matthew Parker
Mark,

I got the codebase from the 2/26/2012, and I got the same inconsistent
results.

I have solr running on four ports 8081-8084

8081 and 8082 are the leaders for shard 1, and shard 2, respectively

8083 - is assigned to shard 1
8084 - is assigned to shard 2

queries come in and sometime it seems the windows from 8081 and 8083 move
responding to the query but there are no results.

if the queries run on 8081/8082 or 8081/8084 then results come back ok.

The query is nothing more than: q=*:*

Regards,

Matt


On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
mpar...@apogeeintegration.com wrote:

 I'll have to check on the commit situation. We have been pushing data from
 SharePoint the last week or so. Would that somehow block the documents
 moving between the solr instances?

 I'll try another version tomorrow. Thanks for the suggestions.

 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.comwrote:

 Hmmm...all of that looks pretty normal...

 Did a commit somehow fail on the other machine? When you view the stats
 for the update handler, are there a lot of pending adds for on of the
 nodes? Do the commit counts match across nodes?

 You can also query an individual node with distrib=false to check that.

 If you build is a month old, I'd honestly recommend you try upgrading as
 well.

 - Mark

 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:

  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
/CONFIGS(v=0, children=1)
   /CONFIGURATION(v=0 children=25)
   all the configuration files, velocity info, xslt, etc.
 
   /NODE_STATES(v=0 children=4)
  MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
  state:active,core:,collection:collection1,node_name:...
   /ZOOKEEPER (v-0 children=1)
  QUOTA(v=0)
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
   /LIVE_NODES (v=0 children=4)
  MACHINE1:8083_SOLR(ephemeral v=0)
  MACHINE1:8082_SOLR(ephemeral v=0)
  MACHINE1:8081_SOLR(ephemeral v=0)
  MACHINE1:8084_SOLR(ephemeral v=0)
   /COLLECTIONS (v=1 children=1)
  COLLECTION1(v=0 children=2){configName:configuration1}
  LEADER_ELECT(v=0 children=2)
  SHARD1(V=0 children=1)
  ELECTION(v=0 children=2)
 
  87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
  87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
  SHARD2(v=0 children=1)
  ELECTION(v=0 children=2)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
  LEADERS (v=0 children=2)
  SHARD1 (ephemeral
  v=0){core:,node_name:MACHINE1:8081_solr,base_url:
  http://MACHINE1:8081/solr};
  SHARD2 (ephemeral
  v=0){core:,node_name:MACHINE1:8082_solr,base_url:
  http://MACHINE1:8082/solr};
   /OVERSEER_ELECT (v=0 children=2)
  ELECTION (v=0 children=4)
  231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
 v=0)
  87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
 v=0)
  159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
 v=0)
  87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
 v=0)
  LEADER (emphemeral
  v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
  On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
  On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a
 couple
  of
  times, but nothing happens. Is that stored on disk somewhere?
 
  Are you using the new admin UI? That has recently been updated to work
  better with cloud - it had some troubles not too long ago. If you are,
 you
  should trying using the old admin UI's zookeeper page - that should
 show
  the cluster state.
 
  That being said, there has been a lot of bug fixes over the past month
 -
  so you may just want to update to a recent version.
 
 
  I configured a custom request handler to calculate an unique document
 id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Hey Matt - is your build recent?
 
  Can you visit the cloud/zookeeper page in the admin and send the
  contents
  of the clusterstate.json node?
 
  Are you using a custom index chain or anything out of the ordinary?

Using LocalParams in StatsComponent to create a price slider?

2012-02-28 Thread Ted Strauss
Hi Mark, Just wondering if you ever resolved your issue here and/or if you
submitted a bug/improvement on
Lucenehttps://issues.apache.org/jira/browse/SOLR#selectedTab=com.atlassian.jira.plugin.system.project%3Aissues-panel?
If not, I'll submit it.
Cheers
ts


Re: Is there a way to implement a IntRangeField in Solr?

2012-02-28 Thread Mikhail Khludnev
On Tue, Feb 28, 2012 at 6:44 PM, federico.wachs
federico.wa...@2clams.comwrote:

 Hi Mikhail, thanks for your concern and reply.

 I've read a few dozen times your reply and I think I get what you mean, but
 I'm not exactly sure how to go forward with your approach. You are saying
 that I should be able to have nested documents, but I haven't been able to
 submit a Document with another Document on it so far.

 I'm using SolrJ to integrate with my Solr servers, do you think you could
 guide me a bit on how you would accomplish to nest two different kinds of
 documents?


Ok start from Lease documents with ApartmentID and than query leases and
specify group.field=ApartmentIDgroup=true

Disclaimer: I've never done grouping and I am not really familiar with
SolrJ,



 Thank you for your time and explanation I really appreciate it!

 Regards,
 Federicp

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3784220.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: indexing but not able to search

2012-02-28 Thread somer81
hello did solve this problem. I am a newbie with Solr. 
I just tried to add a simple XML file to Solr. It works fine 
adds and indexes but when try to search it I cannot see the 
docs fields which I described in XML. I see them in schema browser. They 
look they are indexed. 
I even tried to give same field names like in original schema.xml. 
I have field names  ; name , description , coord - instead of store. 

But it still doesnt show results. 
Do you have any idea please ? 

Omer 
sevinc.o...@gmail.com 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-but-not-able-to-search-tp3144695p3784974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread sharadgaur
I am also facing same problem do you have any update on it. I am using
Solr 3.5 and getting same error...

Feb 28, 2012 1:40:44 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token to
exceeds length of provided text sized 11503
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token to exceeds length of provided text sized 11503
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
... 20 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785157.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread Marian Steinbach
Unfortunately I don't have any news on that. I disabled highlighting on the
text field (sadly).

Have you tracked down which field causes the problem? Can you tell which
filters you are applying to the according field type?

Marian


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread Ahmet Arslan
 Unfortunately I don't have any news
 on that. I disabled highlighting on the
 text field (sadly).
 
 Have you tracked down which field causes the problem? Can
 you tell which
 filters you are applying to the according field type?

Are you using HTMLStripCharFilter ? If yes this could be :
https://issues.apache.org/jira/browse/LUCENE-3690


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread Marian Steinbach
Am 28. Februar 2012 21:14 schrieb Ahmet Arslan iori...@yahoo.com:


 Are you using HTMLStripCharFilter ? If yes this could be :
 https://issues.apache.org/jira/browse/LUCENE-3690



Not sure whether that question was directed at me, but I am not
using HTMLStripCharFilter but some other pattern replacements which modify
character positions, probably in the same manner as HTMLStripCharFilter
does.


Re: Speeding up indexing

2012-02-28 Thread Erik Hatcher
30 million - that's feasible on a single (beefy) Solr server but whether 
it's advisable to go distributed or not depends on other factors, like query 
speed issues you may have with that many docs in a single server, expected 
collection growth, and so on.

As for your questions further below -

1. Sending multiple docs into Solr definitely can help improve indexing 
throughput, up to the limits of what your environment can handle of course 
(there are many variables here, how many connections can your server handle at 
once, how much effort/memory is involved in indexing your documents parsing and 
analysis-wise, etc)

2. There's Solr configuration tweaks for sure (see Solr's example 
solrconfig.xml for details) that affect indexing performance, but it all 
depends on the bottlenecks in determining whether any of those settings would 
be an improvement or a detriment.

3. If you're going to index in parallel, which of course is architecturally 
possible, then you're basically setting it up for distributed search.  It's 
possible to merge indexes (on the same server) but for your particular case 
that doesn't seem like an architectural recommendation I'd make.  

I'd stick to a single server and see if/where that has issues, parallelize your 
indexing, and consider Solr's distributed search as needed from there.

Erik



On Feb 27, 2012, at 14:36 , Memory Makers wrote:

 A quick add on to this -- we have over 30 million documents.
 
 I take it that we should be looking @ Distributed Solr?
  as in
 http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344
 
 Thanks.
 
 On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers memmakers...@gmail.comwrote:
 
 Many thanks for the response.
 
 Here is the revised questions:
 
 For example if I have N processes that are producing documents to index:
 1. Should I have them simultaneously submit documents to Solr (will this
 improve the indexing throughput)?
 2. Is there anything I can do Solr configuration wise that will allow me
 to speed up indexing
 3. Is there an architecture where I can have two (or more) solr server do
 indexing in parallel
 
 Thanks.
 
 On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
 
 Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
 you do so will depend on your indexing environment.  Most crudely, running
 multiple indexing scripts on different subsets of data up to the the
 limitations of your operating system and hardware is how many do it.
 SolrJ has some multithreaded facility, as does DataImportHandler.
 Distributing the indexing to multiple machines, but pointing all to the
 same Solr server, is effectively the same as multi-threading it push
 documents into Solr from wherever as fast as it can handle it.  This is
 definitely how many do this.
 
   Erik
 
 On Feb 27, 2012, at 13:24 , Memory Makers wrote:
 
 Hi,
 
 Is there a way to speed up indexing by increasing the number of threads
 doing the indexing or perhaps by distributing indexing on multiple
 machines?
 
 Thanks.
 
 
 



Need tokenization that finds part of stringvalue

2012-02-28 Thread PeterKerk
I have the following in my schema.xml

field name=title type=text_ws indexed=true stored=true/
field name=title_search type=text indexed=true stored=true/


fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


I want to search on field title.
Now my field title holds the value great smartphone.
If I search on smartphone the item is found. But I want the item also to
be found on great or phone it doesnt work.
I have been playing around with the tokenizer test function, but have failed
to find the definition for the text fieldtype I need.
Help? :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3785366.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread Ahmet Arslan
 Not sure whether that question was directed at me, but I am
 not using HTMLStripCharFilter but some other pattern
 replacements which modify
 character positions, probably in the same manner as
 HTMLStripCharFilter
 does.

I thought that cause of the problem is 
https://issues.apache.org/jira/browse/LUCENE-2208

What is your field definition? Can you provide your document and query pair 
that causes this exception?


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-02-28 Thread sharadgaur
I was using fieldType text_general_rev

 fieldType name=text_general_rev class=solr.TextField
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType




But since I changed to fieldType text_genral. Everything is running fine
not getting InvalidTokenOffsetsException exception.




   fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785456.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error during auto-warming of key

2012-02-28 Thread Oana Goilav
Hi,

Did you find the root cause of this issue?


Cheers,
Oana


[http://elasticpath.com/images/ep.gif]
Oana Goilav, Software Engineer
Phone: 604.408.8078 ext. 203
Email: oana.goi...@elasticpath.commailto:oana.goi...@elasticpath.com

Elastic Path Software, Inc.
Web elasticpath.com http://www.elasticpath.com/ | Blog getelastic.com 
http://www.getelastic.com/ | Twitter twitter.com/elasticpath 
http://www.twitter.com/elasticpath
Careers elasticpath.com/jobshttp://www.elasticpath.com/jobs | Community 
grep.elasticpath.com http://grep.elasticpath.com/

Confidentiality Notice: This message is intended only for the use of the 
designated addressee(s), and may contain information that is privileged, 
confidential and exempt from disclosure. Any unauthorized viewing, disclosure, 
copying, distribution or use of information contained in this e-mail is 
prohibited and may be unlawful. If you received this e-mail in error, please 
reply to the sender immediately to inform us you are not the intended recipient 
and delete the email from your computer system.





Re: solr returns reduced results for same query after adding a new field to the schema.

2012-02-28 Thread Chris Hostetter

: When I query solr with the query ?q=ingredientSuggestion=banana I get
: 160 results.
: Ok, all fine.
: 
: When I add a new field such as
: 
:   field name=courseName type=text indexed=true
: stored=true required=true/
: 
: to my index it reduces the number of results from my query to 131, even
: though the query 
: has'nt changed and does not (at least explicitly) filter the result set.

presumably you changed your DIH config in some way when you added that 
field? are you certain that you didn't do anything to alter the total 
number of docs being indexed?  or that the data source didn't change 
between the first time you indexed and the second?


-Hoss


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-28 Thread Mark Miller
Hmm...this is very strange - there is nothing interesting in any of the logs?

In clusterstate.json, all of the shards have an active state?


There are quite a few of us doing exactly this setup recently, so there must be 
something we are missing here...

Any info you can offer might help.

- Mark

On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:

 Mark,
 
 I got the codebase from the 2/26/2012, and I got the same inconsistent
 results.
 
 I have solr running on four ports 8081-8084
 
 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
 8083 - is assigned to shard 1
 8084 - is assigned to shard 2
 
 queries come in and sometime it seems the windows from 8081 and 8083 move
 responding to the query but there are no results.
 
 if the queries run on 8081/8082 or 8081/8084 then results come back ok.
 
 The query is nothing more than: q=*:*
 
 Regards,
 
 Matt
 
 
 On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
 mpar...@apogeeintegration.com wrote:
 
 I'll have to check on the commit situation. We have been pushing data from
 SharePoint the last week or so. Would that somehow block the documents
 moving between the solr instances?
 
 I'll try another version tomorrow. Thanks for the suggestions.
 
 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.comwrote:
 
 Hmmm...all of that looks pretty normal...
 
 Did a commit somehow fail on the other machine? When you view the stats
 for the update handler, are there a lot of pending adds for on of the
 nodes? Do the commit counts match across nodes?
 
 You can also query an individual node with distrib=false to check that.
 
 If you build is a month old, I'd honestly recommend you try upgrading as
 well.
 
 - Mark
 
 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
 Here is most of the cluster state:
 
 Connected to Zookeeper
 localhost:2181, localhost: 2182, localhost:2183
 
 /(v=0 children=7) 
  /CONFIGS(v=0, children=1)
 /CONFIGURATION(v=0 children=25)
 all the configuration files, velocity info, xslt, etc.
 
 /NODE_STATES(v=0 children=4)
MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
 /ZOOKEEPER (v-0 children=1)
QUOTA(v=0)
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
 /LIVE_NODES (v=0 children=4)
MACHINE1:8083_SOLR(ephemeral v=0)
MACHINE1:8082_SOLR(ephemeral v=0)
MACHINE1:8081_SOLR(ephemeral v=0)
MACHINE1:8084_SOLR(ephemeral v=0)
 /COLLECTIONS (v=1 children=1)
COLLECTION1(v=0 children=2){configName:configuration1}
LEADER_ELECT(v=0 children=2)
SHARD1(V=0 children=1)
ELECTION(v=0 children=2)
 
 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
SHARD2(v=0 children=1)
ELECTION(v=0 children=2)
 
 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
 159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
LEADERS (v=0 children=2)
SHARD1 (ephemeral
 v=0){core:,node_name:MACHINE1:8081_solr,base_url:
 http://MACHINE1:8081/solr};
SHARD2 (ephemeral
 v=0){core:,node_name:MACHINE1:8082_solr,base_url:
 http://MACHINE1:8082/solr};
 /OVERSEER_ELECT (v=0 children=2)
ELECTION (v=0 children=4)
231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
 v=0)
87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
 v=0)
159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
 v=0)
87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
 v=0)
LEADER (emphemeral
 v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
 On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
 On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
 Thanks for your reply Mark.
 
 I believe the build was towards the begining of the month. The
 solr.spec.version is 4.0.0.2012.01.10.38.09
 
 I cannot access the clusterstate.json contents. I clicked on it a
 couple
 of
 times, but nothing happens. Is that stored on disk somewhere?
 
 Are you using the new admin UI? That has recently been updated to work
 better with cloud - it had some troubles not too long ago. If you are,
 you
 should trying using the old admin UI's zookeeper page - that should
 show
 the cluster state.
 
 That being said, there has been a lot of bug fixes over the past month
 -
 so you may just want to update to a recent version.
 
 
 I configured a custom request handler to calculate an unique document
 id
 based on the file's url.
 
 On 

Couple issues with edismax in 3.5

2012-02-28 Thread Way Cool
Hi, Guys,

I am having the following issues with edismax:

1. Search for 4X6 generated the following parsed query:
+DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4 name:x
name:6)^1.025) )
while the search for 4 X 6 (with space in between)  generated the query
below: (I like this one)
+((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
+((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
+((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)

Is that really intentional? The first query is pretty weird because it will
return all of the docs with one of 4, x, 6.

Any easy way we can force 4X6 search to be the same as 4 X 6?

2. Issue with multi words synonym because edismax separates keywords to
multiple words via the line below:
clauses = splitIntoClauses(userQuery, false);
and seems like edismax doesn't quite respect fieldType at query time, for
example, handling stopWords differently than what's specified in schema.

For example: I have the following synonym:
AAA BBB, AAABBB, AAA-BBB, CCC DDD

When I search for AAA-BBB, it works, however search for CCC DDD was not
returning results containing AAABBB. What is interesting is that
admin/analysis.jsp is returning great results.


Thanks,

YH


Building a resilient cluster

2012-02-28 Thread Ranjan Bagchi
Hi,

I'm interested in setting up a solr cluster where each machine [at least
initially] hosts a separate shard of a big index [too big to sit on the
machine].  I'm able to put a cloud together by telling it that I have (to
start out with) 4 nodes, and then starting up nodes on 3 machines pointing
at the zkInstance.  I'm able to load my sharded data onto each machine
individually and it seems to work.

My concern is that it's not fault tolerant:  if one of the non-zookeeper
machines falls over, the whole cluster won't work.  Also, I can't create a
shard with more data, and have it work within the existing cloud.

I tried using -DshardId=shard5 [on an existing 4-shard cluster], but it
just started replicating, which doesn't seem right.

Are there ways around this?

Thanks,
Ranjan Bagchi