Re: Requesting to add into a Contributor Group

2013-05-05 Thread Robert Muir
done. let us know if you have any problems.

On Sat, May 4, 2013 at 10:12 AM, Krunal jariwalakru...@gmail.com wrote:

 Dear Sir,

 Kindly add me to the contributor group to help me contribute to the Solr
 wiki.

 My Email id: jariwalakru...@gmail.com
 Login Name: Krunal

 Specific changes I would like to make to begin with are:

 - Correct Link of Ajax Solr here http://wiki.apache.org/solr/SolrJS which
 is wrong, the correct link should be
 https://github.com/evolvingweb/ajax-solr/wiki

 - Add our company data here http://wiki.apache.org/solr/Support

 We offer Solr integration service on Dot Net Platform at Xcellence-IT.

 And business division of ours, i.e. nopAccelerate - offers a Solr
 Integration Plugin for nopCommerce along with other nopCommerce performance
 optimization services.


 We have been working on Solr since last 1 years and will be happy to
 contribute back by helping community maintain  update Wiki. If this is not
 allowed, then kindly let us know so I will send you our Company details so
 you can make changes too.

 Thanks,

 Awaiting your response.

 Krunal

 *Krunal Jariwala*


 *Cell:* +91-98251-07747

 *Best time to Call:* 9am to 7pm (IST) GMT +5.30



Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Erick Erickson
Second the thanks

Erick

On Sat, May 4, 2013 at 6:08 PM, Lance Norskog goks...@gmail.com wrote:
 Great! Thank you very much Shawn.


 On 05/04/2013 10:55 AM, Shawn Heisey wrote:

 On 5/4/2013 11:45 AM, Shawn Heisey wrote:

 Advance warning: this is a long reply.

 I have condensed some relevant performance problem information into the
 following wiki page:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Anyone who has additional information for this page, feel free to add
 it.  I hope I haven't made too many mistakes!

 Thanks,
 Shawn




Re: How to get solr synonyms in result set.

2013-05-05 Thread Erick Erickson
Sure, you can specify a separate synonyms list at query time, just define
an index and query time analysis chain one without the synonym filter
factory and one without.

Be aware that index-time and query-time have some different characteristics,
especially around multi-word synonyms see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Best
Erick

On Sun, May 5, 2013 at 12:23 AM, varun srivastava
varunmail...@gmail.com wrote:
 Hi ,
  Synonyms list is used at index time. So I dont think you can pass list at
 query time and make it work.


 On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.comwrote:

 Hi,

 I want to get specific solr synonyms terms list during query time in result
 set based on filter criteria.
 I have implemented synonyms in .txt file.

 Thanks








 -
 Regards,

 Suneel Pandey
 Sr. Software Developer
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Kumar Limbu
Thanks for the replies. It is really appreciated.

Based on the replies it seems like upgrading to the latest version of Solr
is something that will probably resolve this issue.

We are also update quite frequently. We update every 5 minutes. We will try
and set this to higher interval and see if that helps.

We will also try increasing the servlet timeout and see if that resolves the
issue.

Among the other suggestions we already tried increasing the zkClientTimeout
from 15 seconds to 30 seconds but that didn't seem to help. What do you
recommend is a good value to try?

Few more details about our system:
we are running this on a system with 16GB of RAM. We are using 64 bit server
and we also use SSD disks.

Also, since we are already using 4.0 in our production environment with the
aforementioned 3 servers setup, how should we go about upgrading to the
latest version (4.3)? Do we need to do a full reindex of our data or is the
index compatible between these versions? 

We will try out the suggestions and will post later if any of them help us
resolve the issue.

Again, thanks for the reply.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800p4060897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom tokenizer error

2013-05-05 Thread Jack Krupansky

I didn't notice any call to the reset method for your base tokenizer.

Is there any reason that you didn't just use char filters to replace colon 
and periods with spaces?


-- Jack Krupansky

-Original Message- 
From: Sarita Nair

Sent: Friday, May 03, 2013 2:43 PM
To: solr-user@lucene.apache.org
Subject: custom tokenizer error

I am using a custom Tokenizer, as part of analysis chain, for a Solr (4.2.1) 
field. On trying to index, Solr throws a NullPointerException.
The unit tests for the custom tokenizer work fine. Any ideas as to what is 
it that I am missing/doing incorrectly will be appreciated.


Here is the relevant schema.xml excerpt:

   fieldType name=negated class=solr.TextField omitNorms=true 
   analyzer type=index
   tokenizer 
class=some.other.solr.analysis.EmbeddedPunctuationTokenizer$Factory/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPossessiveFilterFactory/
   /analyzer
   /fieldType

Here are the relevant pieces of the Tokenizer:

   /**
* Intercepts each token produced by {@link 
StandardTokenizer#incrementToken()}
* and checks for the presence of a colon or period. If found, splits 
the token
* on the punctuation mark and adjusts the term and offset attributes of 
the

* underlying {@link TokenStream} to create additional tokens.
*
*
*/
   public class EmbeddedPunctuationTokenizer extends Tokenizer {
private static final Pattern PUNCTUATION_SYMBOLS = Pattern.compile([:.]);
private StandardTokenizer baseTokenizer;
  private CharTermAttribute termAttr;

private OffsetAttribute offsetAttr;

private /*@Nullable*/ String tokenAfterPunctuation = null;

private int currentOffset = 0;

public EmbeddedPunctuationTokenizer(final Reader reader) {
super(reader);
baseTokenizer = new StandardTokenizer(Version.MINIMUM_LUCENE_VERSION, 
reader);

// Two TokenStreams are in play here: the one underlying the current
// instance and the one underlying the StandardTokenizer. The attribute
// instances must be associated with both.
termAttr = baseTokenizer.addAttribute(CharTermAttribute.class);
offsetAttr = baseTokenizer.addAttribute(OffsetAttribute.class);
this.addAttributeImpl((CharTermAttributeImpl)termAttr);
this.addAttributeImpl((OffsetAttributeImpl)offsetAttr);
}

@Override
public void end() throws IOException {
baseTokenizer.end();
super.end();
}

@Override
public void close() throws IOException {
baseTokenizer.close();
super.close();
}

@Override
public void reset() throws IOException {
super.reset();
baseTokenizer.reset();
currentOffset = 0;
tokenAfterPunctuation = null;
}

@Override
public final boolean incrementToken() throws IOException {
clearAttributes();
if (tokenAfterPunctuation != null) {
// Do not advance the underlying TokenStream if the previous call
// found an embedded punctuation mark and set aside the substring
// that follows it. Set the attributes instead from the substring,
// bearing in mind that the substring could contain more embedded
// punctuation marks.
adjustAttributes(tokenAfterPunctuation);
} else if (baseTokenizer.incrementToken()) {
// No remaining substring from a token with embedded punctuation: save
// the starting offset reported by the base tokenizer as the current
// offset, then proceed with the analysis of token it returned.
currentOffset = offsetAttr.startOffset();
adjustAttributes(termAttr.toString());
} else {
// No more tokens in the underlying token stream: return false
return false;
}
return true;
}


  private void adjustAttributes(final String token) {
Matcher m = PUNCTUATION_SYMBOLS.matcher(token);
if (m.find()) {
int index = m.start();
offsetAttr.setOffset(currentOffset, currentOffset + index);
termAttr.copyBuffer(token.toCharArray(), 0, index);
tokenAfterPunctuation = token.substring(index + 1);
// Given that the incoming token had an embedded punctuation mark,
// the starting offset for the substring following the punctuation
// mark will be 1 beyond the end of the current token, which is the
// substring preceding embedded punctuation mark.
currentOffset = offsetAttr.endOffset() + 1;
} else if (tokenAfterPunctuation != null) {
// Last remaining substring following a previously detected embedded
// punctuation mark: adjust attributes based on its values.
int length = tokenAfterPunctuation.length();
termAttr.copyBuffer(tokenAfterPunctuation.toCharArray(), 0, length);
offsetAttr.setOffset(currentOffset, currentOffset + length);
tokenAfterPunctuation = null;
}
// Implied else: neither is true so attributes from base tokenizer need
// no adjustments.
}

}
}

Solr throws the following error, in the 'else if' block of #incrementToken

   2013-04-29 14:19:48,920 [http-thread-pool-8080(3)] ERROR 
org.apache.solr.core.SolrCore - java.lang.NullPointerException
   at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
   at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
   

zookeeper errors

2013-05-05 Thread Michael Della Bitta
Hi,

I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper 2.4.5
setup.

I'm seeing a lot of these errors in the zookeeper logs:

2013-05-05 15:06:22,863 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception

Also some of these:

NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x3e22c637f5063f due to java.io.IOException:
Connection reset by peer

or

NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x3e22c637f506a1 due to java.io.IOException:
Connection timed out

We've had problems with nodes dropping out of collections during indexing.
I'm assuming these are related? Is there some sort of socket tuning I need
to do on the Solr side to keep these connections going?

Thanks for any input anybody might be able to provide,

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-05 Thread Jack Krupansky
From the wiki: SolrCloud can continue to serve results without interruption 
as long as at least one server hosts every shard. You can demonstrate this 
by judiciously shutting down various instances and looking for results. If 
you have killed all of the servers for a particular shard, requests to other 
servers will result in a 503 error. To return just the documents that are 
available in the shards that are still alive (and avoid the error), add the 
following query parameter: shards.tolerant=true


That doesn't completely answer your question, but is an important part of 
the puzzle.


-- Jack Krupansky

-Original Message- 
From: Dennis Haller

Sent: Friday, May 03, 2013 3:21 PM
To: solr-user@lucene.apache.org
Subject: disaster recovery scenarios for solr cloud and zookeeper

Hi,

Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
nodes, it is possible to manage zookeeper maintenance and online
availability to be close to %100. But what is the worst case for Solr if
for some unanticipated reason all Zookeeper nodes go offline?

Could someone comment on a couple of possible scenarios for which all ZK
nodes are offline. What would happen to Solr and what would be needed to
recover in each case?
1) brief interruption, say 2 minutes,
2) longer downtime, say 60 min

Thanks
Dennis 



Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-05 Thread Mark Miller
When Solr loses it's connection to ZooKeeper, updates will start being 
rejected. Read requests will continue as normal. This is regardless of how long 
ZooKeeper is down.

So it's pretty simple - when you lost the ability to talk to ZK, everything 
keeps working based on the most recent clusterstate - except that updates are 
blocked and you cannot add new nodes to the cluster. You are essentially in 
steady state.

The ZK clients will continue trying to reconnect so that when ZK comes back 
updates while start being accepted again and new nodes may join the cluster.

- Mark

On May 3, 2013, at 3:21 PM, Dennis Haller dhal...@talenttech.com wrote:

 Hi,
 
 Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
 expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
 nodes, it is possible to manage zookeeper maintenance and online
 availability to be close to %100. But what is the worst case for Solr if
 for some unanticipated reason all Zookeeper nodes go offline?
 
 Could someone comment on a couple of possible scenarios for which all ZK
 nodes are offline. What would happen to Solr and what would be needed to
 recover in each case?
 1) brief interruption, say 2 minutes,
 2) longer downtime, say 60 min
 
 Thanks
 Dennis



Re: zookeeper errors

2013-05-05 Thread Mark Miller
It sounds like you probably need to raise the default 15 sec zk client timeout. 
We have it default to a fairly aggressive setting. For high load envs, you may 
have to bring it to 30 or 45 seconds.

Also, be sure you are not using a stop the world gc collector - CMS (the 
concurrent low pause collector) is best.

- Mark

On May 5, 2013, at 1:50 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Hi,
 
 I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper 2.4.5
 setup.
 
 I'm seeing a lot of these errors in the zookeeper logs:
 
 2013-05-05 15:06:22,863 - WARN  [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
 
 Also some of these:
 
 NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
 causing close of session 0x3e22c637f5063f due to java.io.IOException:
 Connection reset by peer
 
 or
 
 NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
 causing close of session 0x3e22c637f506a1 due to java.io.IOException:
 Connection timed out
 
 We've had problems with nodes dropping out of collections during indexing.
 I'm assuming these are related? Is there some sort of socket tuning I need
 to do on the Solr side to keep these connections going?
 
 Thanks for any input anybody might be able to provide,
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game



Re: zookeeper errors

2013-05-05 Thread Michael Della Bitta
Mark,

I'm definitely using CMS, so I'll look into the zk client timeout.

Thanks!


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Sun, May 5, 2013 at 2:21 PM, Mark Miller markrmil...@gmail.com wrote:

 It sounds like you probably need to raise the default 15 sec zk client
 timeout. We have it default to a fairly aggressive setting. For high load
 envs, you may have to bring it to 30 or 45 seconds.

 Also, be sure you are not using a stop the world gc collector - CMS (the
 concurrent low pause collector) is best.

 - Mark

 On May 5, 2013, at 1:50 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Hi,
 
  I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper
 2.4.5
  setup.
 
  I'm seeing a lot of these errors in the zookeeper logs:
 
  2013-05-05 15:06:22,863 - WARN  [NIOServerCxn.Factory:
  0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
 
  Also some of these:
 
  NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
  causing close of session 0x3e22c637f5063f due to java.io.IOException:
  Connection reset by peer
 
  or
 
  NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
  causing close of session 0x3e22c637f506a1 due to java.io.IOException:
  Connection timed out
 
  We've had problems with nodes dropping out of collections during
 indexing.
  I'm assuming these are related? Is there some sort of socket tuning I
 need
  to do on the Solr side to keep these connections going?
 
  Thanks for any input anybody might be able to provide,
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game




Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-05 Thread Jack Krupansky

Is soul retrieval possible when ZooKeeper is down?

-- Jack Krupansky

-Original Message- 
From: Mark Miller

Sent: Sunday, May 05, 2013 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: disaster recovery scenarios for solr cloud and zookeeper

When Solr loses it's connection to ZooKeeper, updates will start being 
rejected. Read requests will continue as normal. This is regardless of how 
long ZooKeeper is down.


So it's pretty simple - when you lost the ability to talk to ZK, everything 
keeps working based on the most recent clusterstate - except that updates 
are blocked and you cannot add new nodes to the cluster. You are essentially 
in steady state.


The ZK clients will continue trying to reconnect so that when ZK comes back 
updates while start being accepted again and new nodes may join the cluster.


- Mark

On May 3, 2013, at 3:21 PM, Dennis Haller dhal...@talenttech.com wrote:


Hi,

Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
expected to have a very high (perfect?) availability. With 3 or 5 
zookeeper

nodes, it is possible to manage zookeeper maintenance and online
availability to be close to %100. But what is the worst case for Solr if
for some unanticipated reason all Zookeeper nodes go offline?

Could someone comment on a couple of possible scenarios for which all ZK
nodes are offline. What would happen to Solr and what would be needed to
recover in each case?
1) brief interruption, say 2 minutes,
2) longer downtime, say 60 min

Thanks
Dennis 




Re: Why is SolrCloud doing a full copy of the index?

2013-05-05 Thread Kristopher Kane
 
 Advance warning: this is a long reply.
 

Awesome Shawn.  Thanks!





iterate through each document in Solr

2013-05-05 Thread Mingfeng Yang
Dear Solr Users,

Does anyone know what is the best way to iterate through each document in a
Solr index with billion entries?

I tried to use  select?q=*:*start=xxrows=500  to get 500 docs each time
and then change start value, but it got very slow after getting through
about 10 million docs.

Thanks,
Ming-


Re: iterate through each document in Solr

2013-05-05 Thread Michael Sokolov

On 5/5/13 7:48 PM, Mingfeng Yang wrote:

Dear Solr Users,

Does anyone know what is the best way to iterate through each document in a
Solr index with billion entries?

I tried to use  select?q=*:*start=xxrows=500  to get 500 docs each time
and then change start value, but it got very slow after getting through
about 10 million docs.

Thanks,
Ming-

You need to use a unique and stable sort key and get documents  
sortkey.  For example, if you have a unique key, retrieve documents 
ordered by the unique key, and for each batch get documents  max (key) 
from the previous batch


-Mike


Re: How to get solr synonyms in result set.

2013-05-05 Thread varun srivastava
Hi Suneel,
 After discovering that only query time synonym work with solr I found a
good article on pros and cons of query and index time synonyms . It may
help you
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

Regards
Varun


On Sun, May 5, 2013 at 9:20 AM, Erick Erickson erickerick...@gmail.comwrote:

 Sure, you can specify a separate synonyms list at query time, just define
 an index and query time analysis chain one without the synonym filter
 factory and one without.

 Be aware that index-time and query-time have some different
 characteristics,
 especially around multi-word synonyms see:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 Best
 Erick

 On Sun, May 5, 2013 at 12:23 AM, varun srivastava
 varunmail...@gmail.com wrote:
  Hi ,
   Synonyms list is used at index time. So I dont think you can pass list
 at
  query time and make it work.
 
 
  On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.com
 wrote:
 
  Hi,
 
  I want to get specific solr synonyms terms list during query time in
 result
  set based on filter criteria.
  I have implemented synonyms in .txt file.
 
  Thanks
 
 
 
 
 
 
 
 
  -
  Regards,
 
  Suneel Pandey
  Sr. Software Developer
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: How to get solr synonyms in result set.

2013-05-05 Thread Upayavira
There is no way to identify *which* synonyms triggered in your search
output.

You could implement a synonyms search component, that looks in the
stored values of configured fields for synonyms, and adds another block
of XML to the output. This old be useful component.

Upayavira

On Mon, May 6, 2013, at 05:23 AM, varun srivastava wrote:
 Hi Suneel,
  After discovering that only query time synonym work with solr I found a
 good article on pros and cons of query and index time synonyms . It may
 help you
 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
 
 Regards
 Varun
 
 
 On Sun, May 5, 2013 at 9:20 AM, Erick Erickson
 erickerick...@gmail.comwrote:
 
  Sure, you can specify a separate synonyms list at query time, just define
  an index and query time analysis chain one without the synonym filter
  factory and one without.
 
  Be aware that index-time and query-time have some different
  characteristics,
  especially around multi-word synonyms see:
 
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
  Best
  Erick
 
  On Sun, May 5, 2013 at 12:23 AM, varun srivastava
  varunmail...@gmail.com wrote:
   Hi ,
Synonyms list is used at index time. So I dont think you can pass list
  at
   query time and make it work.
  
  
   On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.com
  wrote:
  
   Hi,
  
   I want to get specific solr synonyms terms list during query time in
  result
   set based on filter criteria.
   I have implemented synonyms in .txt file.
  
   Thanks
  
  
  
  
  
  
  
  
   -
   Regards,
  
   Suneel Pandey
   Sr. Software Developer
   --
   View this message in context:
  
  http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html
   Sent from the Solr - User mailing list archive at Nabble.com.