Solr 4.2 rollback not working

2013-05-02 Thread Dipti Srivastava
Hi All,
WE have setup a 4.2 Solr cloud with 4 nodes and while the add/update/delete 
operations are working we are not able to perform a rollback. Is there 
something different for this operation vs the 3.x sole master/slave config?

Thanks,
Dipti
phone: 408.678.1595  |  cell: 408.806.1970 | email: 
dipti.srivast...@apollogrp.edumailto:dipti.srivast...@apollogrp.edu
Solutions Engineering and Integration: 
https://wiki.apollogrp.edu/display/NGP/Solutions+Engineering+and+Integration
Support process: 
https://wiki.apollogrp.edu/display/NGP/Classroom+Services+Support
P Please consider the environment before printing this email.


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.



commit in solr4 takes a longer time

2013-05-02 Thread vicky desai
Hi all,

I have recently migrated from solr 3.6 to solr 4.0. The documents in my core
are getting constantly updated and so I fire a code commit after every 10
thousand docs . However moving from 3.6 to 4.0 I have noticed that for the
same core size it takes about twice the time to commit in solr4.0 compared
to solr 3.6. 

Is there any workaround by which I can reduce this time. Any help would be
highly appreciated 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: commit in solr4 takes a longer time

2013-05-02 Thread Furkan KAMACI
Can you explain more about your document size, shard and replica sizes, and
auto/soft commit time parameters?

2013/5/2 vicky desai vicky.de...@germinait.com

 Hi all,

 I have recently migrated from solr 3.6 to solr 4.0. The documents in my
 core
 are getting constantly updated and so I fire a code commit after every 10
 thousand docs . However moving from 3.6 to 4.0 I have noticed that for the
 same core size it takes about twice the time to commit in solr4.0 compared
 to solr 3.6.

 Is there any workaround by which I can reduce this time. Any help would be
 highly appreciated



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SolrJ / Solr Two Phase Commit

2013-05-02 Thread mark12345
I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit. 
Any examples?  Any best practices?

What I know:
* Lucene offers Two Phase Commitvia it's index writer (prepareCommit()
followed by either commit() or rollback()).

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html

* I know Solr Optimistic Concurrency is available.
http://yonik.com/solr/optimistic-concurrency/
http://yonik.com/solr/optimistic-concurrency/  


I want a transactional behavior that ensures that there is a full commit or
full rollback of multiple documents.  I do not want to be in a situation
where I don't know if the beans have been written or not written to the Solr
instance.

* Code Snippet

 try {
   UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
 docTwo));
   successForAddingDocuments = (updateResponse.getStatus() == 0);
   if (successForAddingDocuments) {
   UpdateResponse updateResponseForCommit = server.commit();
   successForCommit = (updateResponseForCommit.getStatus() == 0);
   }
 } catch (Exception e) {
 } finally {
   if (!successForCommit) {
   System.err.println(Rolling back transaction.);
   try {
   UpdateResponse updateResponseForRollback = 
 server.rollback();
   if (updateResponseForRollback.getStatus() == 0) {
   successForRollback = true;
   } else {
   successForRollback = false;
   System.err.println(Failed to rollback!  Bad as 
 state is now
 unknown!);
   }
   } catch (Exception e) {
   }
   }
 }

* Full Test class

 @Test
 public void documentTransactionTest() {
 
   try {
   // HttpSolrServer server = ...
   server.deleteById(Arrays.asList(1, 2));
   server.commit();
   } catch (Exception e) {
   e.printStackTrace();
   }
 
   SolrInputDocument docOne = new SolrInputDocument();
   {
   docOne.addField(id, 1L);
   docOne.addField(type_s, MyTestDoc);
   docOne.addField(value_s, docOne);
   docOne.addField(_version_, -1L);
   }
 
   SolrInputDocument docTwo = new SolrInputDocument();
   {
   docTwo.addField(id, 2L);
   docTwo.addField(type_s, MyTestDoc);
   docTwo.addField(value_s, docTwo);
   docTwo.addField(_version_, -1L);
   }
 
   boolean successForAddingDocuments = false;
   boolean successForCommit = false;
   boolean successForRollback = false;
 
 //throw new SolrServerException(Connection Broken);
 
   try {
   UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
 docTwo));
   successForAddingDocuments = (updateResponse.getStatus() == 0);
   if (successForAddingDocuments) {
   UpdateResponse updateResponseForCommit = 
 server.commit();
   successForCommit = (updateResponseForCommit.getStatus() 
 == 0);
   }
   } catch (Exception e) {
   } finally {
   if (!successForCommit) {
   System.err.println(Rolling back transaction.);
   try {
   UpdateResponse updateResponseForRollback = 
 server.rollback();
   if (updateResponseForRollback.getStatus() == 0) 
 {
   successForRollback = true;
   } else {
   successForRollback = false;
   System.err.println(Failed to rollback! 
  Bad as state is now
 unknown!);
   }
   } catch (Exception e) {
   }
   }
   }
 
   {
   try {
   QueryResponse response = server.query(new
 SolrQuery(*:*).addFilterQuery(type_s:MyTestDoc));
 
   if (successForCommit) {
   Assert.assertEquals(2, 
 response.getResults().size());
   Assert.assertEquals(docOne,
 response.getResults().get(0).get(value_s));
   Assert.assertEquals(docTwo,
 response.getResults().get(1).get(value_s));
   } else if (successForRollback) {
   Assert.assertEquals(0, 
 response.getResults().size());
   } else {
   // UNKNOWN STATE
 
   if (response.getResults().size() == 0) {
   // rollback must have been successful
   } else if (response.getResults().size() == 2) {
   // commit was successful
  

Re: Solr 4.2 rollback not working

2013-05-02 Thread mark12345
What version of Solr are you using?  4.2.0 or 4.2.1?

The following might be of interest to you:
*  https://issues.apache.org/jira/browse/SOLR-4605
https://issues.apache.org/jira/browse/SOLR-4605  
*  https://issues.apache.org/jira/browse/SOLR-4733
https://issues.apache.org/jira/browse/SOLR-4733  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060401.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: commit in solr4 takes a longer time

2013-05-02 Thread vicky desai
Hi,

I am using 1 shard and two replicas. Document size is around 6 lakhs 


My solrconfig.xml is as follows
?xml version=1.0 encoding=UTF-8 ?
config
luceneMatchVersionLUCENE_40/luceneMatchVersion
indexConfig


maxFieldLength2147483647/maxFieldLength
lockTypesimple/lockType
unlockOnStartuptrue/unlockOnStartup
/indexConfig
updateHandler class=solr.DirectUpdateHandler2
autoSoftCommit
maxDocs500/maxDocs
maxTime1000/maxTime
/autoSoftCommit
autoCommit
maxDocs5/maxDocs 
maxTime30/maxTime 
/autoCommit
/updateHandler

requestDispatcher handleSelect=true 
requestParsers enableRemoteStreaming=false
multipartUploadLimitInKB=204800 /
/requestDispatcher

requestHandler name=standard class=solr.StandardRequestHandler
default=true /
requestHandler name=/update class=solr.UpdateRequestHandler /
requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers /
requestHandler name=/replication class=solr.ReplicationHandler /
directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / 
enableLazyFieldLoadingtrue/enableLazyFieldLoading
admin
defaultQuery*:*/defaultQuery
/admin 
/config




--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-02 Thread Jack Krupansky
The Integer.MAX_VALUE-1 limit is set by Lucene. As hardware capacity and 
performance continues to advance, I think it's only a matter of time before 
Lucene (and then Solr) relaxes the limit, but I don't imagine it will 
happened real soon. Maybe in Lucene/Solr 6.0?


-- Jack Krupansky

-Original Message- 
From: Valery Giner

Sent: Wednesday, May 01, 2013 1:36 PM
To: solr-user@lucene.apache.org
Subject: any plans to remove int32 limitation on the number of the documents 
in the index?


Dear Solr Developers,

I've been unable to find an answer to the question in the subject line
of this e-mail, except of a vague one.

We need to be able to index over 2bln+ documents.   We were doing well
without sharding until the number of docs hit the limit ( 2bln+).   The
performance was satisfactory for the queries, updates and indexing of
new documents.

That is, except for the need to go around the int32 limit, we don't
really have a need for setting up distributed solr.

I wonder whether some one on the solr team could tell us when/what
version of solr we could expect the limit to be removed.

I hope this question may be of interest to some one else :)

--
Thanks,
Val 



Any estimation for solr 4.3?

2013-05-02 Thread adfel70
Hi,
We're planning on upgrading our solr cluster from 4.0 to 4.2.1
Is 4.3 coming any soon?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-estimation-for-solr-4-3-tp4060408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any estimation for solr 4.3?

2013-05-02 Thread Jack Krupansky
RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
days.


-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Thursday, May 02, 2013 1:32 AM
To: solr-user@lucene.apache.org
Subject: Any estimation for solr 4.3?

Hi,
We're planning on upgrading our solr cluster from 4.0 to 4.2.1
Is 4.3 coming any soon?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-estimation-for-solr-4-3-tp4060408.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Pulling Config Folder from Zookeeper at SolrCloud

2013-05-02 Thread Furkan KAMACI
I use the same folder naming convention of Solr example for my Solr 4.2.1
cloud. I have a collection1 folder and under it I have a conf folder. When
I starting up my first node,  I indicate that:

-Dsolr.solr.home=./solr -Dsolr.data.dir=./solr/data -DnumShards=5
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf

when I use external Zookeeper ensemble.

However I see that Zookeeper keeps that config file. What should I do when
I startup new nodes at my SolrCloud. Should I remove conf folder from
sources and -Dbootstrap_confdir=./solr/collection1/con parameter from
startup parameters because of Zookeeper keeps it? Also what should I do if
I have multiple configs at Zookeeper, how can I indicate my new starting up
instance to use which config at Zookeeper?


Does Near Real Time get not supported at SolrCloud?

2013-05-02 Thread Furkan KAMACI
Does Near Real Time get not supported at SolrCloud?

I mean when a soft commit occurs at a leader I think that it doesn't
distribute it to replicas(because it is not at storage, does indexes at RAM
distributes to replicas too?) and a search query comes what happens?


socket write error

2013-05-02 Thread Dmitry Kan
Hi guys!

We have solr router and shards. I see this in jetty log on the router:

May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.SocketException) caught when processing
request: Connection reset by peer: socket write error

and then:

May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request

followed by exception about Internal Server Error

any ideas why this happens?

We run 80+ shards distributed across several servers. Router runs on its
own node.

Is there anything in particular I should be looking into wrt ubuntu socket
settings? Is this a known issue for solr's distributed search from the past?

Thanks,
Dmitry


Is indexing large documents still an issue?

2013-05-02 Thread adfel70
Hi,
In previous versions of solr, indexing documents with large fields caused
performance degradation.

Is this still the case in solr 4.2?

If so, and I'll need to chunk the document and index many document parts,
can anyony give a general idea of what field/document size solr CAN handle?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is indexing large documents still an issue?

2013-05-02 Thread Bai Shen
The only issue I ran into was returning the content field.  Once I modified
my query to avoid that, I got good performance.

Admittedly, I only have about 15-20k documents in my index ATM, but most of
them are in the multiMB range with a current max of 250MB.


On Thu, May 2, 2013 at 7:05 AM, adfel70 adfe...@gmail.com wrote:

 Hi,
 In previous versions of solr, indexing documents with large fields caused
 performance degradation.

 Is this still the case in solr 4.2?

 If so, and I'll need to chunk the document and index many document parts,
 can anyony give a general idea of what field/document size solr CAN handle?

 thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is indexing large documents still an issue?

2013-05-02 Thread adfel70
Well, returning the content field for highlighting is within my requirements.
Did you solve this in some other way? or you just didn't have to?


Bai Shen wrote
 The only issue I ran into was returning the content field.  Once I
 modified
 my query to avoid that, I got good performance.
 
 Admittedly, I only have about 15-20k documents in my index ATM, but most
 of
 them are in the multiMB range with a current max of 250MB.
 
 
 On Thu, May 2, 2013 at 7:05 AM, adfel70 lt;

 adfel70@

 gt; wrote:
 
 Hi,
 In previous versions of solr, indexing documents with large fields caused
 performance degradation.

 Is this still the case in solr 4.2?

 If so, and I'll need to chunk the document and index many document parts,
 can anyony give a general idea of what field/document size solr CAN
 handle?

 thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-indexing-large-documents-still-an-issue-tp4060425p4060431.html
Sent from the Solr - User mailing list archive at Nabble.com.


Security for inter-solr-node requests

2013-05-02 Thread Furkan KAMACI
Here is a part from wiki:

1) Just forward credentials from the super-request which caused the
inter-solr-node sub-requests
2) Use internal credentials provided to the solr-node by the
administrator at startup

what do you use and is there any code example for it?


Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Otis Gospodnetic
The leader would not be behind replica because the old leader would not
come back and take over the leader role. It would ne just a replica and it
would replicate the index from whichever node is the leader.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I think about such situation:

 Let's assume that I am indexing at my SolrCloud. My leader has a version of
 higher than replica as well (I have one leader and one replica for each
 shard). If I kill leader, replica will be leader as well. When I startup
 old leader again it will be a replica for my shard.

 However I think that leader will have less document than replica and a less
 version than replica. Does it cause a problem because of leader is behing
 of replica?



Re: socket write error

2013-05-02 Thread Dmitry Kan
After some searching around, I see this:

http://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query

Seems like this has happened in the past with large amount of shards.

To make it clear: the distributed search works with 20 shards.


On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi guys!

 We have solr router and shards. I see this in jetty log on the router:

 May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
 executeWithRetry
 INFO: I/O exception (java.net.SocketException) caught when processing
 request: Connection reset by peer: socket write error

 and then:

 May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
 executeWithRetry
 INFO: Retrying request

 followed by exception about Internal Server Error

 any ideas why this happens?

 We run 80+ shards distributed across several servers. Router runs on its
 own node.

 Is there anything in particular I should be looking into wrt ubuntu socket
 settings? Is this a known issue for solr's distributed search from the past?

 Thanks,
 Dmitry



RE: DF is not updated when a document is marked for deletion note

2013-05-02 Thread Markus Jelsma
DF uses maxDoc which is updated when segments merge so DF is almost never 
accurate in a dynamic index.  
 
-Original message-
 From:Furkan KAMACI furkankam...@gmail.com
 Sent: Thu 02-May-2013 14:05
 To: solr-user@lucene.apache.org
 Subject: DF is not updated when a document is marked for deletion note
 
 When I look at here: http://localhost:8983/solr/admin/luke
 
 I see that Note: Document Frequency (df) is not updated when a document is
 marked for deletion. df values include deleted documents.
 
 is it something like I should care?
 


Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Furkan KAMACI
Thanks for the answer. This is what I try to say:

time = t
Node A (Leader):  version is 100
Node B (Replica): version is 90

time = t+1
Node A (Killing):  version is 100 and killed
Node B (Replica): version is 90

time = t+2
Node A (Killed):  version is 100 and killed
Node B (Become Leader): version is 95 (we indexed something)

time = t+3
Node A (Started as Replica):  version is 100 and live
Node B (Leader): version is 95

so I think that leader will behind replica. Is there anything different for
such scenario?



2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com

 The leader would not be behind replica because the old leader would not
 come back and take over the leader role. It would ne just a replica and it
 would replicate the index from whichever node is the leader.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  I think about such situation:
 
  Let's assume that I am indexing at my SolrCloud. My leader has a version
 of
  higher than replica as well (I have one leader and one replica for each
  shard). If I kill leader, replica will be leader as well. When I startup
  old leader again it will be a replica for my shard.
 
  However I think that leader will have less document than replica and a
 less
  version than replica. Does it cause a problem because of leader is behing
  of replica?
 



Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Raymond Wiker
If you're using zookeeper, this should not be allowed to happen (I think).


On Thu, May 2, 2013 at 2:12 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Thanks for the answer. This is what I try to say:

 time = t
 Node A (Leader):  version is 100
 Node B (Replica): version is 90

 time = t+1
 Node A (Killing):  version is 100 and killed
 Node B (Replica): version is 90

 time = t+2
 Node A (Killed):  version is 100 and killed
 Node B (Become Leader): version is 95 (we indexed something)

 time = t+3
 Node A (Started as Replica):  version is 100 and live
 Node B (Leader): version is 95

 so I think that leader will behind replica. Is there anything different for
 such scenario?



 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com

  The leader would not be behind replica because the old leader would not
  come back and take over the leader role. It would ne just a replica and
 it
  would replicate the index from whichever node is the leader.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 
   I think about such situation:
  
   Let's assume that I am indexing at my SolrCloud. My leader has a
 version
  of
   higher than replica as well (I have one leader and one replica for each
   shard). If I kill leader, replica will be leader as well. When I
 startup
   old leader again it will be a replica for my shard.
  
   However I think that leader will have less document than replica and a
  less
   version than replica. Does it cause a problem because of leader is
 behing
   of replica?
  
 



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com wrote:

 RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
 days.


How can I see the Changelog of what will be in it?

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Yago Riveiro
The road map has this release note, but I think that most of it will be move to 
4.3.1 or 4.4

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128
 

Regards

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 2, 2013 at 2:56 PM, Andy Lester wrote:

 
 On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com 
 (mailto:j...@basetechnology.com) wrote:
 
  RC4 of 4.3 is available now. The final release of 4.3 is likely to be 
  within days.
 
 
 How can I see the Changelog of what will be in it?
 
 Thanks,
 xoa
 
 --
 Andy Lester = a...@petdance.com = www.petdance.com 
 (http://www.petdance.com) = AIM:petdance
 
 




Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:03 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 The road map has this release note, but I think that most of it will be move 
 to 4.3.1 or 4.4
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128
  

So, is there a way I can see what is currently pending to go in 4.3?

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Yago Riveiro
In attachment the change log of solr 4.3 RC3  

Regards

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 2, 2013 at 3:06 PM, Andy Lester wrote:

 
 On May 2, 2013, at 9:03 AM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
  The road map has this release note, but I think that most of it will be 
  move to 4.3.1 or 4.4
  
  https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128
   
 
 So, is there a way I can see what is currently pending to go in 4.3?
 
 --
 Andy Lester = a...@petdance.com = www.petdance.com 
 (http://www.petdance.com) = AIM:petdance
 
 




Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 In attachment the change log of solr 4.3 RC3 
 


And where would I find that?  I don't see anything at 
http://lucene.apache.org/solr/downloads.html to download?  Do I need to check 
out Subversion repo?  Is there a page somewhere that describes the process set 
up?

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Alexandre Rafalovitch
Hopefully, this is not a secret, but the RCs are built and available
for download and announced on the dev mailing list.

So, the changes for RC4 (not RC3 anymore) are here:
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC4-rev1477023/solr/changes/Changes.html

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 2, 2013 at 10:14 AM, Andy Lester a...@petdance.com wrote:

 On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 In attachment the change log of solr 4.3 RC3



 And where would I find that?  I don't see anything at 
 http://lucene.apache.org/solr/downloads.html to download?  Do I need to check 
 out Subversion repo?  Is there a page somewhere that describes the process 
 set up?

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Yago Riveiro
I get the RC3 zip file from the the mailing list. 

The 4.3 has not yet been released. Therefore you can't download it from the 
regular channels 

Regards

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 2, 2013 at 3:14 PM, Andy Lester wrote:

 
 On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
  In attachment the change log of solr 4.3 RC3 
 
 
 And where would I find that? I don't see anything at 
 http://lucene.apache.org/solr/downloads.html to download? Do I need to check 
 out Subversion repo? Is there a page somewhere that describes the process set 
 up?
 
 --
 Andy Lester = a...@petdance.com = www.petdance.com 
 (http://www.petdance.com) = AIM:petdance
 
 




Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Hopefully, this is not a secret, but the RCs are built and available
 for download and announced on the dev mailing list.


Thanks for the link.

I don't think it's a secret, but I sure don't see anything that says This is 
how the dev process works.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: SolrJ / Solr Two Phase Commit

2013-05-02 Thread Michael Della Bitta
One thing I do know is that commits in Solr are global, so there's no way
to do this with concurrency.

That being said, Solr doesn't tend to accept updates that would generate
errors once committed in my experience.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, May 2, 2013 at 3:06 AM, mark12345 marks1900-pos...@yahoo.com.auwrote:

 I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit.
 Any examples?  Any best practices?

 What I know:
 * Lucene offers Two Phase Commitvia it's index writer
 (prepareCommit()
 followed by either commit() or rollback()).


 http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html

 * I know Solr Optimistic Concurrency is available.
 http://yonik.com/solr/optimistic-concurrency/
 http://yonik.com/solr/optimistic-concurrency/


 I want a transactional behavior that ensures that there is a full commit or
 full rollback of multiple documents.  I do not want to be in a situation
 where I don't know if the beans have been written or not written to the
 Solr
 instance.

 * Code Snippet

  try {
UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
  docTwo));
successForAddingDocuments = (updateResponse.getStatus() == 0);
if (successForAddingDocuments) {
UpdateResponse updateResponseForCommit = server.commit();
successForCommit = (updateResponseForCommit.getStatus() ==
 0);
}
  } catch (Exception e) {
  } finally {
if (!successForCommit) {
System.err.println(Rolling back transaction.);
try {
UpdateResponse updateResponseForRollback =
 server.rollback();
if (updateResponseForRollback.getStatus() == 0) {
successForRollback = true;
} else {
successForRollback = false;
System.err.println(Failed to rollback!
  Bad as state is now
  unknown!);
}
} catch (Exception e) {
}
}
  }

 * Full Test class

  @Test
  public void documentTransactionTest() {
 
try {
// HttpSolrServer server = ...
server.deleteById(Arrays.asList(1, 2));
server.commit();
} catch (Exception e) {
e.printStackTrace();
}
 
SolrInputDocument docOne = new SolrInputDocument();
{
docOne.addField(id, 1L);
docOne.addField(type_s, MyTestDoc);
docOne.addField(value_s, docOne);
docOne.addField(_version_, -1L);
}
 
SolrInputDocument docTwo = new SolrInputDocument();
{
docTwo.addField(id, 2L);
docTwo.addField(type_s, MyTestDoc);
docTwo.addField(value_s, docTwo);
docTwo.addField(_version_, -1L);
}
 
boolean successForAddingDocuments = false;
boolean successForCommit = false;
boolean successForRollback = false;
 
  //throw new SolrServerException(Connection Broken);
 
try {
UpdateResponse updateResponse =
 server.add(Arrays.asList(docOne,
  docTwo));
successForAddingDocuments = (updateResponse.getStatus() ==
 0);
if (successForAddingDocuments) {
UpdateResponse updateResponseForCommit =
 server.commit();
successForCommit =
 (updateResponseForCommit.getStatus() == 0);
}
} catch (Exception e) {
} finally {
if (!successForCommit) {
System.err.println(Rolling back transaction.);
try {
UpdateResponse updateResponseForRollback =
 server.rollback();
if (updateResponseForRollback.getStatus()
 == 0) {
successForRollback = true;
} else {
successForRollback = false;
System.err.println(Failed to
 rollback!  Bad as state is now
  unknown!);
}
} catch (Exception e) {
}
}
}
 
{
try {
QueryResponse response = server.query(new
  SolrQuery(*:*).addFilterQuery(type_s:MyTestDoc));
 
if (successForCommit) {
Assert.assertEquals(2,
 response.getResults().size());
Assert.assertEquals(docOne,
  response.getResults().get(0).get(value_s));

Re: Any estimation for solr 4.3?

2013-05-02 Thread omar jaafor
Hi everyone,

I am working on an internal project in my company that requires solr, but I
could not manage to link it to Tika. I bought the apache solr 4 cookbook
yet I couldn't figure out the solution.

1) I copied the required jar files into a lib directory 2) I added the lib
directory in solrconfig.xml

when I remove start=lazy in requesthandler=update/extract I get the
following error:

org.apache.solr.common.SolrException: RequestHandler init failure ... ...
Caused by: org.apache.solr.common.SolrException: RequestHandler init
failure ... ... Caused by: org.apache.solr.common.SolrException: Error
loading class 'solr.extraction.ExtractingRequestHandler' ... ... Caused by:
java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHandler

I would highly apreciate any help.

thank you very much


2013/5/2 Andy Lester a...@petdance.com


 On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  Hopefully, this is not a secret, but the RCs are built and available
  for download and announced on the dev mailing list.


 Thanks for the link.

 I don't think it's a secret, but I sure don't see anything that says This
 is how the dev process works.

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance




Re: Any estimation for solr 4.3?

2013-05-02 Thread Alexandre Rafalovitch
On Thu, May 2, 2013 at 10:23 AM, Andy Lester a...@petdance.com wrote:
 I don't think it's a secret, but I sure don't see anything that says This is 
 how the dev process works.

I suspect this is somewhere in the Apache operating charter/standard
operating procedures for all projects and we (Solr users/developers)
are just 4-meta-levels downstream of that. But I agree, it could make
for a nice mini-page on the Wiki (if it is ever up).

Regards,
   Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: commit in solr4 takes a longer time

2013-05-02 Thread Walter Underwood
First, I would upgrade to 4.2.1 and remember to change luceneMatchVersion to 
LUCENE_42.

There were a LOT of fixes between 4.0 and 4.2.1.

wunder

On May 2, 2013, at 12:16 AM, vicky desai wrote:

 Hi,
 
 I am using 1 shard and two replicas. Document size is around 6 lakhs 
 
 
 My solrconfig.xml is as follows
 ?xml version=1.0 encoding=UTF-8 ?
 config
   luceneMatchVersionLUCENE_40/luceneMatchVersion
   indexConfig
 
   
   maxFieldLength2147483647/maxFieldLength
   lockTypesimple/lockType
   unlockOnStartuptrue/unlockOnStartup
   /indexConfig
   updateHandler class=solr.DirectUpdateHandler2
   autoSoftCommit
   maxDocs500/maxDocs
   maxTime1000/maxTime
   /autoSoftCommit
   autoCommit
   maxDocs5/maxDocs 
   maxTime30/maxTime 
   /autoCommit
   /updateHandler
 
   requestDispatcher handleSelect=true 
   requestParsers enableRemoteStreaming=false
 multipartUploadLimitInKB=204800 /
   /requestDispatcher
 
   requestHandler name=standard class=solr.StandardRequestHandler
 default=true /
   requestHandler name=/update class=solr.UpdateRequestHandler /
   requestHandler name=/admin/
 class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/replication class=solr.ReplicationHandler /
   directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / 
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   admin
   defaultQuery*:*/defaultQuery
   /admin 
 /config
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
 Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wun...@wunderwood.org





Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-02 Thread Glen Newton
$100 for anyone who gets me a working Long.MAX_VALUE branch!  ;-)

I know that for many of the SOLR with faceting use cases, things will
not scale to Long documents, but there are a number of more
straightforward use cases, where SOLR/Lucene will scale to Long. Like
simple searches, small numbers of terms, etc.

For these, getting Long in place sooner rather than later would be
appreciated by some of us!  :-)

-glen

On Thu, May 2, 2013 at 4:20 AM, Jack Krupansky j...@basetechnology.com wrote:
 The Integer.MAX_VALUE-1 limit is set by Lucene. As hardware capacity and
 performance continues to advance, I think it's only a matter of time before
 Lucene (and then Solr) relaxes the limit, but I don't imagine it will
 happened real soon. Maybe in Lucene/Solr 6.0?

 -- Jack Krupansky

 -Original Message- From: Valery Giner
 Sent: Wednesday, May 01, 2013 1:36 PM
 To: solr-user@lucene.apache.org
 Subject: any plans to remove int32 limitation on the number of the documents
 in the index?


 Dear Solr Developers,

 I've been unable to find an answer to the question in the subject line
 of this e-mail, except of a vague one.

 We need to be able to index over 2bln+ documents.   We were doing well
 without sharding until the number of docs hit the limit ( 2bln+).   The
 performance was satisfactory for the queries, updates and indexing of
 new documents.

 That is, except for the need to go around the int32 limit, we don't
 really have a need for setting up distributed solr.

 I wonder whether some one on the solr team could tell us when/what
 version of solr we could expect the limit to be removed.

 I hope this question may be of interest to some one else :)

 --
 Thanks,
 Val



-- 
-
http://zzzoot.blogspot.com/
-


Re: commit in solr4 takes a longer time

2013-05-02 Thread Gopal Patwa
you might want to added openSearcher=false for hard commit, so hard commit
also act like soft commit

   autoCommit
maxDocs5/maxDocs
maxTime30/maxTime
   openSearcherfalse/openSearcher
/autoCommit



On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.comwrote:

 Hi,

 I am using 1 shard and two replicas. Document size is around 6 lakhs


 My solrconfig.xml is as follows
 ?xml version=1.0 encoding=UTF-8 ?
 config
 luceneMatchVersionLUCENE_40/luceneMatchVersion
 indexConfig


 maxFieldLength2147483647/maxFieldLength
 lockTypesimple/lockType
 unlockOnStartuptrue/unlockOnStartup
 /indexConfig
 updateHandler class=solr.DirectUpdateHandler2
 autoSoftCommit
 maxDocs500/maxDocs
 maxTime1000/maxTime
 /autoSoftCommit
 autoCommit
 maxDocs5/maxDocs
 maxTime30/maxTime
 /autoCommit
 /updateHandler

 requestDispatcher handleSelect=true 
 requestParsers enableRemoteStreaming=false
 multipartUploadLimitInKB=204800 /
 /requestDispatcher

 requestHandler name=standard class=solr.StandardRequestHandler
 default=true /
 requestHandler name=/update class=solr.UpdateRequestHandler /
 requestHandler name=/admin/
 class=org.apache.solr.handler.admin.AdminHandlers /
 requestHandler name=/replication
 class=solr.ReplicationHandler /
 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /
 enableLazyFieldLoadingtrue/enableLazyFieldLoading
 admin
 defaultQuery*:*/defaultQuery
 /admin
 /config




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: commit in solr4 takes a longer time

2013-05-02 Thread Furkan KAMACI
What happens exactly when you don't open searcher at commit?

2013/5/2 Gopal Patwa gopalpa...@gmail.com

 you might want to added openSearcher=false for hard commit, so hard commit
 also act like soft commit

autoCommit
 maxDocs5/maxDocs
 maxTime30/maxTime
openSearcherfalse/openSearcher
 /autoCommit



 On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com
 wrote:

  Hi,
 
  I am using 1 shard and two replicas. Document size is around 6 lakhs
 
 
  My solrconfig.xml is as follows
  ?xml version=1.0 encoding=UTF-8 ?
  config
  luceneMatchVersionLUCENE_40/luceneMatchVersion
  indexConfig
 
 
  maxFieldLength2147483647/maxFieldLength
  lockTypesimple/lockType
  unlockOnStartuptrue/unlockOnStartup
  /indexConfig
  updateHandler class=solr.DirectUpdateHandler2
  autoSoftCommit
  maxDocs500/maxDocs
  maxTime1000/maxTime
  /autoSoftCommit
  autoCommit
  maxDocs5/maxDocs
  maxTime30/maxTime
  /autoCommit
  /updateHandler
 
  requestDispatcher handleSelect=true 
  requestParsers enableRemoteStreaming=false
  multipartUploadLimitInKB=204800 /
  /requestDispatcher
 
  requestHandler name=standard
 class=solr.StandardRequestHandler
  default=true /
  requestHandler name=/update class=solr.UpdateRequestHandler
 /
  requestHandler name=/admin/
  class=org.apache.solr.handler.admin.AdminHandlers /
  requestHandler name=/replication
  class=solr.ReplicationHandler /
  directoryFactory name=DirectoryFactory
  class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  admin
  defaultQuery*:*/defaultQuery
  /admin
  /config
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: commit in solr4 takes a longer time

2013-05-02 Thread Alexandre Rafalovitch
If you don't re-open the searcher, you will not see new changes. So,
if you only have hard commit, you never see those changes (until
restart). But if you also have soft commit enabled, that will re-open
your searcher for you.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 What happens exactly when you don't open searcher at commit?

 2013/5/2 Gopal Patwa gopalpa...@gmail.com

 you might want to added openSearcher=false for hard commit, so hard commit
 also act like soft commit

autoCommit
 maxDocs5/maxDocs
 maxTime30/maxTime
openSearcherfalse/openSearcher
 /autoCommit



 On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com
 wrote:

  Hi,
 
  I am using 1 shard and two replicas. Document size is around 6 lakhs
 
 
  My solrconfig.xml is as follows
  ?xml version=1.0 encoding=UTF-8 ?
  config
  luceneMatchVersionLUCENE_40/luceneMatchVersion
  indexConfig
 
 
  maxFieldLength2147483647/maxFieldLength
  lockTypesimple/lockType
  unlockOnStartuptrue/unlockOnStartup
  /indexConfig
  updateHandler class=solr.DirectUpdateHandler2
  autoSoftCommit
  maxDocs500/maxDocs
  maxTime1000/maxTime
  /autoSoftCommit
  autoCommit
  maxDocs5/maxDocs
  maxTime30/maxTime
  /autoCommit
  /updateHandler
 
  requestDispatcher handleSelect=true 
  requestParsers enableRemoteStreaming=false
  multipartUploadLimitInKB=204800 /
  /requestDispatcher
 
  requestHandler name=standard
 class=solr.StandardRequestHandler
  default=true /
  requestHandler name=/update class=solr.UpdateRequestHandler
 /
  requestHandler name=/admin/
  class=org.apache.solr.handler.admin.AdminHandlers /
  requestHandler name=/replication
  class=solr.ReplicationHandler /
  directoryFactory name=DirectoryFactory
  class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  admin
  defaultQuery*:*/defaultQuery
  /admin
  /config
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: commit in solr4 takes a longer time

2013-05-02 Thread Sandeep Mestry
Hi Vicky,

I faced this issue as well and after some playing around I found the
autowarm count in cache sizes to be a problem.
I changed that from a fixed count (3072) to percentage (10%) and all commit
times were stable then onwards.

filterCache class=solr.FastLRUCache size=8192 initialSize=3072
autowarmCount=10% /
queryResultCache class=solr.LRUCache size=16384 initialSize=3072
autowarmCount=10% /
documentCache class=solr.LRUCache size=8192 initialSize=4096
autowarmCount=10% /

HTH,
Sandeep


On 2 May 2013 16:31, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If you don't re-open the searcher, you will not see new changes. So,
 if you only have hard commit, you never see those changes (until
 restart). But if you also have soft commit enabled, that will re-open
 your searcher for you.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  What happens exactly when you don't open searcher at commit?
 
  2013/5/2 Gopal Patwa gopalpa...@gmail.com
 
  you might want to added openSearcher=false for hard commit, so hard
 commit
  also act like soft commit
 
 autoCommit
  maxDocs5/maxDocs
  maxTime30/maxTime
 openSearcherfalse/openSearcher
  /autoCommit
 
 
 
  On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com
  wrote:
 
   Hi,
  
   I am using 1 shard and two replicas. Document size is around 6 lakhs
  
  
   My solrconfig.xml is as follows
   ?xml version=1.0 encoding=UTF-8 ?
   config
   luceneMatchVersionLUCENE_40/luceneMatchVersion
   indexConfig
  
  
   maxFieldLength2147483647/maxFieldLength
   lockTypesimple/lockType
   unlockOnStartuptrue/unlockOnStartup
   /indexConfig
   updateHandler class=solr.DirectUpdateHandler2
   autoSoftCommit
   maxDocs500/maxDocs
   maxTime1000/maxTime
   /autoSoftCommit
   autoCommit
   maxDocs5/maxDocs
   maxTime30/maxTime
   /autoCommit
   /updateHandler
  
   requestDispatcher handleSelect=true 
   requestParsers enableRemoteStreaming=false
   multipartUploadLimitInKB=204800 /
   /requestDispatcher
  
   requestHandler name=standard
  class=solr.StandardRequestHandler
   default=true /
   requestHandler name=/update
 class=solr.UpdateRequestHandler
  /
   requestHandler name=/admin/
   class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/replication
   class=solr.ReplicationHandler /
   directoryFactory name=DirectoryFactory
   class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   admin
   defaultQuery*:*/defaultQuery
   /admin
   /config
  
  
  
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 



solr master server

2013-05-02 Thread Torsten Albrecht
Hi,

I want to set up a master / slave configuration for solr 3.6

Is there a best practice for the Raid config and the Linux partitions for the 
master server?

Cheers,

Torsten

Re: How to deal with cache for facet search when index is always increment?

2013-05-02 Thread Daniel Tyreus
On Wed, May 1, 2013 at 7:01 PM, 李威 li...@antvision.cn wrote:


 For facet seach, solr would create cache which is based on the whole docs.
 If I import a new doc into index, the cache would out of time and need to
 create again.
 For real time seach, the docs would be import to index anytime. In this
 case, the cache is nealy always need to create again, which cause the facet
 seach is very slowly.
 Do you have any idea to deal with such problem?


We're in a similar situation and have had better performance using
facet.method=fcs.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

The suggestion to use soft commits is also a good one.

Best regards,
Daniel


Re: string field does not yield exact match result using qf parameter

2013-05-02 Thread kirpakaroji
Hi Jan

my question is when I tweak pf and qf parameter and the results change
slightly and I do not think for exact match you need to implement the
solution that you mentioned in your reply. you can always have string field
and in your pf parameter you can boost that field to get the exact match
results on top.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not In query

2013-05-02 Thread André Maldonado
Hi Jan. Thank's again for your reply.

You're right. It is almost impossible to an user exclude 200.000 documents.

I'll do some tests with NOT IN query.

Thank you again.

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado



On Tue, Apr 30, 2013 at 6:09 PM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 How, practically would a user end up with 200.000 documents excluded? Is
 there some way in your application to exclude categories of documents
 with one click? If so, I would index those category IDs on all docs in that
 category, and then do fq=-cat:123 instead of adding all the individual
 docids. Anyway, I'd start with the simple approach and then optimize once
 you (perhaps, perhaps not) bump into problems. Most likely it will work
 like a charm :)

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 30. apr. 2013 kl. 16:21 skrev André Maldonado andre.maldon...@gmail.com:

  Thank's Jan for your reply.
 
  My application has thousands of users and I don't know yet how many of
 them
  will use this feature. They can exclude one document from their search
  results or can exclude 200.000 documents. It's much more natural that
 they
  exclude something like 50~300 documents. More than this will be strange.
 
  However, I don't know how cache will work because we have a large number
 of
  users who can use this feature. Even that query for user 1 be cached, it
  won't work for other users.
 
  Do you see another solution for this case?
 
  Thank's
 
 
 
  *
 
 --
  *
  *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*
 
  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
  (11) 9112-4227
 
  http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
  http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
  http://www.facebook.com/profile.php?id=10659376883
   http://twitter.com/andremaldonado 
 http://www.delicious.com/andre.maldonado
   https://profiles.google.com/105605760943701739931
  http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
   http://www.youtube.com/andremaldonado
 
 
 
  On Fri, Apr 26, 2013 at 6:18 PM, Jan Høydahl jan@cominvent.com
 wrote:
 
  I would start with the way you propose, a negative filter
 
  q=foo barfq=-id:(123 729 640 112...)
 
  This will effectively hide those doc ids, and a benefit is that it is
  cached so if the list of ids is long, you'll only take the performance
 hit
  the first time. I don't know your application, but if it is highly
 likely
  that a single user will add excludes for several thousand ids then you
  should perhaps consider other options and benchmark up front.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  26. apr. 2013 kl. 21:50 skrev André Maldonado 
 andre.maldon...@gmail.com:
 
  Hi all.
 
  We have an index with 300.000 documents and a lot, a lot of fields.
 
  We're planning a module where users will choose some documents to
 exclude
  from their search results. So, these documents will be excluded for
 UserA
  and visible for UserB.
 
  So, we have some options to do this. The simplest way is to do a Not
 In
  query in document id. But we don't know the performance impact this
 will
  have. Is this an option?
 
  There is another reasonable way to accomplish this?
 
  Thank's
 
  *
 
 
 --
  *
  *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*
 
  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
  (11) 9112-4227
 
  http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
  http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
  http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado 
  http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
  http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado
 
 




Re: socket write error

2013-05-02 Thread Patanachai Tangchaisin

Hi,

First, which version of Solr are you using?

I also has 60 shards+ on Solr 4.2.1 and it doesn't seems to be a problem
for me.

- Make sure you use POST to send a query to Solr.
- 'connection reset by peer' from client can indicate that there is
something wrong with server e.g. server closes a connection etc.

--
Patanachai

On 05/02/2013 05:05 AM, Dmitry Kan wrote:

After some searching around, I see this:

http://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query

Seems like this has happened in the past with large amount of shards.

To make it clear: the distributed search works with 20 shards.


On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote:


Hi guys!

We have solr router and shards. I see this in jetty log on the router:

May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.SocketException) caught when processing
request: Connection reset by peer: socket write error

and then:

May 02, 2013 1:30:22 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request

followed by exception about Internal Server Error

any ideas why this happens?

We run 80+ shards distributed across several servers. Router runs on its
own node.

Is there anything in particular I should be looking into wrt ubuntu socket
settings? Is this a known issue for solr's distributed search from the past?

Thanks,
Dmitry




CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



Re: Random IllegalStateExceptions on Solr slave (3.6.1)

2013-05-02 Thread Erick Erickson
My first guess would be that your tomcat container timeouts need to be
lengthened, but that's mostly a guess based on the socket timeout
error message. Not sure where in Tomcat that needs to be configured
though...

Best
Erick

On Tue, Apr 30, 2013 at 12:37 PM, Arun Rangarajan
arunrangara...@gmail.com wrote:
 We have a master-slave Solr set up and run live queries only against the
 slave. Full import (with optimize) happens on master every day at 2 a.m.
 Delta imports happen every 10 min for one entity and every hour for another
 entity.

 The following exceptions occur a few times every day in our app logs:

 ...
 org.apache.solr.client.solrj.SolrServerException: Error executing query
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
 at
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 ...

 and this one:
 ...
 org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
 Connection reset
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
 at
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 ...

 These happen on different queries at different times, so they are not
 query-dependent.

 If I inspect the localhost.log file in tomcat on the solr slave server, the
 following exception occurs at the same time:

 Apr 06, 2013 7:16:33 AM org.apache.catalina.core.StandardWrapperValve invoke
 SEVERE: Servlet.service() for servlet default threw exception
 java.lang.IllegalStateException
 at
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:722)
 ...


 The replication set-up is as follows:

 requestHandler name=/replication class=solr.ReplicationHandler 
   lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfterstartup/str
 str name=replicateAftercommit/str
 str name=replicateAfteroptimize/str
 str
 name=confFilessolrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml/str
   /lst
   lst name=slave
 str name=enable${enable.slave:false}/str
 str name=masterUrlhttp://${master.ip}:${master.port}/solr/${
 solr.core.name}/replication/str
 str name=pollInterval00:01:00/str
   /lst
   /requestHandler

 Aside from the full and delta imports, we also have external file fields
 which are loaded every 1 hour with reloadCache on both master and slave.

 What is causing these exceptions and how do I fix it?

 Thanks.


Re: EmbeddedSolrServer

2013-05-02 Thread Erick Erickson
because some of the underlying classes in SolrJ try to communicate
with Zookeeper to intelligently route requests to leaders.

It looks like you don't have your classpath pointed at the
dist/solrj-lib, at least that would be my first guess...

Best
Erick

On Wed, May 1, 2013 at 7:51 AM, Peri Subrahmanya
peri.subrahma...@htcinc.com wrote:
 I m trying to use the EmbeddedSolrServer and here is my sample code:

 CoreContainer.Initializer initializer = new CoreContainer.Initializer();
 CoreContainer coreContainer = initializer.initialize();
 EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );

 Upon running I get the following exception - java.lang.NoClassDefFoundError:
 org/apache/solr/common/cloud/ZooKeeperException.

 I m not sure why its complaining about ZooKeeper. Any ideas please?

 Thank you,
 Peri Subrahmanya




 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
 recipient, please delete without copying and kindly advise us by e-mail of 
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
 Services to any order or other contract unless pursuant to explicit written 
 agreement or government initiative expressly permitting the use of e-mail for 
 such purpose.


Re: solr master server

2013-05-02 Thread Otis Gospodnetic
None that I know of. But if you hit issues I know where you can get help!
:)

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 2, 2013 12:41 PM, Torsten Albrecht tors...@soahc.eu wrote:

 Hi,

 I want to set up a master / slave configuration for solr 3.6

 Is there a best practice for the Raid config and the Linux partitions for
 the master server?

 Cheers,

 Torsten


Exception with Replication in solr 3.6

2013-05-02 Thread gpssolr2020
Hi,

We have one master and 2 slaves with solr3.6. The below messages are logged
in solr log .


ERROR: Master at: http://server:port/solr/pe/replication is not available.
Index fetch failed. Exception: Connection reset

ERROR: Master at: http://server:port/solr/pe/replication is not available.
Index fetch failed. Exception: Read timed out

What does it mean? 

We are not getting these messages frequently.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-with-Replication-in-solr-3-6-tp4060514.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-02 Thread Otis Gospodnetic
Val,

Haven't seen this mentioned in a while...

I'm curious...what sort of index, queries, hardware, and latency
requirements do you have?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 1, 2013 4:36 PM, Valery Giner valgi...@research.att.com wrote:

 Dear Solr Developers,

 I've been unable to find an answer to the question in the subject line of
 this e-mail, except of a vague one.

 We need to be able to index over 2bln+ documents.   We were doing well
 without sharding until the number of docs hit the limit ( 2bln+).   The
 performance was satisfactory for the queries, updates and indexing of new
 documents.

 That is, except for the need to go around the int32 limit, we don't really
 have a need for setting up distributed solr.

 I wonder whether some one on the solr team could tell us when/what version
 of solr we could expect the limit to be removed.

 I hope this question may be of interest to some one else :)

 --
 Thanks,
 Val




Re: EmbeddedSolrServer

2013-05-02 Thread Alexandre Rafalovitch
Actually, I found it very hard to figure out the exact Jar
requirements for SolrJ. I ended up basically pointing at expanded
webapp's lib directory, which is a total overkill.

Would be nice to have some specific guidance on this issue.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 2, 2013 at 3:17 PM, Erick Erickson erickerick...@gmail.com wrote:
 because some of the underlying classes in SolrJ try to communicate
 with Zookeeper to intelligently route requests to leaders.

 It looks like you don't have your classpath pointed at the
 dist/solrj-lib, at least that would be my first guess...

 Best
 Erick

 On Wed, May 1, 2013 at 7:51 AM, Peri Subrahmanya
 peri.subrahma...@htcinc.com wrote:
 I m trying to use the EmbeddedSolrServer and here is my sample code:

 CoreContainer.Initializer initializer = new CoreContainer.Initializer();
 CoreContainer coreContainer = initializer.initialize();
 EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );

 Upon running I get the following exception - java.lang.NoClassDefFoundError:
 org/apache/solr/common/cloud/ZooKeeperException.

 I m not sure why its complaining about ZooKeeper. Any ideas please?

 Thank you,
 Peri Subrahmanya




 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
 recipient, please delete without copying and kindly advise us by e-mail of 
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
 Global Services to any order or other contract unless pursuant to explicit 
 written agreement or government initiative expressly permitting the use of 
 e-mail for such purpose.


Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Otis Gospodnetic
Hi,

Can you actually make this happen?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 2, 2013 8:12 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 Thanks for the answer. This is what I try to say:

 time = t
 Node A (Leader):  version is 100
 Node B (Replica): version is 90

 time = t+1
 Node A (Killing):  version is 100 and killed
 Node B (Replica): version is 90

 time = t+2
 Node A (Killed):  version is 100 and killed
 Node B (Become Leader): version is 95 (we indexed something)

 time = t+3
 Node A (Started as Replica):  version is 100 and live
 Node B (Leader): version is 95

 so I think that leader will behind replica. Is there anything different for
 such scenario?



 2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com

  The leader would not be behind replica because the old leader would not
  come back and take over the leader role. It would ne just a replica and
 it
  would replicate the index from whichever node is the leader.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 
   I think about such situation:
  
   Let's assume that I am indexing at my SolrCloud. My leader has a
 version
  of
   higher than replica as well (I have one leader and one replica for each
   shard). If I kill leader, replica will be leader as well. When I
 startup
   old leader again it will be a replica for my shard.
  
   However I think that leader will have less document than replica and a
  less
   version than replica. Does it cause a problem because of leader is
 behing
   of replica?
  
 



Re: Exception with Replication in solr 3.6

2013-05-02 Thread Otis Gospodnetic
Looks like a network issue, especially if this is not happening
consistently.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 2, 2013 3:42 PM, gpssolr2020 psgoms...@gmail.com wrote:

 Hi,

 We have one master and 2 slaves with solr3.6. The below messages are logged
 in solr log .


 ERROR: Master at: http://server:port/solr/pe/replication is not available.
 Index fetch failed. Exception: Connection reset

 ERROR: Master at: http://server:port/solr/pe/replication is not available.
 Index fetch failed. Exception: Read timed out

 What does it mean?

 We are not getting these messages frequently.

 Thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exception-with-Replication-in-solr-3-6-tp4060514.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DF is not updated when a document is marked for deletion note

2013-05-02 Thread Otis Gospodnetic
Nah, not until and IF you see issues. Most users are not even aware of this.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 2, 2013 8:05 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 When I look at here: http://localhost:8983/solr/admin/luke

 I see that Note: Document Frequency (df) is not updated when a document is
 marked for deletion. df values include deleted documents.

 is it something like I should care?



Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-02 Thread Valery Giner

Otis,

The documents themselves are relatively small, tens of fields, only a 
few of them could be up to a hundred bytes.

Lunix Servers with relatively large RAM (256),
Minutes on the searches are fine for our purposes,  adding a few tens of 
millions of records in tens of minutes are also fine.
We had to do some simple tricks for keeping indexing up to speed but 
nothing too fancy.
Moving to the sharding adds a layer of complexity which we don't really 
need because of the above, ... and adding complexity may result in lower 
reliability :)


Thanks,
Val

On 05/02/2013 03:41 PM, Otis Gospodnetic wrote:

Val,

Haven't seen this mentioned in a while...

I'm curious...what sort of index, queries, hardware, and latency
requirements do you have?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 1, 2013 4:36 PM, Valery Giner valgi...@research.att.com wrote:


Dear Solr Developers,

I've been unable to find an answer to the question in the subject line of
this e-mail, except of a vague one.

We need to be able to index over 2bln+ documents.   We were doing well
without sharding until the number of docs hit the limit ( 2bln+).   The
performance was satisfactory for the queries, updates and indexing of new
documents.

That is, except for the need to go around the int32 limit, we don't really
have a need for setting up distributed solr.

I wonder whether some one on the solr team could tell us when/what version
of solr we could expect the limit to be removed.

I hope this question may be of interest to some one else :)

--
Thanks,
Val






Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Furkan KAMACI
Hi Otis;

I see that at my admin page:

Replication (Slave)  Version  GenSize
Master:  1367307652512   82  778.04 MB
Slave:   1367307658862   82  781.05 MB

and I started to figure about it so that's why I asked this question.


2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 Can you actually make this happen?

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On May 2, 2013 8:12 AM, Furkan KAMACI furkankam...@gmail.com wrote:

  Thanks for the answer. This is what I try to say:
 
  time = t
  Node A (Leader):  version is 100
  Node B (Replica): version is 90
 
  time = t+1
  Node A (Killing):  version is 100 and killed
  Node B (Replica): version is 90
 
  time = t+2
  Node A (Killed):  version is 100 and killed
  Node B (Become Leader): version is 95 (we indexed something)
 
  time = t+3
  Node A (Started as Replica):  version is 100 and live
  Node B (Leader): version is 95
 
  so I think that leader will behind replica. Is there anything different
 for
  such scenario?
 
 
 
  2013/5/2 Otis Gospodnetic otis.gospodne...@gmail.com
 
   The leader would not be behind replica because the old leader would not
   come back and take over the leader role. It would ne just a replica and
  it
   would replicate the index from whichever node is the leader.
  
   Otis
   Solr  ElasticSearch Support
   http://sematext.com/
   On Apr 29, 2013 5:31 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  
I think about such situation:
   
Let's assume that I am indexing at my SolrCloud. My leader has a
  version
   of
higher than replica as well (I have one leader and one replica for
 each
shard). If I kill leader, replica will be leader as well. When I
  startup
old leader again it will be a replica for my shard.
   
However I think that leader will have less document than replica and
 a
   less
version than replica. Does it cause a problem because of leader is
  behing
of replica?
   
  
 



transientCacheSize doesn't seem to have any effect, except on startup

2013-05-02 Thread didier deshommes
Hi,
I've been very interested in the transient core feature of solr to manage a
large number of cores. I'm especially interested in this use case, that the
wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down
now):

loadOnStartup=false transient=true: This is really the use-case. There are
a large number of cores in your system that are short-duration use. You
want Solr to load them as necessary, but unload them when the cache gets
full on an LRU basis.

I'm creating 10 transient core via core admin like so

$ curl 
http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false


and have transientCacheSize=2 in my solr.xml file, which I take means I
should have at most 2 transient cores loaded at any time. The problem is
that these cores are still loaded when when I ask solr to list cores:

$ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status;

From the explanation in the wiki, it looks like solr would manage loading
and unloading transient cores for me without having to worry about them,
but this is not what's happening.

The situation is different when I restart solr; it does the right thing
by loading the maximum cores set by transientCacheSize. When I add more
cores, the old behavior happens again, where all created transient cores
are loaded in solr.

I'm using the development branch lucene_solr_4_3 to run my example. I can
open a jira if need be.


Re: Security for inter-solr-node requests

2013-05-02 Thread Jan Høydahl
This feature is not yet part of Solr, but a feature under development in 
SOLR-4470. We encourage you to try it out and report back what worked best for 
you.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

2. mai 2013 kl. 13:58 skrev Furkan KAMACI furkankam...@gmail.com:

 Here is a part from wiki:
 
 1) Just forward credentials from the super-request which caused the
 inter-solr-node sub-requests
 2) Use internal credentials provided to the solr-node by the
 administrator at startup
 
 what do you use and is there any code example for it?



Pros and Cons of Using Deduplication of Solr at Huge Data Indexing

2013-05-02 Thread Furkan KAMACI
I use Solr 4.2.1 as SolrCloud. I crawl huge data with Nutch and index them
with SolrCloud. I wonder about Solr's deduplication mechanism. What exactly
it does and does it results with a slow indexing or is it beneficial for my
situation?


RE: Pros and Cons of Using Deduplication of Solr at Huge Data Indexing

2013-05-02 Thread Markus Jelsma
Distributed deduplication does not work right now:
https://issues.apache.org/jira/browse/SOLR-3473

We've chosen not do use update processors for deduplication anymore and rely on 
several custom mapreduce jobs in Nutch and some custom collectors in Solr to do 
some on-demand online deduplication.

If SOLR-3473 is fixed you can get very decent deduplication.

-Original message-
 From:Furkan KAMACI furkankam...@gmail.com
 Sent: Thu 02-May-2013 22:30
 To: solr-user@lucene.apache.org
 Subject: Pros and Cons of Using Deduplication of Solr at Huge Data Indexing
 
 I use Solr 4.2.1 as SolrCloud. I crawl huge data with Nutch and index them
 with SolrCloud. I wonder about Solr's deduplication mechanism. What exactly
 it does and does it results with a slow indexing or is it beneficial for my
 situation?
 


Rearranging Search Results of a Search?

2013-05-02 Thread Furkan KAMACI
I know that I can use boosting at query for a field, for a searching term,
at solrconfig.xml and query elevator so I can arrange the results of a
search. However after I get top documents how can I change the order of a
results? Does Lucene's postfilter stands for that?


Re: EmbeddedSolrServer

2013-05-02 Thread Shawn Heisey
On 5/2/2013 1:43 PM, Alexandre Rafalovitch wrote:
 Actually, I found it very hard to figure out the exact Jar
 requirements for SolrJ. I ended up basically pointing at expanded
 webapp's lib directory, which is a total overkill.
 
 Would be nice to have some specific guidance on this issue.

I have a SolrJ app that uses HttpSolrServer.  Here is the list of jars
in my lib directory relevant to SolrJ.  There are other jars related to
the other functionality in my app that I didn't list here.  I take a
very minimalistic approach for what I add to my lib directory.  I work
out the minimum jars required to get it to compile, then I try the
program out and determine which additional jars it needs one by one.

commons-io-2.4.jar
httpclient-4.2.4.jar
httpcore-4.2.4.jar
httpmime-4.2.4.jar
jcl-over-slf4j-1.7.5.jar
log4j-1.2.17.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
solr-solrj-4.2.1.jar

You might notice that my component versions are newer than what is
included in dist/solrj-lib.  I have tested all of the functionality of
my application, and I do not require the other jars found in
dist/solrj-lib, including zookeeper.  When I add functionality in the
future, if I run into a class not found exception, I will add the
appropriate jar.

If I were using CloudSolrServer, zookeeper would be required.  With
EmbeddedSolrServer, more Lucene and Solr jars are required, because that
starts the Solr server itself within your application.

Thanks,
Shawn



Customizing Solr For a Certain Language

2013-05-02 Thread Furkan KAMACI
Hi folks;

I want to use Solr to index any other language except for English. I will
use Turkish documents to index with Solr. I will implement some algorithms
that is more suitable to Turkish rather than English. Is there any wiki
page that explains to steps for it? I mean what are the main parts of a
customized Analyzer. i.e. a suitable stopwords.txt, a stemmer algorithm,
customized tokenizer for that language, customized tokenizer filter etc.
etc.

Which steps should I follow?


Not able to see newly added copyField in the response (indexing is 80% complete)

2013-05-02 Thread Utkarsh Sengar
Hello,

I updated my schema to use a copyField and have triggered a reindex, 80% of
the reindexing is complete. Although when I query the data, I don't see
myNewCopyFieldName being returned with the documents.

Is there something wrong with my schema or I need to wait for the indexing
to complete to see the new copyField?



This is my schema (retracted the actual names):

 fields
field name=key  type=string indexed=true  stored=true/
field name=1  type=string indexed=true  stored=true/
field name=2  type=string indexed=true  stored=true/
field name=3  type=string indexed=false  stored=true/
field name=4  type=string indexed=true  stored=true/
field name=5  type=string indexed=true  stored=true/
field name=6  type=custom_type indexed=true  stored=true/
field name=7  type=text_general indexed=true  stored=true/
field name=8  type=string indexed=true  stored=true/
field name=9  type=text_general indexed=true  stored=true/
field name=10  type=text_general indexed=true  stored=true/
field name=11  type=string indexed=true  stored=true/
field name=12  type=string indexed=true  stored=true/
field name=13  type=string indexed=true  stored=true/
   field name=myNewCopyFieldName type=text_general indexed=true
stored=true multiValued=true/
 /fields
defaultSearchField4/defaultSearchFielduniqueKeykey/uniqueKey

copyField source=1 dest=myNewCopyFieldName/copyField source=2
dest=myNewCopyFieldName/copyField source=3
dest=myNewCopyFieldName/copyField source=4
dest=myNewCopyFieldName/copyField source=6
dest=myNewCopyFieldName/

Where:
fieldType name=custom_type class=solr.TextField positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer/fieldType


and


fieldType name=text_general class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer/fieldType


-- 
Thanks,
-Utkarsh


How to decide proper cache size at load testing?

2013-05-02 Thread Furkan KAMACI
I read that at wiki:

Sometimes a smaller cache size will help avoid full garbage collections at
the cost of more evictions. Load testing should be used to help determine
proper cache sizes throughout the searching/indexing lifecycle.

Could anybody give me an example scenario of how can I make a test, what
should I do and find a proper cache size at load testing?


Re: Solr 4.2 rollback not working

2013-05-02 Thread Dipti Srivastava
We are using Solr 4.2.1, which claims to have fixed this issue. I have
some logging indicating that the rollback is not broadcasted to other
nodes in solr. So only one node in the cluster gets the rollback but not
the others.



Thanks,
Dipti





phone: 408.678.1595  |  cell: 408.806.1970 | email:
dipti.srivast...@apollogrp.edu

Solutions Engineering and Integration:
https://wiki.apollogrp.edu/display/NGP/Solutions+Engineering+and+Integratio
n
Support process:
https://wiki.apollogrp.edu/display/NGP/Classroom+Services+Support
P Please consider the environment before printing this email.




On 5/2/13 12:09 AM, mark12345 marks1900-pos...@yahoo.com.au wrote:

What version of Solr are you using?  4.2.0 or 4.2.1?

The following might be of interest to you:
*  https://issues.apache.org/jira/browse/SOLR-4605
https://issues.apache.org/jira/browse/SOLR-4605
*  https://issues.apache.org/jira/browse/SOLR-4733
https://issues.apache.org/jira/browse/SOLR-4733



--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393
p4060401.html
Sent from the Solr - User mailing list archive at Nabble.com.



This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.



default.xml
Description: default.xml


Re: Not able to see newly added copyField in the response (indexing is 80% complete)

2013-05-02 Thread Shawn Heisey
On 5/2/2013 3:13 PM, Utkarsh Sengar wrote:
 Hello,
 
 I updated my schema to use a copyField and have triggered a reindex, 80% of
 the reindexing is complete. Although when I query the data, I don't see
 myNewCopyFieldName being returned with the documents.
 
 Is there something wrong with my schema or I need to wait for the indexing
 to complete to see the new copyField?

After making sure that you restarted Solr (or reloaded the core) after
changing your schema, there are two things to mention:

1) Using stored=true with a copyField doesn't make any sense, because
you already have the individual values stored with the source fields.  I
haven't done any testing, but Solr might ignore stored=true on
copyField fields.

2) If I'm wrong about how Solr behaves with stored=true on a
copyField, then a soft commit (4.x and later) or a hard commit with
openSearcher=true would be required to see changes from indexing.  Have
you committed your updates yet?

Thanks,
Shawn



Re: Customizing Solr For a Certain Language

2013-05-02 Thread Alexandre Rafalovitch
Have you looked at the main example that comes with Solr? It contains
a specific configuration for Turkish. Perhaps you could try that and
narrow the question to more precise issues?

I don't remember any Turkish-specific discussions, but perhaps
something can be learned from searching for discussions on supporting
Chinese and German languages.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 2, 2013 at 5:08 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi folks;

 I want to use Solr to index any other language except for English. I will
 use Turkish documents to index with Solr. I will implement some algorithms
 that is more suitable to Turkish rather than English. Is there any wiki
 page that explains to steps for it? I mean what are the main parts of a
 customized Analyzer. i.e. a suitable stopwords.txt, a stemmer algorithm,
 customized tokenizer for that language, customized tokenizer filter etc.
 etc.

 Which steps should I follow?


Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-02 Thread Shawn Heisey
On 5/2/2013 2:19 PM, Furkan KAMACI wrote:
 I see that at my admin page:
 
 Replication (Slave)  Version  GenSize
 Master:  1367307652512   82  778.04 MB
 Slave:   1367307658862   82  781.05 MB
 
 and I started to figure about it so that's why I asked this question.

As we've been trying to tell you, the sizes can (and will) be different
between replicas on SolrCloud.  Also, if you're not running a recent
release candidate of 4.3, then the version numbers on the replication
screen are misleading.  See SOLR-4661 for more details.

Your example of version numbers like 100, 90, and 95 wouldn't actually
happen, because the version number is based on the current time in
milliseconds since 1970-01-01 00:00:00 UTC.  If you index after killing
the leader, the new leader's version number will be higher than the
offline replica.

If you can find actual proof of a problem with index updates related to
killing the leader, then we can take the bug report and work on fixing
it.  Here's how you would go about finding proof.  It would be easiest
to have one shard, but if you want to make sure it's OK with multiple
shards, you would have to kill all the leaders.

* Start with a functional collection with two replicas.
* Index a document with a recognizable ID like A.
* Make sure you can find document A.
* Kill the leader replica, let's say it was replica1.
* Make sure replica2 becomes leader.
* Make sure you can find document A.
* Index document B.
* Start replica1, wait for it to turn green.
* Make sure you can still find document B.
* Kill the leader again, this time it's replica2.
* Make sure you can still find document B.

To my knowledge, nobody has reported a real problem with proof.  I would
imagine that more than one person has done testing like this to make
sure that SolrCloud is reliable.

Thanks,
Shawn



Re: Not able to see newly added copyField in the response (indexing is 80% complete)

2013-05-02 Thread Utkarsh Sengar
Thanks Shawn. Find my answers below.


On Thu, May 2, 2013 at 2:34 PM, Shawn Heisey s...@elyograg.org wrote:

 On 5/2/2013 3:13 PM, Utkarsh Sengar wrote:
  Hello,
 
  I updated my schema to use a copyField and have triggered a reindex, 80%
 of
  the reindexing is complete. Although when I query the data, I don't see
  myNewCopyFieldName being returned with the documents.
 
  Is there something wrong with my schema or I need to wait for the
 indexing
  to complete to see the new copyField?

 After making sure that you restarted Solr (or reloaded the core) after
 changing your schema, there are two things to mention:


Yes, I restarted solr and also did a reload.


 1) Using stored=true with a copyField doesn't make any sense, because
 you already have the individual values stored with the source fields.  I
 haven't done any testing, but Solr might ignore stored=true on
 copyField fields.


Ah I see, didn't know about this. If its not stored then it makes sense.
Need to verify this though.



 2) If I'm wrong about how Solr behaves with stored=true on a
 copyField, then a soft commit (4.x and later) or a hard commit with
 openSearcher=true would be required to see changes from indexing.  Have
 you committed your updates yet?


I am using Solr 4.x and soft commit is enabled. So I assume commit happened.
I see this in my solr admin:

   - lastModified:less than a minute ago
   - version:453962
   - numDocs:26413743
   - maxDoc: 28322675
   - current:
   - indexing: yes

So, lastModified = less than minute means the change was committed right?


 Thanks,
 Shawn




-- 
Thanks,
-Utkarsh


Re: string field does not yield exact match result using qf parameter

2013-05-02 Thread Jan Høydahl
Hi,

You can try to increase the pf boost for your string field, I don't think 
you'll have success in having it boosted with pf since it's a string? Check 
explain output with debugQuery=true and see whether you get a phrase boost.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

2. mai 2013 kl. 19:16 skrev kirpakaroji kirpakar...@yahoo.com:

 Hi Jan
 
 my question is when I tweak pf and qf parameter and the results change
 slightly and I do not think for exact match you need to implement the
 solution that you mentioned in your reply. you can always have string field
 and in your pf parameter you can boost that field to get the exact match
 results on top.
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/string-field-does-not-yield-exact-match-result-using-qf-parameter-tp4060096p4060492.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete from Solr Cloud 4.0 index..

2013-05-02 Thread Shawn Heisey
On 5/2/2013 4:24 AM, Annette Newton wrote:
 Hi Shawn,
 
 Thanks so much for your response.  We basically are very write intensive
 and write throughput is pretty essential to our product.  Reads are
 sporadic and actually is functioning really well.
 
 We write on average (at the moment) 8-12 batches of 35 documents per
 minute.  But we really will be looking to write more in the future, so need
 to work out scaling of solr and how to cope with more volume.
 
 Schema (I have changed the names) :
 
 http://pastebin.com/x1ry7ieW
 
 Config:
 
 http://pastebin.com/pqjTCa7L

This is very clean.  There's probably more you could remove/comment, but
generally speaking I couldn't find any glaring issues.  In particular,
you have disabled autowarming, which is a major contributor to commit
speed problems.

The first thing I think I'd try is increasing zkClientTimeout to 30 or
60 seconds.  You can use the startup commandline or solr.xml, I would
probably use the latter.  Here's a solr.xml fragment that uses a system
property or a 15 second default:

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib
  cores adminPath=/admin/cores
zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:}
hostContext=solr

General thoughts, these changes might not help this particular issue:
You've got autoCommit with openSearcher=true.  This is a hard commit.
If it were me, I would set that up with openSearcher=false and either do
explicit soft commits from my application or set up autoSoftCommit with
a shorter timeframe than autoCommit.

This might simply be a scaling issue, where you'll need to spread the
load wider than four shards.  I know that there are financial
considerations with that, and they might not be small, so let's leave
that alone for now.

The memory problems might be a symptom/cause of the scaling issue I just
mentioned.  You said you're using facets, which can be a real memory hog
even with only a few of them.  Have you tried facet.method=enum to see
how it performs?  You'd need to switch to it exclusively, never go with
the default of fc.  You could put that in the defaults or invariants
section of your request handler(s).

Another way to reduce memory usage for facets is to use disk-based
docValues on version 4.2 or later for the facet fields, but this will
increase your index size, and your index is already quite large.
Depending on your index contents, the increase may be small or large.

Something to just mention: It looks like your solrconfig.xml has
hard-coded absolute paths for dataDir and updateLog.  This is fine if
you'll only ever have one core/collection on each server, but it'll be a
disaster if you have multiples.  I could be wrong about how these get
interpreted in SolrCloud -- they might actually be relative despite
starting with a slash.

Thanks,
Shawn



Re: Any estimation for solr 4.3?

2013-05-02 Thread Shawn Heisey
On 5/2/2013 7:56 AM, Andy Lester wrote:
 
 On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com wrote:
 
 RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
 days.
 
 
 How can I see the Changelog of what will be in it?

Here's the latest CHANGES.txt file straight from the source repository
on the 4.3 branch:

https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_3/solr/CHANGES.txt?view=markup

I did not see a 4_3_0 tag in svn.  Perhaps that will be created as the
last step before release.

Thanks,
Shawn



Re: Does Near Real Time get not supported at SolrCloud?

2013-05-02 Thread Otis Gospodnetic
NRT works with SolrCloud.

Otis
Solr  ElasticSearch Support
http://sematext.com/

On May 2, 2013 5:34 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 Does Near Real Time get not supported at SolrCloud?

 I mean when a soft commit occurs at a leader I think that it doesn't
 distribute it to replicas(because it is not at storage, does indexes at
RAM
 distributes to replicas too?) and a search query comes what happens?


Re: How to decide proper cache size at load testing?

2013-05-02 Thread Otis Gospodnetic
You simply need to monitor and adjust. Both during testing and in
production because search patterns change over time. Hook up alerting to it
to get notified of high evictions and low cache hit rate so you don't have
to actively look at stats all day.

Here is the graph of Query Cache metrics for http://search-lucene.com/ for
example:

https://apps.sematext.com/spm-reports/s.do?k=eDcirzHG7i


Otis
Solr  ElasticSearch Support
http://sematext.com/

On May 2, 2013 5:14 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I read that at wiki:

 Sometimes a smaller cache size will help avoid full garbage collections at
 the cost of more evictions. Load testing should be used to help determine
 proper cache sizes throughout the searching/indexing lifecycle.

 Could anybody give me an example scenario of how can I make a test, what
 should I do and find a proper cache size at load testing?



Re: Rearranging Search Results of a Search?

2013-05-02 Thread Otis Gospodnetic
Hi,

You should use search more often :)
http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue

Coincidentally, what you see there happens to be a good example of a
Solr component that does something behind the scenes to deliver those
search results even though my original query was bad.  Knd of
similar to what you are after.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 I know that I can use boosting at query for a field, for a searching term,
 at solrconfig.xml and query elevator so I can arrange the results of a
 search. However after I get top documents how can I change the order of a
 results? Does Lucene's postfilter stands for that?


Re: SolrJ / Solr Two Phase Commit

2013-05-02 Thread mark12345
By saying commits in Solr are global, do you mean per Solr deployment, per
HttpSolrServer instance, per thread, or something else?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: EmbeddedSolrServer

2013-05-02 Thread Peri Subrahmanya
I actually have a maven project with a declared solrj dependency (4.2.1); Do I 
need anything extra to get rid of the Zookeeper exception? I didn't see jars 
specific to zookeeper in the list below that I would need. Any more ideas 
please?

Thank you,
Peri Subrahmanya


On May 2, 2013, at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 5/2/2013 1:43 PM, Alexandre Rafalovitch wrote:
 Actually, I found it very hard to figure out the exact Jar
 requirements for SolrJ. I ended up basically pointing at expanded
 webapp's lib directory, which is a total overkill.
 
 Would be nice to have some specific guidance on this issue.
 
 I have a SolrJ app that uses HttpSolrServer.  Here is the list of jars
 in my lib directory relevant to SolrJ.  There are other jars related to
 the other functionality in my app that I didn't list here.  I take a
 very minimalistic approach for what I add to my lib directory.  I work
 out the minimum jars required to get it to compile, then I try the
 program out and determine which additional jars it needs one by one.
 
 commons-io-2.4.jar
 httpclient-4.2.4.jar
 httpcore-4.2.4.jar
 httpmime-4.2.4.jar
 jcl-over-slf4j-1.7.5.jar
 log4j-1.2.17.jar
 slf4j-api-1.7.5.jar
 slf4j-log4j12-1.7.5.jar
 solr-solrj-4.2.1.jar
 
 You might notice that my component versions are newer than what is
 included in dist/solrj-lib.  I have tested all of the functionality of
 my application, and I do not require the other jars found in
 dist/solrj-lib, including zookeeper.  When I add functionality in the
 future, if I run into a class not found exception, I will add the
 appropriate jar.
 
 If I were using CloudSolrServer, zookeeper would be required.  With
 EmbeddedSolrServer, more Lucene and Solr jars are required, because that
 starts the Solr server itself within your application.
 
 Thanks,
 Shawn
 
 
 
 
 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Re: SolrJ / Solr Two Phase Commit

2013-05-02 Thread Michael Della Bitta
Peer core or collection, depending on whether we're talking about Cloud or
not.

Basically, commits in Solr are about controlling visibility more than
anything, although now with Cloud, they have resource consumption and
lifecycle ramifications as well.
On May 2, 2013 10:01 PM, mark12345 marks1900-pos...@yahoo.com.au wrote:

 By saying commits in Solr are global, do you mean per Solr deployment,
 per
 HttpSolrServer instance, per thread, or something else?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrJ / Solr Two Phase Commit

2013-05-02 Thread mark12345
Question: Just to clarify. Are you saying that if I have multiple threads
using multiple instances of HttpSolrServer each making calls to add
SolrInputDocuments (For example, httpSolrServer.add(SolrInputDocument
doc). ), and one server calls httpSolrServer.commit(), all documents
added are now commited?


If that is the case it does help me understand the rollback api description
in a new light.

http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29

 Performs a rollback of all non-committed documents pending.
 
 Note that this is not a true rollback as in databases. Content you have
 previously added may have been committed due to autoCommit, buffer full,
 other client performing a commit etc.




Michael Della Bitta-2 wrote
 Per core or collection, depending on whether we're talking about Cloud or
 not.
 
 Basically, commits in Solr are about controlling visibility more than
 anything, although now with Cloud, they have resource consumption and
 lifecycle ramifications as well.
 On May 2, 2013 10:01 PM, mark12345 wrote:
 
 By saying commits in Solr are global, do you mean per Solr deployment,
 per
 HttpSolrServer instance, per thread, or something else?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html
Sent from the Solr - User mailing list archive at Nabble.com.


The HttpSolrServer add(CollectionSolrInputDocument docs) method is not atomic.

2013-05-02 Thread mark12345
One thing I noticed is that while the HttpSolrServer add(SolrInputDocument
doc) method is atomic (Either a bean is added or an exception is thrown),
the HttpSolrServer add(CollectionSolrInputDocument docs) method is not
atomic.  

Question:  Is there a way to commit multiple documents/beans in a
transaction/together in a way that it succeeds completely or fails
completely?


Quick outline of what I did to highlight a call to HttpSolrServer
add(CollectionSolrInputDocument docs) method is not atomic.
1.  Create 5 documents, comprising of 4 valid documents (Documents 1,2,4,5)
and 1 document with an issue, document 3.
2.  Call to HttpSolrServer add(CollectionSolrInputDocument docs) which
threw a SolrException.
3.  Call to HttpSolrServer commit().
4.  Discovered that 2 out of 5 (documents 1 and 2) documents where still
commited.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060590.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 rollback not working

2013-05-02 Thread mark12345
Sorry I don't know enough of the system to help you directly.  I was only
going to suggest upgrading to 4.2.1 if you where using 4.2.0.

It might be worthwhile to create a JIRA issue for what you are experiencing.

https://issues.apache.org/jira/browse/SOLR



Dipti Srivastava wrote
 We are using Solr 4.2.1, which claims to have fixed this issue. I have
 some logging indicating that the rollback is not broadcasted to other
 nodes in solr. So only one node in the cluster gets the rollback but not
 the others.
 
 
 
 Thanks,
 Dipti
 
 On 5/2/13 12:09 AM, mark12345 wrote:
 
What version of Solr are you using?  4.2.0 or 4.2.1?

The following might be of interest to you:
*  https://issues.apache.org/jira/browse/SOLR-4605
lt;https://issues.apache.org/jira/browse/SOLR-4605gt;
*  https://issues.apache.org/jira/browse/SOLR-4733
lt;https://issues.apache.org/jira/browse/SOLR-4733gt;



--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060401.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4060592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: EmbeddedSolrServer

2013-05-02 Thread Shawn Heisey
On 5/2/2013 8:07 PM, Peri Subrahmanya wrote:
 I actually have a maven project with a declared solrj dependency (4.2.1); Do 
 I need anything extra to get rid of the Zookeeper exception? I didn't see 
 jars specific to zookeeper in the list below that I would need. Any more 
 ideas please?

SolrJ has a dependency on zookeeper.  Not all uses of solrj will require
zookeeper, but maven cannot know how you intend to use it, so I don't
think anything can be done about automatically pulling it in.

Thanks,
Shawn



Re: socket write error

2013-05-02 Thread Dmitry Kan
Hi, thanks.

Solr 3.4.
There is POST request everywhere, between client and router, router and
shards.

Do you do faceting across all shards? How many documents approx you have?
On 2 May 2013 22:02, Patanachai Tangchaisin 
patanachai.tangchai...@wizecommerce.com wrote:

 Hi,

 First, which version of Solr are you using?

 I also has 60 shards+ on Solr 4.2.1 and it doesn't seems to be a problem
 for me.

 - Make sure you use POST to send a query to Solr.
 - 'connection reset by peer' from client can indicate that there is
 something wrong with server e.g. server closes a connection etc.

 --
 Patanachai

 On 05/02/2013 05:05 AM, Dmitry Kan wrote:

 After some searching around, I see this:

 http://search-lucene.com/m/**ErEZUl7P5f2/%2522socket+write+**
 error%2522subj=Long+list+of+**shards+breaks+solrj+queryhttp://search-lucene.com/m/ErEZUl7P5f2/%2522socket+write+error%2522subj=Long+list+of+shards+breaks+solrj+query

 Seems like this has happened in the past with large amount of shards.

 To make it clear: the distributed search works with 20 shards.


 On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi guys!

 We have solr router and shards. I see this in jetty log on the router:

 May 02, 2013 1:30:22 PM org.apache.commons.httpclient.**
 HttpMethodDirector
 executeWithRetry
 INFO: I/O exception (java.net.SocketException) caught when processing
 request: Connection reset by peer: socket write error

 and then:

 May 02, 2013 1:30:22 PM org.apache.commons.httpclient.**
 HttpMethodDirector
 executeWithRetry
 INFO: Retrying request

 followed by exception about Internal Server Error

 any ideas why this happens?

 We run 80+ shards distributed across several servers. Router runs on its
 own node.

 Is there anything in particular I should be looking into wrt ubuntu
 socket
 settings? Is this a known issue for solr's distributed search from the
 past?

 Thanks,
 Dmitry



 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.




Re: AutoSuggest+Grouping in one request

2013-05-02 Thread Otis Gospodnetic
Hi,

Hm, I *think* you can't do it in one go with Solr's Suggester, but I'm
not expert there.  I can only point you to something like our
AutoComplete - http://sematext.com/products/autocomplete/index.html -
which, as you can see on that screenshot, has the grouping you seem to
be after.  Maybe somebody else can point out if Solr Suggester can do
the same?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Fri, Apr 26, 2013 at 9:58 AM, Rounak Jain rouna...@gmail.com wrote:
 Hi everyone,

 Search dropdowns on popular sites like Amazon (example
 imagehttp://i.imgur.com/aQyM8WD.jpg)
 use autosuggested words along with grouping (Field Collapsing in Solr).

 While I can replicate the same functionality in Solr using two requests
 (first to obtain suggestions, second for the actual query using the most
 probable suggestion), I want to know if this can be done in one request
 itself.

 I understand that there are various ways to obtain suggestions (term
 component, facets, Solr's inbuilt
 Suggesterhttp://wiki.apache.org/solr/Suggester),
 and I'm open to using any one of them, if it means I'll be able to get
 everything (groups + suggestions) in one request.

 Looking forward to some advice with regard to this.

 Thanks,

 Rounak


Re: SolrJ / Solr Two Phase Commit

2013-05-02 Thread Walter Underwood
Yes, that is correct.  --wunder

On May 2, 2013, at 7:46 PM, mark12345 wrote:

 Question: Just to clarify. Are you saying that if I have multiple threads
 using multiple instances of HttpSolrServer each making calls to add
 SolrInputDocuments (For example, httpSolrServer.add(SolrInputDocument
 doc). ), and one server calls httpSolrServer.commit(), all documents
 added are now commited?
 
 
 If that is the case it does help me understand the rollback api description
 in a new light.
 
 http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
 
 Performs a rollback of all non-committed documents pending.
 
 Note that this is not a true rollback as in databases. Content you have
 previously added may have been committed due to autoCommit, buffer full,
 other client performing a commit etc.
 
 
 
 
 Michael Della Bitta-2 wrote
 Per core or collection, depending on whether we're talking about Cloud or
 not.
 
 Basically, commits in Solr are about controlling visibility more than
 anything, although now with Cloud, they have resource consumption and
 lifecycle ramifications as well.
 On May 2, 2013 10:01 PM, mark12345 wrote:
 
 By saying commits in Solr are global, do you mean per Solr deployment,
 per
 HttpSolrServer instance, per thread, or something else?
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html
 Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wun...@wunderwood.org





Re: More than one sort criteria

2013-05-02 Thread Peter Sch�tt
Hallo,

 Peter, try sorting them only using one sort parameter, separating the
 fields by comma.
 
 sort=zip+asc,street+asc 

This was it, thank you.

Ciao
  Peter Schütt