date:20130904

unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan

i am using solr4.2 on windows7
my schema is:
field name=id type=string indexed=true stored=true
required=true/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true
stored=true multiValued=false/
dynamicField name=ignored_* type=ignored indexed=false stored=
falsemultiValued=true/

 solrconfig.xml :
requestHandler name=/update/extract class=org.apache.solr.handler.
extraction.ExtractingRequestHandler
lst name=defaults
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str
str name=captureAttrtrue/str
/lst
/requestHandler

when i execute:
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
-F myfile=@abc.txt

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen

so could i just nest it in a XPathEntityProcessor to filter the html or is 
there something like xpath for tika?

entity name=htm processor=XPathEntityProcessor url=${rec.file} 
forEach=/div[@id='content'] dataSource=main
entity name=tika processor=TikaEntityProcessor 
url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
format=html 
field column=text /
/entity
/entity

but now i dont know how to pass the text to tika, what do i put in url and 
datasource?


On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:

 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.
 
 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for the 
 field text. unfortunately it's indexing the hole page. Can't xpath do this?
 
 data-config.xml:
 
 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource name=main/
 document
entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /
 
entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
field column=text xpath=//div[@id='content'] /
 
/entity
/entity
 /document
 /dataConfig
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt


Hey Alejandro,

I guess it means what you call more than one instance.

The request handlers are at the core-level, and not the Solr 
instance/global level, and within each of those cores you could have one 
or more data import handlers.


Most setups have 1 DIH per core at the handler location /dataimport, 
but I believe you could have several, ie: /dataimport2, /dataimport3 
if you had different DIH configs for each handler.


Within a single data import handler, you can have several entities, 
which are what explain to the DIH processes how to get/index the data. 
What you can do here is have several entities that construct your index, 
and execute those entities with several separate HTTP calls to the DIH, 
thus creating more than one instance of the DIH process within 1 core 
and 1 DIH handler.


ie:

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=suppliers; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=parts; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=companies; 



http://wiki.apache.org/solr/DataImportHandler#Commands

Cheers,

Tim

On 03/09/13 09:25 AM, Alejandro Calbazana wrote:

Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro

Re: Change the score of a document based on the value of a multifield using dismax

2013-09-04 Thread danielitos85

Thanks a lot David. 
I will try it ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Shalin Shekhar Mangar

No that wouldn't work. It seems that you probably need a custom
Transformer to extract the right div content. I do not know if
TikaEntityProcessor supports such a thing.

On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote:
 so could i just nest it in a XPathEntityProcessor to filter the html or is 
 there something like xpath for tika?

 entity name=htm processor=XPathEntityProcessor url=${rec.file} 
 forEach=/div[@id='content'] dataSource=main
 entity name=tika processor=TikaEntityProcessor 
 url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
 format=html 
 field column=text /
 /entity
 /entity

 but now i dont know how to pass the text to tika, what do i put in url and 
 datasource?


 On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:

 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.

 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for 
 the field text. unfortunately it's indexing the hole page. Can't xpath do 
 this?

 data-config.xml:

 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource name=main/
 document
entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /

entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
field column=text xpath=//div[@id='content'] /

/entity
/entity
 /document
 /dataConfig



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Measuring SOLR performance

2013-09-04 Thread Dmitry Kan

Hi Roman,

Ok, I will. Thanks!

Cheers,
Dmitry


On Tue, Sep 3, 2013 at 4:46 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Dmitry,

 Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the
 issue of the plugin we use to generate charts). You may want to use the
 github for whatever comes next

 https://github.com/romanchyla/solrjmeter/issues

 Cheers,

   roman


 On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi Roman,
 
  Thanks, the --additionalSolrParams was just what I wanted and works fine.
 
  BTW, if you have some special bug tracking forum for the tool, I'm
 happy
  to submit questions / bug reports there. Otherwise, this email list is ok
  (for me at least).
 
  One other thing I have noticed in the err logs was a series of messages
 of
  this sort upon generating the perf test report. Seems to be jmeter
 related
  (the err messages disappear, if extra lib dir is present under ext
  directory).
 
  java.lang.Throwable: Could not access
  /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
 
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
 
 
 
  On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   Hi Dmitry,
  
   If it is something you want to pass with every request (which is my use
   case), you can pass it as additional solr params, eg.
  
   python solrjmeter
  
  
 
 --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name
   
  
   the string should be url encoded.
  
   If it is something that changes with every request, you should modify
 the
   jmeter test. If you open/load it with jmeter GUI, in the HTTP request
   processor you can define other additional fields to pass with the
  request.
   These values can come from the CSV file, you'll see an example how to
 use
   that when you open the test difinition file.
  
   Cheers,
  
 roman
  
  
  
  
   On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com
 wrote:
  
Hi Erick,
   
Agree, this is perfectly fine to mix them in solr. But my question is
   about
solrjmeter input query format. Just couldn't find a suitable example
 on
   the
solrjmeter's github.
   
Dmitry
   
   
   
On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 filter and facet queries can be freely intermixed, it's not a
  problem.
 What problem are you seeing when you try this?

 Best,
 Erick


 On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com
   wrote:

  Hi Roman,
 
  What's the format for running the facet+filter queries?
 
  Would something like this work:
 
  field:foo  =50  fq=other_field:bar facet=true
 facet.field=facet_field_name
 
 
  Thanks,
  Dmitry
 
 
 
  On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan 
 solrexp...@gmail.com
 wrote:
 
   Hi Roman,
  
   With adminPath=/admin or adminPath=/admin/cores, no.
Interestingly
   enough, though, I can access
   http://localhost:8983/solr/statements/admin/system
  
   But I can access http://localhost:8983/solr/admin/cores, only
  when
 with
   adminPath=/admin/cores (which suggests that this is the right
   value
 to
  be
   used for cores), and not with adminPath=/admin.
  
   Bottom line, these core configuration is not self-evident.
  
   Dmitry
  
  
  
  
   On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla 
   roman.ch...@gmail.com
  wrote:
  
   Hi Dmitry,
   So it seems solrjmeter should not assume the adminPath - and
   perhaps
  needs
   to be passed as an argument. When you set the adminPath, are
 you
able
 to
   access localhost:8983/solr/statements/admin/cores ?
  
   roman
  
  
   On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan 
  solrexp...@gmail.com
   
  wrote:
  
Hi Roman,
   
I have noticed a difference with different solr.xml config
contents.
  It
   is
probably legit, but thought to let you know (tests run on
  fresh
   checkout as
of today).
   
As mentioned before, I have two cores configured in
 solr.xml.
  If
the
   file
is:
   
[code]
solr persistent=false
   
  !--
  adminPath: RequestHandler path to manage cores.
If 'null' (or absent), cores will not be manageable via

Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase but 
also hits that have the word 'the' before the word 'toolkit', no matter how far 
apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-04 Thread maephisto

Thanks Shawn!

Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick.
Currently I'm exploring and experimenting with SolrCloud, thus I only used
only one ZK.
For a production environment you suggestion would, of course, be mandatory.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4088164.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde

My solrconfig.xml is:

 

requestHandler name=/update/extract
class=solr.extraction.ExtractingRequestHandler 

lst name=defaults

str name=fmap.contentdesc/str   !-to map this field of my table which
is defined as shown below in schem.xml--

str name=lowernamestrue/str

str name=uprefixattr_/str

str name=captureAttrtrue/str

/lst

/requestHandler

lib dir=../../extract regex=.*\.jar /

 

Schema.xml:

fields 

field name=doc_id type=integer indexed=true stored=true
multiValued=false/  

field name=name type=text indexed=true stored=true
multiValued=false/  

field name=path type=text indexed=true stored=true
multiValued=false/

field name=desc type=text_split indexed=true stored=true
multiValued=false/

/fields 

types

fieldType name=string class=solr.StrField  /

fieldType name=integer class=solr.IntField /

fieldType name=text class=solr.TextField /

fieldType name=text class=solr.TextField /

/types

dynamicField name=*_i  type=integer  indexed=true  stored=true/

uniqueKeydoc_id/uniqueKey

 

I have created extract directory and copied all required .jar and solr-cell
jar files into this extract directory and given its path in lib tag in
solrconfig.xml

 

When I try out this:

 

curl
http://localhost:8080/solr/update/extract?literal.doc_id=1commit=true;

-F myfile=@solr-word.pdf mailto:myfile=@solr-word.pdf   in Windows 7.

 

I get /solr/update/extract is not available and sometimes I get access
denied error.

I tried resolving through net,but in vain.as all the solutions are related
to linux os,im working on Windows.

Please help me and provide solutions related o Windows os.

I referred Apache_solr_4_Cookbook.

Thanks a lot.

solr performance against oracle

2013-09-04 Thread Sergio Stateri

Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName(oracle.jdbc.driver.OracleDriver);
Connection conn =
DriverManager.getConnection(XXX);
PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689);
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println(total time: + (new
Date().getTime()-initialTime.getTime()) +  ms);
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = http://localhost:8983/solr/;;
this.solrServerUrl = http://localhost:8983/solr/roteiros/;;
server = new HttpSolrServer(solrUrl);
 String docId = AddOneRoteiroToCollection.docId;
 HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
 query.setQuery((id: + docId + )); // search by id
query.addField(id);
query.addField(descricaoRoteiro);

extrairEApresentarResultados(query);
 }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
 for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println(Total de documentos encontrados:  + docs.size());
System.out.println(Tempo total:  + now +  ms);
}


descricaoRoteiro is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name id (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.


-- 
Sergio Stateri Jr.
stat...@gmail.com

Re: solr performance against oracle

2013-09-04 Thread Andrea Gazzarini

You said nothing about your enviroments (e.g. operating systems, what 
kind of Oracle installation you have, whar kind of SOLR installation, 
how many data in database, how many documents in index, RAM for SOLR, 
for Oracle, for OS, and in general hardware...and so on)...


Anyway...a migration from Oracle to SOLR? That is, you're going to throw 
out the window Oracle and completely replace it with SOLR? I would 
consider other aspects first before your performace test...unless you 
have one flat table in Oracle, you should explain to your manager that 
there's a lot work that needs to be done for that kind of migration 
(e.g. collect all query requirements, denormalization)


Best,
Gazza


On 09/04/2013 02:06 PM, Sergio Stateri wrote:

Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName(oracle.jdbc.driver.OracleDriver);
Connection conn =
DriverManager.getConnection(XXX);
PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689);
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println(total time: + (new
Date().getTime()-initialTime.getTime()) +  ms);
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = http://localhost:8983/solr/;;
this.solrServerUrl = http://localhost:8983/solr/roteiros/;;
server = new HttpSolrServer(solrUrl);
  String docId = AddOneRoteiroToCollection.docId;
  HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
  query.setQuery((id: + docId + )); // search by id
query.addField(id);
query.addField(descricaoRoteiro);

extrairEApresentarResultados(query);
  }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
  for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println(Total de documentos encontrados:  + docs.size());
System.out.println(Tempo total:  + now +  ms);
}


descricaoRoteiro is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name id (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.

Re: Strange behaviour with single word and phrase

2013-09-04 Thread Jack Krupansky

Do you have stop word filtering enabled? What does your field type look 
like?


If stop words are ignored, you will get exactly the behavior you described.

-- Jack Krupansky

-Original Message- 
From: Alistair Young

Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase 
but also hits that have the word 'the' before the word 'toolkit', no matter 
how far apart they are.


Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80

Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Jack Krupansky


Did you restart Solr after editing config and schema?

-- Jack Krupansky

-Original Message- 
From: Nutan

Sent: Wednesday, September 04, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: unknown _stream_source_info while indexing rich doc in solr

i am using solr4.2 on windows7
my schema is:
field name=id type=string indexed=true stored=true
required=true/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true
stored=true multiValued=false/
dynamicField name=ignored_* type=ignored indexed=false stored=
falsemultiValued=true/

solrconfig.xml :
requestHandler name=/update/extract class=org.apache.solr.handler.
extraction.ExtractingRequestHandler
lst name=defaults
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str
str name=captureAttrtrue/str
/lst
/requestHandler

when i execute:
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
-F myfile=@abc.txt

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Cloud hangs when replicating updates

2013-09-04 Thread Greg Walters

Kevin,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: kevin.osb...@cbsinteractive.com [mailto:kevin.osb...@cbsinteractive.com] 
On Behalf Of Kevin Osborn
Sent: Tuesday, September 03, 2013 4:16 PM
To: solr-user
Subject: Solr Cloud hangs when replicating updates

I was having problems updating SolrCloud with a large batch of records. The 
records are coming in bursts with lulls between updates.

At first, I just tried large updates of 100,000 records at a time.
Eventually, this caused Solr to hang. When hung, I can still query Solr.
But I cannot do any deletes or other updates to the index.

At first, my updates were going as SolrJ CSV posts. I have also tried local 
file updates and had similar results. I finally slowed things down to just use 
SolrJ's Update feature, which is basically just JavaBin. I am also sending over 
just 100 at a time in 10 threads. Again, it eventually hung.

Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs 
right away.

These are my commit settings:

autoCommit
   maxTime15000/maxTime
   maxDocs5000/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit
autoSoftCommit
 maxTime3/maxTime
   /autoSoftCommit

I have tried quite a few variations with the same results. I also tried various 
JVM settings with the same results. The only variable seems to be that reducing 
the cluster size from 2 to 1 is the only thing that helps.

I also did a jstack trace. I did not see any explicit deadlocks, but I did see 
quite a few threads in WAITING or TIMED_WAITING. It is typically something like 
this:

  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00074039a450 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Greg Walters

Tim,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: Tim Vaillancourt [mailto:t...@elementspace.com] 
Sent: Tuesday, September 03, 2013 6:31 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud 4.x hangs under high update volume

Hey guys,

I am looking into an issue we've been having with SolrCloud since the beginning 
of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've 
noticed other users with this same issue, so I'd really like to get to the 
bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see 
stalled transactions that snowball to consume all Jetty threads in the JVM. 
This eventually causes the JVM to hang with most threads waiting on the 
condition/stack provided at the bottom of this message. At this point SolrCloud 
instances then start to see their neighbors (who also have all threads hung) as 
down w/Connection Refused, and the shards become down
in state. Sometimes a node or two survives and just returns 503s no server 
hosting shard errors.

As a workaround/experiment, we have tuned the number of threads sending updates 
to Solr, as well as the batch size (we batch updates from client - solr), and 
the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching 
(1 update = 1 call to Solr), which also did not help. Certain combinations of 
update threads and batch sizes seem to mask/help the problem, but not resolve 
it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a 
replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day.
- 5000 max jetty threads (well above what we use when we are healthy), 
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java version (I 
hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the following, 
which seems to be waiting on a lock that I would very much like to understand 
further:

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)

Re: Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young

Yep ignoring stop words. Thanks for the pointer.

Alistair

-
mov eax,1
mov ebx,0
int 80




On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote:

Do you have stop word filtering enabled? What does your field type look
like?

If stop words are ignored, you will get exactly the behavior you
described.

-- Jack Krupansky

-Original Message-
From: Alistair Young
Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase
but also hits that have the word 'the' before the word 'toolkit', no
matter 
how far apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen

or could i use a filter in schema.xml where i define a fieldtype and use some 
filter that understands xpath?

On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote:

 No that wouldn't work. It seems that you probably need a custom
 Transformer to extract the right div content. I do not know if
 TikaEntityProcessor supports such a thing.
 
 On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote:
 so could i just nest it in a XPathEntityProcessor to filter the html or is 
 there something like xpath for tika?
 
 entity name=htm processor=XPathEntityProcessor url=${rec.file} 
 forEach=/div[@id='content'] dataSource=main
entity name=tika processor=TikaEntityProcessor 
 url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
 format=html 
field column=text /
/entity
/entity
 
 but now i dont know how to pass the text to tika, what do i put in url and 
 datasource?
 
 
 On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:
 
 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.
 
 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for 
 the field text. unfortunately it's indexing the hole page. Can't xpath 
 do this?
 
 data-config.xml:
 
 dataConfig
   dataSource type=BinFileDataSource name=data/
   dataSource type=BinURLDataSource name=dataUrl/
   dataSource type=URLDataSource name=main/
 document
   entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
   field column=title xpath=//title /
   field column=id xpath=//id /
   field column=file xpath=//file /
   field column=path xpath=//path /
   field column=url xpath=//url /
   field column=Author xpath=//author /
 
   entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
   field column=text xpath=//div[@id='content'] /
 
   /entity
   /entity
 /document
 /dataConfig
 
 
 
 --
 Regards,
 Shalin Shekhar Mangar.
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller

I'm going to try and fix the root cause for 4.5 - I've suspected what it is 
since early this year, but it's never personally been an issue, so it's rolled 
along for a long time. 

Mark

Sent from my iPhone

On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote:

 Hey guys,
 
 I am looking into an issue we've been having with SolrCloud since the
 beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
 yet). I've noticed other users with this same issue, so I'd really like to
 get to the bottom of it.
 
 Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
 see stalled transactions that snowball to consume all Jetty threads in the
 JVM. This eventually causes the JVM to hang with most threads waiting on
 the condition/stack provided at the bottom of this message. At this point
 SolrCloud instances then start to see their neighbors (who also have all
 threads hung) as down w/Connection Refused, and the shards become down
 in state. Sometimes a node or two survives and just returns 503s no server
 hosting shard errors.
 
 As a workaround/experiment, we have tuned the number of threads sending
 updates to Solr, as well as the batch size (we batch updates from client -
 solr), and the Soft/Hard autoCommits, all to no avail. Turning off
 Client-to-Solr batching (1 update = 1 call to Solr), which also did not
 help. Certain combinations of update threads and batch sizes seem to
 mask/help the problem, but not resolve it entirely.
 
 Our current environment is the following:
 - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
 - 3 x Zookeeper instances, external Java 7 JVM.
 - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
 a replica of 1 shard).
 - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
 day.
 - 5000 max jetty threads (well above what we use when we are healthy),
 Linux-user threads ulimit is 6000.
 - Occurs under Jetty 8 or 9 (many versions).
 - Occurs under Java 1.6 or 1.7 (several minor versions).
 - Occurs under several JVM tunings.
 - Everything seems to point to Solr itself, and not a Jetty or Java version
 (I hope I'm wrong).
 
 The stack trace that is holding up all my Jetty QTP threads is the
 following, which seems to be waiting on a lock that I would very much like
 to understand further:
 
 java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
 java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at

Re: Boost by numFounds

2013-09-04 Thread Flavio Pompermaier

I found that what can do the trick for page-rank like indexing is
externalFileField! Is there an help to upload the external files to all
solr servers (in solr 3 and solrCloud)?
Or should I copy it to all solr instances data folder and then reload their
cache?

On Sat, Aug 24, 2013 at 12:36 AM, Flavio Pompermaier
pomperma...@okkam.itwrote:

 Any help..? Is it possible to add this pagerank-like behaviour?

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Kevin Osborn

I am having this issue as well. I did apply this patch. Unfortunately, it
did not resolve the issue in my case.

On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters
gwalt...@sherpaanalytics.comwrote:

Tim,

Take a look at
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland
https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue
that you're reporting for a while then I applied the patch from SOLR-4816
to my clients and the problems went away. If you don't feel like applying
the patch it looks like it should be included in the release of version
4.5. Also note that the problem happens more frequently when the
replication factor is greater than 1.

Thanks,
Greg

-Original Message-
From: Tim Vaillancourt [mailto:t...@elementspace.com]
Sent: Tuesday, September 03, 2013 6:31 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud 4.x hangs under high update volume

Hey guys,

I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
yet). I've noticed other users with this same issue, so I'd really like to
get to the bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
see stalled transactions that snowball to consume all Jetty threads in the
JVM. This eventually causes the JVM to hang with most threads waiting on
the condition/stack provided at the bottom of this message. At this point
SolrCloud instances then start to see their neighbors (who also have all
threads hung) as down w/Connection Refused, and the shards become down
in state. Sometimes a node or two survives and just returns 503s no
server hosting shard errors.

As a workaround/experiment, we have tuned the number of threads sending
updates to Solr, as well as the batch size (we batch updates from client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
day.
- 5000 max jetty threads (well above what we use when we are healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java
version (I hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much like
to understand further:

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for 0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at

org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at

org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at

org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at

org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at

org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at

org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at

org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Kevin Osborn

Thanks. If there is anything I can do to help you resolve this issue, let
me know.

-Kevin


On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote:

 Ill look at fixing the root issue for 4.5. I've been putting it off for
 way to long.

 Mark

 Sent from my iPhone

 On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:

  I was having problems updating SolrCloud with a large batch of records.
 The
  records are coming in bursts with lulls between updates.
 
  At first, I just tried large updates of 100,000 records at a time.
  Eventually, this caused Solr to hang. When hung, I can still query Solr.
  But I cannot do any deletes or other updates to the index.
 
  At first, my updates were going as SolrJ CSV posts. I have also tried
 local
  file updates and had similar results. I finally slowed things down to
 just
  use SolrJ's Update feature, which is basically just JavaBin. I am also
  sending over just 100 at a time in 10 threads. Again, it eventually hung.
 
  Sometimes, Solr hangs in the first couple of chunks. Other times, it
 hangs
  right away.
 
  These are my commit settings:
 
  autoCommit
maxTime15000/maxTime
maxDocs5000/maxDocs
openSearcherfalse/openSearcher
  /autoCommit
  autoSoftCommit
  maxTime3/maxTime
/autoSoftCommit
 
  I have tried quite a few variations with the same results. I also tried
  various JVM settings with the same results. The only variable seems to be
  that reducing the cluster size from 2 to 1 is the only thing that helps.
 
  I also did a jstack trace. I did not see any explicit deadlocks, but I
 did
  see quite a few threads in WAITING or TIMED_WAITING. It is typically
  something like this:
 
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x00074039a450 (a
  java.util.concurrent.Semaphore$NonfairSync)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 at
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 at
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 at
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 at
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 at
 
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
 at
 
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
 at
 
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
 at
  org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
 at
 org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller

Ill look at fixing the root issue for 4.5. I've been putting it off for way to 
long. 

Mark 

Sent from my iPhone

On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:

 I was having problems updating SolrCloud with a large batch of records. The
 records are coming in bursts with lulls between updates.
 
 At first, I just tried large updates of 100,000 records at a time.
 Eventually, this caused Solr to hang. When hung, I can still query Solr.
 But I cannot do any deletes or other updates to the index.
 
 At first, my updates were going as SolrJ CSV posts. I have also tried local
 file updates and had similar results. I finally slowed things down to just
 use SolrJ's Update feature, which is basically just JavaBin. I am also
 sending over just 100 at a time in 10 threads. Again, it eventually hung.
 
 Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs
 right away.
 
 These are my commit settings:
 
 autoCommit
   maxTime15000/maxTime
   maxDocs5000/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit
 autoSoftCommit
 maxTime3/maxTime
   /autoSoftCommit
 
 I have tried quite a few variations with the same results. I also tried
 various JVM settings with the same results. The only variable seems to be
 that reducing the cluster size from 2 to 1 is the only thing that helps.
 
 I also did a jstack trace. I did not see any explicit deadlocks, but I did
 see quite a few threads in WAITING or TIMED_WAITING. It is typically
 something like this:
 
  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00074039a450 (a
 java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
 org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 
 It basically appears that Solr

Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-09-04 Thread Sukanta Dey

Hi Team,

In my project I am going to use Apache solr-4.4.0 version for searching. While 
doing that I need to join between multiple solr documents within the same core 
on one of the common field across the documents.
Though I successfully join the documents using solr-4.4.0 join syntax, it is 
returning me the expected result, but, since my next requirement is to sort the 
returned result on basis of the fields from the documents
Involved in join condition's from clause, which I was not able to get. Let me 
explain the problem in detail along with the files I am using ...


1)  Files being used :

a.   Picklist_1.xml

--

adddoc

field name=describedObjectIdt1324838/field

field name=describedObjectType7/field

field name=picklistItemId956/field

field name=siteId130712901/field

field name=enDraft/field

field name=grDraoft/field

/doc/add



b.  Picklist_2.xml

---

adddoc

field name=describedObjectIdt1324837/field

field name=describedObjectType7/field

field name=picklistItemId87749/field

field name=siteId130712901/field

field name=enNew/field

field name=grNeuo/field

/doc/add



c.   AssetID_1.xml

---

adddoc

field name=def14227_picklistt1324837/field

field name=describedObjectIda180894808/field

field name=describedObjectType1/field

field name=isMetadataCompletetrue/field

field name=lastUpdateDate2013-09-02T09:28:18Z/field

field name=ownerId130713716/field

field name=siteId130712901/field

/doc/add



d.  AssetID_2.xml



adddoc

 field name=def14227_picklistt1324838/field

 field name=describedObjectIda171658357/field

field name=describedObjectType1/field

field name=ownerId130713716/field

field name=rGroupId2283961/field

field name=rGroupId2290309/field

field name=rGroupPermissionLevel7/field

field name=rGroupPermissionLevel7/field

field name=rRuleId13503796/field
field name=rRuleId15485964/field

field name=rUgpId38052/field

field name=rUgpId41133/field

field name=siteId130712901/field

/doc/add



2)  Requirement:



i. It needs to have a join  between the files using 
def14227_picklist field from AssetID_1.xml and AssetID_2.xml and 
describedObjectId field from Picklist_1.xml and Picklist_2.xml files.

ii.   After joining we need to have all the fields from the 
files AssetID_*.xml and en,gr fields from Picklist_*.xml files.

iii.  While joining we also sort the result based on the en 
field value.



3)  I was trying with q={!join from=inner_id to=outer_id}zzz:vvv syntax 
but no luck.

Any help/suggestion would be appreciated.

Thanks,
Sukanta Dey

How to config SOLR server for spell check functionality

2013-09-04 Thread sebastian.manolescu

I want to implement spell check functionality offerd by solr using MySql
database, but I dont understand how.
Here the basic flow of what I want to do.

I have a simple inputText (in jsf) and if I type the word shwo the response
to OutputLabel should be show.

First of all I'm using the following tools and frameworks:

JBoss application server 6.1.
Eclipse
JPA
JSF(Primefaces)

Steps I've done until now:

Step 1: Download solr server from:
http://lucene.apache.org/solr/downloads.html Extract content.

Step 2: Add to Envoierment variable:

Variable name: solr.solr.home Variable value :
D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr --- where you have the solr
server

Step 3:

Open solr war and to solr.war\WEB-INF\web.xml add env-entry - (the easy way)

solr/home D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr java.lang.String

OR import project change and bulid war.

Step 4: Browser: localhost:8080/solr/

And the solr console appears.

Until now all works well.

I have found some usefull code (my opinion) that returns:

[collection1] webapp=/solr path=/spell
params={spellcheck=onq=whateverwt=javabinqt=/spellversion=2spellcheck.build=true}
hits=0 status=0 QTime=16

Here is the code that gives the result from above:

SolrServer solr;
try {
solr = new CommonsHttpSolrServer(http://localhost:8080/solr;);

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(qt, /spell);
params.set(q, whatever);
params.set(spellcheck, on);
params.set(spellcheck.build, true);

QueryResponse response = solr.query(params);
SpellCheckResponse spellCheckResponse =
response.getSpellCheckResponse();
if (!spellCheckResponse.isCorrectlySpelled()) {
for (Suggestion suggestion :
response.getSpellCheckResponse().getSuggestions()) {
   System.out.println(original token:  + suggestion.getToken() + 
- alternatives:  + suggestion.getAlternatives());
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Questions:

1.How do I make the database connection whit my DB and search the content to
see if there are any words that could match?
2.How do I make the configuration.(solr-config.xml,shema.xml...etc)?
3.How do I send a string from my view(xhtml) so that the solr server knows
what he looks for?

I read all the information about solr but it's still unclear:

Links:Main Page:
http://lucene.apache.org/solr/

Main Page tutorial: http://lucene.apache.org/solr/4_4_0/tutorial.html

Solr Wiki:
http://wiki.apache.org/solr/Solrj --- official solrj documentation
http://wiki.apache.org/solr/SpellCheckComponent

Solr config: http://wiki.apache.org/solr/SolrConfigXml
http://www.installationpage.com/solr/solr-configuration-tutorial-schema-solrconfig-xml/
http://wiki.apache.org/solr/SchemaXml

StackOverflow proof: Solr Did you mean (Spell check component)

Solr Database Integration:
http://www.slideshare.net/th0masr/integrating-the-solr-search-engine
http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

Solr Spell Check:
http://docs.lucidworks.com/display/solr/Spell+Checking
http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/
http://techiesinsight.blogspot.ro/2012/06/using-solr-spellchecker-from-java.html
http://blog.websolr.com/post/2748574298/spellcheck-with-solr-spellcheckcomponent
How to use SpellingResult class in SolrJ

I really need your help.Regards.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-config-SOLR-server-for-spell-check-functionality-tp4088163.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr performance against oracle

2013-09-04 Thread Toke Eskildsen

On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote:
 I´m trying to change the data access in the company where I work from
 Oracle to Solr.

They work on different principles and fulfill different needs. Comparing
them by a performance oriented test are not likely to be usable point
for selecting between them. Start by describing your typical use cases
instead.

 Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
 returns arround 20 ms (and Oracle server is in another company, I´m using
 dedicated link to access it).

200ms is suspiciously slow for a trivial lookup in 800,000 values. I am
sure we can bring that down to Oracle-time or better, but I do not think
it shows much.

 How can I tell to my managers that I´d like to use Solr?

Why would you like to use Solr?

Solr highlighting fragment issue

2013-09-04 Thread Sreehareesh Kaipravan Meethaleveetil

Hi,
I'm having some  issues with Solr search results (using Solr 1.4 ) . I have 
enabled highlighting of searched text (hl=true) and set the fragment size as 
500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term 
'grandfather' (2 results are displayed) .
Now I have couple of problems in this.

1.   In the search results the keyword is appearing inconsistently towards 
the start/end of the text. I'd like to control the number of characters 
appearing before and after the keyword match (highlighted term). More 
specifically I'd like to get the keyword match somewhere around the middle of 
the resultant text.

2.   The total number of characters appearing in the search result is never 
equals the fragment size I specified (500 characters). It varies in greater 
extends (for example  408 or 520).
Please share your thoughts on achieving the above 2 results.
[cid:image001.png@01CEA8D2.4FF025E0]
Thanks  Regards,
Sreehareesh KM

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller

There is an issue if I remember right, but I can't find it right now.

If anyone that has the problem could try this patch, that would be very
helpful: http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote:

 Hi Mark,

 Got an issue to watch?

 Thanks,
 Markus

 -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Wednesday 4th September 2013 16:55
  To: solr-user@lucene.apache.org
  Subject: Re: SolrCloud 4.x hangs under high update volume
 
  I'm going to try and fix the root cause for 4.5 - I've suspected what it
 is since early this year, but it's never personally been an issue, so it's
 rolled along for a long time.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
 
   Hey guys,
  
   I am looking into an issue we've been having with SolrCloud since the
   beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
 4.4.0
   yet). I've noticed other users with this same issue, so I'd really
 like to
   get to the bottom of it.
  
   Under a very, very high rate of updates (2000+/sec), after 1-12 hours
 we
   see stalled transactions that snowball to consume all Jetty threads in
 the
   JVM. This eventually causes the JVM to hang with most threads waiting
 on
   the condition/stack provided at the bottom of this message. At this
 point
   SolrCloud instances then start to see their neighbors (who also have
 all
   threads hung) as down w/Connection Refused, and the shards become
 down
   in state. Sometimes a node or two survives and just returns 503s no
 server
   hosting shard errors.
  
   As a workaround/experiment, we have tuned the number of threads sending
   updates to Solr, as well as the batch size (we batch updates from
 client -
   solr), and the Soft/Hard autoCommits, all to no avail. Turning off
   Client-to-Solr batching (1 update = 1 call to Solr), which also did not
   help. Certain combinations of update threads and batch sizes seem to
   mask/help the problem, but not resolve it entirely.
  
   Our current environment is the following:
   - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
   - 3 x Zookeeper instances, external Java 7 JVM.
   - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard
 and
   a replica of 1 shard).
   - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
 good
   day.
   - 5000 max jetty threads (well above what we use when we are healthy),
   Linux-user threads ulimit is 6000.
   - Occurs under Jetty 8 or 9 (many versions).
   - Occurs under Java 1.6 or 1.7 (several minor versions).
   - Occurs under several JVM tunings.
   - Everything seems to point to Solr itself, and not a Jetty or Java
 version
   (I hope I'm wrong).
  
   The stack trace that is holding up all my Jetty QTP threads is the
   following, which seems to be waiting on a lock that I would very much
 like
   to understand further:
  
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x0007216e68d8 (a
   java.util.concurrent.Semaphore$NonfairSync)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
  at
  
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
  at
  
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
  at
  
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
  at
  
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
  at
  
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
  at
  
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
  at
  
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
  at
  
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
  at
  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller

It would be great if you could give this patch a try:
http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn kevin.osb...@cbsi.com wrote:

 Thanks. If there is anything I can do to help you resolve this issue, let
 me know.

 -Kevin


 On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote:

  Ill look at fixing the root issue for 4.5. I've been putting it off for
  way to long.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:
 
   I was having problems updating SolrCloud with a large batch of records.
  The
   records are coming in bursts with lulls between updates.
  
   At first, I just tried large updates of 100,000 records at a time.
   Eventually, this caused Solr to hang. When hung, I can still query
 Solr.
   But I cannot do any deletes or other updates to the index.
  
   At first, my updates were going as SolrJ CSV posts. I have also tried
  local
   file updates and had similar results. I finally slowed things down to
  just
   use SolrJ's Update feature, which is basically just JavaBin. I am also
   sending over just 100 at a time in 10 threads. Again, it eventually
 hung.
  
   Sometimes, Solr hangs in the first couple of chunks. Other times, it
  hangs
   right away.
  
   These are my commit settings:
  
   autoCommit
 maxTime15000/maxTime
 maxDocs5000/maxDocs
 openSearcherfalse/openSearcher
   /autoCommit
   autoSoftCommit
   maxTime3/maxTime
 /autoSoftCommit
  
   I have tried quite a few variations with the same results. I also tried
   various JVM settings with the same results. The only variable seems to
 be
   that reducing the cluster size from 2 to 1 is the only thing that
 helps.
  
   I also did a jstack trace. I did not see any explicit deadlocks, but I
  did
   see quite a few threads in WAITING or TIMED_WAITING. It is typically
   something like this:
  
java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x00074039a450 (a
   java.util.concurrent.Semaphore$NonfairSync)
  at
  java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
  at
  
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
  at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
  at
  
 
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
  at
  
 
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
  at
  
 org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
  at
  org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
  at
  
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
  at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at

Questions about Replication Factor on solrcloud

2013-09-04 Thread Lisandro Montaño

Hi all,

 

Im currently working on deploying a solrcloud distribution in centos
machines and wanted to have more guidance about Replication Factor
configuration.

 

I have configured two servers with solrcloud over tomcat and a third server
as zookeeper. I have configured successfully and have one server with
collection1 available and the other with collection1_Shard1_Replica1.

 

My questions are:

 

-  Can I have 1 shard and 2 replicas on two machines? What are the
limitations or considerations to define this?

-  How does replica works? (there is not too much info about it)

-  When I import data on collection1 it works properly, but when I
do it in collection1_Shard1_Replica1 it fails. Is that an expected behavior?
(Maybe if I have a better definition of replicas I will understand it
better)

 

 

Thanks in advance for your help and guidance.

 

Regards,

Lisandro Montano

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt

Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim

On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote:

There is an issue if I remember right, but I can't find it right now.

If anyone that has the problem could try this patch, that would be very
helpful: http://pastebin.com/raw.php?i=aaRWwSGP

- Mark

On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io
wrote:

Hi Mark,

Got an issue to watch?

Thanks,
Markus

-Original message-
From:Mark Miller markrmil...@gmail.com
Sent: Wednesday 4th September 2013 16:55
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud 4.x hangs under high update volume

I'm going to try and fix the root cause for 4.5 - I've suspected what
it
is since early this year, but it's never personally been an issue, so
it's
rolled along for a long time.

Mark

Sent from my iPhone

On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
wrote:

Hey guys,

I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
4.4.0
yet). I've noticed other users with this same issue, so I'd really
like to
get to the bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours
we
see stalled transactions that snowball to consume all Jetty threads
in
the
JVM. This eventually causes the JVM to hang with most threads waiting
on
the condition/stack provided at the bottom of this message. At this
point
SolrCloud instances then start to see their neighbors (who also have
all
threads hung) as down w/Connection Refused, and the shards become
down
in state. Sometimes a node or two survives and just returns 503s no
server
hosting shard errors.

As a workaround/experiment, we have tuned the number of threads
sending
updates to Solr, as well as the batch size (we batch updates from
client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did
not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1
shard
and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
good
day.
- 5000 max jetty threads (well above what we use when we are
healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java
version
(I hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much
like
to understand further:

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for 0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at