unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
i am using solr4.2 on windows7
my schema is:
field name=id type=string indexed=true stored=true
required=true/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true
stored=true multiValued=false/
dynamicField name=ignored_* type=ignored indexed=false stored=
falsemultiValued=true/

 solrconfig.xml :
requestHandler name=/update/extract class=org.apache.solr.handler.
extraction.ExtractingRequestHandler
lst name=defaults
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str
str name=captureAttrtrue/str
/lst
/requestHandler

when i execute:
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
-F myfile=@abc.txt

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
so could i just nest it in a XPathEntityProcessor to filter the html or is 
there something like xpath for tika?

entity name=htm processor=XPathEntityProcessor url=${rec.file} 
forEach=/div[@id='content'] dataSource=main
entity name=tika processor=TikaEntityProcessor 
url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
format=html 
field column=text /
/entity
/entity

but now i dont know how to pass the text to tika, what do i put in url and 
datasource?


On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:

 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.
 
 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for the 
 field text. unfortunately it's indexing the hole page. Can't xpath do this?
 
 data-config.xml:
 
 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource name=main/
 document
entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /
 
entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
field column=text xpath=//div[@id='content'] /
 
/entity
/entity
 /document
 /dataConfig
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt

Hey Alejandro,

I guess it means what you call more than one instance.

The request handlers are at the core-level, and not the Solr 
instance/global level, and within each of those cores you could have one 
or more data import handlers.


Most setups have 1 DIH per core at the handler location /dataimport, 
but I believe you could have several, ie: /dataimport2, /dataimport3 
if you had different DIH configs for each handler.


Within a single data import handler, you can have several entities, 
which are what explain to the DIH processes how to get/index the data. 
What you can do here is have several entities that construct your index, 
and execute those entities with several separate HTTP calls to the DIH, 
thus creating more than one instance of the DIH process within 1 core 
and 1 DIH handler.


ie:

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=suppliers; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=parts; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=companies; 



http://wiki.apache.org/solr/DataImportHandler#Commands

Cheers,

Tim

On 03/09/13 09:25 AM, Alejandro Calbazana wrote:

Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro



Re: Change the score of a document based on the *value* of a multifield using dismax

2013-09-04 Thread danielitos85
Thanks a lot David. 
I will try it ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Shalin Shekhar Mangar
No that wouldn't work. It seems that you probably need a custom
Transformer to extract the right div content. I do not know if
TikaEntityProcessor supports such a thing.

On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote:
 so could i just nest it in a XPathEntityProcessor to filter the html or is 
 there something like xpath for tika?

 entity name=htm processor=XPathEntityProcessor url=${rec.file} 
 forEach=/div[@id='content'] dataSource=main
 entity name=tika processor=TikaEntityProcessor 
 url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
 format=html 
 field column=text /
 /entity
 /entity

 but now i dont know how to pass the text to tika, what do i put in url and 
 datasource?


 On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:

 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.

 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for 
 the field text. unfortunately it's indexing the hole page. Can't xpath do 
 this?

 data-config.xml:

 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource name=main/
 document
entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
field column=title xpath=//title /
field column=id xpath=//id /
field column=file xpath=//file /
field column=path xpath=//path /
field column=url xpath=//url /
field column=Author xpath=//author /

entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
field column=text xpath=//div[@id='content'] /

/entity
/entity
 /document
 /dataConfig



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Measuring SOLR performance

2013-09-04 Thread Dmitry Kan
Hi Roman,

Ok, I will. Thanks!

Cheers,
Dmitry


On Tue, Sep 3, 2013 at 4:46 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Dmitry,

 Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the
 issue of the plugin we use to generate charts). You may want to use the
 github for whatever comes next

 https://github.com/romanchyla/solrjmeter/issues

 Cheers,

   roman


 On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi Roman,
 
  Thanks, the --additionalSolrParams was just what I wanted and works fine.
 
  BTW, if you have some special bug tracking forum for the tool, I'm
 happy
  to submit questions / bug reports there. Otherwise, this email list is ok
  (for me at least).
 
  One other thing I have noticed in the err logs was a series of messages
 of
  this sort upon generating the perf test report. Seems to be jmeter
 related
  (the err messages disappear, if extra lib dir is present under ext
  directory).
 
  java.lang.Throwable: Could not access
  /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
 
  at
 
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
  at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
 
 
 
  On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   Hi Dmitry,
  
   If it is something you want to pass with every request (which is my use
   case), you can pass it as additional solr params, eg.
  
   python solrjmeter
  
  
 
 --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name
   
  
   the string should be url encoded.
  
   If it is something that changes with every request, you should modify
 the
   jmeter test. If you open/load it with jmeter GUI, in the HTTP request
   processor you can define other additional fields to pass with the
  request.
   These values can come from the CSV file, you'll see an example how to
 use
   that when you open the test difinition file.
  
   Cheers,
  
 roman
  
  
  
  
   On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com
 wrote:
  
Hi Erick,
   
Agree, this is perfectly fine to mix them in solr. But my question is
   about
solrjmeter input query format. Just couldn't find a suitable example
 on
   the
solrjmeter's github.
   
Dmitry
   
   
   
On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 filter and facet queries can be freely intermixed, it's not a
  problem.
 What problem are you seeing when you try this?

 Best,
 Erick


 On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com
   wrote:

  Hi Roman,
 
  What's the format for running the facet+filter queries?
 
  Would something like this work:
 
  field:foo  =50  fq=other_field:bar facet=true
 facet.field=facet_field_name
 
 
  Thanks,
  Dmitry
 
 
 
  On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan 
 solrexp...@gmail.com
 wrote:
 
   Hi Roman,
  
   With adminPath=/admin or adminPath=/admin/cores, no.
Interestingly
   enough, though, I can access
   http://localhost:8983/solr/statements/admin/system
  
   But I can access http://localhost:8983/solr/admin/cores, only
  when
 with
   adminPath=/admin/cores (which suggests that this is the right
   value
 to
  be
   used for cores), and not with adminPath=/admin.
  
   Bottom line, these core configuration is not self-evident.
  
   Dmitry
  
  
  
  
   On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla 
   roman.ch...@gmail.com
  wrote:
  
   Hi Dmitry,
   So it seems solrjmeter should not assume the adminPath - and
   perhaps
  needs
   to be passed as an argument. When you set the adminPath, are
 you
able
 to
   access localhost:8983/solr/statements/admin/cores ?
  
   roman
  
  
   On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan 
  solrexp...@gmail.com
   
  wrote:
  
Hi Roman,
   
I have noticed a difference with different solr.xml config
contents.
  It
   is
probably legit, but thought to let you know (tests run on
  fresh
   checkout as
of today).
   
As mentioned before, I have two cores configured in
 solr.xml.
  If
the
   file
is:
   
[code]
solr persistent=false
   
  !--
  adminPath: RequestHandler path to manage cores.
If 'null' (or absent), cores will not be manageable via
   

Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase but 
also hits that have the word 'the' before the word 'toolkit', no matter how far 
apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80


Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-04 Thread maephisto
Thanks Shawn!

Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick.
Currently I'm exploring and experimenting with SolrCloud, thus I only used
only one ZK.
For a production environment you suggestion would, of course, be mandatory.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4088164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde
My solrconfig.xml is:

 

requestHandler name=/update/extract
class=solr.extraction.ExtractingRequestHandler 

lst name=defaults

str name=fmap.contentdesc/str   !-to map this field of my table which
is defined as shown below in schem.xml--

str name=lowernamestrue/str

str name=uprefixattr_/str

str name=captureAttrtrue/str

/lst

/requestHandler

lib dir=../../extract regex=.*\.jar /

 

Schema.xml:

fields 

field name=doc_id type=integer indexed=true stored=true
multiValued=false/  

field name=name type=text indexed=true stored=true
multiValued=false/  

field name=path type=text indexed=true stored=true
multiValued=false/

field name=desc type=text_split indexed=true stored=true
multiValued=false/

/fields 

types

fieldType name=string class=solr.StrField  /

fieldType name=integer class=solr.IntField /

fieldType name=text class=solr.TextField /

fieldType name=text class=solr.TextField /

/types

dynamicField name=*_i  type=integer  indexed=true  stored=true/

uniqueKeydoc_id/uniqueKey

 

I have created extract directory and copied all required .jar and solr-cell
jar files into this extract directory and given its path in lib tag in
solrconfig.xml

 

When I try out this:

 

curl
http://localhost:8080/solr/update/extract?literal.doc_id=1commit=true;

-F myfile=@solr-word.pdf mailto:myfile=@solr-word.pdf   in Windows 7.

 

I get /solr/update/extract is not available and sometimes I get access
denied error.

I tried resolving through net,but in vain.as all the solutions are related
to linux os,im working on Windows.

Please help me and provide solutions related o Windows os.

I referred Apache_solr_4_Cookbook.

Thanks a lot.



solr performance against oracle

2013-09-04 Thread Sergio Stateri
Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName(oracle.jdbc.driver.OracleDriver);
Connection conn =
DriverManager.getConnection(XXX);
PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689);
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println(total time: + (new
Date().getTime()-initialTime.getTime()) +  ms);
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = http://localhost:8983/solr/;;
this.solrServerUrl = http://localhost:8983/solr/roteiros/;;
server = new HttpSolrServer(solrUrl);
 String docId = AddOneRoteiroToCollection.docId;
 HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
 query.setQuery((id: + docId + )); // search by id
query.addField(id);
query.addField(descricaoRoteiro);

extrairEApresentarResultados(query);
 }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
 for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println(Total de documentos encontrados:  + docs.size());
System.out.println(Tempo total:  + now +  ms);
}


descricaoRoteiro is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name id (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.


-- 
Sergio Stateri Jr.
stat...@gmail.com


Re: solr performance against oracle

2013-09-04 Thread Andrea Gazzarini
You said nothing about your enviroments (e.g. operating systems, what 
kind of Oracle installation you have, whar kind of SOLR installation, 
how many data in database, how many documents in index, RAM for SOLR, 
for Oracle, for OS, and in general hardware...and so on)...


Anyway...a migration from Oracle to SOLR? That is, you're going to throw 
out the window Oracle and completely replace it with SOLR? I would 
consider other aspects first before your performace test...unless you 
have one flat table in Oracle, you should explain to your manager that 
there's a lot work that needs to be done for that kind of migration 
(e.g. collect all query requirements, denormalization)


Best,
Gazza


On 09/04/2013 02:06 PM, Sergio Stateri wrote:

Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName(oracle.jdbc.driver.OracleDriver);
Connection conn =
DriverManager.getConnection(XXX);
PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689);
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println(total time: + (new
Date().getTime()-initialTime.getTime()) +  ms);
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = http://localhost:8983/solr/;;
this.solrServerUrl = http://localhost:8983/solr/roteiros/;;
server = new HttpSolrServer(solrUrl);
  String docId = AddOneRoteiroToCollection.docId;
  HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
  query.setQuery((id: + docId + )); // search by id
query.addField(id);
query.addField(descricaoRoteiro);

extrairEApresentarResultados(query);
  }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
  for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println(Total de documentos encontrados:  + docs.size());
System.out.println(Tempo total:  + now +  ms);
}


descricaoRoteiro is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name id (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.






Re: Strange behaviour with single word and phrase

2013-09-04 Thread Jack Krupansky
Do you have stop word filtering enabled? What does your field type look 
like?


If stop words are ignored, you will get exactly the behavior you described.

-- Jack Krupansky

-Original Message- 
From: Alistair Young

Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase 
but also hits that have the word 'the' before the word 'toolkit', no matter 
how far apart they are.


Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80 



Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Jack Krupansky

Did you restart Solr after editing config and schema?

-- Jack Krupansky

-Original Message- 
From: Nutan

Sent: Wednesday, September 04, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: unknown _stream_source_info while indexing rich doc in solr

i am using solr4.2 on windows7
my schema is:
field name=id type=string indexed=true stored=true
required=true/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true
stored=true multiValued=false/
dynamicField name=ignored_* type=ignored indexed=false stored=
falsemultiValued=true/

solrconfig.xml :
requestHandler name=/update/extract class=org.apache.solr.handler.
extraction.ExtractingRequestHandler
lst name=defaults
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str
str name=captureAttrtrue/str
/lst
/requestHandler

when i execute:
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
-F myfile=@abc.txt

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: Solr Cloud hangs when replicating updates

2013-09-04 Thread Greg Walters
Kevin,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: kevin.osb...@cbsinteractive.com [mailto:kevin.osb...@cbsinteractive.com] 
On Behalf Of Kevin Osborn
Sent: Tuesday, September 03, 2013 4:16 PM
To: solr-user
Subject: Solr Cloud hangs when replicating updates

I was having problems updating SolrCloud with a large batch of records. The 
records are coming in bursts with lulls between updates.

At first, I just tried large updates of 100,000 records at a time.
Eventually, this caused Solr to hang. When hung, I can still query Solr.
But I cannot do any deletes or other updates to the index.

At first, my updates were going as SolrJ CSV posts. I have also tried local 
file updates and had similar results. I finally slowed things down to just use 
SolrJ's Update feature, which is basically just JavaBin. I am also sending over 
just 100 at a time in 10 threads. Again, it eventually hung.

Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs 
right away.

These are my commit settings:

autoCommit
   maxTime15000/maxTime
   maxDocs5000/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit
autoSoftCommit
 maxTime3/maxTime
   /autoSoftCommit

I have tried quite a few variations with the same results. I also tried various 
JVM settings with the same results. The only variable seems to be that reducing 
the cluster size from 2 to 1 is the only thing that helps.

I also did a jstack trace. I did not see any explicit deadlocks, but I did see 
quite a few threads in WAITING or TIMED_WAITING. It is typically something like 
this:

  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00074039a450 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Greg Walters
Tim,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: Tim Vaillancourt [mailto:t...@elementspace.com] 
Sent: Tuesday, September 03, 2013 6:31 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud 4.x hangs under high update volume

Hey guys,

I am looking into an issue we've been having with SolrCloud since the beginning 
of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've 
noticed other users with this same issue, so I'd really like to get to the 
bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see 
stalled transactions that snowball to consume all Jetty threads in the JVM. 
This eventually causes the JVM to hang with most threads waiting on the 
condition/stack provided at the bottom of this message. At this point SolrCloud 
instances then start to see their neighbors (who also have all threads hung) as 
down w/Connection Refused, and the shards become down
in state. Sometimes a node or two survives and just returns 503s no server 
hosting shard errors.

As a workaround/experiment, we have tuned the number of threads sending updates 
to Solr, as well as the batch size (we batch updates from client - solr), and 
the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching 
(1 update = 1 call to Solr), which also did not help. Certain combinations of 
update threads and batch sizes seem to mask/help the problem, but not resolve 
it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a 
replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day.
- 5000 max jetty threads (well above what we use when we are healthy), 
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java version (I 
hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the following, 
which seems to be waiting on a lock that I would very much like to understand 
further:

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
   

Re: Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
Yep ignoring stop words. Thanks for the pointer.

Alistair

-
mov eax,1
mov ebx,0
int 80




On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote:

Do you have stop word filtering enabled? What does your field type look
like?

If stop words are ignored, you will get exactly the behavior you
described.

-- Jack Krupansky

-Original Message-
From: Alistair Young
Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase
but also hits that have the word 'the' before the word 'toolkit', no
matter 
how far apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80 






Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
or could i use a filter in schema.xml where i define a fieldtype and use some 
filter that understands xpath?

On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote:

 No that wouldn't work. It seems that you probably need a custom
 Transformer to extract the right div content. I do not know if
 TikaEntityProcessor supports such a thing.
 
 On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote:
 so could i just nest it in a XPathEntityProcessor to filter the html or is 
 there something like xpath for tika?
 
 entity name=htm processor=XPathEntityProcessor url=${rec.file} 
 forEach=/div[@id='content'] dataSource=main
entity name=tika processor=TikaEntityProcessor 
 url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity 
 format=html 
field column=text /
/entity
/entity
 
 but now i dont know how to pass the text to tika, what do i put in url and 
 datasource?
 
 
 On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:
 
 I don't know much about Tika but in the example data-config.xml that
 you posted, the xpath attribute on the field text won't work
 because the xpath attribute is used only by a XPathEntityProcessor.
 
 On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for 
 the field text. unfortunately it's indexing the hole page. Can't xpath 
 do this?
 
 data-config.xml:
 
 dataConfig
   dataSource type=BinFileDataSource name=data/
   dataSource type=BinURLDataSource name=dataUrl/
   dataSource type=URLDataSource name=main/
 document
   entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
   field column=title xpath=//title /
   field column=id xpath=//id /
   field column=file xpath=//file /
   field column=path xpath=//path /
   field column=url xpath=//url /
   field column=Author xpath=//author /
 
   entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
   field column=text xpath=//div[@id='content'] /
 
   /entity
   /entity
 /document
 /dataConfig
 
 
 
 --
 Regards,
 Shalin Shekhar Mangar.
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
I'm going to try and fix the root cause for 4.5 - I've suspected what it is 
since early this year, but it's never personally been an issue, so it's rolled 
along for a long time. 

Mark

Sent from my iPhone

On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote:

 Hey guys,
 
 I am looking into an issue we've been having with SolrCloud since the
 beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
 yet). I've noticed other users with this same issue, so I'd really like to
 get to the bottom of it.
 
 Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
 see stalled transactions that snowball to consume all Jetty threads in the
 JVM. This eventually causes the JVM to hang with most threads waiting on
 the condition/stack provided at the bottom of this message. At this point
 SolrCloud instances then start to see their neighbors (who also have all
 threads hung) as down w/Connection Refused, and the shards become down
 in state. Sometimes a node or two survives and just returns 503s no server
 hosting shard errors.
 
 As a workaround/experiment, we have tuned the number of threads sending
 updates to Solr, as well as the batch size (we batch updates from client -
 solr), and the Soft/Hard autoCommits, all to no avail. Turning off
 Client-to-Solr batching (1 update = 1 call to Solr), which also did not
 help. Certain combinations of update threads and batch sizes seem to
 mask/help the problem, but not resolve it entirely.
 
 Our current environment is the following:
 - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
 - 3 x Zookeeper instances, external Java 7 JVM.
 - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
 a replica of 1 shard).
 - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
 day.
 - 5000 max jetty threads (well above what we use when we are healthy),
 Linux-user threads ulimit is 6000.
 - Occurs under Jetty 8 or 9 (many versions).
 - Occurs under Java 1.6 or 1.7 (several minor versions).
 - Occurs under several JVM tunings.
 - Everything seems to point to Solr itself, and not a Jetty or Java version
 (I hope I'm wrong).
 
 The stack trace that is holding up all my Jetty QTP threads is the
 following, which seems to be waiting on a lock that I would very much like
 to understand further:
 
 java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
 java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at
 

Re: Boost by numFounds

2013-09-04 Thread Flavio Pompermaier
I found that what can do the trick for page-rank like indexing is
externalFileField! Is there an help to upload the external files to all
solr servers (in solr 3 and solrCloud)?
Or should I copy it to all solr instances data folder and then reload their
cache?

On Sat, Aug 24, 2013 at 12:36 AM, Flavio Pompermaier
pomperma...@okkam.itwrote:

 Any help..? Is it possible to add this pagerank-like behaviour?




Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Kevin Osborn
I am having this issue as well. I did apply this patch. Unfortunately, it
did not resolve the issue in my case.


On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters
gwalt...@sherpaanalytics.comwrote:

 Tim,

 Take a look at
 http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland
 https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue
 that you're reporting for a while then I applied the patch from SOLR-4816
 to my clients and the problems went away. If you don't feel like applying
 the patch it looks like it should be included in the release of version
 4.5. Also note that the problem happens more frequently when the
 replication factor is greater than 1.

 Thanks,
 Greg

 -Original Message-
 From: Tim Vaillancourt [mailto:t...@elementspace.com]
 Sent: Tuesday, September 03, 2013 6:31 PM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud 4.x hangs under high update volume

 Hey guys,

 I am looking into an issue we've been having with SolrCloud since the
 beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
 yet). I've noticed other users with this same issue, so I'd really like to
 get to the bottom of it.

 Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
 see stalled transactions that snowball to consume all Jetty threads in the
 JVM. This eventually causes the JVM to hang with most threads waiting on
 the condition/stack provided at the bottom of this message. At this point
 SolrCloud instances then start to see their neighbors (who also have all
 threads hung) as down w/Connection Refused, and the shards become down
 in state. Sometimes a node or two survives and just returns 503s no
 server hosting shard errors.

 As a workaround/experiment, we have tuned the number of threads sending
 updates to Solr, as well as the batch size (we batch updates from client -
 solr), and the Soft/Hard autoCommits, all to no avail. Turning off
 Client-to-Solr batching (1 update = 1 call to Solr), which also did not
 help. Certain combinations of update threads and batch sizes seem to
 mask/help the problem, but not resolve it entirely.

 Our current environment is the following:
 - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
 - 3 x Zookeeper instances, external Java 7 JVM.
 - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
 a replica of 1 shard).
 - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
 day.
 - 5000 max jetty threads (well above what we use when we are healthy),
 Linux-user threads ulimit is 6000.
 - Occurs under Jetty 8 or 9 (many versions).
 - Occurs under Java 1.6 or 1.7 (several minor versions).
 - Occurs under several JVM tunings.
 - Everything seems to point to Solr itself, and not a Jetty or Java
 version (I hope I'm wrong).

 The stack trace that is holding up all my Jetty QTP threads is the
 following, which seems to be waiting on a lock that I would very much like
 to understand further:

 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007216e68d8 (a
 java.util.concurrent.Semaphore$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at

 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at

 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 at

 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 at

 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 at

 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 at

 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 at

 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 at

 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
 at

 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
 at

 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
 at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at

 

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Kevin Osborn
Thanks. If there is anything I can do to help you resolve this issue, let
me know.

-Kevin


On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote:

 Ill look at fixing the root issue for 4.5. I've been putting it off for
 way to long.

 Mark

 Sent from my iPhone

 On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:

  I was having problems updating SolrCloud with a large batch of records.
 The
  records are coming in bursts with lulls between updates.
 
  At first, I just tried large updates of 100,000 records at a time.
  Eventually, this caused Solr to hang. When hung, I can still query Solr.
  But I cannot do any deletes or other updates to the index.
 
  At first, my updates were going as SolrJ CSV posts. I have also tried
 local
  file updates and had similar results. I finally slowed things down to
 just
  use SolrJ's Update feature, which is basically just JavaBin. I am also
  sending over just 100 at a time in 10 threads. Again, it eventually hung.
 
  Sometimes, Solr hangs in the first couple of chunks. Other times, it
 hangs
  right away.
 
  These are my commit settings:
 
  autoCommit
maxTime15000/maxTime
maxDocs5000/maxDocs
openSearcherfalse/openSearcher
  /autoCommit
  autoSoftCommit
  maxTime3/maxTime
/autoSoftCommit
 
  I have tried quite a few variations with the same results. I also tried
  various JVM settings with the same results. The only variable seems to be
  that reducing the cluster size from 2 to 1 is the only thing that helps.
 
  I also did a jstack trace. I did not see any explicit deadlocks, but I
 did
  see quite a few threads in WAITING or TIMED_WAITING. It is typically
  something like this:
 
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x00074039a450 (a
  java.util.concurrent.Semaphore$NonfairSync)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 at
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 at
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 at
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 at
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 at
 
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
 at
 
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
 at
 
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
 at
  org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
 at
 org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 
 

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
Ill look at fixing the root issue for 4.5. I've been putting it off for way to 
long. 

Mark 

Sent from my iPhone

On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:

 I was having problems updating SolrCloud with a large batch of records. The
 records are coming in bursts with lulls between updates.
 
 At first, I just tried large updates of 100,000 records at a time.
 Eventually, this caused Solr to hang. When hung, I can still query Solr.
 But I cannot do any deletes or other updates to the index.
 
 At first, my updates were going as SolrJ CSV posts. I have also tried local
 file updates and had similar results. I finally slowed things down to just
 use SolrJ's Update feature, which is basically just JavaBin. I am also
 sending over just 100 at a time in 10 threads. Again, it eventually hung.
 
 Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs
 right away.
 
 These are my commit settings:
 
 autoCommit
   maxTime15000/maxTime
   maxDocs5000/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit
 autoSoftCommit
 maxTime3/maxTime
   /autoSoftCommit
 
 I have tried quite a few variations with the same results. I also tried
 various JVM settings with the same results. The only variable seems to be
 that reducing the cluster size from 2 to 1 is the only thing that helps.
 
 I also did a jstack trace. I did not see any explicit deadlocks, but I did
 see quite a few threads in WAITING or TIMED_WAITING. It is typically
 something like this:
 
  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00074039a450 (a
 java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
 org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 
 It basically appears that Solr 

Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-09-04 Thread Sukanta Dey
Hi Team,

In my project I am going to use Apache solr-4.4.0 version for searching. While 
doing that I need to join between multiple solr documents within the same core 
on one of the common field across the documents.
Though I successfully join the documents using solr-4.4.0 join syntax, it is 
returning me the expected result, but, since my next requirement is to sort the 
returned result on basis of the fields from the documents
Involved in join condition's from clause, which I was not able to get. Let me 
explain the problem in detail along with the files I am using ...


1)  Files being used :

a.   Picklist_1.xml

--

adddoc

field name=describedObjectIdt1324838/field

field name=describedObjectType7/field

field name=picklistItemId956/field

field name=siteId130712901/field

field name=enDraft/field

field name=grDraoft/field

/doc/add



b.  Picklist_2.xml

---

adddoc

field name=describedObjectIdt1324837/field

field name=describedObjectType7/field

field name=picklistItemId87749/field

field name=siteId130712901/field

field name=enNew/field

field name=grNeuo/field

/doc/add



c.   AssetID_1.xml

---

adddoc

field name=def14227_picklistt1324837/field

field name=describedObjectIda180894808/field

field name=describedObjectType1/field

field name=isMetadataCompletetrue/field

field name=lastUpdateDate2013-09-02T09:28:18Z/field

field name=ownerId130713716/field

field name=siteId130712901/field

/doc/add



d.  AssetID_2.xml



adddoc

 field name=def14227_picklistt1324838/field

 field name=describedObjectIda171658357/field

field name=describedObjectType1/field

field name=ownerId130713716/field

field name=rGroupId2283961/field

field name=rGroupId2290309/field

field name=rGroupPermissionLevel7/field

field name=rGroupPermissionLevel7/field

field name=rRuleId13503796/field
field name=rRuleId15485964/field

field name=rUgpId38052/field

field name=rUgpId41133/field

field name=siteId130712901/field

/doc/add



2)  Requirement:



i. It needs to have a join  between the files using 
def14227_picklist field from AssetID_1.xml and AssetID_2.xml and 
describedObjectId field from Picklist_1.xml and Picklist_2.xml files.

ii.   After joining we need to have all the fields from the 
files AssetID_*.xml and en,gr fields from Picklist_*.xml files.

iii.  While joining we also sort the result based on the en 
field value.



3)  I was trying with q={!join from=inner_id to=outer_id}zzz:vvv syntax 
but no luck.

Any help/suggestion would be appreciated.

Thanks,
Sukanta Dey






How to config SOLR server for spell check functionality

2013-09-04 Thread sebastian.manolescu
I want to implement spell check functionality offerd by solr using MySql
database, but I dont understand how.
Here the basic flow of what I want to do.

I have a simple inputText (in jsf) and if I type the word shwo the response
to OutputLabel should be show.

First of all I'm using the following tools and frameworks:

JBoss application server 6.1.
Eclipse
JPA
JSF(Primefaces)

Steps I've done until now:

Step 1: Download solr server from:
http://lucene.apache.org/solr/downloads.html Extract content.

Step 2: Add to Envoierment variable:

Variable name: solr.solr.home Variable value :
D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr --- where you have the solr
server

Step 3:

Open solr war and to solr.war\WEB-INF\web.xml add env-entry - (the easy way)

solr/home D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr java.lang.String

OR import project change and bulid war.

Step 4: Browser: localhost:8080/solr/

And the solr console appears.

Until now all works well.

I have found some usefull code (my opinion) that returns:

[collection1] webapp=/solr path=/spell
params={spellcheck=onq=whateverwt=javabinqt=/spellversion=2spellcheck.build=true}
hits=0 status=0 QTime=16

Here is the code that gives the result from above:

SolrServer solr;
try {
solr = new CommonsHttpSolrServer(http://localhost:8080/solr;);

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(qt, /spell);
params.set(q, whatever);
params.set(spellcheck, on);
params.set(spellcheck.build, true);

QueryResponse response = solr.query(params);
SpellCheckResponse spellCheckResponse =
response.getSpellCheckResponse();
if (!spellCheckResponse.isCorrectlySpelled()) {
for (Suggestion suggestion :
response.getSpellCheckResponse().getSuggestions()) {
   System.out.println(original token:  + suggestion.getToken() + 
- alternatives:  + suggestion.getAlternatives());
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Questions:

1.How do I make the database connection whit my DB and search the content to
see if there are any words that could match?
2.How do I make the configuration.(solr-config.xml,shema.xml...etc)?
3.How do I send a string from my view(xhtml) so that the solr server knows
what he looks for?

I read all the information about solr but it's still unclear:

Links:Main Page:
http://lucene.apache.org/solr/

Main Page tutorial: http://lucene.apache.org/solr/4_4_0/tutorial.html

Solr Wiki:
http://wiki.apache.org/solr/Solrj --- official solrj documentation
http://wiki.apache.org/solr/SpellCheckComponent

Solr config: http://wiki.apache.org/solr/SolrConfigXml
http://www.installationpage.com/solr/solr-configuration-tutorial-schema-solrconfig-xml/
http://wiki.apache.org/solr/SchemaXml

StackOverflow proof: Solr Did you mean (Spell check component)

Solr Database Integration:
http://www.slideshare.net/th0masr/integrating-the-solr-search-engine
http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

Solr Spell Check:
http://docs.lucidworks.com/display/solr/Spell+Checking
http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/
http://techiesinsight.blogspot.ro/2012/06/using-solr-spellchecker-from-java.html
http://blog.websolr.com/post/2748574298/spellcheck-with-solr-spellcheckcomponent
How to use SpellingResult class in SolrJ

I really need your help.Regards.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-config-SOLR-server-for-spell-check-functionality-tp4088163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr performance against oracle

2013-09-04 Thread Toke Eskildsen
On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote:
 I´m trying to change the data access in the company where I work from
 Oracle to Solr.

They work on different principles and fulfill different needs. Comparing
them by a performance oriented test are not likely to be usable point
for selecting between them. Start by describing your typical use cases
instead.

 Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
 returns arround 20 ms (and Oracle server is in another company, I´m using
 dedicated link to access it).

200ms is suspiciously slow for a trivial lookup in 800,000 values. I am
sure we can bring that down to Oracle-time or better, but I do not think
it shows much.

 How can I tell to my managers that I´d like to use Solr?

Why would you like to use Solr?



Solr highlighting fragment issue

2013-09-04 Thread Sreehareesh Kaipravan Meethaleveetil
Hi,
I'm having some  issues with Solr search results (using Solr 1.4 ) . I have 
enabled highlighting of searched text (hl=true) and set the fragment size as 
500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term 
'grandfather' (2 results are displayed) .
Now I have couple of problems in this.

1.   In the search results the keyword is appearing inconsistently towards 
the start/end of the text. I'd like to control the number of characters 
appearing before and after the keyword match (highlighted term). More 
specifically I'd like to get the keyword match somewhere around the middle of 
the resultant text.

2.   The total number of characters appearing in the search result is never 
equals the fragment size I specified (500 characters). It varies in greater 
extends (for example  408 or 520).
Please share your thoughts on achieving the above 2 results.
[cid:image001.png@01CEA8D2.4FF025E0]
Thanks  Regards,
Sreehareesh KM


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
There is an issue if I remember right, but I can't find it right now.

If anyone that has the problem could try this patch, that would be very
helpful: http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote:

 Hi Mark,

 Got an issue to watch?

 Thanks,
 Markus

 -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Wednesday 4th September 2013 16:55
  To: solr-user@lucene.apache.org
  Subject: Re: SolrCloud 4.x hangs under high update volume
 
  I'm going to try and fix the root cause for 4.5 - I've suspected what it
 is since early this year, but it's never personally been an issue, so it's
 rolled along for a long time.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
 
   Hey guys,
  
   I am looking into an issue we've been having with SolrCloud since the
   beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
 4.4.0
   yet). I've noticed other users with this same issue, so I'd really
 like to
   get to the bottom of it.
  
   Under a very, very high rate of updates (2000+/sec), after 1-12 hours
 we
   see stalled transactions that snowball to consume all Jetty threads in
 the
   JVM. This eventually causes the JVM to hang with most threads waiting
 on
   the condition/stack provided at the bottom of this message. At this
 point
   SolrCloud instances then start to see their neighbors (who also have
 all
   threads hung) as down w/Connection Refused, and the shards become
 down
   in state. Sometimes a node or two survives and just returns 503s no
 server
   hosting shard errors.
  
   As a workaround/experiment, we have tuned the number of threads sending
   updates to Solr, as well as the batch size (we batch updates from
 client -
   solr), and the Soft/Hard autoCommits, all to no avail. Turning off
   Client-to-Solr batching (1 update = 1 call to Solr), which also did not
   help. Certain combinations of update threads and batch sizes seem to
   mask/help the problem, but not resolve it entirely.
  
   Our current environment is the following:
   - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
   - 3 x Zookeeper instances, external Java 7 JVM.
   - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard
 and
   a replica of 1 shard).
   - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
 good
   day.
   - 5000 max jetty threads (well above what we use when we are healthy),
   Linux-user threads ulimit is 6000.
   - Occurs under Jetty 8 or 9 (many versions).
   - Occurs under Java 1.6 or 1.7 (several minor versions).
   - Occurs under several JVM tunings.
   - Everything seems to point to Solr itself, and not a Jetty or Java
 version
   (I hope I'm wrong).
  
   The stack trace that is holding up all my Jetty QTP threads is the
   following, which seems to be waiting on a lock that I would very much
 like
   to understand further:
  
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x0007216e68d8 (a
   java.util.concurrent.Semaphore$NonfairSync)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
  at
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
  at
  
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
  at
  
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
  at
  
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
  at
  
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
  at
  
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
  at
  
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
  at
  
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
  at
  
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
  at
  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
  at
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at
  
 

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
It would be great if you could give this patch a try:
http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn kevin.osb...@cbsi.com wrote:

 Thanks. If there is anything I can do to help you resolve this issue, let
 me know.

 -Kevin


 On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote:

  Ill look at fixing the root issue for 4.5. I've been putting it off for
  way to long.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:
 
   I was having problems updating SolrCloud with a large batch of records.
  The
   records are coming in bursts with lulls between updates.
  
   At first, I just tried large updates of 100,000 records at a time.
   Eventually, this caused Solr to hang. When hung, I can still query
 Solr.
   But I cannot do any deletes or other updates to the index.
  
   At first, my updates were going as SolrJ CSV posts. I have also tried
  local
   file updates and had similar results. I finally slowed things down to
  just
   use SolrJ's Update feature, which is basically just JavaBin. I am also
   sending over just 100 at a time in 10 threads. Again, it eventually
 hung.
  
   Sometimes, Solr hangs in the first couple of chunks. Other times, it
  hangs
   right away.
  
   These are my commit settings:
  
   autoCommit
 maxTime15000/maxTime
 maxDocs5000/maxDocs
 openSearcherfalse/openSearcher
   /autoCommit
   autoSoftCommit
   maxTime3/maxTime
 /autoSoftCommit
  
   I have tried quite a few variations with the same results. I also tried
   various JVM settings with the same results. The only variable seems to
 be
   that reducing the cluster size from 2 to 1 is the only thing that
 helps.
  
   I also did a jstack trace. I did not see any explicit deadlocks, but I
  did
   see quite a few threads in WAITING or TIMED_WAITING. It is typically
   something like this:
  
java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x00074039a450 (a
   java.util.concurrent.Semaphore$NonfairSync)
  at
  java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
  at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
  at
  
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
  at
  
 
 org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
  at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
  at
  
 
 org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
  at
  
 
 org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
  at
  
 org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
  at
  org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
  at
  
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
  at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
  
 
 

Questions about Replication Factor on solrcloud

2013-09-04 Thread Lisandro Montaño
Hi all,

 

I’m currently working on deploying a solrcloud distribution in centos
machines and wanted to have more guidance about Replication Factor
configuration.

 

I have configured two servers with solrcloud over tomcat and a third server
as zookeeper. I have configured successfully and have one server with
collection1 available and the other with collection1_Shard1_Replica1.

 

My questions are:

 

-  Can I have 1 shard and 2 replicas on two machines? What are the
limitations or considerations to define this?

-  How does replica works? (there is not too much info about it)

-  When I import data on collection1 it works properly, but when I
do it in collection1_Shard1_Replica1 it fails. Is that an expected behavior?
(Maybe if I have a better definition of replica’s I will understand it
better)

 

 

Thanks in advance for your help and guidance.

 

Regards,

Lisandro Montano

 



Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim



On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote:

 There is an issue if I remember right, but I can't find it right now.

 If anyone that has the problem could try this patch, that would be very
 helpful: http://pastebin.com/raw.php?i=aaRWwSGP

 - Mark


 On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io
 wrote:

  Hi Mark,
 
  Got an issue to watch?
 
  Thanks,
  Markus
 
  -Original message-
   From:Mark Miller markrmil...@gmail.com
   Sent: Wednesday 4th September 2013 16:55
   To: solr-user@lucene.apache.org
   Subject: Re: SolrCloud 4.x hangs under high update volume
  
   I'm going to try and fix the root cause for 4.5 - I've suspected what
 it
  is since early this year, but it's never personally been an issue, so
 it's
  rolled along for a long time.
  
   Mark
  
   Sent from my iPhone
  
   On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
  wrote:
  
Hey guys,
   
I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
  4.4.0
yet). I've noticed other users with this same issue, so I'd really
  like to
get to the bottom of it.
   
Under a very, very high rate of updates (2000+/sec), after 1-12 hours
  we
see stalled transactions that snowball to consume all Jetty threads
 in
  the
JVM. This eventually causes the JVM to hang with most threads waiting
  on
the condition/stack provided at the bottom of this message. At this
  point
SolrCloud instances then start to see their neighbors (who also have
  all
threads hung) as down w/Connection Refused, and the shards become
  down
in state. Sometimes a node or two survives and just returns 503s no
  server
hosting shard errors.
   
As a workaround/experiment, we have tuned the number of threads
 sending
updates to Solr, as well as the batch size (we batch updates from
  client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did
 not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.
   
Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1
 shard
  and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
  good
day.
- 5000 max jetty threads (well above what we use when we are
 healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java
  version
(I hope I'm wrong).
   
The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much
  like
to understand further:
   
java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
   at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
   at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
   at
   
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
   at
   
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
   at
   
 
 

Re: cleanup after OutOfMemoryError

2013-09-04 Thread Mark Miller
I don't know that there is any 'safe' thing you can do other than restart -
but if I were to try anything, I would use true for rollback.

- Mark


On Wed, Sep 4, 2013 at 9:44 AM, Ryan McKinley ryan...@gmail.com wrote:

 I have an application where I am calling DirectUpdateHandler2 directly
 with:

   update.addDoc(cmd);

 This will sometimes hit:

 java.lang.OutOfMemoryError: Java heap space
 at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248)
 at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234)
 at

 org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273)
 at

 org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126)
 at

 org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
 at

 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264)
 at

 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283)
 at

 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
 at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
 at

 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
 at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303)

 and then a little while later:

 auto commit error...:java.lang.IllegalStateException: this writer hit an
 OutOfMemoryError; cannot commit
 at

 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
 at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)


 Is there anythign I can/should do to cleanup after the OOME?  At a minimum
 I do not want any new requests using the same IndexWriter.  Should I use:


   catch(OutOfMemoryError ex) {

update.getCommitTracker().cancelPendingCommit();
  update.newIndexWriter(false);
  ...

 or perhaps 'true' for rollback?

 Thanks
 Ryan




-- 
- Mark


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
The 'lock' or semaphore was added to cap the number of threads that would be 
used. Previously, the number of threads in use could spike to many, many 
thousands on heavy updates. A limit on the number of outstanding requests was 
put in place to keep this from happening. Something like 16 * the number of 
hosts in the cluster.

I assume the deadlock comes from the fact that requests are of two kinds - 
forward to the leader and distrib updates from the leader to replicas. Forward 
to the leader actually waits for the leader to then distrib the updates to 
replicas before returning. I believe this is what can lead to deadlock. 

This is likely why the patch for the CloudSolrServer can help the situation - 
it removes the need to forward to the leader because it sends to the correct 
leader to begin with. Only useful if you are adding docs with CloudSolrServer 
though, and more like a workaround than a fix.

The patch uses a separate 'limiting' semaphore for the two cases.

- Mark

On Sep 4, 2013, at 10:22 AM, Tim Vaillancourt t...@elementspace.com wrote:

 Thanks guys! :)
 
 Mark: this patch is much appreciated, I will try to test this shortly, 
 hopefully today.
 
 For my curiosity/understanding, could someone explain to me quickly what 
 locks SolrCloud takes on updates? Was I on to something that more shards 
 decrease the chance for locking?
 
 Secondly, I was wondering if someone could summarize what this patch 'fixes'? 
 I'm not too familiar with Java and the solr codebase (working on that though 
 :D).
 
 Cheers,
 
 Tim
 
 
 
 On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote:
 There is an issue if I remember right, but I can't find it right now.
 
 If anyone that has the problem could try this patch, that would be very
 helpful: http://pastebin.com/raw.php?i=aaRWwSGP
 
 - Mark
 
 
 On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma 
 markus.jel...@openindex.iowrote:
 
  Hi Mark,
 
  Got an issue to watch?
 
  Thanks,
  Markus
 
  -Original message-
   From:Mark Miller markrmil...@gmail.com
   Sent: Wednesday 4th September 2013 16:55
   To: solr-user@lucene.apache.org
   Subject: Re: SolrCloud 4.x hangs under high update volume
  
   I'm going to try and fix the root cause for 4.5 - I've suspected what it
  is since early this year, but it's never personally been an issue, so it's
  rolled along for a long time.
  
   Mark
  
   Sent from my iPhone
  
   On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
  wrote:
  
Hey guys,
   
I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
  4.4.0
yet). I've noticed other users with this same issue, so I'd really
  like to
get to the bottom of it.
   
Under a very, very high rate of updates (2000+/sec), after 1-12 hours
  we
see stalled transactions that snowball to consume all Jetty threads in
  the
JVM. This eventually causes the JVM to hang with most threads waiting
  on
the condition/stack provided at the bottom of this message. At this
  point
SolrCloud instances then start to see their neighbors (who also have
  all
threads hung) as down w/Connection Refused, and the shards become
  down
in state. Sometimes a node or two survives and just returns 503s no
  server
hosting shard errors.
   
As a workaround/experiment, we have tuned the number of threads sending
updates to Solr, as well as the batch size (we batch updates from
  client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.
   
Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard
  and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
  good
day.
- 5000 max jetty threads (well above what we use when we are healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java
  version
(I hope I'm wrong).
   
The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much
  like
to understand further:
   
java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at
  

cleanup after OutOfMemoryError

2013-09-04 Thread Ryan McKinley
I have an application where I am calling DirectUpdateHandler2 directly with:

  update.addDoc(cmd);

This will sometimes hit:

java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248)
at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273)
at
org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126)
at
org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
at
org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303)

and then a little while later:

auto commit error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)


Is there anythign I can/should do to cleanup after the OOME?  At a minimum
I do not want any new requests using the same IndexWriter.  Should I use:


  catch(OutOfMemoryError ex) {

   update.getCommitTracker().cancelPendingCommit();
 update.newIndexWriter(false);
 ...

or perhaps 'true' for rollback?

Thanks
Ryan


subindex

2013-09-04 Thread Peyman Faratin
Hi

Is there a way to build a new (smaller) index from an existing (larger) index 
where the smaller index contains a subset of the fields of the larger index? 

thank you

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Markus Jelsma
Hi Mark,

Got an issue to watch?

Thanks,
Markus
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Wednesday 4th September 2013 16:55
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud 4.x hangs under high update volume
 
 I'm going to try and fix the root cause for 4.5 - I've suspected what it is 
 since early this year, but it's never personally been an issue, so it's 
 rolled along for a long time. 
 
 Mark
 
 Sent from my iPhone
 
 On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote:
 
  Hey guys,
  
  I am looking into an issue we've been having with SolrCloud since the
  beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
  yet). I've noticed other users with this same issue, so I'd really like to
  get to the bottom of it.
  
  Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
  see stalled transactions that snowball to consume all Jetty threads in the
  JVM. This eventually causes the JVM to hang with most threads waiting on
  the condition/stack provided at the bottom of this message. At this point
  SolrCloud instances then start to see their neighbors (who also have all
  threads hung) as down w/Connection Refused, and the shards become down
  in state. Sometimes a node or two survives and just returns 503s no server
  hosting shard errors.
  
  As a workaround/experiment, we have tuned the number of threads sending
  updates to Solr, as well as the batch size (we batch updates from client -
  solr), and the Soft/Hard autoCommits, all to no avail. Turning off
  Client-to-Solr batching (1 update = 1 call to Solr), which also did not
  help. Certain combinations of update threads and batch sizes seem to
  mask/help the problem, but not resolve it entirely.
  
  Our current environment is the following:
  - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
  - 3 x Zookeeper instances, external Java 7 JVM.
  - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
  a replica of 1 shard).
  - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
  day.
  - 5000 max jetty threads (well above what we use when we are healthy),
  Linux-user threads ulimit is 6000.
  - Occurs under Jetty 8 or 9 (many versions).
  - Occurs under Java 1.6 or 1.7 (several minor versions).
  - Occurs under several JVM tunings.
  - Everything seems to point to Solr itself, and not a Jetty or Java version
  (I hope I'm wrong).
  
  The stack trace that is holding up all my Jetty QTP threads is the
  following, which seems to be waiting on a lock that I would very much like
  to understand further:
  
  java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007216e68d8 (a
  java.util.concurrent.Semaphore$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at
  java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at
  java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 at
  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 at
  org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 at
  org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 at
  org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 at
  org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 at
  org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
 at
  org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
 at
  org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
 at
  org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
 at
  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
 at
  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at
  org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
 at
  org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
 at
  org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
 at
  org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
 at
  

Re: Numeric fields and payload

2013-09-04 Thread PETER LENAHAN
Chris Hostetter hossman_lucene at fucit.org writes:

 
 
 : is it possible to store (text) payload to numeric fields (class 
 : solr.TrieDoubleField)?  My goal is to store measure units to numeric 
 : features - e.g. '1.5 cm' - and to use faceted search with these fields. 
 : But the field type doesn't allow analyzers to add the payload data. I 
 : want to avoid database access to load the units. I'm using Solr 4.2 .
 
 I'm not sure if it's possible to add payloads to Trie fields, but even if 
 there is i don't think you really want that for your usecase -- i think it 
 would make a lot more sense to normalize your units so you do consistent 
 sorting, range queries, and faceting on the values regardless of wether 
 it's 100cm or 1000mm or 1m.
 
 -Hoss
 
 

Hoss,  What you suggest may be fine for specific units. But for monetary 
values with formatting it is not realistic. $10,000.00 would require 
formatting the number to display it.  It would be much easier to store the 
string as a payload with the formatted value.


Peter Lenahan



Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Hi,

http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
too) become out of date:

In configuration section

queryResponseWriter
  name=xslt
  class=org.apache.solr.request.XSLTResponseWriter
  int name=xsltCacheLifetimeSeconds5/int
/queryResponseWriter

class name

org.apache.solr.request.XSLTResponseWriter

should be replaced by

org.apache.solr.response.XSLTResponseWriter

Otherwise ClassNotFoundException happens. Change is result of
https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.

Apparently can't update that page myself, please could someone else do that?

Thanks!


RE: Solr highlighting fragment issue

2013-09-04 Thread Bryan Loofbourrow
 I’m having some  issues with Solr search results (using Solr 1.4 ) . I
have enabled highlighting of searched text (hl=true) and set the fragment
size as 500 (hl.fragsize=500) in the search query.

Below is the (screen shot) results shown when I searched for the term
‘grandfather’ (2 results are displayed) .

Now I have couple of problems in this.

1.   In the search results the keyword is appearing inconsistently
towards the start/end of the text. I’d like to control the number of
characters appearing before and after the keyword match (highlighted term).
More specifically I’d like to get the keyword match somewhere around the
middle of the resultant text.

2.   The total number of characters appearing in the search result is
never equals the fragment size I specified (500 characters). It varies in
greater extends (for example  408 or 520).

Please share your thoughts on achieving the above 2 results. 

I can’t see your screenshot, but it doesn’t really matter.



If I remember correctly how this stuff works, I think you’re going to have
a challenge getting where you want to get. In your position, I would push
back on both of those requirements rather than try to solve the problem.



For (1), the issue is that, IIRC, the highlighter breaks up your documents
into fragments BEFORE it knows where the matches are. I’d think you’d have
to pretty seriously recast the algorithm to get the result you want.



For (2), it may well be that you could tune the fragmenter to get closer to
your desired number of characters, either writing your own, or using the
available regexes and whatnot. But getting an exact number of characters
does not seem reasonable, because I’m pretty sure that there is a
constraint that a matching term must appear in its entirety in one fragment
– and also that sometimes fragments are concatenated. Imagine, for example,
a matched phrase where the start of the phrase is in one fragment, and the
end is in another. Which goes back to the first point.



So if you absolutely must have both of these (and the second one is
strange, since it implies that your fragments will often start and end in
the middles of words), then I guess you would need to rewrite the
fragmenting algorithm to drive fragmenting from the matches.



-- Bryan


Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Upayavira
It's a wiki. Can't you correct it?

Upayavira

On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote:
 Hi,
 
 http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
 too) become out of date:
 
 In configuration section
 
 queryResponseWriter
   name=xslt
   class=org.apache.solr.request.XSLTResponseWriter
   int name=xsltCacheLifetimeSeconds5/int
 /queryResponseWriter
 
 class name
 
 org.apache.solr.request.XSLTResponseWriter
 
 should be replaced by
 
 org.apache.solr.response.XSLTResponseWriter
 
 Otherwise ClassNotFoundException happens. Change is result of
 https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.
 
 Apparently can't update that page myself, please could someone else do
 that?
 
 Thanks!


Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Upayavira,

I could edit that page myself, but need to be confirmed human according to
http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki

My wiki account name is 'pin' just in case.

On Wed, Sep 4, 2013 at 5:27 PM, Upayavira u...@odoko.co.uk wrote:

 It's a wiki. Can't you correct it?

 Upayavira

 On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote:
  Hi,
 
  http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
  too) become out of date:
 
  In configuration section
 
  queryResponseWriter
name=xslt
class=org.apache.solr.request.XSLTResponseWriter
int name=xsltCacheLifetimeSeconds5/int
  /queryResponseWriter
 
  class name
 
  org.apache.solr.request.XSLTResponseWriter
 
  should be replaced by
 
  org.apache.solr.response.XSLTResponseWriter
 
  Otherwise ClassNotFoundException happens. Change is result of
  https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.
 
  Apparently can't update that page myself, please could someone else do
  that?
 
  Thanks!



Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks so much for the explanation Mark, I owe you one (many)!

We have this on our high TPS cluster and will run it through it's paces
tomorrow. I'll provide any feedback I can, more soon! :D

Cheers,

Tim


Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
HI solrusers

   I'm testing the replication within SolrCloud .
   I just uncomment the replication section separately on the master and
slave node.
   The replication section setting on the  master node:
lst name=master
 str name=replicateAftercommit/str
 str name=replicateAfterstartup/str
 str name=confFilesschema.xml,stopwords.txt/str
   /lst
 and on the slave node:
  lst name=slave
 str name=masterUrlhttp://10.7.23.124:8080/solr/#//str
 str name=pollInterval00:00:50/str
   /lst

   After startup, an Error comes out on the slave node :
80110110 [snapPuller-70-thread-1] ERROR org.apache.solr.handler.SnapPuller
?.Master at: http://10.7.23.124:8080/solr/#/ is not available. Index fetch
failed. Exception: Invalid version (expected 2, but 60) or the data in not
in 'javabin' format


 Could anyone help me to solve the problem ?


regards


Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi again

  I'm  using Solr4.4.


2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com

 HI solrusers

I'm testing the replication within SolrCloud .
I just uncomment the replication section separately on the master and
 slave node.
The replication section setting on the  master node:
 lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFilesschema.xml,stopwords.txt/str
/lst
  and on the slave node:
   lst name=slave
  str name=masterUrlhttp://10.7.23.124:8080/solr/#//str
  str name=pollInterval00:00:50/str
/lst

After startup, an Error comes out on the slave node :
 80110110 [snapPuller-70-thread-1] ERROR
 org.apache.solr.handler.SnapPuller  ?.Master at:
 http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed.
 Exception: Invalid version (expected 2, but 60) or the data in not in
 'javabin' format


  Could anyone help me to solve the problem ?


 regards






Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi all
   I solve the problem by add the coreName explicitly according to
http://wiki.apache.org/solr/SolrReplication#Replicating_solrconfig.xml.

   But I want to make sure about that is it necessary to set the coreName
explicitly. Is there any SolrJ API to pull the replication on the slave
node from the master node?


regards



2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com

 Hi again

   I'm  using Solr4.4.


 2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com

 HI solrusers

I'm testing the replication within SolrCloud .
I just uncomment the replication section separately on the master and
 slave node.
The replication section setting on the  master node:
 lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFilesschema.xml,stopwords.txt/str
/lst
  and on the slave node:
   lst name=slave
  str name=masterUrlhttp://10.7.23.124:8080/solr/#//str
  str name=pollInterval00:00:50/str
/lst

After startup, an Error comes out on the slave node :
 80110110 [snapPuller-70-thread-1] ERROR
 org.apache.solr.handler.SnapPuller  ?.Master at:
 http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed.
 Exception: Invalid version (expected 2, but 60) or the data in not in
 'javabin' format


  Could anyone help me to solve the problem ?


 regards







Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
yes sir i did restart the tomcat.


On Wed, Sep 4, 2013 at 6:27 PM, Jack Krupansky-2 [via Lucene] 
ml-node+s472066n4088181...@n3.nabble.com wrote:

 Did you restart Solr after editing config and schema?

 -- Jack Krupansky

 -Original Message-
 From: Nutan
 Sent: Wednesday, September 04, 2013 3:07 AM
 To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4088181i=0
 Subject: unknown _stream_source_info while indexing rich doc in solr

 i am using solr4.2 on windows7
 my schema is:
 field name=id type=string indexed=true stored=true
 required=true/
 field name=author type=string indexed=true stored=true
 multiValued=true/
 field name=comments type=text indexed=true stored=true
 multiValued=false/
 field name=keywords type=text indexed=true stored=true
 multiValued=false/
 field name=contents type=text indexed=true stored=true
 multiValued=false/
 field name=title type=text indexed=true stored=true
 multiValued=false/
 field name=revision_number type=string indexed=true
 stored=true multiValued=false/
 dynamicField name=ignored_* type=ignored indexed=false stored=
 falsemultiValued=true/

 solrconfig.xml :
 requestHandler name=/update/extract class=org.apache.solr.handler.
 extraction.ExtractingRequestHandler
 lst name=defaults
 str name=fmap.contentcontents/str
 str name=lowernamestrue/str
 str name=uprefixignored_/str
 str name=captureAttrtrue/str
 /lst
 /requestHandler

 when i execute:
 curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
 -F myfile=@abc.txt

 i get error:unknown field ignored_stream_
 source_info.

 i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
 please help me.




 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088181.html
  To unsubscribe from unknown _stream_source_info while indexing rich doc
 in solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4088136code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDg4MTM2fC0xMzEzOTU5Mzcx
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088295.html
Sent from the Solr - User mailing list archive at Nabble.com.