Re: Solr facet search improvements

2015-01-28 Thread Jack Krupansky
It would probably be better to do entity extraction and normalization of
job titles as a front-end process before ingesting the data into Solr, but
you could also do it as a custom or script update processor. The latter can
be easily coded in JavaScript to run within Solr

Your first step in any case will be to define the specific rules you wish
to use for both normalization of job titles and the actual matching. Yes,
you can do that in Solr, but you have to do it, Solr will not do it
magically for you. Also, post some specific query examples that completely
cover the range of queries you need to be able to handle.

-- Jack Krupansky

On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush thakkar.aay...@gmail.com
wrote:

 I have around 1 million job titles which are indexed on Solr and am looking
 to improve the faceted search results on job title matches.

 For example: a job search for *Research Scientist Computer Architecture* is
 made, and the facet field title which is tokenized in solr and gives the
 following results:

 1. Senior Data Scientist
 2. PARALLEL COMPUTING SOFTWARE ENGINEER
 3. Engineer/Scientist 4
 4. Data Scientist
 5. Engineer/Scientist
 6. Senior Research Scientist
 7. Research Scientist-Wireless Networks
 8. Research Scientist-Andriod Development
 9. Quantum Computing Theorist Job
 10.Data Sceintist Smart Analytics

 I want to be able to improve / optimize the job titles and be able to make
 exclusions and some normalizations. Is this possible with Solr? What is the
 best way to have more granular control over the facted search results ?

 For example *Engineer/Scientist 4* - is not useful and too specific and
 titles like *Quantum Computing theorist* would ideally also be excluded



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Running multiple full-import commands via curl in a script

2015-01-28 Thread Mikhail Khludnev
Literally, queue can be done by submitting as is (async) and polling
command status. However, giving
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200
you can try to add synchronous=true... that should hang request until
it's completed.
The other question is how run requests in parallel which is explicitly
violated by
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173
 The only workaround I can suggest is to duplicate DIH definitions in solr
config
  requestHandler name=/dataimport class=solr.DataImportHandler ...
  requestHandler name=/dataimport2 class=solr.DataImportHandler ...
  requestHandler name=/dataimport3 class=solr.DataImportHandler ...
 ...
then those guys should be able to handle own request in parallel. Nasty
stuff..
have a good hack

On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts carl.roberts.zap...@gmail.com
 wrote:

 Hi,

 I am attempting to run all these curl commands from a script so that I can
 put them in a crontab job, however, it seems that only the first one
 executes and the other ones return with an error (below):

 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2002
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2003
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2004
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2005
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2006
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2007
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2008
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2009
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2010
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2011
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2012
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2013
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2014
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 full-importclean=falseentity=cve-2015
 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
 delta-importclean=falseentity=cve-last

 error:

 *A command is still running...*

 Question:  Is there a way to queue the other requests in Solr so that they
 run as soon as the previous one is done?  If not, how would you recommend I
 do this?

 Many thanks in advance,

 Joe





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Reindex data without creating new index.

2015-01-28 Thread Shawn Heisey
On 1/27/2015 11:54 PM, SolrUser1543 wrote:
 I want to reindex my data in order to change a value of some field according
 to value of another. ( both field are existing ) 
 
 For this purpose I run a clue utility in order to get a list of IDs.  
 Then I created an update processor , which can set a value of field A
 according to value of field B.
 I added a new request handler ,like a classic update , but with new update
 chain with a new update processor
 
 I want to run a http post request for each ID , to a new handler ,with item
 id only. 
 This will trigger my update processor , which will get an existing doc from
 the index and do the logic. 
 
 So in this way I can do some enrichment , without full data import and
 without creating a new index .
 
 What do you think about it ?
 Could it cause a performance degradation because of it? SOLR can handle it
 or it will rebalance the index ?
 Does SOLR has some built in feature which can do it ?

This is likely possible, with some caveats.  You'll need to write all
the code yourself, extending the UpdateRequestProcessorFactory and
UpdateRequestProcessor classes.

This will be similar to the atomic update feature, so you'll likely need
to find that source code and model yours on its operation.  It will have
the same requirements -- all fields must be 'stored=true' except those
which are copyField destinations, which must be 'stored=false'.  With
Atomic Updates, this requirement is not *enforced*, but it must be met,
or there will be data loss.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

What do you mean by rebalance the index?  This could mean almost
anything, but most of the meanings I can come up with would not apply to
this situation at all.

The effect on Solr for each document you process will be the sum of:  A
query for that document, a tiny bit for the update processor itself,
followed by a reindex of that document.

Thanks,
Shawn



Re: Solr facet search improvements

2015-01-28 Thread Shawn Heisey
On 1/28/2015 3:56 AM, thakkar.aayush wrote:
 I have around 1 million job titles which are indexed on Solr and am looking
 to improve the faceted search results on job title matches.
 
 For example: a job search for *Research Scientist Computer Architecture* is
 made, and the facet field title which is tokenized in solr and gives the
 following results:
 
 1. Senior Data Scientist 
 2. PARALLEL COMPUTING SOFTWARE ENGINEER 
 3. Engineer/Scientist 4 
 4. Data Scientist 
 5. Engineer/Scientist 
 6. Senior Research Scientist 
 7. Research Scientist-Wireless Networks 
 8. Research Scientist-Andriod Development 
 9. Quantum Computing Theorist Job 
 10.Data Sceintist Smart Analytics
 
 I want to be able to improve / optimize the job titles and be able to make
 exclusions and some normalizations. Is this possible with Solr? What is the
 best way to have more granular control over the facted search results ?
 
 For example *Engineer/Scientist 4* - is not useful and too specific and
 titles like *Quantum Computing theorist* would ideally also be excluded

Normally, if the field is tokenized, you will not get the original
values in the facet.  You will get values like senior instead of
Senior Data Scientist.  If DocValues are enabled on the field, then
you may well indeed get the original values.  I've never tried facets on
a tokenized field with DocValues, but everything I understand about the
feature says it would result in the original (not tokenized) values.

If you want different values in the facets, then you'll need to change
those values before they get indexed in Solr.  That can be done with
custom UpdateProcessor code embedded in the update chain, or you can
simply do the changes in your program that indexes the data in Solr.

Thanks,
Shawn



What is the best way to update an index?

2015-01-28 Thread Carl Roberts

Hi,

What is the best way to update an index with new data or records? Via 
this command:


curl 
http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-importclean=falsesynchronous=trueentity=cve-2002;


or this command:

curl 
http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=delta-importsynchronous=trueentity=cve-2002;



Thanks,

Joe


Re: Running multiple full-import commands via curl in a script

2015-01-28 Thread Carl Roberts

Thanks Mikhail - synchronous=true works like a charm...:)

On 1/28/15, 5:16 AM, Mikhail Khludnev wrote:

Literally, queue can be done by submitting as is (async) and polling
command status. However, giving
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200
you can try to add synchronous=true... that should hang request until
it's completed.
The other question is how run requests in parallel which is explicitly
violated by
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173
  The only workaround I can suggest is to duplicate DIH definitions in solr
config
   requestHandler name=/dataimport class=solr.DataImportHandler ...
   requestHandler name=/dataimport2 class=solr.DataImportHandler ...
   requestHandler name=/dataimport3 class=solr.DataImportHandler ...
  ...
then those guys should be able to handle own request in parallel. Nasty
stuff..
have a good hack

On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts carl.roberts.zap...@gmail.com

wrote:
Hi,

I am attempting to run all these curl commands from a script so that I can
put them in a crontab job, however, it seems that only the first one
executes and the other ones return with an error (below):

curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2002
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2003
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2004
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2005
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2006
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2007
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2008
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2009
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2010
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2011
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2012
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2013
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2014
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-importclean=falseentity=cve-2015
curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
delta-importclean=falseentity=cve-last

error:

*A command is still running...*

Question:  Is there a way to queue the other requests in Solr so that they
run as soon as the previous one is done?  If not, how would you recommend I
do this?

Many thanks in advance,

Joe









Re: Morphology of synonims

2015-01-28 Thread Shawn Heisey
On 1/28/2015 5:11 AM, Reinforcer wrote:
 Is Solr capable of using morphology for synonims?
 
 For example. Request: inanely.
 Indexed text in Solr: Searching keywords without morphology is fatuously.
 inane and fatuous are synonims.
 
 So, inanely ---morphology inane -synonims--- fatuous
 ---morphology fatuously. Is this possible (double morphology)?

Synonyms are handled via exact match.  The feature you are describing is
called stemming or lemmatization.

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

It is possible to combine stemming and synonyms in the same analysis
chain, but you must figure out what the root word is to put into your
synonym list.  It may not be what you expect.  For example, the english
stemmer will probably change acheive to acheiv ... which sounds
wrong, until you remember that stemming must be applied both at index
and query time, and the user will never see that form of the word.

Synonyms are usually only applied at either index or query time.  Which
one to choose depends on your requirements, but I believe it is
typically on the query side.

The analysis tab in the admin UI is invaluable for seeing the results of
changes in the analysis chain.

Thanks,
Shawn



Re: Reading data from another solr core

2015-01-28 Thread Alvaro Cabrerizo
Hi,

I usually use the SolrEntityProcessor for moving/transform data between
cores, it's  a piece of cake!

Regards.

On Wed, Jan 28, 2015 at 8:13 AM, solrk koushikga...@gmail.com wrote:

 Hi Guys,

 I have multiple cores setup in my solr server. I would like read/import
 data
 from one core(source) into another core(target) and index it..Is there is a
 easy way in solr to do so?

 I was thinking of using SolrEntityProcessor for this purpose..any other
 suggestions is appreciated..

 http://blog.trifork.com/2011/11/08/importing-data-from-another-solr/

 For example:

 dataconfig
   document
   entity name=user pk=id
   url=
   processor=XPathEntityProcessor

  field column=id xpath=/user/id /
  entity name=sep processor=SolrEntityProcessor query=*:*
 url=http://127.0.0.1:8081/solr/core2;
 /entity

 /entity
   /document
 /dataconfig

 Please sugget me if there is better solution? or Should i write new
 processor which reads the index of another core?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Morphology of synonims

2015-01-28 Thread Reinforcer
Hi,

Is Solr capable of using morphology for synonims?

For example. Request: inanely.
Indexed text in Solr: Searching keywords without morphology is fatuously.
inane and fatuous are synonims.

So, inanely ---morphology inane -synonims--- fatuous
---morphology fatuously. Is this possible (double morphology)?


Best regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Morphology-of-synonims-tp4182517.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr facet search improvements

2015-01-28 Thread thakkar.aayush
I have around 1 million job titles which are indexed on Solr and am looking
to improve the faceted search results on job title matches.

For example: a job search for *Research Scientist Computer Architecture* is
made, and the facet field title which is tokenized in solr and gives the
following results:

1. Senior Data Scientist 
2. PARALLEL COMPUTING SOFTWARE ENGINEER 
3. Engineer/Scientist 4 
4. Data Scientist 
5. Engineer/Scientist 
6. Senior Research Scientist 
7. Research Scientist-Wireless Networks 
8. Research Scientist-Andriod Development 
9. Quantum Computing Theorist Job 
10.Data Sceintist Smart Analytics

I want to be able to improve / optimize the job titles and be able to make
exclusions and some normalizations. Is this possible with Solr? What is the
best way to have more granular control over the facted search results ?

For example *Engineer/Scientist 4* - is not useful and too specific and
titles like *Quantum Computing theorist* would ideally also be excluded



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
Sent from the Solr - User mailing list archive at Nabble.com.


CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
My problem:
I create cores dynamically using container#create( CoreDescriptor ) and then 
add documents to the very core(s). So far so good.
When I restart my app I do
container = CoreContainer#createAndLoad(...)
but when I then call container.getAllCoreNames() an empty list is returned.

What cores should be loaded by the container if I call
CoreContainer#createAndLoad(...)
? Where does the container lookup the existing cores?


Re: extract and add fields on the fly

2015-01-28 Thread Mark
Create the SID from the existing doc implies that a document already
exists that you wish to add fields to.

However if the document is a binary are you suggesting

1) curl to upload/extract passing docID
2) obtain a SID based off docID
3) add addtinal fields to SID  commit

I know I'm possibly wandering into the schemaless teritory here as well


On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote:

 I would switch the order of those. Add the new fields and *then* index to
 solr.

 We do something similar when we create SolrInputDocuments that are pushed
 to solr. Create the SID from the existing doc, add any additional fields,
 then add to solr.

 On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:

  Is it possible to use curl to upload a document (for extract  indexing)
  and specify some fields on the fly?
 
  sort of:
  1) index this document
  2) by the way here are some important facets whilst your at it
 
  Regards
 
  Mark
 



Re: extract and add fields on the fly

2015-01-28 Thread Andrew Pawloski
Sorry, I may have misunderstood:

Are you talking about adding additional fields at indexing time? (Here I
would add the fields first *then* send to solr.)

Are you talking about updating a field withing an existing document in a
solr index? (In that case I would direct you here [1].)

Am I still misunderstanding?

[1]
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

On Wed, Jan 28, 2015 at 12:30 PM, Mark javam...@gmail.com wrote:

 Create the SID from the existing doc implies that a document already
 exists that you wish to add fields to.

 However if the document is a binary are you suggesting

 1) curl to upload/extract passing docID
 2) obtain a SID based off docID
 3) add addtinal fields to SID  commit

 I know I'm possibly wandering into the schemaless teritory here as well


 On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote:

  I would switch the order of those. Add the new fields and *then* index to
  solr.
 
  We do something similar when we create SolrInputDocuments that are pushed
  to solr. Create the SID from the existing doc, add any additional fields,
  then add to solr.
 
  On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:
 
   Is it possible to use curl to upload a document (for extract 
 indexing)
   and specify some fields on the fly?
  
   sort of:
   1) index this document
   2) by the way here are some important facets whilst your at it
  
   Regards
  
   Mark
  
 



Re: extract and add fields on the fly

2015-01-28 Thread Mark
Second thoughts SID is purely i/p as its name suggests :)

I think a better approach would be

1) curl to upload/extract passing docID
2) curl to update additional fields for that docID



On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:


 Create the SID from the existing doc implies that a document already
 exists that you wish to add fields to.

 However if the document is a binary are you suggesting

 1) curl to upload/extract passing docID
 2) obtain a SID based off docID
 3) add addtinal fields to SID  commit

 I know I'm possibly wandering into the schemaless teritory here as well


 On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote:

 I would switch the order of those. Add the new fields and *then* index to
 solr.

 We do something similar when we create SolrInputDocuments that are pushed
 to solr. Create the SID from the existing doc, add any additional fields,
 then add to solr.

 On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:

  Is it possible to use curl to upload a document (for extract  indexing)
  and specify some fields on the fly?
 
  sort of:
  1) index this document
  2) by the way here are some important facets whilst your at it
 
  Regards
 
  Mark
 





Re: extract and add fields on the fly

2015-01-28 Thread Mark
I'm looking to

1) upload a binary document using curl
2) add some additional facets

Specifically my question is can this be achieved in 1 curl operation or
does it need 2?

On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:


 Second thoughts SID is purely i/p as its name suggests :)

 I think a better approach would be

 1) curl to upload/extract passing docID
 2) curl to update additional fields for that docID



 On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:


 Create the SID from the existing doc implies that a document already
 exists that you wish to add fields to.

 However if the document is a binary are you suggesting

 1) curl to upload/extract passing docID
 2) obtain a SID based off docID
 3) add addtinal fields to SID  commit

 I know I'm possibly wandering into the schemaless teritory here as well


 On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote:

 I would switch the order of those. Add the new fields and *then* index to
 solr.

 We do something similar when we create SolrInputDocuments that are pushed
 to solr. Create the SID from the existing doc, add any additional fields,
 then add to solr.

 On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:

  Is it possible to use curl to upload a document (for extract 
 indexing)
  and specify some fields on the fly?
 
  sort of:
  1) index this document
  2) by the way here are some important facets whilst your at it
 
  Regards
 
  Mark
 






Re: extract and add fields on the fly

2015-01-28 Thread Andrew Pawloski
I would switch the order of those. Add the new fields and *then* index to
solr.

We do something similar when we create SolrInputDocuments that are pushed
to solr. Create the SID from the existing doc, add any additional fields,
then add to solr.

On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:

 Is it possible to use curl to upload a document (for extract  indexing)
 and specify some fields on the fly?

 sort of:
 1) index this document
 2) by the way here are some important facets whilst your at it

 Regards

 Mark



Re: replicas goes in recovery mode right after update

2015-01-28 Thread Vijay Sekhri
Hi Shawn,
Thank you so much for the assistance. Building is not a problem . Back in
the days I have worked with linking, compiling and  building C , C++
software . Java is a piece of cake.
We have built the new war from the source version 4.10.3 and our
preliminary tests have shown that our issue (replicas in recovery on high
load)* is resolved *. We will continue to do more testing and confirm .
Please note that the *patch is BUGGY*.

It removed the break statement within while loop because of which, whenever
we send a list of docs it would hang (API CloudSolrServer.add) , but it
would work if send one doc at a time.

It took a while to figure out why that is happening. Once we put the break
statement back it worked like a charm.
Furthermore the patch has
solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java
which should be
solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java

Finally checking if(!offer) is sufficient than using if(offer == false)
Last but not the least having a configurable queue size and timeouts
(managed via solrconfig) would be quite helpful
Thank you once again for your help.

Vijay

On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 1/27/2015 2:52 PM, Vijay Sekhri wrote:
  Hi Shawn,
  Here is some update. We found the main issue
  We have configured our cluster to run under jetty and when we tried full
  indexing, we did not see the original Invalid Chunk error. However the
  replicas still went into recovery
  All this time we been trying to look into replicas logs to diagnose the
  issue. The problem seem to be at the leader side. When we looked into
  leader logs, we found the following on all the leaders
 
  3439873 [qtp1314570047-92] WARN
   org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
  sending update
  *java.lang.IllegalStateException: Queue full*

 snip

  There is a similar bug reported around this
  https://issues.apache.org/jira/browse/SOLR-5850
 
  and it seem to be in OPEN status. Is there a way we can configure the
 queue
  size and increase it ? or is there a version of solr that has this issue
  resolved already?
  Can you suggest where we go from here to resolve this ? We can repatch
 the
  war file if that is what you would recommend .
  In the end our initial speculation about solr unable to handle so many
  update is correct. We do not see this issue when the update load is less.

 Are you in a position where you can try the patch attached to
 SOLR-5850?  You would need to get the source code for the version you're
 on (or perhaps a newer 4.x version), patch it, and build Solr yourself.
 If you have no experience building java packages from source, this might
 prove to be difficult.

 Thanks,
 Shawn




-- 
*
Vijay Sekhri
*


IndexFormatTooNewException

2015-01-28 Thread Joshi, Shital
Hi,

We upgraded our cluster to Solr 4.10.0 for couple days and again reverted back 
to 4.8.0. However the dashboard still shows Solr 4.10.0. Do you know why?
*   solr-spec 4.10.0
*   solr-impl 4.10.0 1620776
*   lucene-spec 4.10.0
*   lucene-impl 4.10.0 1620776

We recently added new shards to our cluster and dashboard shows correct Solr 
version (4.8.0) for these new shards. We copied index from one of old shards 
(where it is showing 4.10.0 on dashboard) to this new shard and we see this 
error upon start up. How do we get rid of this error?

Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))):
 3 (needs to be between 0 and 2)
at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749)




Re: replica never takes leader role

2015-01-28 Thread Mark Miller
Yes, after 45 seconds a replica should take over as leader. It should
likely explain in the logs of the replica that should be taking over why
this is not happening.

- Mar

On Wed Jan 28 2015 at 2:52:32 PM Joshi, Shital shital.jo...@gs.com wrote:

 When leader reaches 99% physical memory on the box and starts swapping
 (stops replicating), we forcefully bring down leader (first kill -15 and
 then kill -9 if kill -15 doesn't work). This is when we are looking up to
 replica to assume leader's role and it never happens.

 Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and
 test.

 cores adminPath=/admin/cores defaultCoreName=collection1
 host=${host:} hostPort=${jetty.port:8983} 
 hostContext=${hostContext:solr}
 zkClientTimeout=${zkClientTimeout:45000}

 As per definition of zkClientTimeout, After the leader is brought down and
 it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica
 to leader? I am not sure how increasing zk timeout will help.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, January 28, 2015 11:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 This is not the desired behavior at all. I know there have been
 improvements in this area since 4.8, but can't seem to locate the JIRAs.

 I'm curious _why_ the nodes are going down though, is it happening at
 random or are you taking it down? One problem has been that the Zookeeper
 timeout used to default to 15 seconds, and occasionally a node would be
 unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
 the ZK timeout has helped some people avoid this...

 FWIW,
 Erick

 On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com
 wrote:

  We're using Solr 4.8.0
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Tuesday, January 27, 2015 7:47 PM
  To: solr-user@lucene.apache.org
  Subject: Re: replica never takes leader role
 
  What version of Solr? This is an ongoing area of improvements and several
  are very recent.
 
  Try searching the JIRA for Solr for details.
 
  Best,
  Erick
 
  On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
 
   Hello,
  
   We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and
 three
   zookeeper instances. We have noticed that when a leader node goes down
  the
   replica never takes over as a leader, cloud becomes unusable and we
 have
  to
   bounce entire cloud for replica to assume leader role. Is this default
   behavior? How can we change this?
  
   Thanks.
  
  
  
 



RE: IndexFormatTooNewException

2015-01-28 Thread Joshi, Shital
Thank you for replying. 

We added new shard to same cluster where some shards are showing Solr version 
4.10.0 and this new shard is showing Solr version 4.8.0. All shards source Solr 
software from same location and use same start up script. I am surprised how 
older shards are still running Solr 4.10.0.

How we do real downgrade index to 4.8? You mean replay all data? 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, January 28, 2015 4:10 PM
To: solr-user@lucene.apache.org
Subject: Re: IndexFormatTooNewException


: We upgraded our cluster to Solr 4.10.0 for couple days and again 
: reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. 
: Do you know why?

because you didn't fully revert - you are still running Solr 4.10.0 - the 
details of what steps you took to try and switch back make a huge 
differnet in understanding why you are still running .0 even though you 
don't want to.


: We recently added new shards to our cluster and dashboard shows correct 
: Solr version (4.8.0) for these new shards. We copied index from one of 
: old shards (where it is showing 4.10.0 on dashboard) to this new shard 
: and we see this error upon start up. How do we get rid of this error?

IndexFormatTooNewException means exactly what it sounds like -- you are 
asking Solr/Lucene to open an index that it can tell was created by a 
newer version of the software and it is incapable of doing so.

You either need to upgrade all of the nodes to 4.10, or you need to scrap 
this index, do a *real* downgrade to 4.8, and then rebuild your index (or 
restore a backup index from before you attempted to upgrade.

: Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))):
 3 (needs to be between 0 and 2)
: at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
: at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
: at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749)
: 
: 
: 

-Hoss
http://www.lucidworks.com/


Re: IndexFormatTooNewException

2015-01-28 Thread Chris Hostetter

: We upgraded our cluster to Solr 4.10.0 for couple days and again 
: reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. 
: Do you know why?

because you didn't fully revert - you are still running Solr 4.10.0 - the 
details of what steps you took to try and switch back make a huge 
differnet in understanding why you are still running .0 even though you 
don't want to.


: We recently added new shards to our cluster and dashboard shows correct 
: Solr version (4.8.0) for these new shards. We copied index from one of 
: old shards (where it is showing 4.10.0 on dashboard) to this new shard 
: and we see this error upon start up. How do we get rid of this error?

IndexFormatTooNewException means exactly what it sounds like -- you are 
asking Solr/Lucene to open an index that it can tell was created by a 
newer version of the software and it is incapable of doing so.

You either need to upgrade all of the nodes to 4.10, or you need to scrap 
this index, do a *real* downgrade to 4.8, and then rebuild your index (or 
restore a backup index from before you attempted to upgrade.

: Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))):
 3 (needs to be between 0 and 2)
: at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
: at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
: at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749)
: 
: 
: 

-Hoss
http://www.lucidworks.com/


Re: Reindex data without creating new index.

2015-01-28 Thread SolrUser1543
By rebalancing I mean that such a big amount of updates will create a
situation which will require running optimization of index ,because each
document will be added again, instead of original one. 

But according to what you say it is should not be a problem, am I correct? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464p4182726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-01-28 Thread Shawn Heisey
On 1/27/2015 5:50 PM, vsriram30 wrote:
 I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record
 to solr, then I see the following commit update command in both master and
 in slave node :

One of the first things to find out is whether it's still a problem in
the latest version of Solr, which is currently 4.10.3.  Solr 4.6.1 is a
year old, and there have been seven new versions released since then. 
Solr, especially SolrCloud, changes at a VERY rapid pace ... in each
version, many bugs are fixed, and each x.y.0 version adds new
features/functionality.

I'm not in a position to set up a minimal SolrCloud testbed to try this
out, or I would try it myself.

 2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

 I am also setting the updateRequest.setCommitWithin(5000);

 Here as noticed, the openSearcher=true and hence after 5 seconds, I am able
 to see the record in index in both slave and in master.

 Now if I trigger another UpdateRequest with only deleteById set and no add
 documents to Solr, with the same commit within time, then 

 in the master log I see,

 2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

 and in the slave log I see,
 2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 Here as noticed, the master is having openSearcher=true and slave is having
 openSearcher=false. This causes inconsistency in the results as master shows
 that the record is deleted and slave still has the record.

 After digging through the code a bit, I think this is probably happening in
 CommitTracker where the openSearcher might be false while creating the
 CommitUpdateCommand.

 Can you advise if there is any ticket created to address this issue or can I
 create one? Also is there any workaround for this till the bug is fixed than
 to set commit within duration in server to a lower value?

It does sound like a bug.  Some possible workarounds, no idea how
effective they will be:

*) Try deleteByQuery to see whether it is affected the same way.
*) Use autoSoftCommit in solrconfig.xml instead of commitWithin on the
update request.

I do see a report of an identical problem on this mailing list, two days
after 4.0-ALPHA was announced, which was the first public release that
included SolrCloud.  Both of the following URLs open the same message:

http://osdir.com/ml/solr-user.lucene.apache.org/2012-07/msg00214.html
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3ccal3vrcdsiqyajuy6eqvpak0ftg-oy7n5g7cql4x4_8sz5jm...@mail.gmail.com%3E

I did not find an existing issue in Jira for this problem, so if the
same problem exists in 4.10.3, filing one sounds like a good idea.

Thanks,
Shawn



Re: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory

2015-01-28 Thread Nitin Solanki
Ok.. I got the solution.
Changed the value of maxQueryFrequency from 0.01(1%) to 0.9(90%). It is
working. thanks a lot.

On Tue, Jan 27, 2015 at 8:55 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 Can you give a little more information as to how you have the spellchecker
 configured in solrsonfig.xml?  Also, it would help if you showed a query
 and the spell check response and then explain what you wanted it to return
 vs what it actually returned.

 My guess is that the stop words you mention exist in your spelling index
 and you're not using the alternativeTermCount parameter, which tells it
 to suggest for terms that exist in the index.

 I take it also you're using shingles to get word-break suggestions?  You
 might have better luck with this using WordBreakSolrSpellchecker instead of
 shingles.

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, January 27, 2015 5:06 AM
 To: solr-user@lucene.apache.org
 Subject: Stop word suggestions are coming when I indexed sentence using
 ShingleFilterFactory

 Hi,
   I am getting the suggestion of both correct words and misspell
 words but not getting, stop words suggestions. Why? Even I am not using
 solr.StopFilterFactory.


 Schema.xml :

 *field name=gram type=textSpell indexed=true stored=true
 required=true multiValued=false/*

 fieldType name=*textSpell* class=solr.TextField
 positionIncrementGap=100
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.ShingleFilterFactory maxShingleSize=5
 minShingleSize=2 outputUnigrams=true/

  /analyzer
  analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.ShingleFilterFactory maxShingleSize=5
 minShingleSize=2 outputUnigrams=true/

 /analyzer
 /fieldType



RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
We're using Solr 4.8.0


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, January 27, 2015 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

What version of Solr? This is an ongoing area of improvements and several
are very recent.

Try searching the JIRA for Solr for details.

Best,
Erick

On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hello,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
 zookeeper instances. We have noticed that when a leader node goes down the
 replica never takes over as a leader, cloud becomes unusable and we have to
 bounce entire cloud for replica to assume leader role. Is this default
 behavior? How can we change this?

 Thanks.





RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread fabio.bozzo
I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for go pro is good:*

{
  responseHeader: {
status: 0,
QTime: 2,
params: {
  q: go pro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485581792
}
  },
  response: {
numFound: 27,
start: 0,
docs: [
  {
codice_produttore_s: DK00150020,
codice_s: 5.BAT.27407,
id: 27407,
marchio: GO PRO,
barcode_interno_s: 185323000958,
prezzo_acquisto_d: 16.12,
data_aggiornamento_dt: 2012-06-21T00:00:00Z,
descrizione: BATTERIA GO PRO HERO ,
prezzo_vendita_d: 39.9,
categoria: Batterie,
_version_: 1491583424191791000
  },

 

]
  },
  spellcheck: {
suggestions: [
  go pro,
  {
numFound: 1,
startOffset: 0,
endOffset: 6,
origFreq: 433,
suggestion: [
  {
word: gopro,
freq: 2
  }
]
  },
  correctlySpelled,
  false,
  collation,
  [
collationQuery,
gopro,
hits,
3,
misspellingsAndCorrections,
[
  go pro,
  gopro
]
  ]
]
  }
}

While querying for gopro is not:

{
  responseHeader: {
status: 0,
QTime: 6,
params: {
  q: gopro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485629480
}
  },
  response: {
numFound: 3,
start: 0,
docs: [
  {
codice_produttore_s: DK0030010,
codice_s: 5.VID.39163,
id: 38814,
marchio: GO PRO,
barcode_interno_s: 818279012477,
prezzo_acquisto_d: 150.84,
data_aggiornamento_dt: 2014-12-24T00:00:00Z,
descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM,
prezzo_vendita_d: 219,
categoria: Fotografia,
_version_: 1491583425479442400
  },

]
  },
  spellcheck: {
suggestions: [
  gopro,
  {
numFound: 1,
startOffset: 0,
endOffset: 5,
origFreq: 2,
suggestion: [
  {
word: giro,
freq: 6
  }
]
  },
  correctlySpelled,
  false
]
  }
}

---

I'd like go pro as a suggestion for gopro too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexFormatTooNewException

2015-01-28 Thread Shawn Heisey
On 1/28/2015 2:51 PM, Joshi, Shital wrote:
 Thank you for replying. 

 We added new shard to same cluster where some shards are showing Solr version 
 4.10.0 and this new shard is showing Solr version 4.8.0. All shards source 
 Solr software from same location and use same start up script. I am surprised 
 how older shards are still running Solr 4.10.0.

 How we do real downgrade index to 4.8? You mean replay all data? 

It is often not enough to simply replace the solr war.  You may also
need to wipe out the extracted war before restarting, or jars from the
previous version may still exist and some of them might be loaded
instead of the new version.

If you're using the jetty included in the example, the war is in the
webapps directory and the extracted files are under solr-webapp.  If
you're using another container, then I have no idea where the war gets
extracted.

If any index segments were written by the 4.10 version, they will not be
readable after downgrading to the 4.8 version.  Wiping out the index and
rebuilding it from scratch is usually the only way to fix that situation.

Thanks,
Shawn



Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-01-28 Thread vsriram30
Thanks Shawn.  Not sure whether I will be able to test it out with 4.10.3.  I
will try the workarounds and update.

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4182757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading data from another solr core

2015-01-28 Thread solrk
Thank you Alvaro Cabrerizo! I am going to give a shot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466p4182758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Define Id when using db dih

2015-01-28 Thread SolrUser1543
Hi,  

I am using data import handler and import data from oracle db. 
I have a problem that the table I am importing from has no one column which
is defined as a key. 
How should I define the key in the data config file ?

Thanks 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Define-Id-when-using-db-dih-tp4182797.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
Thx Shawn. I am running latest-greatest Solr (4.10.3)
Solr home is e.g.
/opt/webs/siteX/WebContent/WEB-INF/solr
the core(s) reside in
/opt/webs/siteX/WebContent/WEB-INF/solr/cores
Should these be found by core discovery? 
If not, how can I configure coreRootDirectory in sorl.xml to be cores folder 
below slorHome

str name=coreRootDirectory${coreRootDirectory:solrHome/cores}/str

Note:
the solr.xml is to be used for any of our 150sites we host. Therefore like it 
to be generic - solrHome/cores

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Mittwoch, 28. Januar 2015 17:08
An: solr-user@lucene.apache.org
Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded

On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
 My problem:
 I create cores dynamically using container#create( CoreDescriptor ) and then 
 add documents to the very core(s). So far so good.
 When I restart my app I do
 container = CoreContainer#createAndLoad(...) but when I then call 
 container.getAllCoreNames() an empty list is returned.

 What cores should be loaded by the container if I call
 CoreContainer#createAndLoad(...)
 ? Where does the container lookup the existing cores?

If the solr.xml is the old format, then cores are defined in solr.xml, in the 
cores section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and later 
and will be mandatory in 5.0.  If that format is present, then Solr will use 
core discovery -- starting from either the solr home or a defined 
coreRootDirectory, solr will search for core.properties files and treat each 
directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Re: PostingsFormat block size

2015-01-28 Thread Trym Møller

Hi

Thanks for your input.

I do not do updates to the existing docs, so that is not relevant in my 
case, and I have just skipped that test case :-)
I have not been able to measure any significant changes to the 
distributed searches or just doing a direct search for an id.


Did I miss something with your comment Here it is?

Best regards Trym

On 27-01-2015 17:22, Mikhail Khludnev wrote:

Hm.. It's not blocks which I'm familiar with. Regarding performance impact
from bigger ID blocks: if you have uniqueKeyID/uniqueKey and sends
update for existing docs. And IDs are also used for some of the distributed
search stages, I suppose. Here it is.

On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller t...@sigmat.dk wrote:


Hi

Thanks for your clarifying questions.

In the constructor of the Lucene41PostingsFormat class the minimum and
maximum block size is provided. These sizes are used when creating the
BlockTreeTermsWriter (responsible for writing the .tim and .tip files of
the lucene index). It is the blocksizes of the BlockTreeTermsWriter I refer
to.

I'm not quite sure I understand your second question - sorry.
I can tell that I have not tried if the PulsingPostingsFormat is of any
help in regards to lowering the Solr JVM Memory usage, but I can see the
same BlockTreeTermsWriter with its block sizes are used by the
PulsingPostingsFormat.
Should I expect something else from the PulsingPostingsFormat in regards
to memory usage or in regards to searching (if have have changed to block
sizes of the BlockTreeTermsWriter)?

Best regards Trym


On 27-01-2015 14:00, Mikhail Khludnev wrote:


Hello Trym,

Can you clarify, which blockSize do you mean? And the second q, just to
avoid unnecessary explanation, do you know what's Pulsing?

On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller t...@sigmat.dk wrote:

  Hi

I have successfully create a really cool Lucene41x8PostingsFormat class
(a
copy of the Lucene41PostingsFormat class modified to use 8 times the
default block size), registered the format as required. In the
schema.xml I
have created a field type string with this postingsformat and lastly I'm
using this field type for my id field. This all works great and as a
consequence the .tip files of the Lucene index (segments) are
considerably
smaller and the same goes for the Solr JVM Memory usage (which was the
end
goal).

Now I need to find the consequences (besides the disk and memory usage)
of
this change to the id-field. I would expect that id-searches are slower.
But when will Solr/Lucene do id-searches? I have myself no user scenarios
where my documents are searched by the id value.

Thanks for any comments.

Best regards Trym









AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
BTW:
None of my core folders contains a core.properties file ... ? Could it be due 
to the fact that I am (so far) running only EmbeddedSolrServer, hence no real 
Solr-Server?

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Donnerstag, 29. Januar 2015 08:08
An: solr-user@lucene.apache.org
Betreff: AW: CoreContainer#createAndLoad, existing cores not loaded

Thx Shawn. I am running latest-greatest Solr (4.10.3) Solr home is e.g.
/opt/webs/siteX/WebContent/WEB-INF/solr
the core(s) reside in
/opt/webs/siteX/WebContent/WEB-INF/solr/cores
Should these be found by core discovery? 
If not, how can I configure coreRootDirectory in sorl.xml to be cores folder 
below slorHome

str name=coreRootDirectory${coreRootDirectory:solrHome/cores}/str

Note:
the solr.xml is to be used for any of our 150sites we host. Therefore like it 
to be generic - solrHome/cores

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org]
Gesendet: Mittwoch, 28. Januar 2015 17:08
An: solr-user@lucene.apache.org
Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded

On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
 My problem:
 I create cores dynamically using container#create( CoreDescriptor ) and then 
 add documents to the very core(s). So far so good.
 When I restart my app I do
 container = CoreContainer#createAndLoad(...) but when I then call
 container.getAllCoreNames() an empty list is returned.

 What cores should be loaded by the container if I call
 CoreContainer#createAndLoad(...)
 ? Where does the container lookup the existing cores?

If the solr.xml is the old format, then cores are defined in solr.xml, in the 
cores section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and later 
and will be mandatory in 5.0.  If that format is present, then Solr will use 
core discovery -- starting from either the solr home or a defined 
coreRootDirectory, solr will search for core.properties files and treat each 
directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Re: replica never takes leader role

2015-01-28 Thread Erick Erickson
This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote:

 We're using Solr 4.8.0


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, January 27, 2015 7:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 What version of Solr? This is an ongoing area of improvements and several
 are very recent.

 Try searching the JIRA for Solr for details.

 Best,
 Erick

 On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hello,
 
  We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
  zookeeper instances. We have noticed that when a leader node goes down
 the
  replica never takes over as a leader, cloud becomes unusable and we have
 to
  bounce entire cloud for replica to assume leader role. Is this default
  behavior? How can we change this?
 
  Thanks.
 
 
 



Re: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Shawn Heisey
On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
 My problem:
 I create cores dynamically using container#create( CoreDescriptor ) and then 
 add documents to the very core(s). So far so good.
 When I restart my app I do
 container = CoreContainer#createAndLoad(...)
 but when I then call container.getAllCoreNames() an empty list is returned.

 What cores should be loaded by the container if I call
 CoreContainer#createAndLoad(...)
 ? Where does the container lookup the existing cores?

If the solr.xml is the old format, then cores are defined in solr.xml,
in the cores section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and
later and will be mandatory in 5.0.  If that format is present, then
Solr will use core discovery -- starting from either the solr home or a
defined coreRootDirectory, solr will search for core.properties files
and treat each directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Re: extract and add fields on the fly

2015-01-28 Thread Mark
Use case is

use curl to upload/extract/index document passing in additional facets not
present in the document e.g. literal.source=old system

In this way some fields come from the uploaded extracted content and some
fields as specified in the curl URL

Hope that's clearer?

Regards

Mark


On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Sounds like 'literal.X' syntax from

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

 Can you explain your use case as different from what's already
 documented? May be easier to understand.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote:
  I'm looking to
 
  1) upload a binary document using curl
  2) add some additional facets
 
  Specifically my question is can this be achieved in 1 curl operation or
  does it need 2?
 
  On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:
 
 
  Second thoughts SID is purely i/p as its name suggests :)
 
  I think a better approach would be
 
  1) curl to upload/extract passing docID
  2) curl to update additional fields for that docID
 
 
 
  On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:
 
 
  Create the SID from the existing doc implies that a document already
  exists that you wish to add fields to.
 
  However if the document is a binary are you suggesting
 
  1) curl to upload/extract passing docID
  2) obtain a SID based off docID
  3) add addtinal fields to SID  commit
 
  I know I'm possibly wandering into the schemaless teritory here as well
 
 
  On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com
 wrote:
 
  I would switch the order of those. Add the new fields and *then*
 index to
  solr.
 
  We do something similar when we create SolrInputDocuments that are
 pushed
  to solr. Create the SID from the existing doc, add any additional
 fields,
  then add to solr.
 
  On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:
 
   Is it possible to use curl to upload a document (for extract 
  indexing)
   and specify some fields on the fly?
  
   sort of:
   1) index this document
   2) by the way here are some important facets whilst your at it
  
   Regards
  
   Mark
  
 
 
 
 



Re: extract and add fields on the fly

2015-01-28 Thread Mark
That approach works although as suspected the schma has to recognise the
additinal facet (stuff in this case):

responseHeader:{status:400,QTime:1},error:{msg:ERROR:
[doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
'stuff',code:400}}

..getting closer..

On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote:


 Use case is

 use curl to upload/extract/index document passing in additional facets not
 present in the document e.g. literal.source=old system

 In this way some fields come from the uploaded extracted content and some
 fields as specified in the curl URL

 Hope that's clearer?

 Regards

 Mark


 On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 Sounds like 'literal.X' syntax from

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

 Can you explain your use case as different from what's already
 documented? May be easier to understand.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote:
  I'm looking to
 
  1) upload a binary document using curl
  2) add some additional facets
 
  Specifically my question is can this be achieved in 1 curl operation or
  does it need 2?
 
  On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:
 
 
  Second thoughts SID is purely i/p as its name suggests :)
 
  I think a better approach would be
 
  1) curl to upload/extract passing docID
  2) curl to update additional fields for that docID
 
 
 
  On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:
 
 
  Create the SID from the existing doc implies that a document already
  exists that you wish to add fields to.
 
  However if the document is a binary are you suggesting
 
  1) curl to upload/extract passing docID
  2) obtain a SID based off docID
  3) add addtinal fields to SID  commit
 
  I know I'm possibly wandering into the schemaless teritory here as
 well
 
 
  On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com
 wrote:
 
  I would switch the order of those. Add the new fields and *then*
 index to
  solr.
 
  We do something similar when we create SolrInputDocuments that are
 pushed
  to solr. Create the SID from the existing doc, add any additional
 fields,
  then add to solr.
 
  On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:
 
   Is it possible to use curl to upload a document (for extract 
  indexing)
   and specify some fields on the fly?
  
   sort of:
   1) index this document
   2) by the way here are some important facets whilst your at it
  
   Regards
  
   Mark
  
 
 
 
 





Re: extract and add fields on the fly

2015-01-28 Thread Alexandre Rafalovitch
Well, the schema does need to know what type your field is. If you
can't add it to schema, use dynamicFields with prefixe/suffixes or
dynamic schema (less recommended).

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 28 January 2015 at 13:32, Mark javam...@gmail.com wrote:
 That approach works although as suspected the schma has to recognise the
 additinal facet (stuff in this case):

 responseHeader:{status:400,QTime:1},error:{msg:ERROR:
 [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
 'stuff',code:400}}

 ..getting closer..

 On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote:


 Use case is

 use curl to upload/extract/index document passing in additional facets not
 present in the document e.g. literal.source=old system

 In this way some fields come from the uploaded extracted content and some
 fields as specified in the curl URL

 Hope that's clearer?

 Regards

 Mark


 On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 Sounds like 'literal.X' syntax from

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

 Can you explain your use case as different from what's already
 documented? May be easier to understand.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote:
  I'm looking to
 
  1) upload a binary document using curl
  2) add some additional facets
 
  Specifically my question is can this be achieved in 1 curl operation or
  does it need 2?
 
  On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:
 
 
  Second thoughts SID is purely i/p as its name suggests :)
 
  I think a better approach would be
 
  1) curl to upload/extract passing docID
  2) curl to update additional fields for that docID
 
 
 
  On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:
 
 
  Create the SID from the existing doc implies that a document already
  exists that you wish to add fields to.
 
  However if the document is a binary are you suggesting
 
  1) curl to upload/extract passing docID
  2) obtain a SID based off docID
  3) add addtinal fields to SID  commit
 
  I know I'm possibly wandering into the schemaless teritory here as
 well
 
 
  On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com
 wrote:
 
  I would switch the order of those. Add the new fields and *then*
 index to
  solr.
 
  We do something similar when we create SolrInputDocuments that are
 pushed
  to solr. Create the SID from the existing doc, add any additional
 fields,
  then add to solr.
 
  On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:
 
   Is it possible to use curl to upload a document (for extract 
  indexing)
   and specify some fields on the fly?
  
   sort of:
   1) index this document
   2) by the way here are some important facets whilst your at it
  
   Regards
  
   Mark
  
 
 
 
 





Re: extract and add fields on the fly

2015-01-28 Thread Alexandre Rafalovitch
Sounds like 'literal.X' syntax from
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Can you explain your use case as different from what's already
documented? May be easier to understand.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote:
 I'm looking to

 1) upload a binary document using curl
 2) add some additional facets

 Specifically my question is can this be achieved in 1 curl operation or
 does it need 2?

 On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:


 Second thoughts SID is purely i/p as its name suggests :)

 I think a better approach would be

 1) curl to upload/extract passing docID
 2) curl to update additional fields for that docID



 On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:


 Create the SID from the existing doc implies that a document already
 exists that you wish to add fields to.

 However if the document is a binary are you suggesting

 1) curl to upload/extract passing docID
 2) obtain a SID based off docID
 3) add addtinal fields to SID  commit

 I know I'm possibly wandering into the schemaless teritory here as well


 On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote:

 I would switch the order of those. Add the new fields and *then* index to
 solr.

 We do something similar when we create SolrInputDocuments that are pushed
 to solr. Create the SID from the existing doc, add any additional fields,
 then add to solr.

 On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote:

  Is it possible to use curl to upload a document (for extract 
 indexing)
  and specify some fields on the fly?
 
  sort of:
  1) index this document
  2) by the way here are some important facets whilst your at it
 
  Regards
 
  Mark
 






RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread Dyer, James
Try using something larger than 2 for alternativeTermCount.  5 is probably ok 
here.  If that doesn't work, then post the exact query you are using and the 
full extended spellcheck results.

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I have this in my solrconfig:

requestHandler name=/select class=solr.SearchHandler

lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfcatch_all/str

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest100/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str
/lst

arr name=last-components
strspellcheck/str
/arr

/requestHandler

Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, (gopro=100 docs; go pro=150 'others'
docs).
I want to suggest gopro for go pro search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: [MASSMAIL]Re: Contextual sponsored results with Solr

2015-01-28 Thread Jorge Luis Betancourt González
We are trying to avoid firing 2 queries per request. I've started to play with 
a PostFilter to see how it goes, perhaps something in the line of the 
ReRankQueryQueryParser could be used to avoid using two queries and instead 
rerank the results? 

- Original Message -
From: Ahmet Arslan iori...@yahoo.com.INVALID
To: solr-user@lucene.apache.org
Sent: Tuesday, January 27, 2015 11:06:29 PM
Subject: [MASSMAIL]Re: Contextual sponsored results with Solr

Hi Jorge,

We have done similar thing with N=3. We issue separate two queries/requests, 
display 'special N' above the results.
We excluded 'special N' with -id:(1 2 3 ... N) type query. all done on client 
side.

Ahmet



On Tuesday, January 27, 2015 8:28 PM, Jorge Luis Betancourt González 
jlbetanco...@uci.cu wrote:
Hi all,

Recently I got an interesting use case that I'm not sure how to implement, the 
idea is that the client wants a fixed number of documents, let's call it N, to 
appear in the top of the results. Let me explain a little we're working with 
web documents so the idea is too promote the documents that match the query of 
the user from a given domain (wikipedia, for example) to the top of the list. 
So if I apply a a boost using the boost parameter:

http://localhost:8983/solr/select?q=searchfl=urlboost=map(query($type1query),0,0,1,50)type1query=host:wikipedia

I get *all* the documents from the desired host at the top, but there is no way 
of limiting the number of documents from the host that are boosted to the top 
of the result list (which could lead to several pages of content from the same 
host, which is not desired, the idea is to only show N) . I was thinking in 
something like field collapsing/grouping but only for the documents that match 
my $type1query parameter (host:wikipedia) but I don't see any way of doing 
grouping/collapsing on only one group and leave the other results untouched. 

I although thought on using 2 groups using group.query=host:wikipedia and 
group.query=-host:wikipedia, but in this case there is no way of controlling 
how much documents each independently group will have.

In this particular case QueryElevationComponent it's not helping because I 
don't want to map all the posible queries I just want to put the some of the 
results from a certain host in the top of the list, but without boosting all 
the documents from the same host.

Any thoughts or recommendations on this? 

Thank you,

Regards,


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.



Re: PostingsHighlighter highlighted snippet size (fragsize)

2015-01-28 Thread Zisis Tachtsidis
It seems that a solution has been found.

PostingsHighlighter uses by default Java's SENTENCE BreakIterator so it
breaks the snippets into fragments per sentence.
In my text_en analysis chain though I was using a filter that lowercases
input and this seems to mess with the logic of SENTENCE BreakIterator.
Removing the filter did the trick.

Apart from that there is a new issue now. I'm trying to search on one field
and highlight another and this seems to not be working even If I use the
exact same analyzers for both fields. I get the correct results in the
highlighting section but there is no highlight. Digging deeper I've found
inside PostingsHighlighter.highlightFieldsAsObjects() (line 393 in version
4.10.3) that the fields to be highlighted (I guess) are the intersection of
the query terms set (fields used in the search query) and the set of fields
to be highlighted (defined by the hl.fl param). So, unless I use the field
to be highlighted in the search query I get no highlight.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PostingsHighlighter-highlighted-snippet-size-fragsize-tp4180634p4182596.html
Sent from the Solr - User mailing list archive at Nabble.com.


extract and add fields on the fly

2015-01-28 Thread Mark
Is it possible to use curl to upload a document (for extract  indexing)
and specify some fields on the fly?

sort of:
1) index this document
2) by the way here are some important facets whilst your at it

Regards

Mark


Re: How to implement Auto complete, suggestion client side

2015-01-28 Thread Olivier Austina
Hi,

Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me.

Regards
Olivier


2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com:

 You've got a lot of options depending on what you want. But since you
 seem to just want _an_ example, you can use mine from
 http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
 box there).

 You can see the source for the test screen (using Spring Boot and
 Spring Data Solr as a middle-layer) and Select2 for the UI at:
 https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
 The Solr definition is at:

 https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf

 Other implementation pieces are in that (and another) public
 repository as well, but it's all in Java. You'll probably want to do
 something similar in PHP.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 26 January 2015 at 17:11, Olivier Austina olivier.aust...@gmail.com
 wrote:
  Hi All,
 
  I would say I am new to web technology.
 
  I would like to implement auto complete/suggestion in the user search box
  as the user type in the search box (like Google for example). I am using
  Solr as database. Basically I am  familiar with Solr and I can formulate
  suggestion queries.
 
  But now I don't know how to implement suggestion in the User Interface.
  Which technologies should I need. The website is in PHP. Any suggestions,
  examples, basic tutorial is welcome. Thank you.
 
 
 
  Regards
  Olivier



Re: replicas goes in recovery mode right after update

2015-01-28 Thread Erick Erickson
Vijay:

Thanks for reporting this back!  Could I ask you to post a new patch with
your correction? Please use the same patch name
(SOLR-5850.patch), and include a note about what you found (I've already
added a comment).

Thanks!
Erick

On Wed, Jan 28, 2015 at 9:18 AM, Vijay Sekhri sekhrivi...@gmail.com wrote:

 Hi Shawn,
 Thank you so much for the assistance. Building is not a problem . Back in
 the days I have worked with linking, compiling and  building C , C++
 software . Java is a piece of cake.
 We have built the new war from the source version 4.10.3 and our
 preliminary tests have shown that our issue (replicas in recovery on high
 load)* is resolved *. We will continue to do more testing and confirm .
 Please note that the *patch is BUGGY*.

 It removed the break statement within while loop because of which, whenever
 we send a list of docs it would hang (API CloudSolrServer.add) , but it
 would work if send one doc at a time.

 It took a while to figure out why that is happening. Once we put the break
 statement back it worked like a charm.
 Furthermore the patch has

 solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java
 which should be

 solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java

 Finally checking if(!offer) is sufficient than using if(offer == false)
 Last but not the least having a configurable queue size and timeouts
 (managed via solrconfig) would be quite helpful
 Thank you once again for your help.

 Vijay

 On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey apa...@elyograg.org wrote:

  On 1/27/2015 2:52 PM, Vijay Sekhri wrote:
   Hi Shawn,
   Here is some update. We found the main issue
   We have configured our cluster to run under jetty and when we tried
 full
   indexing, we did not see the original Invalid Chunk error. However the
   replicas still went into recovery
   All this time we been trying to look into replicas logs to diagnose the
   issue. The problem seem to be at the leader side. When we looked into
   leader logs, we found the following on all the leaders
  
   3439873 [qtp1314570047-92] WARN
org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
   sending update
   *java.lang.IllegalStateException: Queue full*
 
  snip
 
   There is a similar bug reported around this
   https://issues.apache.org/jira/browse/SOLR-5850
  
   and it seem to be in OPEN status. Is there a way we can configure the
  queue
   size and increase it ? or is there a version of solr that has this
 issue
   resolved already?
   Can you suggest where we go from here to resolve this ? We can repatch
  the
   war file if that is what you would recommend .
   In the end our initial speculation about solr unable to handle so many
   update is correct. We do not see this issue when the update load is
 less.
 
  Are you in a position where you can try the patch attached to
  SOLR-5850?  You would need to get the source code for the version you're
  on (or perhaps a newer 4.x version), patch it, and build Solr yourself.
  If you have no experience building java packages from source, this might
  prove to be difficult.
 
  Thanks,
  Shawn
 
 


 --
 *
 Vijay Sekhri
 *



Re: extract and add fields on the fly

2015-01-28 Thread Mark
Thanks Alexandre,

I figured it out with this example,

https://wiki.apache.org/solr/ExtractingRequestHandler

whereby you can add additional fields at upload/extract time

curl 
http://localhost:8983/solr/update/extract?literal.id=doc4captureAttr=truedefaultField=textcapture=divfmap.div=foo_txtboost.foo_txt=3literal.blah_s=Bah;
-F tutorial=@help.pdf

and therefore I learned that you can't update a field that isn't in the
original which is what I was trying to do before.

Regards

Mark



On 28 January 2015 at 18:38, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Well, the schema does need to know what type your field is. If you
 can't add it to schema, use dynamicFields with prefixe/suffixes or
 dynamic schema (less recommended).

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 28 January 2015 at 13:32, Mark javam...@gmail.com wrote:
  That approach works although as suspected the schma has to recognise the
  additinal facet (stuff in this case):
 
  responseHeader:{status:400,QTime:1},error:{msg:ERROR:
  [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
  'stuff',code:400}}
 
  ..getting closer..
 
  On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote:
 
 
  Use case is
 
  use curl to upload/extract/index document passing in additional facets
 not
  present in the document e.g. literal.source=old system
 
  In this way some fields come from the uploaded extracted content and
 some
  fields as specified in the curl URL
 
  Hope that's clearer?
 
  Regards
 
  Mark
 
 
  On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com
  wrote:
 
  Sounds like 'literal.X' syntax from
 
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 
  Can you explain your use case as different from what's already
  documented? May be easier to understand.
 
  Regards,
 Alex.
  
  Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
  On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote:
   I'm looking to
  
   1) upload a binary document using curl
   2) add some additional facets
  
   Specifically my question is can this be achieved in 1 curl operation
 or
   does it need 2?
  
   On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote:
  
  
   Second thoughts SID is purely i/p as its name suggests :)
  
   I think a better approach would be
  
   1) curl to upload/extract passing docID
   2) curl to update additional fields for that docID
  
  
  
   On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote:
  
  
   Create the SID from the existing doc implies that a document
 already
   exists that you wish to add fields to.
  
   However if the document is a binary are you suggesting
  
   1) curl to upload/extract passing docID
   2) obtain a SID based off docID
   3) add addtinal fields to SID  commit
  
   I know I'm possibly wandering into the schemaless teritory here as
  well
  
  
   On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com
  wrote:
  
   I would switch the order of those. Add the new fields and *then*
  index to
   solr.
  
   We do something similar when we create SolrInputDocuments that are
  pushed
   to solr. Create the SID from the existing doc, add any additional
  fields,
   then add to solr.
  
   On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com
 wrote:
  
Is it possible to use curl to upload a document (for extract 
   indexing)
and specify some fields on the fly?
   
sort of:
1) index this document
2) by the way here are some important facets whilst your at it
   
Regards
   
Mark
   
  
  
  
  
 
 
 



Issue on server restarts with Solr 4.6.0 Cloud

2015-01-28 Thread andrew jenner
Using Solr 4.6.0 on linux with Java 6 (Oracle JRockit 1.6.0_75
R28.3.2-14-160877-1.6.0_75)


We are seeing these issues when doing a restart on a Solr cloud
configuration.After restarting each server in sequence none of them
will come up. The servers start up after a long time but the cloud
status shows the Solr as being down.


java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:87)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:603)
at 
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:778)
at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at 
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:71)
at 
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at 
org.apache.solr.update.TransactionLog$FSReverseReader.init(TransactionLog.java:696)
at 
org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:575)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:942)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:885)
at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1043)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:280)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


SnapPull failed :org.apache.lucene.store.AlreadyClosedException: Already closed
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:340)
at 
org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:811)
at 
org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:564)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:506)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322)
at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:433)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


Error while trying to recover.
core=[REDACTED]:org.apache.solr.common.SolrException: No registered
leader was found, collection:[REDACTED] slice:shard1
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
When leader reaches 99% physical memory on the box and starts swapping (stops 
replicating), we forcefully bring down leader (first kill -15 and then kill -9 
if kill -15 doesn't work). This is when we are looking up to replica to assume 
leader's role and it never happens. 

Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. 

cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} 
hostPort=${jetty.port:8983} hostContext=${hostContext:solr} 
zkClientTimeout=${zkClientTimeout:45000}

As per definition of zkClientTimeout, After the leader is brought down and it 
doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to 
leader? I am not sure how increasing zk timeout will help. 

 
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 28, 2015 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote:

 We're using Solr 4.8.0


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, January 27, 2015 7:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 What version of Solr? This is an ongoing area of improvements and several
 are very recent.

 Try searching the JIRA for Solr for details.

 Best,
 Erick

 On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hello,
 
  We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
  zookeeper instances. We have noticed that when a leader node goes down
 the
  replica never takes over as a leader, cloud becomes unusable and we have
 to
  bounce entire cloud for replica to assume leader role. Is this default
  behavior? How can we change this?
 
  Thanks.