Occasionally hit ArrayIndexOutOfBoundException when searching

2014-11-08 Thread Mohmed Hussain
Hey All,
We are using Solr for an enterprise product. Recently we did an upgrade
from 4.7.0 to 4.9.1 and are seeing this exception.
Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr
Cloud very soon :)). And I used maven to upgrade following is the snippet
from pom.xml
dependency
groupIdorg.apache.solr/groupId
artifactIdsolr-clustering/artifactId
version4.9.1/version
/dependency


*Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException:
java.lang.ArrayIndexOutOfBoundsException: 31
at
com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607)
~[sabasearch.jar:na]
... 196 common frames omitted
Caused by: java.lang.ArrayIndexOutOfBoundsException: 31
at org.apache.lucene.util.FixedBitSet.nextSetBit(FixedBitSet.java:294)
~[sabasearch.jar:na]
at org.apache.solr.search.DocSetBase$1$1$1.advance(DocSetBase.java:202)
~[sabasearch.jar:na]
at
org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.advance(ConstantScoreQuery.java:278)
~[sabasearch.jar:na]
at
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:69)
~[sabasearch.jar:na]
at
org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
~[sabasearch.jar:na]
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
~[sabasearch.jar:na]
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
~[sabasearch.jar:na]
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
~[sabasearch.jar:na]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
~[sabasearch.jar:na]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
~[sabasearch.jar:na]
at
org.apache.solr.search.SolrIndexSearcher.numDocs(SolrIndexSearcher.java:2040)
~[sabasearch.jar:na]
at org.apache.solr.request.SimpleFacets.rangeCount(SimpleFacets.java:1338)
~[sabasearch.jar:na]
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1262)
~[sabasearch.jar:na]
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1197)
~[sabasearch.jar:na]
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1141)
~[sabasearch.jar:na]
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:262)
~[sabasearch.jar:na]
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84)
~[sabasearch.jar:na]
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
~[sabasearch.jar:na]
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
~[sabasearch.jar:na]
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
~[sabasearch.jar:na]
at
com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:602)
~[sabasearch.jar:na]
... 196 common frames omitted

Please guide me, what must have gone wrong?


Thanks
-Hussain


Re: Solr exceptions during batch indexing

2014-11-08 Thread Anurag Sharma
Just trying to understand what's the challenge in returning the bad doc
id(s)?
Solr already know which doc(s) failed on update and can return their id(s)
in response or callback. Can we have JIRA ticket on it if it doesn't exist?

This looks like a common use case and every solr consumer might be writing
their own versions to handle this issue.

On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood wun...@wunderwood.org
wrote:

 Right, that is why we batch.

 When a batch of 1000 fails, drop to a batch size of 1 and start the batch
 over. Then it can report the exact document with problems.

 If you want to continue, go back to the bigger batch size. I usually fail
 the whole batch on one error.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/


 On Nov 7, 2014, at 11:44 AM, Peter Keegan peterlkee...@gmail.com wrote:

  I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
 single
  thread, so it's certainly worth it.
 
  Thanks,
  Peter
 
 
  On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
  And Walter has also been around for a _long_ time ;)
 
  (sorry, couldn't resist)
 
  Erick
 
  On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
  Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
 
  It isn’t to hard if the code is structured for it; retry with a batch
  size of 1.
 
  wunder
 
  On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
  Yeah, this has been an ongoing issue for a _long_ time. Basically,
  you can't. So far, people have essentially written fallback logic to
  index the docs of a failing packet one at a time and report it.
 
  I'd really like better reporting back, but we haven't gotten there
 yet.
 
  Best,
  Erick
 
  On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan peterlkee...@gmail.com
  wrote:
  How are folks handling Solr exceptions that occur during batch
  indexing?
  Solr stops parsing the docs stream when an error occurs (e.g. a doc
  with a
  missing mandatory field), and stops indexing the batch. The bad
  document is
  not identified, so it would be hard for the client to recover by
  skipping
  over it.
 
  Peter
 
 




Re: Synonymn for Numbers

2014-11-08 Thread Anurag Sharma
If you are searching for single document can a real time get on doc id
mentioned below serve your use case?
http://localhost:8983/solr/get?id=mydoc

Real time get for multiple docs:
http://localhost:8983/solr/get?id=mydocid=mydoc

On Sat, Nov 8, 2014 at 12:52 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi Group,

 I am working on implementing synonym for number like
 10,2010
 14,2014

 2 digit number to get documents with four digit, I added the above lines
 in synonym and everything works. But now I have to get for one direction,

 I tried 10=2010 but it is still gets the record belongs to 10 , if I
 search 2010. I want to get only 2010 documents if I search 2010 not 10. I
 have expand=true in the synonym filter.

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/


 If any help really appreciated.

 Thanks

 Ravi



Re: Delete data from stored documents

2014-11-08 Thread Anurag Sharma
Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: My question is if I can delete the field definition from the
 schema.xml and do an optimize and the fields “magically” disappears

 no. schema.xml is really just about regularizing how Lucene indexes
 things. Lucene (where this would have to take place) doesn't have any
 understanding of schema.xml, so changing it then optimizing (and
 optimizing is also a Lucene function) won't have any effect.

 If you
 1 change the schema
 and
 2 update documents
 the data will be purged as background merges happen.

 But really, I'd recommend re-indexing into a new collection if at all
 possible.


 Best,
 Erick

 On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com
 wrote:
  Jack,
 
 
 
 
  I have some data indexed that I don’t need any more. My question is if I
 can delete the field definition from the schema.xml and do an optimize and
 the fields “magically” disappears (and free space from disk).
 
 
 
 
  Re-index data to delete fields is to expensive in collections with
 hundreds of millions of documents.
 
 
 
 
  Optimize operation seems to be a good place to shrink to documents ...
 
 
 
  —
  /Yago Riveiro
 
  On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com
 
  wrote:
 
  Could you clarify exactly what you are trying to do, like with an
 example? I
  mean, how exactly are you determining what fields are unwanted? Are
 you
  simply asking whether fields can be deleted from the index (and schema)?
  -- Jack Krupansky
  -Original Message-
  From: yriveiro
  Sent: Thursday, November 6, 2014 9:19 AM
  To: solr-user@lucene.apache.org
  Subject: Delete data from stored documents
  Hi,
  It's possible remove store data of an index deleting the unwanted fields
  from schema.xml and after do an optimize over the index?
  Thanks,
  /yago
  -
  Best regards
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to dynamically create Solr cores with schema

2014-11-08 Thread Anurag Sharma
For more advanced dynamic fields refer dynamicField elements convention
patterns for fields from the below schema.xml
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/server/solr/configsets/basic_configs/conf/schema.xml

solr create core api can be referred to create a core dynamically. e.g.
curl 
http://localhost:8080/solr/admin/cores?action=CREATEname=$nameinstanceDir=/etc/solr/conf/$name



On Fri, Nov 7, 2014 at 10:29 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 The usual solution to that is to have dynamic fields with suffixes
 indicating the types. So, your int fields are mapped to *_i, your date
 fields to *_d.

 Solr has schemaless support, but it is auto-detect for now. Creating
 fields of particular types via API I think is in JIRA on the trunk for
 5.0.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 6 November 2014 10:04, Andreas Hubold andreas.hub...@coremedia.com
 wrote:
  Hi,
 
  I have a use-case where Java applications need to create Solr indexes
  dynamically. Schema fields of these indexes differ and should be defined
 by
  the Java application upon creation.
 
  So I'm trying to use the Core Admin API [1] to create new cores and the
  Schema API [2] to define fields. When creating a core, I have to specify
  solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema to
  start with. I thought it would be a good idea to use a named config sets
 [3]
  for this purpose:
 
  curl
  '
 http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data
 '
 
  But when I add a field to the core m1, the field actually gets added to
  the config set. Is this a bug of feature?
 
  curl http://localhost:8082/solr/m1/schema/fields -X POST -H
  'Content-type:application/json'
--data-binary '[{
  name:foo,
  type:tdate,
  stored:true
  }]'
 
  All cores created from the config set myconfig will get the new field
  foo in their schema. So this obviously does not work to create cores
 with
  different schema.
 
  I also tried to use the config/schema parameters of the CREATE core
 command
  (instead of config sets) to specify some existing
 solrconfig.xml/schema.xml.
  I tried relative paths here (e.g. some level upwards) but I could not
 get it
  to work. The documentation [1] tells me that relative paths are allowed.
  Should this work?
 
  Next thing that would come to my mind is to use dynamic fields instead
 of a
  correct managed schema, but that does not sound as nice.
  Or maybe I should implement a custom CoreAdminHandler which takes list of
  field definitions, if that's possible somehow...?
 
  I don't know. What's your recommended approach?
 
  We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or different
  with SolrCloud?
 
  Thank you,
  Andreas
 
  [1]
 
 https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE
  [2]
 
 https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema
  [3] https://cwiki.apache.org/confluence/display/solr/Config+Sets



Re: Sort documents by exist(multivalued field)

2014-11-08 Thread Anurag Sharma
Is it possible to describe the exact use case here.

On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 You encode that knowledge by using UpdateRequestProcessor. Clone the
 field, replace it with true, map it to boolean. That way, you will pay
 the price once per document indexed not (documentCount*) times per
 request.

 Regards,
   Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote:
  I want to sort by multivalued field like boolean values.
  Something like that:
  *sort exist(multivalued field name) desc*
 
  Is it possible?
 
  P.S. I know that sorting doesn't work for multivalued fields, but it work
  for single boolean field...
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Sort-documents-by-exist-multivalued-field-tp4168141.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sort documents by exist(multivalued field)

2014-11-08 Thread Yago Riveiro
Re-index data is bad for me, it's 5TB of data, the time to re-index this
data it's too much, but seem to be the only option I have.


On Sat 8 Nov 2014 at 13:10 Anurag Sharma anura...@gmail.com wrote:

 Is it possible to describe the exact use case here.

 On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  You encode that knowledge by using UpdateRequestProcessor. Clone the
  field, replace it with true, map it to boolean. That way, you will pay
  the price once per document indexed not (documentCount*) times per
  request.
 
  Regards,
Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote:
   I want to sort by multivalued field like boolean values.
   Something like that:
   *sort exist(multivalued field name) desc*
  
   Is it possible?
  
   P.S. I know that sorting doesn't work for multivalued fields, but it
 work
   for single boolean field...
  
  
  
   --
   View this message in context:
  http://lucene.472066.n3.nabble.com/Sort-documents-by-
 exist-multivalued-field-tp4168141.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Minimum Term Matching in More Like This Queries

2014-11-08 Thread Anurag Sharma
There is no direct way of retrieving doc based on minimum term match in
Solr. mlm params 'mlt.mintf' and 'mlt.match.offset' can be explored if they
meets the criteria. Refer below links for more details:
http://wiki.apache.org/solr/MoreLikeThisHandler
https://wiki.apache.org/solr/MoreLikeThis

In case you are using lucene library directly,setPercentTermsToMatch()
function can be used from MoreLikeThisQuery class.
Refer code:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThisQuery.java

On Fri, Nov 7, 2014 at 9:45 PM, Tim Hearn timseman...@gmail.com wrote:

 Hi!

 I'm fairly new to Solr.  Is there a feature which enforces minimum term
 matching for MLT Queries?  More precisely, that is, a document will match
 the MLT query if and only if at least x terms in the query are found in the
 document, with x defined by the user.  I could not find such a feature in
 the documentation, and switching to the edismax query parser and using the
 'mm' parameter does not work for me.

 Thanks!



Re: Term count in multivalue fields

2014-11-08 Thread Anurag Sharma
Since 'omitTermFremFreqAndPositions' is enabled what does a function query
'totaltermfreq(field,term)' return?

Another way, not sure this is the correct approach, while indexing add a
field containing the number. Filter and sum(function query) on the field
while querying. Range query can also be done on this field.

On Fri, Nov 7, 2014 at 7:23 PM, Nickolay41189 klin892...@yandex.ru wrote:

 Andrey, thank you for reply. Can you explain what do you mean faceting
 query
 with prefix? I'm newer in the wolrd of Solr, can you give me example of
 this query?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168167.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sort documents by first value in multivalued field

2014-11-08 Thread Anurag Sharma
What is 'first value' here, any example?

On Fri, Nov 7, 2014 at 5:04 PM, Nickolay41189 klin892...@yandex.ru wrote:

 How can I sort documents by first value in multivalued field? (without
 adding
 copyField and without some changes in schema.xml)?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sort-documents-by-first-value-in-multivalued-field-tp4168140.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr exceptions during batch indexing

2014-11-08 Thread Erick Erickson
bq: Just trying to understand what's the challenge in returning the bad doc

Mostly, nobody has done it yet. There's some complication about
async updates, ConcurrentUpdateSolrServer for instance. I suspect
also that one has to write error handling logic in the client anyway
so the motivation is reduced.

And now it would need to handle SolrCloud mode.

All that said, this has bugged me for a long time, but I haven't gotten around
to it. Which says something about the priority I suspect.

FWIW,
Erick

On Sat, Nov 8, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote:
 Just trying to understand what's the challenge in returning the bad doc
 id(s)?
 Solr already know which doc(s) failed on update and can return their id(s)
 in response or callback. Can we have JIRA ticket on it if it doesn't exist?

 This looks like a common use case and every solr consumer might be writing
 their own versions to handle this issue.

 On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood wun...@wunderwood.org
 wrote:

 Right, that is why we batch.

 When a batch of 1000 fails, drop to a batch size of 1 and start the batch
 over. Then it can report the exact document with problems.

 If you want to continue, go back to the bigger batch size. I usually fail
 the whole batch on one error.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/


 On Nov 7, 2014, at 11:44 AM, Peter Keegan peterlkee...@gmail.com wrote:

  I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
 single
  thread, so it's certainly worth it.
 
  Thanks,
  Peter
 
 
  On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
  And Walter has also been around for a _long_ time ;)
 
  (sorry, couldn't resist)
 
  Erick
 
  On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
  Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
 
  It isn’t to hard if the code is structured for it; retry with a batch
  size of 1.
 
  wunder
 
  On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
  Yeah, this has been an ongoing issue for a _long_ time. Basically,
  you can't. So far, people have essentially written fallback logic to
  index the docs of a failing packet one at a time and report it.
 
  I'd really like better reporting back, but we haven't gotten there
 yet.
 
  Best,
  Erick
 
  On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan peterlkee...@gmail.com
  wrote:
  How are folks handling Solr exceptions that occur during batch
  indexing?
  Solr stops parsing the docs stream when an error occurs (e.g. a doc
  with a
  missing mandatory field), and stops indexing the batch. The bad
  document is
  not identified, so it would be hard for the client to recover by
  skipping
  over it.
 
  Peter
 
 




Re: Sort documents by exist(multivalued field)

2014-11-08 Thread Erick Erickson
Well, if you can write a custom function that does the right thing
with multiValued fields you could sort by that.

You still haven't defined the exact use case. The problem here is
that sorting by a multiValued field is meaningless. Consider a
field with aardvark and zebra. Where should it sort? Of course you
can define rules like the minimum value of the field, which will at
least give consistent results... until the next person wants to sort
by the average of all the numbers in a field.


On Sat, Nov 8, 2014 at 5:32 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 Re-index data is bad for me, it's 5TB of data, the time to re-index this
 data it's too much, but seem to be the only option I have.


 On Sat 8 Nov 2014 at 13:10 Anurag Sharma anura...@gmail.com wrote:

 Is it possible to describe the exact use case here.

 On Fri, Nov 7, 2014 at 10:26 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  You encode that knowledge by using UpdateRequestProcessor. Clone the
  field, replace it with true, map it to boolean. That way, you will pay
  the price once per document indexed not (documentCount*) times per
  request.
 
  Regards,
Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On 7 November 2014 06:43, Nickolay41189 klin892...@yandex.ru wrote:
   I want to sort by multivalued field like boolean values.
   Something like that:
   *sort exist(multivalued field name) desc*
  
   Is it possible?
  
   P.S. I know that sorting doesn't work for multivalued fields, but it
 work
   for single boolean field...
  
  
  
   --
   View this message in context:
  http://lucene.472066.n3.nabble.com/Sort-documents-by-
 exist-multivalued-field-tp4168141.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Delete data from stored documents

2014-11-08 Thread Jack Krupansky
Agreed, but I think it would be great if Lucene and Solr provided an API to 
delete a single field for the entire index. We could file a Jira, but can 
Lucene accommodate it? Maybe we'll just have to wait for Elasticsearch to 
implement this feature!


-- Jack Krupansky

-Original Message- 
From: Anurag Sharma

Sent: Saturday, November 8, 2014 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Delete data from stored documents

Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com
wrote:


bq: My question is if I can delete the field definition from the
schema.xml and do an optimize and the fields “magically” disappears

no. schema.xml is really just about regularizing how Lucene indexes
things. Lucene (where this would have to take place) doesn't have any
understanding of schema.xml, so changing it then optimizing (and
optimizing is also a Lucene function) won't have any effect.

If you
1 change the schema
and
2 update documents
the data will be purged as background merges happen.

But really, I'd recommend re-indexing into a new collection if at all
possible.


Best,
Erick

On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com
wrote:
 Jack,




 I have some data indexed that I don’t need any more. My question is if I
can delete the field definition from the schema.xml and do an optimize and
the fields “magically” disappears (and free space from disk).




 Re-index data to delete fields is to expensive in collections with
hundreds of millions of documents.




 Optimize operation seems to be a good place to shrink to documents ...



 —
 /Yago Riveiro

 On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com

 wrote:

 Could you clarify exactly what you are trying to do, like with an
example? I
 mean, how exactly are you determining what fields are unwanted? Are
you
 simply asking whether fields can be deleted from the index (and 
 schema)?

 -- Jack Krupansky
 -Original Message-
 From: yriveiro
 Sent: Thursday, November 6, 2014 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Delete data from stored documents
 Hi,
 It's possible remove store data of an index deleting the unwanted 
 fields

 from schema.xml and after do an optimize over the index?
 Thanks,
 /yago
 -
 Best regards
 --
 View this message in context:

http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Synonymn for Numbers

2014-11-08 Thread Jack Krupansky
Are you using the synonyms for both indexing and query? It sounds like you 
want to use these synonyms only at query time. Otherwise, 10 in the index 
becomes 2010 in the index.


-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Friday, November 7, 2014 2:22 PM
To: solr-user@lucene.apache.org
Subject: Synonymn for Numbers

Hi Group,

I am working on implementing synonym for number like
10,2010
14,2014

2 digit number to get documents with four digit, I added the above lines in 
synonym and everything works. But now I have to get for one direction,


I tried 10=2010 but it is still gets the record belongs to 10 , if I search 
2010. I want to get only 2010 documents if I search 2010 not 10. I have 
expand=true in the synonym filter.


filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/



If any help really appreciated.

Thanks

Ravi 



Help with SolrCloud exceptions while recovering

2014-11-08 Thread Bruno Osiek
Hi,

I am a newbie SolrCloud enthusiast. My goal is to implement an
infrastructure to enable text analysis (clustering, classification,
information extraction, sentiment analysis, etc).

My development environment consists of one machine, quad-core processor,
16GB RAM and 1TB HD.

Have started implementing Apache Flume, Twitter as source and SolrCloud
(within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
configuration and managing cluster.

The pseudo-distributed cluster consists of one collection, three shards
each with three replicas.

Everything runs smoothly for a while. After 50.000 tweets committed
(actually CloudSolrServer commits every batch consisting of 500 documents)
randomly SolrCloud starts logging exceptions: Lucene file not found,
IndexWriter cannot be opened, replication unsuccessful and the likes.
Recovery starts with no success until replica goes down.

Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
same results.

I have looked everywhere for help before writing this email. My guess right
now is that the problem lies with SolrCloud and Zookeeper connection,
although haven't seen any such exception.

Any reference or help will be welcomed.

Cheers,
B.


Re: Solrcloud replicas do not match

2014-11-08 Thread Michal Krajňanský
Hi Erick,

I found the issue to be related to my other question (about shared
solrconfig.xml) which you also answered.

Turns out that I had set data.dir variable in solrconfig.xml to an absolute
path that coincided with a different index. So replica tried to be created
there and something nasty probably happened. When removed the variable
value, the replica starts to be created where expected (and appropriatelly
grows in size).

During this recovery process (copying 60GB of data), the Solr Admin console
is unusable however. Anything I could do about this?

Thank you a lot,


Michal

2014-11-07 20:16 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 How did you create the replica? Does the admin screen show it
 attached to the proper shard?

 What I'd do is set up my SolrCloud instance with (presumably)
 a single node (leader) and insure my searches were working.
 Then (and only then) use the Collection API ADDREPLICA
 command. You should see your replica be updated and
 be good-to-go

 Best,
 Erick

 On Fri, Nov 7, 2014 at 9:13 AM, Michal Krajňanský
 michal.krajnan...@gmail.com wrote:
  Hi all,
 
 
  I have a Solrcloud setup with a manually created collection with the
 index
  obtained via other means than Solr (data come from Lucene).
 
  I created a replica for the index and expected to see the data being
 copied
  to the replica, which does not happen. In the Admin interface I see
  something like:
 
 
  Version Gen Size   Master (Searching)
  1415379668601
  5853288
  60.13 GB
   Master (Replicable)
  1415379668601
  5853288
  -
   Slave (Searching)
  1415379668601
  3
  1.84 KB
 
  The versions seem to match. But obviously the replica only contains a
  handful of documents I indexed AFTER the replica was created.
 
  How do I replicate the documents that were already in the index? Or am I
  missing something?
 
  Best,
 
 
  Michal Krajnansky



Re: Solrcloud solrconfig.xml

2014-11-08 Thread Michal Krajňanský
Hi Erick,

Thank you for making this clearer (it helped me solve issue with
replication I asked about in different thread). However I suspect I still
do something wrong.

I am running a single Tomcat instance with two instances of Solr.

The shared solrconfig.xml contains:
dataDir${solr.data.dir:data}/dataDir

And the Tomcat contexts set the solr/home as follows:
Environment name=solr/home type=java.lang.String
value=.../solrcloud/solr1 override=true /
Environment name=solr/home type=java.lang.String
value=.../solrcloud/solr2 override=true /

The directory structure is as follows:

.../solrcloud/solr1/solr.xml
.../solrcloud/solr1/core1
.../solrcloud/solr1/core1/core.properties
.../solrcloud/solr1/core1/data

.../solrcloud/solr2/solr.xml

After having issued ADDREPLICA on the collection managed by core1, I would
expect to see the new data dir under .../solrcloud/solr2/core2/data.
However I have seen something like this: (the core names were a little
different).

...

.../solrcloud/solr2/solr.xml
.../solrcloud/solr2/core2
.../solrcloud/solr2/core2/core.properties
.../solrcloud/data(!)

I.e. the new core data dir was created relative to the parent solrcloud
folder. Makes me confused...

Best,

Michal Krajnansky


2014-11-07 19:59 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Each of those data dirs is relative to the instance in question.

 So if you're running on different machines, they're physically
 separate even though named identically.

 If you're running multiple nodes on a single machine a-la the
 getting started docs, then each one is in it's own directory
 (e.g. solr/node1, solr/node2) and since the dirs are relative
 to that directory, you get things like
 ..solr/node1/solr/gettingstarted_shard1_replica1/data
 ..solr/node2/solr/gettingstarted_shard1_replica1/data

 etc.

 Best,
 Erick

 On Fri, Nov 7, 2014 at 5:26 AM, Michal Krajňanský
 michal.krajnan...@gmail.com wrote:
  Hi Everyone,
 
 
  I am quite a bit confused about managing configuration files with
 Zookeeper
  for running Solr in cloud mode.
 
  To be precise, I was able to upload the config files (schema.xml,
  solrconfig.xml) into the Zookeeper and run Solrcloud.
 
  What confuses me are properties like data.dir, or replication request
  handlers. It seems like these should be different for each of the servers
  in the cloud. So how does it work?
 
  (I did google to understand the matter unsuccessfully.)
 
 
  Best,
 
  Michal



Solr 4.10 very slow on build()

2014-11-08 Thread Mohsen Saboorian
I have a ~4GB index which takes a minute (or over) to /build()/ when starting
server. I noticed that this happens when I upgrade from solr 4.0 to 4.10.
The index was fully rebuilt with solr 4.10 (using DIH). How can I speed up
startup time?Here is the slow part of the starting log:INFO 
141101-23:48:18.239  Loading spell index for spellchecker: wordbreakINFO 
141101-23:48:18.239  Loading suggester index for: mySuggesterINFO 
141101-23:48:18.239  reload()INFO  *141101-23:48:18.239*  build()INFO 
*141101-23:49:15.270*  [admin] webapp=null path=/admin/cores
params={_=1414873135659wt=json} status=0 QTime=11INFO  141101-23:49:22.503 
[news] Registered new searcher Searcher@28195344[news]
main{StandardDirectoryReader(segments_1b6:65731:nrt _fgm(4.10.1):C244111
_1pw(4.10.1):C191483/156:delGen=140 _1wg(4.10.1):C174054/11:delGen=11
_236(4.10.1):C1920/1:delGen=1 _23h(4.10.1):C1756
_67x(4.10.1):C2120/144:delGen=126 _23l(4.10.1):C2185/2:delGen=2
_4ch(4.10.1):C784/145:delGen=126 _3b5(4.10.1):C758/80:delGen=79
_23q(4.10.1):C3391 _97s(4.10.1):C1218/136:delGen=127
_buo(4.10.1):C1096/86:delGen=84 _eh8(4.10.1):C819/73:delGen=69
_fg8(4.10.1):C413/94:delGen=81 _geb(4.10.1):C229/5:delGen=5
_g4b(4.10.1):C130/24:delGen=23 _g6c(4.10.1):C144/15:delGen=14
_ghj(4.10.1):C21/2:delGen=2 _gj6(4.10.1):C25/3:delGen=3
_gfz(4.10.1):C10/1:delGen=1 _ghe(4.10.1):C1 _gir(4.10.1):C3/2:delGen=1
_gis(4.10.1):C2/1:delGen=1 _gja(4.10.1):C1 _gjb(4.10.1):C2/1:delGen=1
_gjd(4.10.1):C1 _gjj(4.10.1):C1 _gjo(4.10.1):C1 _gjp(4.10.1):C1
_gjq(4.10.1):C1 _gjs(4.10.1):C1)}INFO  141101-23:49:22.505  Creating new
IndexWriter...INFO  141101-23:49:22.506  Waiting until IndexWriter is
unused... core=newsINFO  141101-23:49:22.506  Closing old IndexWriter...
core=newsINFO  141101-23:49:22.650  SolrDeletionPolicy.onInit: commits:
num=1   
commit{dir=NRTCachingDirectory(MMapDirectory@/app/solr/solrhome/news/data/index
lockFactory=NativeFSLockFactory@/app/solr/solrhome/news/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_bpm,generation=15178}INFO 
141101-23:49:22.650  newest commit generation = 15178



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-very-slow-on-build-tp4168368.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to dynamically create Solr cores with schema

2014-11-08 Thread Jorge Luis Betancourt González
I remember a talk by CareerBuilder whe they wrote an API using the approach 
explained by Alexandre and they got really good results.

- Original Message -
From: Anurag Sharma anura...@gmail.com
To: solr-user@lucene.apache.org
Sent: Saturday, November 8, 2014 7:58:48 AM
Subject: Re: How to dynamically create Solr cores with schema

For more advanced dynamic fields refer dynamicField elements convention
patterns for fields from the below schema.xml
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/server/solr/configsets/basic_configs/conf/schema.xml

solr create core api can be referred to create a core dynamically. e.g.
curl 
http://localhost:8080/solr/admin/cores?action=CREATEname=$nameinstanceDir=/etc/solr/conf/$name



On Fri, Nov 7, 2014 at 10:29 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 The usual solution to that is to have dynamic fields with suffixes
 indicating the types. So, your int fields are mapped to *_i, your date
 fields to *_d.

 Solr has schemaless support, but it is auto-detect for now. Creating
 fields of particular types via API I think is in JIRA on the trunk for
 5.0.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 6 November 2014 10:04, Andreas Hubold andreas.hub...@coremedia.com
 wrote:
  Hi,
 
  I have a use-case where Java applications need to create Solr indexes
  dynamically. Schema fields of these indexes differ and should be defined
 by
  the Java application upon creation.
 
  So I'm trying to use the Core Admin API [1] to create new cores and the
  Schema API [2] to define fields. When creating a core, I have to specify
  solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema to
  start with. I thought it would be a good idea to use a named config sets
 [3]
  for this purpose:
 
  curl
  '
 http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data
 '
 
  But when I add a field to the core m1, the field actually gets added to
  the config set. Is this a bug of feature?
 
  curl http://localhost:8082/solr/m1/schema/fields -X POST -H
  'Content-type:application/json'
--data-binary '[{
  name:foo,
  type:tdate,
  stored:true
  }]'
 
  All cores created from the config set myconfig will get the new field
  foo in their schema. So this obviously does not work to create cores
 with
  different schema.
 
  I also tried to use the config/schema parameters of the CREATE core
 command
  (instead of config sets) to specify some existing
 solrconfig.xml/schema.xml.
  I tried relative paths here (e.g. some level upwards) but I could not
 get it
  to work. The documentation [1] tells me that relative paths are allowed.
  Should this work?
 
  Next thing that would come to my mind is to use dynamic fields instead
 of a
  correct managed schema, but that does not sound as nice.
  Or maybe I should implement a custom CoreAdminHandler which takes list of
  field definitions, if that's possible somehow...?
 
  I don't know. What's your recommended approach?
 
  We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or different
  with SolrCloud?
 
  Thank you,
  Andreas
 
  [1]
 
 https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE
  [2]
 
 https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema
  [3] https://cwiki.apache.org/confluence/display/solr/Config+Sets



Re: Solr 4.10 very slow on build()

2014-11-08 Thread Yonik Seeley
Try commenting out the suggester component  handler in solrconfig.xml:
https://issues.apache.org/jira/browse/SOLR-6679

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Sat, Nov 8, 2014 at 2:03 PM, Mohsen Saboorian mohs...@gmail.com wrote:
 I have a ~4GB index which takes a minute (or over) to /build()/ when starting
 server. I noticed that this happens when I upgrade from solr 4.0 to 4.10.
 The index was fully rebuilt with solr 4.10 (using DIH). How can I speed up
 startup time?Here is the slow part of the starting log:INFO
 141101-23:48:18.239  Loading spell index for spellchecker: wordbreakINFO
 141101-23:48:18.239  Loading suggester index for: mySuggesterINFO
 141101-23:48:18.239  reload()INFO  *141101-23:48:18.239*  build()INFO
 *141101-23:49:15.270*  [admin] webapp=null path=/admin/cores
 params={_=1414873135659wt=json} status=0 QTime=11INFO  141101-23:49:22.503
 [news] Registered new searcher Searcher@28195344[news]
 main{StandardDirectoryReader(segments_1b6:65731:nrt _fgm(4.10.1):C244111
 _1pw(4.10.1):C191483/156:delGen=140 _1wg(4.10.1):C174054/11:delGen=11
 _236(4.10.1):C1920/1:delGen=1 _23h(4.10.1):C1756
 _67x(4.10.1):C2120/144:delGen=126 _23l(4.10.1):C2185/2:delGen=2
 _4ch(4.10.1):C784/145:delGen=126 _3b5(4.10.1):C758/80:delGen=79
 _23q(4.10.1):C3391 _97s(4.10.1):C1218/136:delGen=127
 _buo(4.10.1):C1096/86:delGen=84 _eh8(4.10.1):C819/73:delGen=69
 _fg8(4.10.1):C413/94:delGen=81 _geb(4.10.1):C229/5:delGen=5
 _g4b(4.10.1):C130/24:delGen=23 _g6c(4.10.1):C144/15:delGen=14
 _ghj(4.10.1):C21/2:delGen=2 _gj6(4.10.1):C25/3:delGen=3
 _gfz(4.10.1):C10/1:delGen=1 _ghe(4.10.1):C1 _gir(4.10.1):C3/2:delGen=1
 _gis(4.10.1):C2/1:delGen=1 _gja(4.10.1):C1 _gjb(4.10.1):C2/1:delGen=1
 _gjd(4.10.1):C1 _gjj(4.10.1):C1 _gjo(4.10.1):C1 _gjp(4.10.1):C1
 _gjq(4.10.1):C1 _gjs(4.10.1):C1)}INFO  141101-23:49:22.505  Creating new
 IndexWriter...INFO  141101-23:49:22.506  Waiting until IndexWriter is
 unused... core=newsINFO  141101-23:49:22.506  Closing old IndexWriter...
 core=newsINFO  141101-23:49:22.650  SolrDeletionPolicy.onInit: commits:
 num=1
 commit{dir=NRTCachingDirectory(MMapDirectory@/app/solr/solrhome/news/data/index
 lockFactory=NativeFSLockFactory@/app/solr/solrhome/news/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_bpm,generation=15178}INFO
 141101-23:49:22.650  newest commit generation = 15178



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-10-very-slow-on-build-tp4168368.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Occasionally hit ArrayIndexOutOfBoundException when searching

2014-11-08 Thread anil raju
Can anyone here provide help on this? Any further logs or environment
details I can provide to help the analysis?
On Nov 8, 2014 12:31 AM, Mohmed Hussain mohd.huss...@gmail.com wrote:

 Hey All,
 We are using Solr for an enterprise product. Recently we did an upgrade
 from 4.7.0 to 4.9.1 and are seeing this exception.
 Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr
 Cloud very soon :)). And I used maven to upgrade following is the snippet
 from pom.xml
 dependency
 groupIdorg.apache.solr/groupId
 artifactIdsolr-clustering/artifactId
 version4.9.1/version
 /dependency


 *Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException:
 java.lang.ArrayIndexOutOfBoundsException: 31
 at

 com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607)
 ~[sabasearch.jar:na]
 ... 196 common frames omitted
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 31
 at org.apache.lucene.util.FixedBitSet.nextSetBit(FixedBitSet.java:294)
 ~[sabasearch.jar:na]
 at org.apache.solr.search.DocSetBase$1$1$1.advance(DocSetBase.java:202)
 ~[sabasearch.jar:na]
 at

 org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.advance(ConstantScoreQuery.java:278)
 ~[sabasearch.jar:na]
 at

 org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:69)
 ~[sabasearch.jar:na]
 at

 org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
 ~[sabasearch.jar:na]
 at
 org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
 ~[sabasearch.jar:na]
 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
 ~[sabasearch.jar:na]
 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
 ~[sabasearch.jar:na]
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
 ~[sabasearch.jar:na]
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.search.SolrIndexSearcher.numDocs(SolrIndexSearcher.java:2040)
 ~[sabasearch.jar:na]
 at org.apache.solr.request.SimpleFacets.rangeCount(SimpleFacets.java:1338)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1262)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1197)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:1141)
 ~[sabasearch.jar:na]
 at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:262)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 ~[sabasearch.jar:na]
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 ~[sabasearch.jar:na]
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
 ~[sabasearch.jar:na]
 at

 com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:602)
 ~[sabasearch.jar:na]
 ... 196 common frames omitted

 Please guide me, what must have gone wrong?


 Thanks
 -Hussain



Re: Help with SolrCloud exceptions while recovering

2014-11-08 Thread Erick Erickson
First. for tweets committing every 500 docs is much too frequent.
Especially from the client and super-especially if you have multiple
clients running. I'd recommend you just configure solrconfig this way
as a place to start and do NOT commit from any clients.
1 a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
2 a soft commit every minute

This latter governs how long it'll be between when a doc is indexed and when
can be searched.

Here's a long post about how all this works:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


As far as the rest, it's a puzzle definitely. If it continues, a complete stack
trace would be a good thing to start with.

Best,
Erick

On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek baos...@gmail.com wrote:
 Hi,

 I am a newbie SolrCloud enthusiast. My goal is to implement an
 infrastructure to enable text analysis (clustering, classification,
 information extraction, sentiment analysis, etc).

 My development environment consists of one machine, quad-core processor,
 16GB RAM and 1TB HD.

 Have started implementing Apache Flume, Twitter as source and SolrCloud
 (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
 configuration and managing cluster.

 The pseudo-distributed cluster consists of one collection, three shards
 each with three replicas.

 Everything runs smoothly for a while. After 50.000 tweets committed
 (actually CloudSolrServer commits every batch consisting of 500 documents)
 randomly SolrCloud starts logging exceptions: Lucene file not found,
 IndexWriter cannot be opened, replication unsuccessful and the likes.
 Recovery starts with no success until replica goes down.

 Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
 same results.

 I have looked everywhere for help before writing this email. My guess right
 now is that the problem lies with SolrCloud and Zookeeper connection,
 although haven't seen any such exception.

 Any reference or help will be welcomed.

 Cheers,
 B.


Re: Solrcloud replicas do not match

2014-11-08 Thread Erick Erickson
re: Solr admin console.

Hmmm, switch it to a different node? It gets you the same info
no matter which node you're pointing at in your SolrCloud

Not sure why this happens though.

Best,
Erick

On Sat, Nov 8, 2014 at 10:12 AM, Michal Krajňanský
michal.krajnan...@gmail.com wrote:
 Hi Erick,

 I found the issue to be related to my other question (about shared
 solrconfig.xml) which you also answered.

 Turns out that I had set data.dir variable in solrconfig.xml to an absolute
 path that coincided with a different index. So replica tried to be created
 there and something nasty probably happened. When removed the variable
 value, the replica starts to be created where expected (and appropriatelly
 grows in size).

 During this recovery process (copying 60GB of data), the Solr Admin console
 is unusable however. Anything I could do about this?

 Thank you a lot,


 Michal

 2014-11-07 20:16 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 How did you create the replica? Does the admin screen show it
 attached to the proper shard?

 What I'd do is set up my SolrCloud instance with (presumably)
 a single node (leader) and insure my searches were working.
 Then (and only then) use the Collection API ADDREPLICA
 command. You should see your replica be updated and
 be good-to-go

 Best,
 Erick

 On Fri, Nov 7, 2014 at 9:13 AM, Michal Krajňanský
 michal.krajnan...@gmail.com wrote:
  Hi all,
 
 
  I have a Solrcloud setup with a manually created collection with the
 index
  obtained via other means than Solr (data come from Lucene).
 
  I created a replica for the index and expected to see the data being
 copied
  to the replica, which does not happen. In the Admin interface I see
  something like:
 
 
  Version Gen Size   Master (Searching)
  1415379668601
  5853288
  60.13 GB
   Master (Replicable)
  1415379668601
  5853288
  -
   Slave (Searching)
  1415379668601
  3
  1.84 KB
 
  The versions seem to match. But obviously the replica only contains a
  handful of documents I indexed AFTER the replica was created.
 
  How do I replicate the documents that were already in the index? Or am I
  missing something?
 
  Best,
 
 
  Michal Krajnansky



Re: Occasionally hit ArrayIndexOutOfBoundException when searching

2014-11-08 Thread Mohmed Hussain
Hi All,
More analysis revealed it fails when we have indexed documents with
many Japanese characters and have indexed it using tika parser. The search
is successful when we turn OFF facets on the only one date param used.
Following is the SolrParam Query

q=(+(+((+resource_type:CURRICULUM^0.001+(+(audtype_id:audiexxx^0.001)+(+disc_from:[2014-11-09T14:27:27.137Z
TO *]^0.001 +status:200^0.001 +disp_web:true^0.001 +avail_from:[* TO
2014-11-08T14:27:27.137Z]^0.001)))(+resource_type:OFFERING^1.0+(+is_private:false^0.001
+course_avail_from:[* TO 2014-11-08T14:27:27.137Z]^0.001
+course_disp_web:true^0.001 +course_disc_from:[2014-11-08T14:27:27.137Z TO
*]^0.001 +is_recurring_course:0^0.001 +course_ispublished:true^0.001
+disp_web:true^0.001+((+offering_enroll_close:[2014-11-08T22:27:27.137Z TO
*]^0.001 +base_delivery_type:100^0.001 +endDate:[2014-11-07T22:27:27.137Z
TO *]^0.001 +offering_open_enroll:[* TO 2014-11-08T22:27:27.137Z]^0.001
+status:100^0.001)(+base_delivery_type:200^0.001
+disc_from:[2014-11-08T14:27:27.137Z TO *]^0.001 +avail_from:[* TO
2014-11-08T14:27:27.137Z]^0.001))+(audtype_id:audiexxx^0.001)))(+resource_type:CERTIFICATION^0.001+(+(+disp_web:true^0.001
+status:200^0.001 +avail_from:[* TO 2014-11-08T14:27:27.053Z]^0.001
+disc_from:[2014-11-09T14:27:27.053Z TO
*]^0.001)+(audtype_id:audiexxx^0.001+(+(+(is_alumni:0^1.0))+(description_lower:ras_course_with_content_02*^100.0
name_tokenized:ras_course_with_content_02*^1.0
name_lower:ras_course_with_content_02^1.0
course_description_lower:ras_course_with_content_02*^100.0
offering_template_no:ras_course_with_content_02*^1000.0
part_no:ras_course_with_content_02*^1000.0
tag_name:ras_course_with_content_02*^1000.0
name_tokenized:ras_course_with_content_02*^1.0
tag_name:ras_course_with_content_02*^1000.0
name_tokenized:ras_course_with_content_02*^1.0
description_lower:ras_course_with_content_02*^100.0
part_no:ras_course_with_content_02*^1000.0
tag_name:ras_course_with_content_02*^1000.0
part_no:ras_course_with_content_02*^1000.0
name_lower:ras_course_with_content_02^1.0
offering_template_no:ras_course_with_content_02*^1000.0
name_tokenized:ras_course_with_content_02*^1.0
description_lower:ras_course_with_content_02*^100.0
part_no:ras_course_with_content_02*^1000.0
name_lower:ras_course_with_content_02^1.0
tag_name:ras_course_with_content_02*^1000.0
part_no:ras_course_with_content_02*^1000.0
name_lower:ras_course_with_content_02^1.0
description_lower:ras_course_with_content_02*^100.0
name_tokenized:ras_course_with_content_02*^1.0
name_lower:ras_course_with_content_02^1.0
part_no:ras_course_with_content_02*^1000.0
keywords:ras_course_with_content_02*^1000.0
description_lower:ras_course_with_content_02*^100.0
name_lower:ras_course_with_content_02^1.0
description_lower:ras_course_with_content_02*^100.0
course_description_lower:ras_course_with_content_02*^100.0
tag_name:ras_course_with_content_02*^1000.0
keywords:ras_course_with_content_02*^1000.0
keywords:ras_course_with_content_02*^1000.0
tag_name:ras_course_with_content_02*^1000.0
name_tokenized:ras_course_with_content_02*^1.0
keywords:ras_course_with_content_02*^1000.0facet=true
hl.start=1group.limit=101facet.method=enumhl=falsecom.saba.datastore.paging.start=1
f.startDate.facet.date.start=2014-11-01T00:00:00.000ZdebugQuery=truefl=*
fl=scoref.startDate.facet.date.gap=+1MONTH
com.saba.datastore.security.userId=emplo0169692group.field=groupbyidfacet.field=lrnEventTypefacet.field=category_id_facetfacet.field=location_id_facetfacet.field=resource_typefacet.field=delivery_idfacet.field=offering_language_idhl.requireFieldMatch=truegroup.format=groupedgroup.ngroups=truecom.saba.datastore.paging.rows=60facet.mincount=1
f.startDate.facet.date.end=2015-11-30T00:00:00.000Zfacet.date=startDate
hl.end=61com.saba.datastore.security.tenantId=SomeSitefacet.sort=indexgroup=trueindexKey=SomeSite:SocialSearchIndexdatastoreId=/127.0.0.1:8098
shards=


Thanks
-Hussain

On Sat, Nov 8, 2014 at 12:11 PM, anil raju anillr...@gmail.com wrote:

 Can anyone here provide help on this? Any further logs or environment
 details I can provide to help the analysis?
 On Nov 8, 2014 12:31 AM, Mohmed Hussain mohd.huss...@gmail.com wrote:

  Hey All,
  We are using Solr for an enterprise product. Recently we did an
 upgrade
  from 4.7.0 to 4.9.1 and are seeing this exception.
  Its an EmbeddedSolrServer (know its a bad choice and are moving to Solr
  Cloud very soon :)). And I used maven to upgrade following is the snippet
  from pom.xml
  dependency
  groupIdorg.apache.solr/groupId
  artifactIdsolr-clustering/artifactId
  version4.9.1/version
  /dependency
 
 
  *Stack trace*Caused by: org.apache.solr.client.solrj.SolrServerException:
  java.lang.ArrayIndexOutOfBoundsException: 31
  at
 
 
 com.sabax.datastore.impl.DatastoreImpl$EmbeddedSolrServer.request(DatastoreImpl.java:607)
  ~[sabasearch.jar:na]
  ... 196 common frames 

Re: on regards to Solr and NoSQL storages integration

2014-11-08 Thread Jack Krupansky
There is no double storage of data - the Solr index for DataStax 
Enterprise ignores the stored attribute and only stores the primary key 
data to allow the Solr document to reference the Cassandra row, which is 
where the data is stored. The exception would be doc values, where the data 
does need to be kept in the index for efficient operation of Lucene and 
Solr, but that would only be done for fields such as facet fields and is 
under the complete control of the developer.


DataStax Enterprise also utilizes an indexing queue so that Cassandra 
inserts and updates can occur at full speed, with indexing in a background 
thread, maximizing ingestion performance.


-- Jack Krupansky

-Original Message- 
From: andrey prokopenko

Sent: Friday, November 7, 2014 5:00 AM
To: solr-user@lucene.apache.org
Subject: Re: on regards to Solr and NoSQL storages integration

Thanks for the reply. I've considered DataStax, but dropped it first due to
the commercial model they're using and second due to the integration model
they have chosen to integrate with Cassandra. In their docs (can be found
here:
http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_load_data),
they do not disclose the architecture and details of their integration
solution, yet the examination of the Solr configuration and handlers from
their distribution package has revealed that they essentially let the docs
rest both in Solr index and Cassandra storage. To safely propagating
documents on  each Solr index update to Casssandra they use their own
update handler + custom update log.
In my opinion, this is not very efficient, because it doubles docs storage
and leaves Solr index as heavy as it is currently. My approach completely
relays stored fields storage to NoSQL database, using user-defined key
unique key. This gives the users quickly do partial updates of stored but
non-indexed non-indexed fields and greatly reduces time required to
replication in case of heavy write/load.

On Wed, Nov 5, 2014 at 4:04 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:


On 5 November 2014 08:52, andrey prokopenko andrey4...@gmail.com wrote:
 I assume, there might be other developers, trying to solve similar
 problems, so I'd be interested to hear about similar attempts  issues
 encountered while trying to implement such an integration between Solr
and
 other NoSQL databases.

I think DataStax does Solr+Cassandra and Cloudera does Solr+Hadoop
with underlying content stored in the databases. Also Neo4J has
graph+search integration, but I think it's directly using Lucene
engine, not Solr.

Disclaimer: this is very high level understanding, hopefully the other
people can confirm.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853





Re: Term count in multivalue fields

2014-11-08 Thread Nickolay41189
while indexing add a field containing the number isn't suitable for my
case. I can't add new field and do indexing.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Term-count-in-multivalue-fields-tp4168138p4168400.html
Sent from the Solr - User mailing list archive at Nabble.com.