Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-20 Thread mesenthil1
Thanks Erick for clarifying ..
We are not explicitly setting the compositeId. We are using numShards=5
alone as part of the server start up. We are using uuid as unique field.

One sample id is :

possting.mongo-v2.services.com-intl-staging-c2d2a376-5e4a-11e2-8963-0026b9414f30


Not sure how it would have gone to multiple shards.  Do you have any
suggestion for fixing this. Or we need to completely rebuild the index.
When the routing key is compositeId, should we explicitly set "!" with shard
key? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162p4218296.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Ali Nazemian
Dear Erick,

Actually faceting on this field is not a user wanted application. I did
that for the purpose of testing the customized normalizer and charfilter
which I used. Therefore it just used for the purpose of testing. Anyway I
did some googling on this error and It seems that changing facet method to
enum works in other similar cases too. I dont know the differences between
fcs and enum methods on calculating facet behind the scene, but it seems
that enum works better in my case.

Best regards.

On Tue, Jul 21, 2015 at 9:08 AM, Erick Erickson 
wrote:

> This really seems like an XY problem. _Why_ are you faceting on a
> tokenized field?
> What are you really trying to accomplish? Because faceting on a generalized
> content field that's an analyzed field is often A Bad Thing. Try going
> into the
> admin UI>> Schema Browser for that field, and you'll see how many unique
> terms
> you have in that field. Faceting on that many unique terms is rarely
> useful to the
> end user, so my suspicion is that you're not doing what you think you
> are. Or you
> have an unusual use-case. Either way, we need to understand what use-case
> you're trying to support in order to respond helpfully.
>
> You say that using facet.enum works, this is very surprising. That method
> uses
> the filterCache to create a bitset for each unique term. Which is totally
> incompatible with the uninverted field error you're reporting, so I
> clearly don't
> understand something about your setup. Are you _sure_?
>
> Best,
> Erick
>
> On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian 
> wrote:
> > Dear Toke and Davidphilip,
> > Hi,
> > The fieldtype text_fa has some custom language specific normalizer and
> > charfilter, here is the schema.xml value related for this field:
> >  positionIncrementGap="100">
> >   
> >  > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> > 
> > 
> >  > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> >  > words="lang/stopwords_fa.txt" />
> >   
> >   
> >  > class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> > 
> > 
> >  > class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
> >  > words="lang/stopwords_fa.txt" />
> >   
> > 
> >
> > I did try the facet.method=enum and it works fine. Did you mean that
> > actually applying facet on analyzed field is wrong?
> >
> > Best regards.
> >
> > On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen 
> > wrote:
> >
> >> Ali Nazemian  wrote:
> >> > I have a collection of 1.6m documents in Solr 5.2.1.
> >> > [...]
> >> > Caused by: java.lang.IllegalStateException: Too many values for
> >> > UnInvertedField faceting on field content
> >> > [...]
> >> >  >> > default="noval" termVectors="true" termPositions="true"
> >> > termOffsets="true"/>
> >>
> >> You are hitting an internal limit in Solr. As davidphilip tells you, the
> >> solution is docValues, but they cannot be enabled for text fields. You
> need
> >> String fields, but the name of your field suggests that you need
> >> analyzation & tokenization, which cannot be done on String fields.
> >>
> >> > Would you please help me to solve this problem?
> >>
> >> With the information we have, it does not seem to be easy to solve: It
> >> seems like you want to facet on all terms in your index. As they need
> to be
> >> String (to use docValues), you would have to do all the splitting on
> white
> >> space, normalization etc. outside of Solr.
> >>
> >> - Toke Eskildsen
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian


Re: SOLR nrt read writes

2015-07-20 Thread Bhawna Asnani
Thanks, I tried turning off auto softCommits but that didn't help much. Still 
seeing stale results every now and then. Also load on the server very light. We 
are running this just on a test server with one or two users. I don't see any 
warning in logs whole doing softCommits and it says it successfully opened new 
searcher and registered it as main searcher. Could this be due to caching? I 
have tried to disable all in my solrconfig.

Sent from my iPhone

> On Jul 20, 2015, at 12:16 PM, Shawn Heisey  wrote:
> 
>> On 7/20/2015 9:29 AM, Bhawna Asnani wrote:
>> Thanks for your suggestions. The requirement is still the same , to be
>> able to make a change to some solr documents and be able to see it on
>> subsequent search/facet calls.
>> I am using softCommit with waitSearcher=true.
>> 
>> Also I am sending reads/writes to a single solr node only.
>> I have tried disabling caches and warmup time in logs is '0' but every
>> once in a while I do get the document just updated with stale data.
>> 
>> I went through lucene documentation and it seems opening the
>> IndexReader with the IndexWriter should make the changes visible to
>> the reader.
>> 
>> I checked solr logs no errors. I see this in logs each time
>> 'Registered new searcher Searcher@x' even before searches that had
>> the stale document. 
>> 
>> I have attached my solrconfig.xml for reference.
> 
> Your attachment made it through the mailing list processing.  Most
> don't, I'm surprised.  Some thoughts:
> 
> maxBooleanClauses has been set to 40.  This is a lot.  If you
> actually need a setting that high, then you are sending some MASSIVE
> queries, which probably means that your Solr install is exceptionally
> busy running those queries.
> 
> If the server is fairly busy, then you should increase maxTime on
> autoCommit.  I use a value of five minutes (30) ... and my server is
> NOT very busy most of the time.  A commit with openSearcher set to false
> is relatively fast, but it still has somewhat heavy CPU, memory, and
> disk I/O resource requirements.
> 
> You have autoSoftCommit set to happen after five seconds.  If updates
> happen frequently or run for very long, this is potentially a LOT of
> committing and opening new searchers.  I guess it's better than trying
> for one second, but anything more frequent than once a minute is likely
> to get you into trouble unless the system load is extremely light ...
> but as already discussed, your system load is probably not light.
> 
> For the kind of Near Real Time setup you have mentioned, where you want
> to do one or more updates, commit, and then query for the changes, you
> probably should completely remove autoSoftCommit from the config and
> *only* open new searchers with explicit soft commits.  Let autoCommit
> (with a maxTime of 1 to 5 minutes) handle durability concerns.
> 
> A lot of pieces in your config file are set to depend on java system
> properties just like the example does, but since we do not know what
> system properties have been set, we can't tell for sure what those parts
> of the config are doing.
> 
> Thanks,
> Shawn
> 


Re: Issue with using createNodeSet in Solr Cloud

2015-07-20 Thread Erick Erickson
Glad you found a solution

Best,
Erick

On Mon, Jul 20, 2015 at 3:21 AM, Savvas Andreas Moysidis
 wrote:
> Erick, spot on!
>
> The nodes had been registered in zookeeper under my network interface's IP
> address...after specifying those the command worked just fine.
>
> It was indeed the thing I thought was true that wasn't... :)
>
> Many thanks,
> Savvas
>
> On 18 July 2015 at 20:47, Erick Erickson  wrote:
>
>> P.S.
>>
>> "It ain't the things ya don't know that'll kill ya, it's the things ya
>> _do_ know that ain't so"...
>>
>> On Sat, Jul 18, 2015 at 12:46 PM, Erick Erickson
>>  wrote:
>> > Could you post your clusterstate.json? Or at least the "live nodes"
>> > section of your ZK config? (adminUI>>cloud>>tree>>live_nodes. The
>> > addresses of my nodes are things like 192.168.1.201:8983_solr. I'm
>> > wondering if you're taking your node names from the information ZK
>> > records or assuming it's 127.0.0.1
>> >
>> > On Sat, Jul 18, 2015 at 8:56 AM, Savvas Andreas Moysidis
>> >  wrote:
>> >> Thanks Eric,
>> >>
>> >> The strange thing is that although I have set the log level to "ALL" I
>> see
>> >> no error messages in the logs (apart from the line saying that the
>> response
>> >> is a 400 one).
>> >>
>> >> I'm quite confident the configset does exist as the collection gets
>> created
>> >> fine if I don't specify the createNodeSet param.
>> >>
>> >> Complete mystery..! I'll keep on troubleshooting and report back with my
>> >> findings.
>> >>
>> >> Cheers,
>> >> Savvas
>> >>
>> >> On 17 July 2015 at 02:14, Erick Erickson 
>> wrote:
>> >>
>> >>> There were a couple of cases where the "no live servers" was being
>> >>> returned when the error was something completely different. Does the
>> >>> Solr log show something more useful? And are you sure you have a
>> >>> configset named collection_A?
>> >>>
>> >>> 'cause this works (admittedly on 5.x) fine for me, and I'm quite sure
>> >>> there are bunches of automated tests that would be failing so I
>> >>> suspect it's just a misleading error being returned.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Thu, Jul 16, 2015 at 2:22 AM, Savvas Andreas Moysidis
>> >>>  wrote:
>> >>> > Hello There,
>> >>> >
>> >>> > I am trying to use the createNodeSet parameter when creating a new
>> >>> > collection but I'm getting an error when doing so.
>> >>> >
>> >>> > More specifically, I have four Solr instances running locally in
>> separate
>> >>> > JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986
>> )
>> >>> and a
>> >>> > standalone Zookeeper instance which all Solr instances point to. The
>> four
>> >>> > Solr instances have no collections added to them and are all up and
>> >>> running
>> >>> > (I can access the admin page in all of them).
>> >>> >
>> >>> > Now, I want to create a collections in only two of these four
>> instances (
>> >>> > 127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the
>> >>> > following URL:
>> >>> >
>> >>> >
>> >>>
>> http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A
>> >>> >
>> >>> > I am getting the following response:
>> >>> >
>> >>> > 
>> >>> > 
>> >>> > 400
>> >>> > 3503
>> >>> > 
>> >>> > 
>> >>> >
>> >>>
>> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> >>> > Cannot create collection collection_A. No live Solr-instances among
>> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
>> >>> 127.0.0.1:8984
>> >>> > _solr
>> >>> > 
>> >>> > 
>> >>> > 
>> >>> > Cannot create collection collection_A. No live Solr-instances among
>> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
>> >>> 127.0.0.1:8984
>> >>> > _solr
>> >>> > 
>> >>> > 400
>> >>> > 
>> >>> > 
>> >>> > 
>> >>> > Cannot create collection collection_A. No live Solr-instances among
>> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
>> >>> 127.0.0.1:8984
>> >>> > _solr
>> >>> > 
>> >>> > 400
>> >>> > 
>> >>> > 
>> >>> >
>> >>> >
>> >>> > The instances are definitely up and running (at least the admin
>> console
>> >>> can
>> >>> > be accessed as mentioned) and if I remove the createNodeSet
>> parameter the
>> >>> > collection is created as expected.
>> >>> >
>> >>> > Am I missing something obvious or is this a bug?
>> >>> >
>> >>> > The exact Solr version I'm using is 4.9.1.
>> >>> >
>> >>> > Any pointers would be much appreciated.
>> >>> >
>> >>> > Thanks,
>> >>> > Savvas
>> >>>
>>


Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-20 Thread Erick Erickson
bq: We have 130 million documents in our set up and the routing key is set as
"compositeId".

The most likely explanation is that somehow you've sent the same document out
with different routing keys. So what is the ID field (or, more generally, your
 field) for a pair of duplicated documents? My bet is that
whatever is
in front of the ! symbol is different.

As far as indexing done when all replicas for a shard are down.. it
should completely
fail and the document should be nowhere in the collection.

Best,
Erick

On Mon, Jul 20, 2015 at 4:41 AM, mesenthil1
 wrote:
> Hi All,
>
> We are using solr 4.2.1 cloud with 5 shards  set up ( 1 leader & 1 replica
> for each shard). We are seeing the following issue in our set up.
> Few of the documents are getting returned from more than one shard for
> queries. When we try to update the document, it is not updating the
> documents on both and is getting updated on single shard. Even we are unable
> to delete the document as well. Can you please clarify the following?
>
> 1. What happens if a shard(both leader and replica) goes down. If the
> document on the "died shard" is updated, will it forward the document to the
> new shard. If so, when the "died shard" comes up again, will this not be
> considered for the same hask key range?
> 2. Is there a way to fix this[removing duplicates across shards]?
>
> We have 130 million documents in our set up and the routing key is set as
> "compositeId".
>
> Senthil
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Erick Erickson
This really seems like an XY problem. _Why_ are you faceting on a
tokenized field?
What are you really trying to accomplish? Because faceting on a generalized
content field that's an analyzed field is often A Bad Thing. Try going into the
admin UI>> Schema Browser for that field, and you'll see how many unique terms
you have in that field. Faceting on that many unique terms is rarely
useful to the
end user, so my suspicion is that you're not doing what you think you
are. Or you
have an unusual use-case. Either way, we need to understand what use-case
you're trying to support in order to respond helpfully.

You say that using facet.enum works, this is very surprising. That method uses
the filterCache to create a bitset for each unique term. Which is totally
incompatible with the uninverted field error you're reporting, so I
clearly don't
understand something about your setup. Are you _sure_?

Best,
Erick

On Mon, Jul 20, 2015 at 9:32 PM, Ali Nazemian  wrote:
> Dear Toke and Davidphilip,
> Hi,
> The fieldtype text_fa has some custom language specific normalizer and
> charfilter, here is the schema.xml value related for this field:
> 
>   
>  class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> 
> 
>  class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
>  words="lang/stopwords_fa.txt" />
>   
>   
>  class="com.ictcert.lucene.analysis.fa.FarsiCharFilterFactory"/>
> 
> 
>  class="com.ictcert.lucene.analysis.fa.FarsiNormalizationFilterFactory"/>
>  words="lang/stopwords_fa.txt" />
>   
> 
>
> I did try the facet.method=enum and it works fine. Did you mean that
> actually applying facet on analyzed field is wrong?
>
> Best regards.
>
> On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen 
> wrote:
>
>> Ali Nazemian  wrote:
>> > I have a collection of 1.6m documents in Solr 5.2.1.
>> > [...]
>> > Caused by: java.lang.IllegalStateException: Too many values for
>> > UnInvertedField faceting on field content
>> > [...]
>> > > > default="noval" termVectors="true" termPositions="true"
>> > termOffsets="true"/>
>>
>> You are hitting an internal limit in Solr. As davidphilip tells you, the
>> solution is docValues, but they cannot be enabled for text fields. You need
>> String fields, but the name of your field suggests that you need
>> analyzation & tokenization, which cannot be done on String fields.
>>
>> > Would you please help me to solve this problem?
>>
>> With the information we have, it does not seem to be easy to solve: It
>> seems like you want to facet on all terms in your index. As they need to be
>> String (to use docValues), you would have to do all the splitting on white
>> space, normalization etc. outside of Solr.
>>
>> - Toke Eskildsen
>>
>
>
>
> --
> A.Nazemian


Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Ali Nazemian
Dear Toke and Davidphilip,
Hi,
The fieldtype text_fa has some custom language specific normalizer and
charfilter, here is the schema.xml value related for this field:

  





  
  





  


I did try the facet.method=enum and it works fine. Did you mean that
actually applying facet on analyzed field is wrong?

Best regards.

On Mon, Jul 20, 2015 at 8:07 PM, Toke Eskildsen 
wrote:

> Ali Nazemian  wrote:
> > I have a collection of 1.6m documents in Solr 5.2.1.
> > [...]
> > Caused by: java.lang.IllegalStateException: Too many values for
> > UnInvertedField faceting on field content
> > [...]
> >  > default="noval" termVectors="true" termPositions="true"
> > termOffsets="true"/>
>
> You are hitting an internal limit in Solr. As davidphilip tells you, the
> solution is docValues, but they cannot be enabled for text fields. You need
> String fields, but the name of your field suggests that you need
> analyzation & tokenization, which cannot be done on String fields.
>
> > Would you please help me to solve this problem?
>
> With the information we have, it does not seem to be easy to solve: It
> seems like you want to facet on all terms in your index. As they need to be
> String (to use docValues), you would have to do all the splitting on white
> space, normalization etc. outside of Solr.
>
> - Toke Eskildsen
>



-- 
A.Nazemian


Re: solr blocking and client timeout issue

2015-07-20 Thread Erick Erickson
bq: the config is set up per the NRT suggestions in the docs.
autoSoftCommit every 2 seconds and autoCommit every 10 minutes.

2 second soft commit is very aggressive, no matter what the NRT
suggestions are. My first question is whether that's really needed.
The soft commits should be as long as you can stand. And don't listen
to  your product manager who says "2 seconds is required", push back
and answer whether that's really necessary. Most people won't notice
the difference.

bq: ...we are noticing a lot higher number of hard commits than usual.

Is a client somewhere issuing a hard commit? This is rarely
recommended... And is openSearcher true or false? False is a
relatively cheap operation, true is quite expensive.

More than you want to know about hard and soft commits:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

Best,
Erick

On Mon, Jul 20, 2015 at 12:48 PM, Jeremy Ashcraft  wrote:
> heap is already at 5GB
>
> On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote:
>>
>> no swapping that I'm seeing, although we are noticing a lot higher number
>> of hard commits than usual.
>>
>> the config is set up per the NRT suggestions in the docs.  autoSoftCommit
>> every 2 seconds and autoCommit every 10 minutes.
>>
>> there have been 463 updates in the past 2 hours, all followed by hard
>> commits
>>
>> INFO  - 2015-07-20 12:26:20.979;
>> org.apache.solr.update.DirectUpdateHandler2; start
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> INFO  - 2015-07-20 12:26:21.021; org.apache.solr.core.SolrDeletionPolicy;
>> SolrDeletionPolicy.onCommit: commits: num=2
>>
>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}
>>
>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
>> INFO  - 2015-07-20 12:26:21.022; org.apache.solr.core.SolrDeletionPolicy;
>> newest commit generation = 665697
>> INFO  - 2015-07-20 12:26:21.026;
>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>> INFO  - 2015-07-20 12:26:21.026;
>> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
>> webapp=/solr path=/update params={omitHeader=false&wt=json}
>> {add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752),
>> 5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328),
>> 816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0 50
>>
>> could that be causing some of the problems?
>>
>> 
>> From: Shawn Heisey 
>> Sent: Monday, July 20, 2015 11:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: solr blocking and client timeout issue
>>
>> On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
>>>
>>> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
>>> can get production upgraded tonight.
>>>
>>> still getting the big GC pauses this morning, even after applying the
>>> GC tuning options.  Everything was fine throughout the weekend.
>>>
>>> My biggest concern is that this instance had been running with no
>>> issues for almost 2 years, but these GC issues started just last week.
>>
>> It's very possible that you're simply going to need a larger heap than
>> you have needed in the past, either because your index has grown, or
>> because your query patterns have changed and now your queries need more
>> memory.  It could even be both of these.
>>
>> At your current index size, assuming that there's nothing else on this
>> machine, you should have enough memory to raise your heap to 5GB.
>>
>> If there ARE other software pieces on this machine, then the long GC
>> pauses (along with other performance issues) could be explained by too
>> much memory allocation out of the 8GB total memory, resulting in
>> swapping at the OS level.
>>
>> Thanks,
>> Shawn
>>
>
> --
> *jeremy ashcraft*
> development manager
> EdGate Correlation Services 
> /253.853.7133 x228/


Re: Use REST API URL to update field

2015-07-20 Thread Zheng Lin Edwin Yeo
Hi Shawn,

So it means that if my following is in a text file called update.txt,

{"id":"testing_0001",

"popularity":{"inc":1}

This text file must still exist if I use the URL? Or can this information
in the text file be put directly onto the URL?

Regards,
Edwin


On 20 July 2015 at 22:04, Shawn Heisey  wrote:

> On 7/20/2015 2:06 AM, Zheng Lin Edwin Yeo wrote:
> > I'm using Solr 5.2.1, and I would like to check, is there a way to update
> > certain field by using REST API URL directly instead of using curl?
> >
> > For example, I would like to increase the "popularity" field in my index
> > each time a user click on the record.
> >
> > Currently, it can work with the curl command by having this in my text
> file
> > to be read by curl (the "id" is hard-coded here for example purpose)
> >
> > {"id":"testing_0001",
> >
> > "popularity":{"inc":1}
> >
> >
> > Is there a REST API URL that I can call to achieve the same purpose?
>
> The URL that you would use with curl *IS* the URL that you would use for
> a REST-like call.
>
> Thanks,
> Shawn
>
>


Re: Installing Banana on Solr 5.2.1

2015-07-20 Thread Shawn Heisey
On 7/20/2015 5:45 PM, Vineeth Dasaraju wrote:
> I am trying to install Banana on top of solr but haven't been able to do
> so. All the procedures that I get are for an earlier version of solr. Since
> the directory structure has changed in the new version, inspite of me
> placing the banana folder under the server/solr-webapp/webapp folder, I am
> not able to access it using the url
> localhost:8983/banana/src/index.html#/dashboard. I would appreciate it if
> someone can throw some more light into how I can do it.

I think you would also need an xml file in server/contexts that tells
Jetty how to load the application.

I cloned the git repository for banana, and I see
jetty-contexts/banana-context.xml there.  I would imagine that copying
this xml file into server/contexts and copying the banana.war generated
by "ant build-war" into server/webapps would be enough to install it.

If what I have said here is not enough to help you, then your best bet
for help with this is to talk to Lucidworks.  They know Solr REALLY well.

Thanks,
Shawn



WordDelimiterFilter Leading & Trailing Special Character

2015-07-20 Thread Sathiya N Sundararajan
Question about WordDelimiterFilter. The search behavior that we experience
with WordDelimiterFilter satisfies well, except for the case where there is
a special character either at the leading or trailing end of the term.

For instance:

*‘d&b’ *  —>  Works as expected. Finds all docs with ‘d&b’.
*‘p!nk’*  —>  Works fine as above.

But on cases when, there is a special character towards the trailing end of
the term, like ‘Yahoo!’

*‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the special
character *‘!’* stripped out.  This WordDelimiterFilter behavior is
documented
http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html

What I would like to have is, the search performed without stripping out
the leading & trailing special character. Is there a way to achieve this
behavior with WordDelimiterFilter.

This is current config that we have for the field:















thanks


Re: shareSchema property unknown in new solr.xml format

2015-07-20 Thread Yago Riveiro
Thank Hoss this was the example that I saw to configure my solr.xml




I have more than 250 cores and any resource that I can optimize is always 
welcome but if in 5.x this feature is deprecated I will try to upgrade as soon 
as possible.

Installing Banana on Solr 5.2.1

2015-07-20 Thread Vineeth Dasaraju
Hi,

I am trying to install Banana on top of solr but haven't been able to do
so. All the procedures that I get are for an earlier version of solr. Since
the directory structure has changed in the new version, inspite of me
placing the banana folder under the server/solr-webapp/webapp folder, I am
not able to access it using the url
localhost:8983/banana/src/index.html#/dashboard. I would appreciate it if
someone can throw some more light into how I can do it.

P.S.: My solr node runs on the port 8983 only.

Regards,
Vineeth


Re: Data Import Handler Stays Idle

2015-07-20 Thread Raja Pothuganti
>Yes the number of unimported matches (with IOExceptions)

What is the IOException about?

On 7/20/15, 5:10 PM, "Paden"  wrote:

>Yes the number of unimported matches. No I did not specify "false" to
>commit
>on any of my dataimporthandler. Since it defaults to true I really didn't
>take it into account though.
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp421825
>0p4218262.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple boost queries on a specific field

2015-07-20 Thread Chris Hostetter

: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/
: My first results have provider A.

: ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​B​^​1.5 
: My​ first results have provider B. Good!


: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​(​A^2.0​ B^1.5)​/
: Then my first results have provider B. It's not logical.

Why is that not logical?

If you provide us with the details from your schema about the 
provider field, and the debug=true output from your query showing the 
score explanations for the top doc of that query (and for the first "provider 
A" 
doc so we can compare) then we might be able to help explain why a "B" doc 
sows up before an "A" doc -- but you haven't provided near enough info for 
anyhitng other then a wild guess...

https://wiki.apache.org/solr/UsingMailingLists


...my best wild guess is that it has to do with either the IDF of those 
two terms, or the lengthNorm of the "provider" field for the various docs.


Most likely "bq" isn't even remotely what you want however, since it's an 
*additive* boost, and will be affected by the overall queryNorm of the 
query it's a part of -- so even if you get things dialled in just like you 
want them with a "*:*" query, you might find yourself with totlaly 
differnet results once you start using a "real" query.

Assuming every document has at most 1 "provider" then what would probably 
work best for you is to use (edismax with) something like this...

boost=max(prod(2.0, termfreq(provider,'A')),
  prod(1.5, termfreq(provider,'B')),
  prod(..., termfreq(provider,...)),
  ...)

...or if you want use edismax, then instead wrap the "boost" QParser 
arround your dismax query...

  q={!boost b=$boost v=$qq defType=dismax}
  qq=...whatever your normal dismax query is...
  ...

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser
https://cwiki.apache.org/confluence/display/solr/Function+Queries

What that will give you (in either case) is a *multiplicitive* boost by 
each of those values depending on which of those terms exists in the 
provier field -- the "prod" function multiples each value by "1" if the 
corrisponding provider string is in the term once, or "0" if that provider 
isn't in the field (hence the assumption of "at most 1 provider") and then 
the max function just picks one.

Depending on the specifics of your usecase, you could alterantive 
use sum(...) instead of max if some docs are from multiple providers, 
etc...


But the details of *why* you are currently getting the results you are 
getting, and what you consider illogical about them, are a huge factor in 
giving you good advice to move forward.



-Hoss
http://www.lucidworks.com/

Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden
Yes the number of unimported matches. No I did not specify "false" to commit
on any of my dataimporthandler. Since it defaults to true I really didn't
take it into account though. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4218250p4218262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import Handler Stays Idle

2015-07-20 Thread Raja Pothuganti
Number of Ioexceptions , are they equal to un-imported/un processed
documents?

By any chance commit set to false in import request
example:
http://localhost:8983/solr/db/dataimport?command=full-import&commit=false


Thanks
Raja

On 7/20/15, 4:51 PM, "Paden"  wrote:

>I was consistently checking the logs to see if there were any errors that
>would give me any idling. There were no errors except for a few skipped
>documents due to some Illegal IOexceptions from Tika but none of those
>occurred around the time that solr began idling. A lot of font warnings.
>But
>again. Nothing but font warnings around time of idling.
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp421825
>0p4218260.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden
I was consistently checking the logs to see if there were any errors that
would give me any idling. There were no errors except for a few skipped
documents due to some Illegal IOexceptions from Tika but none of those
occurred around the time that solr began idling. A lot of font warnings. But
again. Nothing but font warnings around time of idling. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4218250p4218260.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: set the param [facet.offset] for EVERY [facet.pivot]

2015-07-20 Thread Chris Hostetter

: HI All:I need a pagenigation with facet offset.

: There are two or more fields in [facet.pivot], but only one value 
: for [facet.offset], eg: facet.offset=10&facet.pivot=field_1,field_2. 
: In this condition, field_2 is 10's offset and then field_1 is 10's 
: offset. But what I want is field_2 is 1's offset and field_1 is 10's 
: offset. How can I fix this problem or try another way to complete?

As noted in the ref guide...

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.offsetParameter

...facet.offset supports per field overriding, just like like most (all?) 
facet options...

   facet.pivot=field_1,field_2
   f.field_2.facet.offset=10

...or using localparams (in case you are using field_2 in another 
facet.pivot param...

   facet.pivot={!key=pivot2}field_0,field_2
   facet.pivot={!key=pivot1 f.field_2.facet.offset=10}field_1,field_2
   

-Hoss
http://www.lucidworks.com/


Re: Data Import Handler Stays Idle

2015-07-20 Thread Shawn Heisey
On 7/20/2015 3:03 PM, Paden wrote:
> I'm currently trying to index about 54,000 files with the Solr Data Import
> Handler and I've got a small problem. It fetches about half (28,289) of the
> 54,000 files and it process about 14,146 documents before it stops and just
> stands idle. Here's the status output
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 0
>   },
>   "initArgs": [
> "defaults",
> [
>   "config",
>   "db-data-config.xml",
>   "update.chain",
>   "skip-empty"
> ]
>   ],
>   "command": "status",
>   "status": "idle",
>   "importResponse": "",
>   "statusMessages": {
> "Time Elapsed": "2:39:53.191",
> "Total Requests made to DataSource": "1",
> "Total Rows Fetched": "28289",
> "Total Documents Processed": "14146",
> "Total Documents Skipped": "0",
> "Full Dump Started": "2015-07-20 18:19:17"
>   }
> }
>
> it has a green arrow next to the header where it says number or documents
> fetched/process but it doesn't say that it's done indexing. It also doesn't
> have the commit line that I've seen on my other core that I indexed about
> 290 documents on. This is the second time that I have tried to index these
> files. I swung by the office this last weekend to see how the index was
> going and (I didn't write the numbers down but I guess I should have) I seem
> to remember it being pretty much at this EXACT spot when the dataimport
> handler starting being idle the last time too. Is there some line in the
> solr config that I have to change to actually commit some of the documents.
> That way so it isn't all at once? Is there some doc limit I have reached
> that I don't know exists? Are the PDF's too large and killing tika (and solr
> with it). I'm really kind of stuck here. 

What Solr version are you using, and if you look for the Solr logfile on
the disk, do you see any errors in it?  There may be a few more
questions to ask, but they will depend on the answers to those two.

You may be on to something with the idea of a PDF document that's
killing Tika.

Thanks,
Shawn



Data Import Handler Stays Idle

2015-07-20 Thread Paden
Hello,

I'm currently trying to index about 54,000 files with the Solr Data Import
Handler and I've got a small problem. It fetches about half (28,289) of the
54,000 files and it process about 14,146 documents before it stops and just
stands idle. Here's the status output

{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "initArgs": [
"defaults",
[
  "config",
  "db-data-config.xml",
  "update.chain",
  "skip-empty"
]
  ],
  "command": "status",
  "status": "idle",
  "importResponse": "",
  "statusMessages": {
"Time Elapsed": "2:39:53.191",
"Total Requests made to DataSource": "1",
"Total Rows Fetched": "28289",
"Total Documents Processed": "14146",
"Total Documents Skipped": "0",
"Full Dump Started": "2015-07-20 18:19:17"
  }
}

it has a green arrow next to the header where it says number or documents
fetched/process but it doesn't say that it's done indexing. It also doesn't
have the commit line that I've seen on my other core that I indexed about
290 documents on. This is the second time that I have tried to index these
files. I swung by the office this last weekend to see how the index was
going and (I didn't write the numbers down but I guess I should have) I seem
to remember it being pretty much at this EXACT spot when the dataimport
handler starting being idle the last time too. Is there some line in the
solr config that I have to change to actually commit some of the documents.
That way so it isn't all at once? Is there some doc limit I have reached
that I don't know exists? Are the PDF's too large and killing tika (and solr
with it). I'm really kind of stuck here. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4218250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr blocking and client timeout issue

2015-07-20 Thread Jeremy Ashcraft

heap is already at 5GB

On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote:

no swapping that I'm seeing, although we are noticing a lot higher number of 
hard commits than usual.

the config is set up per the NRT suggestions in the docs.  autoSoftCommit every 
2 seconds and autoCommit every 10 minutes.

there have been 463 updates in the past 2 hours, all followed by hard commits

INFO  - 2015-07-20 12:26:20.979; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-07-20 12:26:21.021; org.apache.solr.core.SolrDeletionPolicy; 
SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; 
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; 
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
INFO  - 2015-07-20 12:26:21.022; org.apache.solr.core.SolrDeletionPolicy; 
newest commit generation = 665697
INFO  - 2015-07-20 12:26:21.026; org.apache.solr.update.DirectUpdateHandler2; 
end_commit_flush
INFO  - 2015-07-20 12:26:21.026; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update params={omitHeader=false&wt=json} 
{add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752), 
5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328), 
816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0 50

could that be causing some of the problems?


From: Shawn Heisey 
Sent: Monday, July 20, 2015 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: solr blocking and client timeout issue

On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:

I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
can get production upgraded tonight.

still getting the big GC pauses this morning, even after applying the
GC tuning options.  Everything was fine throughout the weekend.

My biggest concern is that this instance had been running with no
issues for almost 2 years, but these GC issues started just last week.

It's very possible that you're simply going to need a larger heap than
you have needed in the past, either because your index has grown, or
because your query patterns have changed and now your queries need more
memory.  It could even be both of these.

At your current index size, assuming that there's nothing else on this
machine, you should have enough memory to raise your heap to 5GB.

If there ARE other software pieces on this machine, then the long GC
pauses (along with other performance issues) could be explained by too
much memory allocation out of the 8GB total memory, resulting in
swapping at the OS level.

Thanks,
Shawn



--
*jeremy ashcraft*
development manager
EdGate Correlation Services 
/253.853.7133 x228/


Re: solr blocking and client timeout issue

2015-07-20 Thread Jeremy Ashcraft
no swapping that I'm seeing, although we are noticing a lot higher number of 
hard commits than usual. 

the config is set up per the NRT suggestions in the docs.  autoSoftCommit every 
2 seconds and autoCommit every 10 minutes.  

there have been 463 updates in the past 2 hours, all followed by hard commits

INFO  - 2015-07-20 12:26:20.979; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-07-20 12:26:21.021; org.apache.solr.core.SolrDeletionPolicy; 
SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; 
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/solr/collection1/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; 
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
INFO  - 2015-07-20 12:26:21.022; org.apache.solr.core.SolrDeletionPolicy; 
newest commit generation = 665697
INFO  - 2015-07-20 12:26:21.026; org.apache.solr.update.DirectUpdateHandler2; 
end_commit_flush
INFO  - 2015-07-20 12:26:21.026; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update params={omitHeader=false&wt=json} 
{add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752), 
5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328), 
816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0 50

could that be causing some of the problems?


From: Shawn Heisey 
Sent: Monday, July 20, 2015 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: solr blocking and client timeout issue

On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
> can get production upgraded tonight.
>
> still getting the big GC pauses this morning, even after applying the
> GC tuning options.  Everything was fine throughout the weekend.
>
> My biggest concern is that this instance had been running with no
> issues for almost 2 years, but these GC issues started just last week.

It's very possible that you're simply going to need a larger heap than
you have needed in the past, either because your index has grown, or
because your query patterns have changed and now your queries need more
memory.  It could even be both of these.

At your current index size, assuming that there's nothing else on this
machine, you should have enough memory to raise your heap to 5GB.

If there ARE other software pieces on this machine, then the long GC
pauses (along with other performance issues) could be explained by too
much memory allocation out of the 8GB total memory, resulting in
swapping at the OS level.

Thanks,
Shawn



Re: solr blocking and client timeout issue

2015-07-20 Thread Shawn Heisey
On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
> can get production upgraded tonight.
>
> still getting the big GC pauses this morning, even after applying the
> GC tuning options.  Everything was fine throughout the weekend.
>
> My biggest concern is that this instance had been running with no
> issues for almost 2 years, but these GC issues started just last week.

It's very possible that you're simply going to need a larger heap than
you have needed in the past, either because your index has grown, or
because your query patterns have changed and now your queries need more
memory.  It could even be both of these.

At your current index size, assuming that there's nothing else on this
machine, you should have enough memory to raise your heap to 5GB.

If there ARE other software pieces on this machine, then the long GC
pauses (along with other performance issues) could be explained by too
much memory allocation out of the 8GB total memory, resulting in
swapping at the OS level.

Thanks,
Shawn



SOLR 5.1.0 FileListEntity exclude

2015-07-20 Thread Joe Fidanza

Very new to SOLR and haven't been able to find an answer to this issue:

Using SOLR 5.1.0 we have a successful FileListEntityProcessor setup but 
would like to exclude several directories that live below the baseDir.


The data-config.xml file looks like:




baseDir="/abc/def/" recursive="true" rootEntity="false" >



processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" 
format="text" onError="skip" >







So we'd like to index /abc/def/123 and /abc/def/456 but not index 
/abc/def/789 and /abc/def/xyz etc.


We can currently index all the files under /abc/def. That works fine but 
I can't figure out how to exclude entire subdirectories that have the 
same file types in them as the directories that we do want to index.


Any help  would be appreciated.

Joe

--
Joe Fidanza
609 279 6211
Systems Administrator
Center for Communications Research
805 Bunn Drive
Princeton, NJ 08540



Re: shareSchema property unknown in new solr.xml format

2015-07-20 Thread Chris Hostetter

: > I’m getting this error on startup:
: > 
: >  section of solr.xml contains 1 unknown config parameter(s): 
[shareSchema]

Pretty sure that's because it was never a supported property of the 
 section -- even in the old format of solr.xml.

it's just a top level property -- ie: create a child node for it 
directly under , outside of .


Ah ... i see, this page is giving an incorrect example...

https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format

...I'll fix that.




-Hoss
http://www.lucidworks.com/

Re: solr blocking and client timeout issue

2015-07-20 Thread Jeremy Ashcraft
I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i 
can get production upgraded tonight.


still getting the big GC pauses this morning, even after applying the GC 
tuning options.  Everything was fine throughout the weekend.


My biggest concern is that this instance had been running with no issues 
for almost 2 years, but these GC issues started just last week.


On 07/19/2015 02:48 AM, Shawn Heisey wrote:

On 7/19/2015 12:46 AM, Jeremy Ashcraft wrote:

That did the trick.  The GC tuning options also seems to be working, but
I guess we'll see when traffic ramps back up on monday.  Thanks for all
your help!

On 7/18/2015 8:16 AM, Shawn Heisey wrote:

The first thing I'd try is removing the UseLargePages option and see if
it goes away.

Glad you got the warning out of there.

Noticing that the message said "OpenJDK" I am betting you still have
OpenJDK 7u25 on the system.  I really was serious when I recommended
getting the latest Oracle Java that you could on the system.  Memory
management has seen a lot of improvement since the early Java 7 days.

There's nothing wrong with OpenJDK, as long as it's not OpenJDK 6, but
overall we do see the best results with the Oracle JVM.  If you want to
stick with OpenJDK, it would be a very good idea to get the latest v7 or
v8 version instead of the old version you've got.  7u25 is over two
years old.  Java development moves very quickly, which makes that
version a lot like ancient history.

Part of the reason that my general recommendation is Java 8 is that both
Java 6 and Java 7 have reached end of support at Oracle.  There will be
no more development on those versions.

Having said that, note that it's just a recommendation, whether you
follow that recommendation is up to you.  I found 7u25 to be a very
solid release ... but release 60 and later have better memory
management.  Don't use 7u40, 7u45, 7u51, or 7u55 -- those versions have
known bugs that DO affect Lucene/Solr.

Thanks,
Shawn



--
*jeremy ashcraft*
development manager
EdGate Correlation Services 
/253.853.7133 x228/


Re: SOLR nrt read writes

2015-07-20 Thread Shawn Heisey
On 7/20/2015 9:29 AM, Bhawna Asnani wrote:
> Thanks for your suggestions. The requirement is still the same , to be
> able to make a change to some solr documents and be able to see it on
> subsequent search/facet calls.
> I am using softCommit with waitSearcher=true.
>
> Also I am sending reads/writes to a single solr node only.
> I have tried disabling caches and warmup time in logs is '0' but every
> once in a while I do get the document just updated with stale data.
>
> I went through lucene documentation and it seems opening the
> IndexReader with the IndexWriter should make the changes visible to
> the reader.
>
> I checked solr logs no errors. I see this in logs each time
> 'Registered new searcher Searcher@x' even before searches that had
> the stale document. 
>
> I have attached my solrconfig.xml for reference.

Your attachment made it through the mailing list processing.  Most
don't, I'm surprised.  Some thoughts:

maxBooleanClauses has been set to 40.  This is a lot.  If you
actually need a setting that high, then you are sending some MASSIVE
queries, which probably means that your Solr install is exceptionally
busy running those queries.

If the server is fairly busy, then you should increase maxTime on
autoCommit.  I use a value of five minutes (30) ... and my server is
NOT very busy most of the time.  A commit with openSearcher set to false
is relatively fast, but it still has somewhat heavy CPU, memory, and
disk I/O resource requirements.

You have autoSoftCommit set to happen after five seconds.  If updates
happen frequently or run for very long, this is potentially a LOT of
committing and opening new searchers.  I guess it's better than trying
for one second, but anything more frequent than once a minute is likely
to get you into trouble unless the system load is extremely light ...
but as already discussed, your system load is probably not light.

For the kind of Near Real Time setup you have mentioned, where you want
to do one or more updates, commit, and then query for the changes, you
probably should completely remove autoSoftCommit from the config and
*only* open new searchers with explicit soft commits.  Let autoCommit
(with a maxTime of 1 to 5 minutes) handle durability concerns.

A lot of pieces in your config file are set to depend on java system
properties just like the example does, but since we do not know what
system properties have been set, we can't tell for sure what those parts
of the config are doing.

Thanks,
Shawn



Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Toke Eskildsen
Ali Nazemian  wrote:
> I have a collection of 1.6m documents in Solr 5.2.1.
> [...]
> Caused by: java.lang.IllegalStateException: Too many values for
> UnInvertedField faceting on field content
> [...]
>  default="noval" termVectors="true" termPositions="true"
> termOffsets="true"/>

You are hitting an internal limit in Solr. As davidphilip tells you, the 
solution is docValues, but they cannot be enabled for text fields. You need 
String fields, but the name of your field suggests that you need analyzation & 
tokenization, which cannot be done on String fields.

> Would you please help me to solve this problem?

With the information we have, it does not seem to be easy to solve: It seems 
like you want to facet on all terms in your index. As they need to be String 
(to use docValues), you would have to do all the splitting on white space, 
normalization etc. outside of Solr.

- Toke Eskildsen


Re: SOLR nrt read writes

2015-07-20 Thread Bhawna Asnani
Hi,

Thanks for your suggestions. The requirement is still the same , to be able
to make a change to some solr documents and be able to see it on subsequent
search/facet calls.
I am using softCommit with waitSearcher=true.

Also I am sending reads/writes to a single solr node only.
I have tried disabling caches and warmup time in logs is '0' but every once
in a while I do get the document just updated with stale data.

I went through lucene documentation and it seems opening the IndexReader
with the IndexWriter should make the changes visible to the reader.

I checked solr logs no errors. I see this in logs each time 'Registered new
searcher Searcher@x' even before searches that had the stale document.

I have attached my solrconfig.xml for reference.
Thanks.

On Wed, Jul 15, 2015 at 11:18 AM, Erick Erickson 
wrote:

> bq: The admin can also do some updates on the items and they need to see
> the
> updates almost real time.
>
> Why not give the admin control over commits and default the other commits
> to
> something reasonable? So make your defaults, say, 15 seconds (or 30 seconds
> or longer). If the admin really needs the search to be absolutely up to
> date, they can hit the "commit" button. With perhaps a little tool tip that
> "the index is up to date as of  seconds ago,
> press this button
> to see absolutely all changes in real time".
>
> That will quickly train the admins to use that button as necessary
> when they really
> _do_ need absolutely up-to-date data. My prediction: they'll issues these
> quite
> rarely. 9 times out of 10, this kind of requirement is based on faulty
> assumptions
> and/or not understanding the work flow. That said, it may be totally a
> requirement.
> But at least ask the question.
>
> Best,
> Erick
>
> On Wed, Jul 15, 2015 at 7:57 AM, Bhawna Asnani 
> wrote:
> > We are building an admin for our inventory. Using solr's faceting,
> > searching and stats functionality it provides different ways an admin can
> > look at the inventory.
> > The admin can also do some updates on the items and they need to see the
> > updates almost real time.
> >
> > Our public facing website is already built using solr so we already have
> > the api in place to work with solr.
> > We were hoping we can put a solr instance just for admin (low traffic and
> > low latency) and build the functionality.
> >
> > Thanks for your suggesstions.
> >
> > On Wed, Jul 15, 2015 at 9:37 AM, Daniel Collins 
> > wrote:
> >
> >> Just to re-iterate Charles' response with an example, we have a system
> >> which needs to be as Near RT as we can make it.  So we have application
> >> level commitWith set to 250ms.  Yes, we have to turn off a lot of
> caching,
> >> auto-warming, etc, but it was necessary to make the index as real time
> as
> >> we needed it to be.  Now we have the benefit of being able to throw a
> lot
> >> of hardware, RAM and SSDs at this in order to get any kind of sane
> search
> >> latency.
> >>
> >> We have the luxury of being able to afford that, but it comes with other
> >> problems because we have an index that is changing so fast (replicating
> to
> >> other nodes in the cloud becomes tricky, peer sync fails most of the
> time,
> >> etc.)
> >>
> >> What is your use case that requires this level of real-time access?
> >>
> >> On 15 July 2015 at 13:59, Reitzel, Charles <
> charles.reit...@tiaa-cref.org>
> >> wrote:
> >>
> >> > And, to answer your other question, yes, you can turn off
> auto-warming.
> >> > If your instance is dedicated to this client task, it may serve no
> >> purpose
> >> > or be actually counter-productive.
> >> >
> >> > In the past, I worked on a Solr-based application that committed
> >> > frequently under application control (vs. auto commit) and we turned
> off
> >> > all auto-warming and most of the caching.
> >> >
> >> > There is scant documentation in the new Solr reference (
> cwiki.apache.org
> >> ),
> >> > but the old docs cover this well and appear current enough:
> >> > https://wiki.apache.org/solr/SolrCaching
> >> >
> >> > Just a thought: would true be
> helpful
> >> > here?
> >> >
> >> > Also, since you have just inserted the documents, it sounds like you
> >> > probably could search by ID ...
> >> >
> >> > -Original Message-
> >> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> >> > Sent: Tuesday, July 14, 2015 6:04 PM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: SOLR nrt read writes
> >> >
> >> > On 7/14/2015 12:19 PM, Bhawna Asnani wrote:
> >> > > I have a use case where we have to write data into solr and
> >> > > immediately read it back.
> >> > > The read is not get by Id but a search call.
> >> > >
> >> > > I am doing a softCommit after every such write which needs to be
> >> > > visible immediately.
> >> > > However sometimes the changes are not visible immediately.
> >> > >
> >> > > We have a solr cloud but I have also tried sending reads, writes and
> >> > > commits to cloud leader only and still there is some latency.
>

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread davidphilip cherian
I think you should just make docValues=true and reindex. But be warned that
faceting is generally  not performed on field that are of type text and
tokenized.  They should be string if they are not numeric. What is analysis
chain of 'text_fa'?


On Mon, Jul 20, 2015 at 8:16 PM, Ali Nazemian  wrote:

> Dears,
> Hi,
> I have a collection of 1.6m documents in Solr 5.2.1. When I use facet on
> field of content this error will appear after around 30s of trying to
> return the results:
>
> null:org.apache.solr.common.SolrException: Exception during facet.field:
> content
> at
> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
> at
> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
> at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:285)
> at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:102)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Too many values for
> UnInvertedField faceting on field content
> at
> org.apache.lucene.uninverting.DocTermOrds.uninvert(DocTermOrds.java:509)
> at
> org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:215)
> at
> org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:206)
> at
> org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:199)
> at
> org.apache.lucene.uninverting.FieldCacheImpl$DocTermOrdsCache.createValue(FieldCacheImpl.java:946)
> at
> org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190)
> at
> org.apache.lucene.uninverting.FieldCacheImpl.getDocTermOrds(FieldCacheImpl.java:933)
> at
> org.apache.lucene.uninverting.UninvertingReader.getSortedSetDocValues(UninvertingReader.java:275)
> at
> org.apache.lucene.index.FilterLeafReader.getSortedSetDocValues(FilterLeafReader.java:454)
> at
> org.apache.lucene.index.MultiDocValues.getSortedSetValues(MultiDocValues.java:356)
> at
> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:165)
> at
> org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
> at
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:490)

java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Ali Nazemian
Dears,
Hi,
I have a collection of 1.6m documents in Solr 5.2.1. When I use facet on
field of content this error will appear after around 30s of trying to
return the results:

null:org.apache.solr.common.SolrException: Exception during facet.field: content
at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
at 
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:285)
at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:102)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Too many values for
UnInvertedField faceting on field content
at 
org.apache.lucene.uninverting.DocTermOrds.uninvert(DocTermOrds.java:509)
at 
org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:215)
at 
org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:206)
at 
org.apache.lucene.uninverting.DocTermOrds.(DocTermOrds.java:199)
at 
org.apache.lucene.uninverting.FieldCacheImpl$DocTermOrdsCache.createValue(FieldCacheImpl.java:946)
at 
org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190)
at 
org.apache.lucene.uninverting.FieldCacheImpl.getDocTermOrds(FieldCacheImpl.java:933)
at 
org.apache.lucene.uninverting.UninvertingReader.getSortedSetDocValues(UninvertingReader.java:275)
at 
org.apache.lucene.index.FilterLeafReader.getSortedSetDocValues(FilterLeafReader.java:454)
at 
org.apache.lucene.index.MultiDocValues.getSortedSetValues(MultiDocValues.java:356)
at 
org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:165)
at 
org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:490)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:386)
at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:626)
... 33 more


Here is the schema.xml related to content field:




Would you please help me to solve this problem?

Best regards.


-- 
A.Nazemian


RE: Rule for Score

2015-07-20 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Thanks Shawn, I will Check that.


Ravi

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, July 20, 2015 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Rule for Score

On 7/20/2015 8:16 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) wrote:
> Hi All, Can you someone explain or refer me right place to know more on how 
> the Score is calculated, I am seeing it has few attribute like, 
> termfrequency, document frequeuncy, weight, boost also it is say sum of , 
> product of..
> 
> Is there any example of understanding the basics of these attributes and the 
> logic for calculating the score.

If you add "debugQuery=true" to your URL request, then you will get a full 
accounting of how the score was calculated for each document in the results.  I 
think "debug=query" also works.

Looking at the source code for DefaultSimilarity is probably also helpful.  
Here's the javadoc for that class:

http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html

Much more detailed info is available by clicking on the parent class link 
(TFIDFSimilarity) on this page.

Thanks,
Shawn



Re: Basic auth

2015-07-20 Thread Steven White
Thanks for updating the wiki page.  However, my issue remains, I cannot get
Basic auth working.  Has anyone got it working, on Windows?

Steve

On Mon, Jul 20, 2015 at 9:09 AM, Shawn Heisey  wrote:

> On 7/20/2015 6:06 AM, Steven White wrote:
> > Just to be clear, the example at
> > https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example states to
> > modify the file in /example/etc/webdefault.xml and in
> > /example/etc/jetty.xml, but with Solr 5.2.1, those two files are in
> > C:\solr-5.2.1\server\etc\webdefault.xml and
> > C:\solr-5.2.1\server\etc\jetty.xml
>
> I have updated the wiki page so it has information relevant for 5.x,
> with notes for older versions.  Thanks for letting me know it was outdated!
>
> Thanks,
> Shawn
>
>


Re: Rule for Score

2015-07-20 Thread Shawn Heisey
On 7/20/2015 8:16 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) wrote:
> Hi All, Can you someone explain or refer me right place to know more on how 
> the Score is calculated, I am seeing it has few attribute like, 
> termfrequency, document frequeuncy, weight, boost also it is say sum of , 
> product of..
> 
> Is there any example of understanding the basics of these attributes and the 
> logic for calculating the score.

If you add "debugQuery=true" to your URL request, then you will get a
full accounting of how the score was calculated for each document in the
results.  I think "debug=query" also works.

Looking at the source code for DefaultSimilarity is probably also
helpful.  Here's the javadoc for that class:

http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html

Much more detailed info is available by clicking on the parent class
link (TFIDFSimilarity) on this page.

Thanks,
Shawn



Rule for Score

2015-07-20 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Hi All, Can you someone explain or refer me right place to know more on how the 
Score is calculated, I am seeing it has few attribute like, termfrequency, 
document frequeuncy, weight, boost also it is say sum of , product of..

Is there any example of understanding the basics of these attributes and the 
logic for calculating the score.

Thanks

Ravi


Re: Use REST API URL to update field

2015-07-20 Thread Shawn Heisey
On 7/20/2015 2:06 AM, Zheng Lin Edwin Yeo wrote:
> I'm using Solr 5.2.1, and I would like to check, is there a way to update
> certain field by using REST API URL directly instead of using curl?
> 
> For example, I would like to increase the "popularity" field in my index
> each time a user click on the record.
> 
> Currently, it can work with the curl command by having this in my text file
> to be read by curl (the "id" is hard-coded here for example purpose)
> 
> {"id":"testing_0001",
> 
> "popularity":{"inc":1}
> 
> 
> Is there a REST API URL that I can call to achieve the same purpose?

The URL that you would use with curl *IS* the URL that you would use for
a REST-like call.

Thanks,
Shawn



Re: shareSchema property unknown in new solr.xml format

2015-07-20 Thread Shawn Heisey
On 7/20/2015 6:40 AM, Yago Riveiro wrote:
> I’m trying to move from legacy solr.xml to new solr.xml format (version 
> 4.10.4)
> 
> I’m getting this error on startup:
> 
>  section of solr.xml contains 1 unknown config parameter(s): 
> [shareSchema]

There's somewhat confusing information in Jira on this.  One issue seems
to deprecate shareSchema in 4.4.  Deprecation in one major release
implies that the feature will be completely removed in the next major
release, but should still work until then:

https://issues.apache.org/jira/browse/SOLR-4779

Another issue talks about fixing str/bool/int detection on parameters
like shareSchema in 4.10 and 5.x (trunk at the time):

https://issues.apache.org/jira/browse/SOLR-5746

You said you're still running 4.x, specifically 4.10.4, which means that
I would expect shareSchema to still work.  Although this is technically
a bug, I don't think it's a big enough problem to warrant fixing in the
4.10 branch.

The first thing I would advise trying is to change the parameter from
str to bool.  Hopefully that will fix it.  There may have been a misstep
in the patch for SOLR-5746.

Do you actually NEED this functionality?  It is no longer there in 5.x,
so it may be a good idea to prepare for it's removal now.  The amount of
memory required for an IndexSchema object for each core should not be
prohibitively large, unless you've got thousands of cores ... but in
that case, you're already dealing with fairly significant scaling
challenges, this one may not matter much.

If you do need the functionality, then a jira is in order to work on it.
 I can't guarantee that there will be a 4.10.5 release, even if a fix is
committed to the 4.10 branch.

I have not examined the code to see what's actually happening.  Your
response will determine whether I take a look.

Thanks,
Shawn



Re: Basic auth

2015-07-20 Thread Shawn Heisey
On 7/20/2015 6:06 AM, Steven White wrote:
> Just to be clear, the example at
> https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example states to
> modify the file in /example/etc/webdefault.xml and in
> /example/etc/jetty.xml, but with Solr 5.2.1, those two files are in
> C:\solr-5.2.1\server\etc\webdefault.xml and
> C:\solr-5.2.1\server\etc\jetty.xml

I have updated the wiki page so it has information relevant for 5.x,
with notes for older versions.  Thanks for letting me know it was outdated!

Thanks,
Shawn



shareSchema property unknown in new solr.xml format

2015-07-20 Thread Yago Riveiro
Hi,


I’m trying to move from legacy solr.xml to new solr.xml format (version 4.10.4)


I’m getting this error on startup:


 section of solr.xml contains 1 unknown config parameter(s): 
[shareSchema]


The las documentation has an entry for this property …


I have the property configured as:


${shareSchema:true}





The stack trace:



ERROR - localhost - 2015-07-20 12:32:05.132; 
org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException:  section of solr.xml 
contains 1 unknown config parameter(s): [shareSchema]
        at 
org.apache.solr.core.ConfigSolrXml.errorOnLeftOvers(ConfigSolrXml.java:242)
        at 
org.apache.solr.core.ConfigSolrXml.fillSolrCloudSection(ConfigSolrXml.java:167)
        at 
org.apache.solr.core.ConfigSolrXml.fillPropMap(ConfigSolrXml.java:120)
        at org.apache.solr.core.ConfigSolrXml.(ConfigSolrXml.java:53)
        at org.apache.solr.core.ConfigSolr.fromConfig(ConfigSolr.java:108)
        at org.apache.solr.core.ConfigSolr.fromInputStream(ConfigSolr.java:93)
        at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:70)
        at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:103)
        at 
org.apache.solr.servlet.SolrDispatchFilter.loadConfigSolr(SolrDispatchFilter.java:156)
        at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:187)
        at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:136)
        at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
        at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:265)
        at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1252)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:710)
        at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:494)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:39)
        at 
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:186)
        at 
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:494)
        at 
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:141)
        at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:145)
        at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:56)
        at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:609)
        at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:540)
        at org.eclipse.jetty.util.Scanner.scan(Scanner.java:403)
        at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:337)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:121)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:555)
        at 
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:230)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:81)
        at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:58)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:96)
        at org.eclipse.jetty.server.Server.doStart(Server.java:280)
        at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at 
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1259)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jetty.start.Main.invokeMain(Main.java:473)
        at org.eclipse.jetty.start.Main.start(Main.java:615)
        at org.eclipse.jetty.start.Main.main(Main.java:96)

—/Yago Riveiro

Re: Basic auth

2015-07-20 Thread Steven White
Hi Everyone,

I don't mean to hijack this thread, but I have auth issues that maybe
related to this topic.

I'm on 5.2.1 and trying to setup basic auth using jetty realm per
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example  And I found
other examples on the web which are very similar to the above link.  In
addition, I found Jetty's own basic auth setting at:
http://wiki.eclipse.org/Jetty/Tutorial/Realms

Problem is, no matter what I do I cannot get it to work, I get HTTP Error
404 when I try to access Solr's URL and when I look in
C:\Solr\solr-5.2.1\server\logs\solr.log this is all that I see:

INFO  - 2015-07-20 02:16:12.065; [   ] org.eclipse.jetty.util.log.Log;
Logging initialized @286ms
INFO  - 2015-07-20 02:16:12.231; [   ] org.eclipse.jetty.server.Server;
jetty-9.2.10.v20150310
WARN  - 2015-07-20 02:16:12.240; [   ]
org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
INFO  - 2015-07-20 02:16:12.255; [   ]
org.eclipse.jetty.server.AbstractConnector; Started ServerConnector@5a5fae16
{HTTP/1.1}{0.0.0.0:8983}
INFO  - 2015-07-20 02:16:12.256; [   ] org.eclipse.jetty.server.Server;
Started @478ms

Just to be clear, the example at
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example states to
modify the file in /example/etc/webdefault.xml and in
/example/etc/jetty.xml, but with Solr 5.2.1, those two files are in
C:\solr-5.2.1\server\etc\webdefault.xml and
C:\solr-5.2.1\server\etc\jetty.xml

Lastly, I'm doing the above on Windows.

Thanks

Steve

On Sun, Jul 19, 2015 at 2:20 PM, Erick Erickson 
wrote:

> You're mixing up a couple of things. The Drupal is specific to, well,
> Drupal. You'd probably be best off asking about that on the Drupal
> lists.
>
> SOLR-4470 has not been committed yet, so you can't really use it. This
> may have been superceded by SOLR-7274 and there's a link to the Wiki
> that points to:
> https://cwiki.apache.org/confluence/display/solr/Security
>
> This is all quite new, not sure how much is written in the way of docs.
>
> Best,
> Erick
>
> On Sun, Jul 19, 2015 at 9:35 AM,   wrote:
> > I followed this guide:
> >
> http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr
> >
> > But there is some something wrong, can anyone help or refer to a guide
> on how to setup http basic auth?
> >
> > Regards
> >
> >> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
> >>
> >> SOLR-4470 is about:
> >> Support for basic auth in internal Solr  requests.
> >>
> >> What is wrong with the internal requests?
> >> Can someone help simplify, would it ever be possible to run with basic
> auth? What work arounds?
> >>
> >> Regards
>


Solr Cloud: Duplicate documents in multiple shards

2015-07-20 Thread mesenthil1
Hi All,

We are using solr 4.2.1 cloud with 5 shards  set up ( 1 leader & 1 replica
for each shard). We are seeing the following issue in our set up.  
Few of the documents are getting returned from more than one shard for
queries. When we try to update the document, it is not updating the
documents on both and is getting updated on single shard. Even we are unable
to delete the document as well. Can you please clarify the following?

1. What happens if a shard(both leader and replica) goes down. If the
document on the "died shard" is updated, will it forward the document to the
new shard. If so, when the "died shard" comes up again, will this not be
considered for the same hask key range?  
2. Is there a way to fix this[removing duplicates across shards]?

We have 130 million documents in our set up and the routing key is set as
"compositeId".

Senthil





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with using createNodeSet in Solr Cloud

2015-07-20 Thread Savvas Andreas Moysidis
Erick, spot on!

The nodes had been registered in zookeeper under my network interface's IP
address...after specifying those the command worked just fine.

It was indeed the thing I thought was true that wasn't... :)

Many thanks,
Savvas

On 18 July 2015 at 20:47, Erick Erickson  wrote:

> P.S.
>
> "It ain't the things ya don't know that'll kill ya, it's the things ya
> _do_ know that ain't so"...
>
> On Sat, Jul 18, 2015 at 12:46 PM, Erick Erickson
>  wrote:
> > Could you post your clusterstate.json? Or at least the "live nodes"
> > section of your ZK config? (adminUI>>cloud>>tree>>live_nodes. The
> > addresses of my nodes are things like 192.168.1.201:8983_solr. I'm
> > wondering if you're taking your node names from the information ZK
> > records or assuming it's 127.0.0.1
> >
> > On Sat, Jul 18, 2015 at 8:56 AM, Savvas Andreas Moysidis
> >  wrote:
> >> Thanks Eric,
> >>
> >> The strange thing is that although I have set the log level to "ALL" I
> see
> >> no error messages in the logs (apart from the line saying that the
> response
> >> is a 400 one).
> >>
> >> I'm quite confident the configset does exist as the collection gets
> created
> >> fine if I don't specify the createNodeSet param.
> >>
> >> Complete mystery..! I'll keep on troubleshooting and report back with my
> >> findings.
> >>
> >> Cheers,
> >> Savvas
> >>
> >> On 17 July 2015 at 02:14, Erick Erickson 
> wrote:
> >>
> >>> There were a couple of cases where the "no live servers" was being
> >>> returned when the error was something completely different. Does the
> >>> Solr log show something more useful? And are you sure you have a
> >>> configset named collection_A?
> >>>
> >>> 'cause this works (admittedly on 5.x) fine for me, and I'm quite sure
> >>> there are bunches of automated tests that would be failing so I
> >>> suspect it's just a misleading error being returned.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Jul 16, 2015 at 2:22 AM, Savvas Andreas Moysidis
> >>>  wrote:
> >>> > Hello There,
> >>> >
> >>> > I am trying to use the createNodeSet parameter when creating a new
> >>> > collection but I'm getting an error when doing so.
> >>> >
> >>> > More specifically, I have four Solr instances running locally in
> separate
> >>> > JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986
> )
> >>> and a
> >>> > standalone Zookeeper instance which all Solr instances point to. The
> four
> >>> > Solr instances have no collections added to them and are all up and
> >>> running
> >>> > (I can access the admin page in all of them).
> >>> >
> >>> > Now, I want to create a collections in only two of these four
> instances (
> >>> > 127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the
> >>> > following URL:
> >>> >
> >>> >
> >>>
> http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A
> >>> >
> >>> > I am getting the following response:
> >>> >
> >>> > 
> >>> > 
> >>> > 400
> >>> > 3503
> >>> > 
> >>> > 
> >>> >
> >>>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> >>> > Cannot create collection collection_A. No live Solr-instances among
> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
> >>> 127.0.0.1:8984
> >>> > _solr
> >>> > 
> >>> > 
> >>> > 
> >>> > Cannot create collection collection_A. No live Solr-instances among
> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
> >>> 127.0.0.1:8984
> >>> > _solr
> >>> > 
> >>> > 400
> >>> > 
> >>> > 
> >>> > 
> >>> > Cannot create collection collection_A. No live Solr-instances among
> >>> > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,
> >>> 127.0.0.1:8984
> >>> > _solr
> >>> > 
> >>> > 400
> >>> > 
> >>> > 
> >>> >
> >>> >
> >>> > The instances are definitely up and running (at least the admin
> console
> >>> can
> >>> > be accessed as mentioned) and if I remove the createNodeSet
> parameter the
> >>> > collection is created as expected.
> >>> >
> >>> > Am I missing something obvious or is this a bug?
> >>> >
> >>> > The exact Solr version I'm using is 4.9.1.
> >>> >
> >>> > Any pointers would be much appreciated.
> >>> >
> >>> > Thanks,
> >>> > Savvas
> >>>
>


Use REST API URL to update field

2015-07-20 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.2.1, and I would like to check, is there a way to update
certain field by using REST API URL directly instead of using curl?

For example, I would like to increase the "popularity" field in my index
each time a user click on the record.

Currently, it can work with the curl command by having this in my text file
to be read by curl (the "id" is hard-coded here for example purpose)

{"id":"testing_0001",

"popularity":{"inc":1}


Is there a REST API URL that I can call to achieve the same purpose?


Regards,
Edwin


Reload cause core to become not functional

2015-07-20 Thread Nir Barel
Hi,

We are using RELOAD command with parameters as described in patch: ( BTW, why 
isn't part of a version? )
https://issues.apache.org/jira/browse/SOLR-6063

http://localhost:port/solr/admin/cores?action=RELOAD&core=CORE_NAME&transient=true&loadOnStartup=false

We run this every midnight to transient cores old than X days
in some cases those cores already transient and SOLR never loaded them before, 
in that case we get an exception of:
2015-07-20 00:00:09,391 ERROR [qtp860955776-12068] (SolrException.java:109) - 
org.apache.solr.common.SolrException: Core with core name [CORE NAME] does not 
exist. ( * full exception below )

Than if we tries to query those cores or view them in Solr WebUI we get a 
NullPointerException and POSSIBLE RESOURCE LEAK

2015-07-20 09:06:53,139  WARN [qtp860955776-21404] (ManagedResource.java:183) - 
No stored data found for /rest/managed
2015-07-20 09:06:53,139  WARN [qtp860955776-21404] (ManagedResource.java:109) - 
No registered observers for /rest/managed
2015-07-20 09:07:06,826 ERROR [Finalizer thread] (SolrIndexWriter.java:187) - 
SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE 
RESOURCE LEAK!!!
2015-07-20 09:07:06,827 ERROR [Finalizer thread] (SolrIndexWriter.java:140) - 
Error closing IndexWriter, trying rollback
java.lang.NullPointerException
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:985)
at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:935)
at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:897)
at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:132)
at 
org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:188)
at java.lang.J9VMInternals.runFinalize(J9VMInternals.java:489)
2015-07-20 09:07:08,555  WARN [qtp860955776-21147] (ManagedResource.java:183) - 
No stored data found for /rest/managed
2015-07-20 09:07:08,555  WARN [qtp860955776-21147] (ManagedResource.java:109) - 
No registered observers for /rest/managed


We also see this exception some time:

2015-07-10 02:36:22,134 ERROR [Finalizer thread] (SolrCore.java:1142) - 
REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@b4acaf07 
(other_2015-07-08) has a reference count of 1
2015-07-10 02:36:22,134 ERROR [Finalizer thread] (ConcurrentLRUCache.java:627) 
- ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- 
POSSIBLE RESOURCE LEAK!!!
2015-07-10 02:36:22,134 ERROR [Finalizer thread] (ConcurrentLRUCache.java:627) 
- ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- 
POSSIBLE RESOURCE LEAK!!!


I am trying to understand how to search for the cause, I think it probably a 
bug in the patch we applied but I couldn't find it yet.

---

*Full exception:
2015-07-20 00:00:09,391 ERROR [qtp860955776-12068] (SolrException.java:109) - 
org.apache.solr.common.SolrException: Core with core name [CORE NAME] does not 
exist.
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:755)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:224)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:184)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:726)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollec