Trim trailing whitespaces

2016-04-12 Thread Srinivas Kashyap
Hi,

When i index the data, the data is coming with trailing whitespaces.

How should i remove them? In schema.xml fieldtype for below fields are 
"string". Please suggest.

"response": {
"numFound": 40327,
"start": 0,
"docs": [
  {
"TECHSPEC.REQUEST_NO": "HQ22   ",
"TECH_SPEC_ID": "HQ22   ",
"DOCUMENT_TYPE": "TECHSPEC",
"TECHSPEC.OWNER": "SHOP ",
"timestamp": "2016-04-13T05:01:58.408Z"
  },


Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
'Block A' , No 59/2, 2nd Floor, 100 Feet Ring Road,
Kadirenahalli, Padmanabhanagar
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Re: SOLR Upgrade 3.x to 4.10

2016-04-12 Thread Shawn Heisey
On 4/12/2016 6:10 AM, abhi Abhishek wrote:
> I have SOLR 3.6 running currently, i am planning to upgrade this to
> SOLR 4.10. Below were the thoughts we could come up with.
>
> 1. in place upgrade
>I would be making the SOLR 4.10 slave of 3.6 and copy the indexes,
> and optimize this index.
>
>   will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene
> 4.10) change the index structure to Lucene 4.10? if not what would be the
> version?

Yes, the optimize will change the index structure, but the contents of
the index will not change, even if changes in Solr's analysis components
would have resulted in different info going into the index based on your
schema.  Because the *query* analysis may also change with the upgrade,
this might cause queries to no longer work the same, unless you reindex
and verify that your analysis still does what you require.  A few
changes to analysis components in later versions can be changed back to
earlier behavior with luceneMatchVersion, but this typically only
happens with big changes -- such as the major bugfix for
WordDelimiterFilterFactory in version 4.8.

Reindexing for all upgrades is recommended when possible.

>   if i enable docvalues on certain fields before issuing optimize, will
> it be able to incorporate ( create .dvd & .dvm files ) that in the newly
> created index?

No.  You must entirely reindex to add docValues.  Optimize just rewrites
what's already present in the Lucene index.

> 2. Re-Index the data
>
> Seeking advice for minimum time to upgrade this with most features of SOLR
> 4.10

This is impossible to answer.  It will depend on how long it takes to
index your data.  That is very difficult to predict even if a lot of
information is available.

Thanks,
Shawn



Re: Cache problem

2016-04-12 Thread Shawn Heisey
On 4/12/2016 3:35 AM, Bastien Latard - MDPI AG wrote:
> Thank you both, Bill and Reth!
>
> Here is my current options from my command to launch java:
> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
> -XX:MaxPermSize=20480m [...]*
>
> So should I do *-Xms20480m -Xmx20480m*?
> Why? What would it change?

You do *NOT* need a 10GB permsize.  That's a definite waste of memory --
most of it will never get used.  It's probably best to let Java handle
the permgen.  This generation is entirely eliminated in Java 8.  In Java
7, the permsize usually doesn't need adjusting ... but if it does, Solr
probably wouldn't even start without an adjustment.

Regarding something said in another reply on this thread:  The
documentCache *does* live in the Java heap, not the OS memory.  The OS
caches the actual index files, and documentCache is maintained by Solr
itself, separately from that.

It is highly unlikely that you will ever need a 40GB heap.  You might
not even need a 20GB heap.  As I said earlier:  Based on what I saw in
your screenshots, I think you can run with an 8g heap (-Xms8g -Xmx8g),
but you might need to try 12g instead.

Thanks,
Shawn



Re: Arguments for and against putting solr.xml into Zookeeper?

2016-04-12 Thread Shawn Heisey
On 4/12/2016 2:20 PM, John Bickerstaff wrote:
> I'm wondering if anyone can comment on arguments for and against putting
> solr.xml into Zookeeper?
>
> I assume one argument for doing so is that I would then have all
> configuration in one place.
>
> I also assume that if it doesn't get included as part of the upconfig
> command, there is likely a reason?

If you want the *exact* same file to be used by all SolrCloud nodes,
especially if your cluster is fairly dynamic, then having solr.xml in
zookeeper makes this easier.  Because SolrCloud does not function
without zookeeper, having a critical server-level configuration file
stored there doesn't require an extra dependency.

If each node has a different solr.xml file, then you wouldn't want it to
be stored in zookeeper.

The solr.xml file configures a Solr server at a global level -- the
'upconfig' command is for uploading configurations for collections,
which (with the notable exception of maxBooleanClauses) does not include
any global config.

Thanks,
Shawn



Re: Arguments for and against putting solr.xml into Zookeeper?

2016-04-12 Thread Alexandre Rafalovitch
The relevant JIRA is SOLR-7735 and its references. Maybe that would be
useful as the background.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 13 April 2016 at 06:20, John Bickerstaff  wrote:
> Hello all,
>
> I'm wondering if anyone can comment on arguments for and against putting
> solr.xml into Zookeeper?
>
> I assume one argument for doing so is that I would then have all
> configuration in one place.
>
> I also assume that if it doesn't get included as part of the upconfig
> command, there is likely a reason?
>
> Thanks...


Re: SOLR Upgrade 3.x to 4.10

2016-04-12 Thread Erick Erickson
I would always re-index if possible, it's more certain than upgrading
the indexes. It's "not possible" when it takes very long

And why go for 4.10 rather than 5.5 (Note, 5.5.1 will be out Real Soon
Now). If you can re-index, I'd really think about upgrading to 5.5.1
and going from there.

Best,
Erick

On Tue, Apr 12, 2016 at 5:10 AM, abhi Abhishek  wrote:
> Hi All,
> I have SOLR 3.6 running currently, i am planning to upgrade this to
> SOLR 4.10. Below were the thoughts we could come up with.
>
> 1. in place upgrade
>I would be making the SOLR 4.10 slave of 3.6 and copy the indexes,
> and optimize this index.
>
>   will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene
> 4.10) change the index structure to Lucene 4.10? if not what would be the
> version?
>   if i enable docvalues on certain fields before issuing optimize, will
> it be able to incorporate ( create .dvd & .dvm files ) that in the newly
> created index?
>
>
> 2. Re-Index the data
>
> Seeking advice for minimum time to upgrade this with most features of SOLR
> 4.10
>
> Thanks in Advance
>
> Best Regards,
> Abhishek


Re: Bad Request

2016-04-12 Thread Erick Erickson
The Solr logs themselves may give you a better error message.

Best,
Erick

On Tue, Apr 12, 2016 at 6:37 AM, Robert Brown  wrote:
> Hi,
>
> My collection had issues earlier, 1 shard showed as Down, the other only
> replica was Gone.
>
> Both were actually still up and running, no disk or CPU issues.
>
> This occurred during updates.
>
> The server since recovered after a reboot.
>
> Upon trying to update the index again, I'm now getting constant Bad
> Requests.
>
> Does anyone know what the issue could be, and/or how to resolve it?
>
> org.apache.solr.common.SolrException: Bad Request
>
> request:
> http://hostname:8983/solr/de_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2Fhostname%3A8983%2Fsolr%2Fde_shard2_replica2%2F=javabin=2
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:287)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:160)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I also occasionaly get "Exception writing document id be-de-109513-307573357
> to the index; possible analysis error.' which was the first bunch of errors
> I saw.
>
> Thanks,
> Rob
>
>


Re: Indexing date data for facet search

2016-04-12 Thread Erick Erickson
It may not have made it into the schemas, so just try adding this to
your schema file:



As far as adding the 00Z, I would to be safe.

Best,
Erick

On Tue, Apr 12, 2016 at 6:57 AM, Steven White  wrote:
> Hi Erick,
>
> In Solr's schema.xml, I cannot find  for "dateRange", not even
> on Apache Solr Reference guide [1].  What am I missing?  I'm on Solr 5.2.1.
>
> Also, since my date data doesn't have seconds, can I leave ".ssZ" out or
> must I supply it with "00"?
>
> Thanks
>
> Steve
>
> [1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>
> On Mon, Apr 11, 2016 at 9:19 PM, Erick Erickson 
> wrote:
>
>> You have two options for dates in this scenario, "tdate" or "dateRange".
>> Probably in this case use dateRange, it should be more time and
>> space efficient. Here's some background:
>>
>> https://lucidworks.com/blog/2016/02/13/solrs-daterangefield-perform/
>>
>> Date types should be indexed as fully specified strings, as
>>
>> -MM-DDThh:mm:ssZ
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 11, 2016 at 3:03 PM, Steven White 
>> wrote:
>> > Hi everyone,
>> >
>> > I need to index data data into Solr and then use this field for facet
>> > search.  My question is this, the date data in my DB is stored in the
>> > following format "2016-03-29 15:54:35.461":
>> >
>> > 1) What format I should be indexing this date + time stamp into Solr?
>> > 2) What Solr field type I should be using?  Is it "date"?
>> > 3) How do I handle various time zones and locales?
>> > 4) Can I insert multi value data data into the single "date" facet field
>> > and still use this field for facet search?
>> > 5) Based on my need, will all the Date Math per [1] on date facet still
>> > work? I'm confused here because of my need for (3).
>> >
>> > To elaborate on (4) some more.  The need here is this.  In my DB, there
>> are
>> > more than one column with date data.  I will be indexing them all into
>> this
>> > single multi-value Solr field of type Date that I will then use for
>> facet.
>> > Is this possible?
>> >
>> > I guess, this is a two part question, for date facet: a) how to properly
>> > index, and b) how do I properly search.
>> >
>> > As always, any insight is greatly appreciated.
>> >
>> > Steve
>> >
>> > [1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>>


Re: Which line is solr following in terms of a BI Tool?

2016-04-12 Thread Erick Erickson
The unsatisfactory answer is that the have different characteristics.

The analytics contrib does not work in distributed mode. It's not
receiving a lot of love at this point.

The JSON facets are estimations. Generally very close but are not
guaranteed to be 100% accurate. The variance, as I understand it,
is something on the order of < 1% in most cases.

The pivot facets are accurate, but more expensive than the JSON
facets.

And, to make matters worse, the ParllelSQL way of doing some
aggregations is going to give yet another approach.

Best,
Erick

On Tue, Apr 12, 2016 at 7:15 AM, Pablo  wrote:
> Hello,
> I think this topic is important for solr users that are planning to use solr
> as a BI Tool.
> Speaking about facets, nowadays there are three majors way of doing (more or
> less) the same  in solr.
> First, you have the pivot facets, on the other hand you have the Analytics
> component and finally you have the JSON Facet Api.
> So, which line is Solr following? Which of these component is going to be in
> constant development and which one is going to be deprecated sooner.
> In Yonik page, there are some test that shows how JSON Facet Api performs
> better than legacy facets, also the Api was way simpler than the pivot
> facets, so in my case that was enough to base my solution around the JSON
> Api. But I would like to know what are the thoughts of the solr developers.
>
> Thanks!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arguments for and against putting solr.xml into Zookeeper?

2016-04-12 Thread Erick Erickson
upconfig is for _configurations_. Each collection
can use one of the configurations.

Solr.xml is configuration for the entire Solr
instance so it doesn't make sense for it to be
part of upconfig.

There's certainly room for something explicit to
upload it separate from configsets though...

Best,
Erick

On Tue, Apr 12, 2016 at 1:20 PM, John Bickerstaff
 wrote:
> Hello all,
>
> I'm wondering if anyone can comment on arguments for and against putting
> solr.xml into Zookeeper?
>
> I assume one argument for doing so is that I would then have all
> configuration in one place.
>
> I also assume that if it doesn't get included as part of the upconfig
> command, there is likely a reason?
>
> Thanks...


Question regarding empty UUID field

2016-04-12 Thread Susmit Shukla
Hi,

I have configured solr schema to generate unique id for a collection using
UUIDUpdateProcessorFactory

I am seeing a peculiar behavior - if the unique 'id' field is explicitly
set as empty string in the SolrInputDocument, the document gets indexed. I
can see in the solr query console a good uuid value was generated by solr
and assigned to id.
However, sorting does not work if uuid was generated in this way. Also
cursor functionality that depends on unique id sort also does not work.
I guess the correct behavior would be to fail the indexing if user provides
an empty string for a uuid field.

The issues do not happen if I omit the id field from the SolrInputDocument .

SolrInputDocument

solrDoc.addField("id", "");

...

I am using schema similar to below-







id




  id





 
   
 uuid
   



Thanks,
Susmit


Arguments for and against putting solr.xml into Zookeeper?

2016-04-12 Thread John Bickerstaff
Hello all,

I'm wondering if anyone can comment on arguments for and against putting
solr.xml into Zookeeper?

I assume one argument for doing so is that I would then have all
configuration in one place.

I also assume that if it doesn't get included as part of the upconfig
command, there is likely a reason?

Thanks...


Re: boost parent fields BlockJoinQuery

2016-04-12 Thread Mikhail Khludnev
Giving the error message you undercopypasted search query and and omit the
closing bracket.

On Tue, Apr 12, 2016 at 3:30 PM, michael solomon 
wrote:

> Thanks,
> when I'm trying:
> city:"walla walla"^10 {!parent which="is_parent:true"
> score=max}(normal_text:walla)
> I get:
>
> > "msg": "org.apache.solr.search.SyntaxError: Cannot parse
> > '(normal_text:walla': Encountered \"\" at line 1, column 18.\nWas
> > expecting one of:\n ...\n ...\n ...\n\"+\"
> > ...\n\"-\" ...\n ...\n\"(\" ...\n\")\" ...\n
> > \"*\" ...\n\"^\" ...\n ...\n ...\n
> >  ...\n ...\n ...\n
> >  ...\n\"[\" ...\n\"{\" ...\n ...\n
> > \"filter(\" ...\n ...\n"
>
>
> On Tue, Apr 12, 2016 at 1:30 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Hello,
> >
> > It's usually
> > parent_field:"bla bla"^10 {!parent which="is_parent:true"
> > score=max}(child_field:bla)
> > or
> > parent_field:"bla bla"^10 +{!parent which="is_parent:true"
> > score=max}(child_field:bla)
> >
> > there should be no spaces in child clause, otherwise extract it to param
> > and refrer via v=$param
> >
> >
> > On Tue, Apr 12, 2016 at 9:56 AM, michael solomon 
> > wrote:
> >
> > > Hi,
> > > I'm using in BlockJoin Parser Query for return the parent of the
> relevant
> > > child i.e:
> > > {!parent which="is_parent:true" score=max}(child_field:bla)
> > >
> > > It's possible to boost the parent? something like:
> > >
> > > {!parent which="is_parent:true" score=max}(child_field:bla)
> > > parent_field:"bla bla"^10
> > > Thanks,
> > > Michael
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: [More Like This] Query building

2016-04-12 Thread Scott Stults
Hi Alessandro,

It's not uncommon for Solr patches to remain uncommitted for months, even
years. In fact some never get merged. Don't let that discourage you!


k/r,
Scott

On Fri, Mar 11, 2016 at 11:49 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> I start to feel that is not that easy to contribute improvements or small
> fix to Solr ( if they are not super interesting to the mass) .
> I think this one could be a good improvement in the MLT but I would love to
> discuss this with some committer.
> The patch is attached, it is there since months ago...
> Any feedback would be appreciated, I want to contribute, but I need some
> second opinions ...
>
> Cheers
>
> On 11 February 2016 at 13:48, Alessandro Benedetti 
> wrote:
>
> > Hi Guys,
> > is it possible to have any feedback ?
> > Is there any process to speed up bug resolution / discussions ?
> > just want to understand if the patch is not good enough, if I need to
> > improve it or simply no-one took a look ...
> >
> > https://issues.apache.org/jira/browse/LUCENE-6954
> >
> > Cheers
> >
> > On 11 January 2016 at 15:25, Alessandro Benedetti  >
> > wrote:
> >
> >> Hi guys,
> >> the patch seems fine to me.
> >> I didn't spend much more time on the code but I checked the tests and
> the
> >> pre-commit checks.
> >> It seems fine to me.
> >> Let me know ,
> >>
> >> Cheers
> >>
> >> On 31 December 2015 at 18:40, Alessandro Benedetti <
> abenede...@apache.org
> >> > wrote:
> >>
> >>> https://issues.apache.org/jira/browse/LUCENE-6954
> >>>
> >>> First draft patch available, I will check better the tests new year !
> >>>
> >>> On 29 December 2015 at 13:43, Alessandro Benedetti <
> >>> abenede...@apache.org> wrote:
> >>>
>  Sure, I will proceed tomorrow with the Jira and the simple patch +
>  tests.
> 
>  In the meantime let's try to collect some additional feedback.
> 
>  Cheers
> 
>  On 29 December 2015 at 12:43, Anshum Gupta 
>  wrote:
> 
> > Feel free to create a JIRA and put up a patch if you can.
> >
> > On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
> > abenede...@apache.org
> > > wrote:
> >
> > > Hi guys,
> > > While I was exploring the way we build the More Like This query, I
> > > discovered a part I am not convinced of :
> > >
> > >
> > >
> > > Let's see how we build the query :
> > > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
> > >
> > > 1) we extract the terms from the interesting fields, adding them to
> > a map :
> > >
> > > Map termFreqMap = new HashMap<>();
> > >
> > > *( we lose the relation field-> term, we don't know anymore where
> > the term
> > > was coming ! )*
> > >
> > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
> > >
> > > 2) we build the queue that will contain the query terms, at this
> > point we
> > > connect again there terms to some field, but :
> > >
> > > ...
> > >> // go through all the fields and find the largest document
> frequency
> > >> String topField = fieldNames[0];
> > >> int docFreq = 0;
> > >> for (String fieldName : fieldNames) {
> > >>   int freq = ir.docFreq(new Term(fieldName, word));
> > >>   topField = (freq > docFreq) ? fieldName : topField;
> > >>   docFreq = (freq > docFreq) ? freq : docFreq;
> > >> }
> > >> ...
> > >
> > >
> > > We identify the topField as the field with the highest document
> > frequency
> > > for the term t .
> > > Then we build the termQuery :
> > >
> > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq,
> tf));
> > >
> > > In this way we lose a lot of precision.
> > > Not sure why we do that.
> > > I would prefer to keep the relation between terms and fields.
> > > The MLT query can improve a lot the quality.
> > > If i run the MLT on 2 fields : *description* and *facilities* for
> > example.
> > > It is likely I want to find documents with similar terms in the
> > > description and similar terms in the facilities, without mixing up
> > the
> > > things and loosing the semantic of the terms.
> > >
> > > Let me know your opinion,
> > >
> > > Cheers
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
> 
> 
> 
>  --
>  --
> 
>  Benedetti Alessandro
>  Visiting card : 

Re: Solr slave is doing full replication (entire index) of index after master restart

2016-04-12 Thread Lior Sapir
So what do you say:
Is it a problem in my environment + confs
OR
That's how the replication is working

 (if a slave fails to locate the master when polling then next time the
master is available it will replicate the entire index even if no document
was added to the master and no optimization was performed)

?



On Sat, Apr 9, 2016 at 9:24 PM, Lior Sapir  wrote:

> Thanks for the reply.
>
> 00:00:60 - Is valid
> But I tried 00:01:00 anyway.
> I also checked the clocks and they are synced:
> ntpdate -q solr01-isrl01
>
> server 192.168.103.112, stratum 11, offset 0.003648, delay 0.02589
>  9 Apr 18:09:20 ntpdate[23921]: adjust time server 192.168.103.112 offset
> 0.003648 sec
>
> So these are not the reasons for the full replication. In addition the
> replication is working perfectly until I restart the master
> Regarding the issue of 60 seconds being too fast, I can consider raising
> it to 5 minutes even though my configuration is based on the data-driven
> example contained in the solr package.
>
> But still, this will just make the probability of full replication lower.
> I don't want to rely on that in production. if I have any network issue or
> the master server will restart from any reason. All of his slaves will
> start replicating when the master will be available again and the service
> will be harmed dramatically or even be down.
>
> Anyway,
>
> Can anyone with solr version 5.3.1 or above test this scenario? I want to
> understand if its something specific in my environment or that's just how
> the replication is behaving.
>
> I added another step to be more clear:
>
> 1. Setup a master
> 2. Setup a slave in a different server
> 3. The slave replicated the master index
> 4. From now on not even a single document is added. No optimization or
> what so ever is done on the master or slave
> 5. I stop the master
> 6. wait for the slave to replicate or initiate a replication via the UI or
> script
> 7. I start the master
> 8. I see the slave is replicating/copying the entire index
>
>
> Lior.
>
>
>
>
> On Sat, Apr 9, 2016 at 6:15 PM, Walter Underwood 
> wrote:
>
>> I’m not sure this is a legal polling interval:
>>
>> 00:00:60
>>
>> Try:
>>
>> 00:01:00
>>
>> Also, polling every minute is very fast. Try a longer period.
>>
>> Check the clocks on the two systems. If the clocks are not synchronized,
>> that could cause problem.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Apr 9, 2016, at 8:10 AM, Lior Sapir  wrote:
>> >
>> > Anyone can tell me what was I doing wrong ?
>> > Is that the expected behavior (slave replicate entire index if on
>> previous replication attempt the master was not available ) ?
>> >
>> >
>> >
>> >
>> > On Thu, Apr 7, 2016 at 9:12 PM, Lior Sapir > > wrote:
>> > Thanks for the reply.
>> >
>> > I easily re produced it in my "sandbox" env.  Steps to re produce
>> > 1. Setup a master
>> > 2. Setup a slave in a different server
>> > 3. The slave replicated the master index
>> > 4. From now on not even a single document is added. No optimization or
>> what so ever is done on the master or slave
>> > 5. I stop the master
>> > 6. I start the master
>> > 7. I see the slave is replicating/copying the entire index
>> >
>> > This is exactly what happened  in production when I restarted the
>> master.
>> >
>> > I attached the configurations files.
>> >
>> > Replication section:
>> >
>> > Master:
>> >
>> > 
>> >   
>> > commit
>> >   
>> > 
>> >
>> > Slave:
>> >
>> >   
>> > 
>> > 
>> http://solr01-isrl01.flr.local:8983/solr/replication-master/replication <
>> http://solr01-isrl01.flr.local:8983/solr/replication-master/replication
>> >
>> > 00:00:60
>> > 
>> > 
>> >
>> >
>> >
>> > Best,
>> > Lior
>> >
>> > On Thu, Apr 7, 2016 at 6:56 PM, Erick Erickson > > wrote:
>> > What does your configuration file look like for the replication
>> > handler? Does this happen whenever you restart a slave even if
>> > _nothing_ has changed on the master?
>> >
>> > And this will certainly happen if you're optimizing the master before
>> > you restart, although that doesn't sound likely.
>> >
>> > Best,
>> > Erick
>> >
>> > On Thu, Apr 7, 2016 at 6:54 AM, Lior Sapir > > wrote:
>> > > Solr slave is doing full replication (entire index) of index after
>> master
>> > > restart
>> > > Using solr 5.3.1 not cloud (using maser slave architecture ) I see
>> that
>> > > slave replicates entire index after master restart even though the
>> index
>> > > version is the same
>> > >
>> > > This is bad for me since the slave which is doing serving replicates
>> 80gb
>> > > if I restart the server and our service is down
>> > >
>> > > I attached a file with 

Re: SolrCloud Config file

2016-04-12 Thread Sam Xia
Thanks you Shawn and Erick. It turns out there is a get-pip.py file in the 
configuration folder (the config file was copied from somewhere), which 
caused the mis-behave. After get-pip.py is removed, everything worked as 
expected. Thanks Again.







On 4/11/16, 8:40 PM, "Erick Erickson"  wrote:

>Do note by the way that as of Solr 5.5, the bin/solr script has an
>option for uploading and downloading configsets. Try typing
>
>bin/solr zk -help
>
>Best,
>Erick
>
>On Mon, Apr 11, 2016 at 6:30 PM, Shawn Heisey  wrote:
>> On 4/11/2016 6:40 PM, Sam Xia wrote:
>>> Where is the path of topic collection zookeeper config file? Here is 
>>>from
>>> wiki (see below). But I was not able to find configs/topic anywhere in 
>>>the
>>> installation folder.
>>
>> The /configs/topic path is *inside the zookeeper database*.  It is not a
>> path on the filesystem at all.  Zookeeper is a separate Apache project
>> that Solr happens to use when running in cloud mode.
>>
>> http://zookeeper.apache.org/
>>
>>> "The create command will upload a copy of the 
>>>data_driven_schema_configs
>>> configuration directory to ZooKeeper under /configs/mycollection. 
>>>Refer to
>>> the Solr Start Script Reference
>>> 
>>>>>ren
>>> ce> page for more details about the create command for creating
>>> collections.”
>>>
>>> Here is the command that I run and verify zookeeper is in port 8983. 
>>>BTW,
>>> I did not modify anything and the Solr is a clean install so I do not 
>>>know
>>> why Python is used in the script. The error looks to me that the config
>>> folder was not created at first command. So when you try to update it, 
>>>it
>>> gets an IO error.
>>>
>>> ./solr status
>>>
>>> Found 2 Solr nodes:
>>>
>>> Solr process 30976 running on port 7574
>>> {
>>>   "solr_home":"/locm/solr-6.0.0/example/cloud/node2/solr",
>>>   "version":"6.0.0 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 - nknize -
>>> 2016-04-01 14:41:49",
>>>   "startTime":"2016-04-11T23:42:59.513Z",
>>>   "uptime":"0 days, 0 hours, 51 minutes, 43 seconds",
>>>   "memory":"93.2 MB (%19) of 490.7 MB",
>>>   "cloud":{
>>> "ZooKeeper":"localhost:9983",
>>> "liveNodes":"2",
>>> "collections":"2"}}
>>>
>>>
>>> Solr process 30791 running on port 8983
>>> {
>>>   "solr_home":"/locm/solr-6.0.0/example/cloud/node1/solr",
>>>   "version":"6.0.0 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 - nknize -
>>> 2016-04-01 14:41:49",
>>>   "startTime":"2016-04-11T23:42:54.041Z",
>>>   "uptime":"0 days, 0 hours, 51 minutes, 49 seconds",
>>>   "memory":"78.9 MB (%16.1) of 490.7 MB",
>>>   "cloud":{
>>> "ZooKeeper":"localhost:9983",
>>> "liveNodes":"2",
>>> "collections":"2"}}
>>
>> 8983 is a *Solr* port.  The default embedded zookeeper port is the first
>> Solr port in the cloud example plus 1000, so it usually ends up being 
>>9983.
>>
>>> If you run the following steps, you would be able to reproduce the 
>>>issue
>>> every time.
>>>
>>> Step 1) bin/solr start -e cloud -noprompt
>>> Step 2) bin/solr create -c topic -d sample_techproducts_configs
>>> Step 3) ./zkcli.sh -cmd upconfig -zkhost localhost:9983 -confname topic
>>> -solrhome /locm/solr-5.5.0/ -confdir
>>> 
>>>/locm/solr-5.5.0/server/solr/configsets/sample_techproducts_configs/conf
>>
>> The "-solrhome" option is not something you need.  I have no idea what
>> it will do, but it is not one of the options for upconfig.
>>
>> I tried this (on Windows) and I'm getting a different problem on the
>> upconfig command trying to connect to zookeeper:
>>
>> https://www.dropbox.com/s/c65zmkhd0le6mzv/upconfig-error.png?dl=0
>>
>> Trying again on Linux, I had zero problems with the commands you used,
>> changing only minor details for the upconfig command (things are in a
>> different place, and I didn't use the unnecessary -solrhome option):
>>
>> https://www.dropbox.com/s/edoa07anmkkep0l/xia-recreate1.png?dl=0
>> https://www.dropbox.com/s/ad5ukuvfvlgwq0z/xia-recreate2.png?dl=0
>> https://www.dropbox.com/s/ay1u3jjuwy5t52s/xia-recreate3.png?dl=0
>>
>> Your stated commands indicate 5.5.0, but the JSON status information
>> above and the paths they contain indicate that it is 6.0.0 that is
>> responding.  I will have to try 6.0.0 later.
>>
>> If nothing has changed, then "get-pip.py" would not be there.  There
>> isn't a configset named "topic_configs_ori" included with Solr, not even
>> in the 6.0.0 version.  This came from somewhere besides the Solr 
>>website.
>>
>> Thanks,
>> Shawn
>>


Re: Solrj API for Managed Resources

2016-04-12 Thread iambest
Thanks for your reply, sorry if I wasn't clear. But, I am looking for a solrj
client API to make my life easier when dealing with Managed Resources. solrj
has a client for schema API (SchemaRequest), but, it doesn't handle Managed
Resources.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-API-for-Managed-Resources-tp4269454p4269612.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr cloud newSearcher warmup

2016-04-12 Thread Simone Sabba
Hi, 

i just configured cache autowarm into solrconfig.xml using newSearcher
listener (as described here: 
https://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners

 
).
It works fine with solr 5.4.1 single node instance but with solrcloud 5.4.1
every time i run delta-import i get those two errors:

ERROR 1:

Logger: SolrRequestInfo
Message: Previous SolrRequestInfo was not closed!
req=waitSearcher=true=http://172.16.180.73:8983/solr/prices_shard1_replica2/=FROMLEADER=true=true=javabin=false_end_point=true=2=false

ERROR 2:

Logger: SolrRequestInfo
Message: prev == info : false

Can anybody help me out to figure out what is wrong with my configuration?

Here there is my  solrconfig.xml
  

Thank you
Simone




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-newSearcher-warmup-tp4269608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 6 - AbstractSolrTestCase Error Unable to build KeyStore from file: null

2016-04-12 Thread Joe Lawson
Adding @SolrTestCaseJ4.SuppressSSL to my abstract class extended the
AbstractSolrTestCase worked. Thanks!

https://github.com/healthonnet/hon-lucene-synonyms/blob/cedb3cbb56b01cd6480c257c04999cdce433f53e/src/test/java/org/apache/solr/search/HonLuceneSynonymTestCase.java#L21-L21

On Mon, Apr 11, 2016 at 8:45 PM, Chris Hostetter 
wrote:

>
> https://issues.apache.org/jira/browse/SOLR-8970
> https://issues.apache.org/jira/browse/SOLR-8971
>
> : Date: Mon, 11 Apr 2016 20:35:22 -0400
> : From: Joe Lawson 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Re: Solr 6 - AbstractSolrTestCase Error Unable to build
> KeyStore from
> :  file: null
> :
> : Thanks for the insight. I figured that it was something like that and
> : perhaps I has thread contention on a resource that wasn't really thread
> : safe.
> :
> : I'll give your suggestions a shot tomorrow.
> :
> : Regards,
> :
> : Joe Lawson
> : On Apr 11, 2016 8:24 PM, "Chris Hostetter" 
> wrote:
> :
> : >
> : > : I'm upgrading a plugin and use the AbstractSolrTestCase for tests. My
> : > tests
> : > : work fine in 5.X but when I upgraded to 6.X the tests sometimes
> throw an
> : > : error during initialization. Basically it says,
> : > : "org.apache.solr.common.SolrException: Error instantiating
> : > : shardHandlerFactory class
> : > : [org.apache.solr.handler.component.HttpShardHandlerFactory]: Unable
> to
> : > : build KeyStore from file: null"
> : >
> : > Ugh.  and of course there are no other details to troubleshoot that
> : > because the stupid error handling doesn't wrap the original exception
> --
> : > it just throws it away.
> : >
> : > I'm pretty sure the problem you are seeing (unfortunately manifested in
> : > a really confusing way) is that SolrTestCaseJ4 (and
> AbstractSolrTestCase
> : > which subclasses it) has randomized the use of SSL for a while, but at
> : > some point it also started randomizing the use of client auth -- but
> this
> : > randomization happens very infrequently.
> : >
> : > (for details, check out the SSLTestConfig and it's usage in
> : > SolrTestCaseJ4)
> : >
> : > The bottom line is, in order for the (randomized) clientAuth stuff to
> : > work, SolrTestCaseJ4 assumes it can find an
> : > "../etc/test/solrtest.keystore" realtive to ExternalPaths.SERVER_HOME.
> : >
> : > If you don't have that in your test setup, bad things happen.
> : >
> : > I believe the quickest way for you to resolve this failure in your own
> : > usage of AbstractSolrTestCase is to just add the @SupressSSL
> annotation to
> : > your tests -- assuming you don't care about randomly testing your
> plugin
> : > with SSL authentication (for 99.999% of solr plugins, wether solr is
> being
> : > used over http or https shouldn't matter for test purposes)
> : >
> : > If you do want to include randomized SSL testing, then you need to make
> : > sure your that when/how you run your tests, ExternalPaths.SERVER_HOME
> : > resolves to the correct place, and "../etc/test/solrtest.keystore"
> : > resolves to a real file solr can use as the keystore.
> : >
> : > I'll file some Jiras to try and improve the error handline in these
> : > situations.
> : >
> : >
> : >
> : > -Hoss
> : > http://www.lucidworks.com/
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Shard ranges seem incorrect

2016-04-12 Thread Shawn Heisey
On 4/12/2016 5:49 AM, Markus Jelsma wrote:
> Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we 
> noticed something odd, the hashing ranges don't make sense (full state.json 
> below):
> shard1 Range: 8000-d554
> shard2 Range: d555-2aa9
> shard3 Range: 2aaa-7fff
>
> We've also noticed ranges not going from 0 to  for a 5.5 create 
> single shard collection. Anothercollection created on an older (unknown) 
> release has correct shard ranges. Any idea what's going on?

The hex value 8000 in a Java integer is equivalent to -2147483648
(Integer.MIN_VALUE) -- numeric types in Java are signed.  The hex value
7fff is 2147483647 (Integer.MAX_VALUE).

Thus the hash range goes from 8000 to 7fff, with zero in the
middle.  This is not a new development in 6.0.0 -- it's been this way
for the entire life of SolrCloud (since 4.0-ALPHA).

Thanks,
Shawn



Re: Solr Sharding Strategy

2016-04-12 Thread Shawn Heisey
On 4/11/2016 6:31 AM, Bhaumik Joshi wrote:
> We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> sec) and Query-heavy (100 queries per sec) scenario.
>
> *Index stats: *10 million documents and 16 GB index size
>
>  
>
> Which sharding strategy is best suited in above scenario?
>
> Please share reference resources which states detailed comparison of
> single shard over multi shard if any.
>
>  
>
> Meanwhile we did some tests with SolrMeter (Standalone java tool for
> stress tests with Solr) for single shard and two shards.
>
> *Index stats of test solr cloud: *0.7 million documents and 1 GB index
> size.
>
> As observed in test average query time with 2 shards is much higher
> than single shard.
>

On the same hardware, multiple shards will usually be slower than one
shard, especially under a high load.  Sharding can give good results
with *more* hardware, providing more CPU and memory resources.  When the
query load is high, there should only be only one core (shard replica)
per server, and Solr works best when it is running on bare metal, not
virtualized.

Handling 100 queries per second will require multiple copies of your
index on separate hardware.  This is a fairly high query load.  There
are installations handling much higher loads, of course.  Those
installations have a LOT of replicas and some way to balance load across
them.

For 10 million documents and 16GB of index, I'm not sure that I would
shard at all, just make sure that each machine has plenty of memory --
probably somewhere in the neighborhood of 24GB to 32GB.  That assumes
that Solr is the only thing running on that server, and that if it's
virtualized, making sure that the physical server's memory is not
oversubscribed.

Regarding your specific numbers:

The low queries per second may be caused by one or more of these
problems, or perhaps something I haven't thought of:  1) your queries
are particularly heavy.  2) updates are interfering by tying up scarce
resources.  3) you don't have enough memory in the machine.

How many documents are in each update request that you are sending?  In
another thread on the list, you have stated that you have a 1 second
maxTime on autoSoftCommit.  This is *way* too low, and a *major* source
of performance issues.  Very few people actually need that level of
latency -- a maxTime measured in minutes may be fast enough, and is much
friendlier for performance.

Thanks,
Shawn



Which line is solr following in terms of a BI Tool?

2016-04-12 Thread Pablo
Hello, 
I think this topic is important for solr users that are planning to use solr
as a BI Tool.
Speaking about facets, nowadays there are three majors way of doing (more or
less) the same  in solr. 
First, you have the pivot facets, on the other hand you have the Analytics
component and finally you have the JSON Facet Api.
So, which line is Solr following? Which of these component is going to be in
constant development and which one is going to be deprecated sooner. 
In Yonik page, there are some test that shows how JSON Facet Api performs
better than legacy facets, also the Api was way simpler than the pivot
facets, so in my case that was enough to base my solution around the JSON
Api. But I would like to know what are the thoughts of the solr developers.

Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing date data for facet search

2016-04-12 Thread Steven White
Hi Erick,

In Solr's schema.xml, I cannot find  for "dateRange", not even
on Apache Solr Reference guide [1].  What am I missing?  I'm on Solr 5.2.1.

Also, since my date data doesn't have seconds, can I leave ".ssZ" out or
must I supply it with "00"?

Thanks

Steve

[1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

On Mon, Apr 11, 2016 at 9:19 PM, Erick Erickson 
wrote:

> You have two options for dates in this scenario, "tdate" or "dateRange".
> Probably in this case use dateRange, it should be more time and
> space efficient. Here's some background:
>
> https://lucidworks.com/blog/2016/02/13/solrs-daterangefield-perform/
>
> Date types should be indexed as fully specified strings, as
>
> -MM-DDThh:mm:ssZ
>
> Best,
> Erick
>
> On Mon, Apr 11, 2016 at 3:03 PM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I need to index data data into Solr and then use this field for facet
> > search.  My question is this, the date data in my DB is stored in the
> > following format "2016-03-29 15:54:35.461":
> >
> > 1) What format I should be indexing this date + time stamp into Solr?
> > 2) What Solr field type I should be using?  Is it "date"?
> > 3) How do I handle various time zones and locales?
> > 4) Can I insert multi value data data into the single "date" facet field
> > and still use this field for facet search?
> > 5) Based on my need, will all the Date Math per [1] on date facet still
> > work? I'm confused here because of my need for (3).
> >
> > To elaborate on (4) some more.  The need here is this.  In my DB, there
> are
> > more than one column with date data.  I will be indexing them all into
> this
> > single multi-value Solr field of type Date that I will then use for
> facet.
> > Is this possible?
> >
> > I guess, this is a two part question, for date facet: a) how to properly
> > index, and b) how do I properly search.
> >
> > As always, any insight is greatly appreciated.
> >
> > Steve
> >
> > [1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>


JSON facet raw HLL as result

2016-04-12 Thread sudsport s
is it possible to get raw HLL object as result of json facet instead of
getting cardinality?

I tried to build custom json facet to return raw value as external Jar but
attempt was unsuccessful as JSON facet has some classes with default scope
and I get IllegalAccessException (RunTimeException) if I try to use those
classes in external Jar


Re: Pivot facets - distributed search - request

2016-04-12 Thread Yonik Seeley
On Tue, Apr 12, 2016 at 8:47 AM, Pablo  wrote:
> Hi,
> Is there any way of requesting limit 10 order by a stat within facet pivot?

No.

> I know that the "json facet" component can do this and it has a very
> comphrehensive api, but it has a problem of consistency (refinement) when
> querying across multiple shards.

I know a lot of people have been looking for refinements, I'll try to
get to this relatively soon!
In the meantime, one can probably minimize the error somewhat by
requesting a larger limit.

-Yonik


Bad Request

2016-04-12 Thread Robert Brown

Hi,

My collection had issues earlier, 1 shard showed as Down, the other only 
replica was Gone.


Both were actually still up and running, no disk or CPU issues.

This occurred during updates.

The server since recovered after a reboot.

Upon trying to update the index again, I'm now getting constant Bad 
Requests.


Does anyone know what the issue could be, and/or how to resolve it?

org.apache.solr.common.SolrException: Bad Request

request: 
http://hostname:8983/solr/de_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2Fhostname%3A8983%2Fsolr%2Fde_shard2_replica2%2F=javabin=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:287)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:160)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

I also occasionaly get "Exception writing document id 
be-de-109513-307573357 to the index; possible analysis error.' which was 
the first bunch of errors I saw.


Thanks,
Rob




Re: Cache problem

2016-04-12 Thread Reth RM
This has answers about why giving enough memory to OS is important:
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
And as per solr admin dashboard, the os cache (physical memory is almost
utilized where as memory allocated to jvm is not used) so its best to lower
jvm memory.
Why set xms=xmx? this link pretty much answers it:
http://stackoverflow.com/questions/16087153/what-happens-when-we-set-xmx-and-xms-equal-size



On Tue, Apr 12, 2016 at 3:05 PM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Thank you both, Bill and Reth!
>
> Here is my current options from my command to launch java:
> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
> -XX:MaxPermSize=20480m [...]*
>
> So should I do *-Xms20480m -Xmx20480m* ?
> Why? What would it change?
>
> Reminder: the size of my main index is 46Gb... (80Gb all together)
>
>
>
> BTW: what's the difference between dark and light grey in the JVM
> representation? (real/virtual memory?)
>
>
> NOTE: I have only tomcat running on this server (and this is my live
> website - *i.e.: quite critical*).
>
> So if document cache is using the OS cache, this might be the problem,
> right?
> (because it seems to cache every field ==> so all the data returned by the
> query)
>
> kr,
> Bast
>
>
> On 12/04/2016 08:19, Reth RM wrote:
>
> As per solr admin dashboard's memory report, solr jvm is not using memory
> more than 20 gb, where as physical memory is almost full.  I'd set
> xms=xmx=16 gb and let operating system use rest. And regarding caches:
>  filter cache hit ratio looks good so it should not be concern. And afaik,
> document cache actually uses OS cache. Overall, I'd reduce memory allocated
> to jvm as said above and try.
>
>
>
>
> On Mon, Apr 11, 2016 at 7:40 PM,   
> wrote:
>
>
> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
>
> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>
>   wrote:
>
> Dear Solr experts :),
>
> I read this very interesting post 'Understanding and tuning your Solr
>
> caches' !
>
> This is the only good document that I was able to find after searching
>
> for 1 day!
>
> I was using Solr for 2 years without knowing in details what it was
>
> caching...(because I did not need to understand it before).
>
> I had to take a look since I needed to restart (regularly) my tomcat in
>
> order to improve performances...
>
> But I now have 2 questions:
> 1) How can I know how much RAM is my solr using in real (especially for
>
> caching)?
>
> 2) Could you have a quick look into the following images and tell me if
>
> I'm doing something wrong?
>
> Note: my index contains 66 millions of articles with several text fields
>
> stored.
>
> 
>
> My solr contains several cores (all together are ~80Gb big), but almost
>
> only the one below is used.
>
> I have the feeling that a lot of data is always stored in RAM...and
>
> getting bigger and bigger all the time...
>
> 
> 
>
> (after restart)
> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> 
> [...] after a few minutes
> 
>
> Here are some images, that can show you some stats about my Solr
>
> performances...
>
> 
> 
> 
>
> 
>
> Kind regards,
> Bastien Latard
>
>
>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>


Pivot facets - distributed search - request

2016-04-12 Thread Pablo
Hi,
Is there any way of requesting limit 10 order by a stat within facet pivot?
I know that the "json facet" component can do this and it has a very
comphrehensive api, but it has a problem of consistency (refinement) when
querying across multiple shards. 
And given that pivot facets supports distributed searching, I tried to make
a similar request, but couldn't find how to do it.
Thanks in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pivot-facets-distributed-search-request-tp4269570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost parent fields BlockJoinQuery

2016-04-12 Thread michael solomon
Thanks,
when I'm trying:
city:"walla walla"^10 {!parent which="is_parent:true"
score=max}(normal_text:walla)
I get:

> "msg": "org.apache.solr.search.SyntaxError: Cannot parse
> '(normal_text:walla': Encountered \"\" at line 1, column 18.\nWas
> expecting one of:\n ...\n ...\n ...\n\"+\"
> ...\n\"-\" ...\n ...\n\"(\" ...\n\")\" ...\n
> \"*\" ...\n\"^\" ...\n ...\n ...\n
>  ...\n ...\n ...\n
>  ...\n\"[\" ...\n\"{\" ...\n ...\n
> \"filter(\" ...\n ...\n"


On Tue, Apr 12, 2016 at 1:30 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> It's usually
> parent_field:"bla bla"^10 {!parent which="is_parent:true"
> score=max}(child_field:bla)
> or
> parent_field:"bla bla"^10 +{!parent which="is_parent:true"
> score=max}(child_field:bla)
>
> there should be no spaces in child clause, otherwise extract it to param
> and refrer via v=$param
>
>
> On Tue, Apr 12, 2016 at 9:56 AM, michael solomon 
> wrote:
>
> > Hi,
> > I'm using in BlockJoin Parser Query for return the parent of the relevant
> > child i.e:
> > {!parent which="is_parent:true" score=max}(child_field:bla)
> >
> > It's possible to boost the parent? something like:
> >
> > {!parent which="is_parent:true" score=max}(child_field:bla)
> > parent_field:"bla bla"^10
> > Thanks,
> > Michael
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Curious case of DataSource.getConnection()

2016-04-12 Thread Shalin Shekhar Mangar
What is this Solr scheduler class? Is that your own custom code? None of
the information or code snippets in your email related to a Solr problem. I
guess you are looking to troubleshoot a DB connectivity problem and it
would be better to ask this on stackoverflow.

On Tue, Apr 12, 2016 at 4:01 PM, Srinivas Kashyap <
srini...@tradestonesoftware.com> wrote:

> Hi,
>
> In a Solr scheduler class which runs every 'n' interval of seconds, i'm
> polling a database table to do some custom job.
>
> I'm getting the connection to database, through context file as below:
>
> try {
>  Context initContext = new InitialContext();
>  DataSource ds = null;
>  if ("tomcat".equals(p.getProperty("server.type")))
>  {
>Context webContext =
> (Context)initContext.lookup("java:/comp/env");
>ds = (DataSource)
> webContext.lookup("");
>  }
>  else if ("ws".equals(p.getProperty("server.type")))
> //websphere
>  {
>ds = (DataSource)
> initContext.lookup("");
>  }
> }
>
>  ds.getConnection();
>
>
> But the, connection is not being established. No Exception/error is being
> thrown in console.
>
> Context xml has been double checked to see all the datasource properties
> and attributes are set proper.
>
> Any reason, i'm not able to establish database connection?
>
> P.S: Normal IMPORT process is running unaffected i.e Data is being indexed
> into solr with the same datasource configuration in context xml.
>
>
> Thanks and Regards,
> Srinivas Kashyap
> Senior Software Engineer
> "GURUDAS HERITAGE"
> 'Block A' , No 59/2, 2nd Floor, 100 Feet Ring Road,
> Kadirenahalli, Padmanabhanagar
> Banashankari 2nd Stage,
> Bangalore-560070
> P:  973-986-6105
> Bamboo Rose
> The only B2B marketplace powered by proven trade engines.
> www.BambooRose.com
>
> Make Retail. Fun. Connected. Easier. Smarter. Together. Better.
>
>
> DISCLAIMER:
> E-mails and attachments from TradeStone Software, Inc. are confidential.
> If you are not the intended recipient, please notify the sender
> immediately by
> replying to the e-mail, and then delete it without making copies or using
> it
> in any way. No representation is made that this email or any attachments
> are
> free of viruses. Virus scanning is recommended and is the responsibility of
> the recipient.




-- 
Regards,
Shalin Shekhar Mangar.


Re: EmbeddedSolr for unit tests in Solr 6

2016-04-12 Thread Shalin Shekhar Mangar
Rohana, as I said earlier, the MiniSolrCloudCluster is specifically made
for your use-case i.e. where you want to quickly setup a SolrCloud cluster
in your own application for testing. It is available in
the solr-test-framework artifact.

On Tue, Apr 12, 2016 at 4:31 PM, Rohana Rajapakse <
rohana.rajapa...@gossinteractive.com> wrote:

> Please note that I am not writing unit tests for testing classes in Solr.
> I need a temporary Solr index to test classes in my own application that
> needs a Solr index. I would like to use classes that are available in
> solr-core and solr-solrj jars. I could do this easily in solr-4.x versions
> using EmbeddedSolrServer. I prefer not to extend SolrTestCaseJ4 class. Also
> MiniSolrCloudCluster is not available in solr-core or solr-solrj jar.
>
> What is the best way of doing this in Solr-6.x / Solr-7.0  ?
>
> -Original Message-
> From: Joe Lawson [mailto:jlaw...@opensourceconnections.com]
> Sent: 11 April 2016 17:31
> To: solr-user@lucene.apache.org
> Subject: Re: EmbeddedSolr for unit tests in Solr 6
>
> Check for example tests here too:
>
> https://github.com/apache/lucene-solr/tree/master/solr/core/src/test/org/apache/solr
>
> On Mon, Apr 11, 2016 at 12:24 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Please use MiniSolrCloudCluster instead of EmbeddedSolrServer for
> > unit/integration tests.
> >
> > On Mon, Apr 11, 2016 at 2:26 PM, Rohana Rajapakse <
> > rohana.rajapa...@gossinteractive.com> wrote:
> >
> > > Thanks Shawn,
> > >
> > > I am now pointing solrHomeFolder to
> > > lucene-solr-master\solr\server\solr
> > > which contains the correct solr.xml file.
> > > Tried the following two ways to create an EmbeddedSolrServer:
> > >
> > >
> > > 1. CoreContainer corecon =
> > > CoreContainer.createAndLoad(Paths.get(solrHomeFolder));
> > >corecon.load();
> > >SolrClient svr = new EmbeddedSolrServer(corecon,
> > > corename);
> > >
> > >
> > > 2.   SolrClient svr = new EmbeddedSolrServer(Paths.get(solrHomeFolder),
> > > corename);
> > >
> > >
> > > They both throws the same exception  (java.lang.NoClassDefFoundError:
> > > Could not initialize class org.apache.solr.servlet.SolrRequestParsers).
> > > org.apache.solr.servlet.SolrRequestParsers class is present in the
> > > solr-core-7.0.0-SNAPSHOT.jar and this jar is present in the
> > > WEB-INF\lib folder (in solr server) and also included as a
> > > dependency jar in the pom.xml of the test project.
> > >
> > > Here is the full stack trace of the exception:
> > >
> > > java.lang.NoClassDefFoundError: Could not initialize class
> > > org.apache.solr.servlet.SolrRequestParsers
> > > at
> > >
> > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(Embedd
> > edSolrServer.java:112)
> > > at
> > >
> > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(Embedd
> > edSolrServer.java:70)
> > > at
> > >
> > com.gossinteractive.solr.DocPhraseUpdateProcessorTest.createEmbeddedSo
> > lrServer(DocPhraseUpdateProcessorTest.java:141)
> > > at
> > >
> > com.gossinteractive.solr.DocPhraseUpdateProcessorTest.setUp(DocPhraseU
> > pdateProcessorTest.java:99)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> > ava:62)
> > > at
> > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> > orImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > at
> > >
> > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkM
> > ethod.java:44)
> > > at
> > >
> > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCall
> > able.java:15)
> > > at
> > >
> > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMet
> > hod.java:41)
> > > at
> > >
> > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.j
> > ava:27)
> > > at
> > >
> > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.jav
> > a:31)
> > > at
> > >
> > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunn
> > er.java:73)
> > > at
> > >
> > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunn
> > er.java:46)
> > > at
> > > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> > > at
> > org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> > > at
> > org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> > > at
> > >
> > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.j
> > ava:28)
> > > at
> > >
> > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.jav
> > a:31)
> > > at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> > > at
> > >
> > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4T

SOLR Upgrade 3.x to 4.10

2016-04-12 Thread abhi Abhishek
Hi All,
I have SOLR 3.6 running currently, i am planning to upgrade this to
SOLR 4.10. Below were the thoughts we could come up with.

1. in place upgrade
   I would be making the SOLR 4.10 slave of 3.6 and copy the indexes,
and optimize this index.

  will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene
4.10) change the index structure to Lucene 4.10? if not what would be the
version?
  if i enable docvalues on certain fields before issuing optimize, will
it be able to incorporate ( create .dvd & .dvm files ) that in the newly
created index?


2. Re-Index the data

Seeking advice for minimum time to upgrade this with most features of SOLR
4.10

Thanks in Advance

Best Regards,
Abhishek


Shard ranges seem incorrect

2016-04-12 Thread Markus Jelsma
Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and we 
noticed something odd, the hashing ranges don't make sense (full state.json 
below):
shard1 Range: 8000-d554
shard2 Range: d555-2aa9
shard3 Range: 2aaa-7fff

We've also noticed ranges not going from 0 to  for a 5.5 create single 
shard collection. Another collection created on an older (unknown) release has 
correct shard ranges. Any idea what's going on?
Thanks,
Markus

{"logs":{
"replicationFactor":"3",
"router":{"name":"compositeId"},
"maxShardsPerNode":"9",
"autoAddReplicas":"false",
"shards":{
  "shard1":{
"range":"8000-d554",
"state":"active",
"replicas":{
  "core_node3":{
"core":"logs_shard1_replica3",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"},
  "core_node4":{
"core":"logs_shard1_replica1",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active",
"leader":"true"},
  "core_node8":{
"core":"logs_shard1_replica2",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"}}},
  "shard2":{
"range":"d555-2aa9",
"state":"active",
"replicas":{
  "core_node1":{
"core":"logs_shard2_replica1",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active",
"leader":"true"},
  "core_node2":{
"core":"logs_shard2_replica2",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"},
  "core_node9":{
"core":"logs_shard2_replica3",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"}}},
  "shard3":{
"range":"2aaa-7fff",
"state":"active",
"replicas":{
  "core_node5":{
"core":"logs_shard3_replica1",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active",
"leader":"true"},
  "core_node6":{
"core":"logs_shard3_replica2",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"},
  "core_node7":{
"core":"logs_shard3_replica3",
"base_url":"http://127.0.1.1:8983/solr;,
"node_name":"127.0.1.1:8983_solr",
"state":"active"}}






RE: EmbeddedSolr for unit tests in Solr 6

2016-04-12 Thread Rohana Rajapakse
Please note that I am not writing unit tests for testing classes in Solr. I 
need a temporary Solr index to test classes in my own application that needs a 
Solr index. I would like to use classes that are available in solr-core and 
solr-solrj jars. I could do this easily in solr-4.x versions using 
EmbeddedSolrServer. I prefer not to extend SolrTestCaseJ4 class. Also 
MiniSolrCloudCluster is not available in solr-core or solr-solrj jar.

What is the best way of doing this in Solr-6.x / Solr-7.0  ?

-Original Message-
From: Joe Lawson [mailto:jlaw...@opensourceconnections.com] 
Sent: 11 April 2016 17:31
To: solr-user@lucene.apache.org
Subject: Re: EmbeddedSolr for unit tests in Solr 6

Check for example tests here too:
https://github.com/apache/lucene-solr/tree/master/solr/core/src/test/org/apache/solr

On Mon, Apr 11, 2016 at 12:24 PM, Shalin Shekhar Mangar < 
shalinman...@gmail.com> wrote:

> Please use MiniSolrCloudCluster instead of EmbeddedSolrServer for 
> unit/integration tests.
>
> On Mon, Apr 11, 2016 at 2:26 PM, Rohana Rajapakse < 
> rohana.rajapa...@gossinteractive.com> wrote:
>
> > Thanks Shawn,
> >
> > I am now pointing solrHomeFolder to  
> > lucene-solr-master\solr\server\solr
> > which contains the correct solr.xml file.
> > Tried the following two ways to create an EmbeddedSolrServer:
> >
> >
> > 1. CoreContainer corecon =
> > CoreContainer.createAndLoad(Paths.get(solrHomeFolder));
> >corecon.load();
> >SolrClient svr = new EmbeddedSolrServer(corecon, 
> > corename);
> >
> >
> > 2.   SolrClient svr = new EmbeddedSolrServer(Paths.get(solrHomeFolder),
> > corename);
> >
> >
> > They both throws the same exception  (java.lang.NoClassDefFoundError:
> > Could not initialize class org.apache.solr.servlet.SolrRequestParsers).
> > org.apache.solr.servlet.SolrRequestParsers class is present in the 
> > solr-core-7.0.0-SNAPSHOT.jar and this jar is present in the 
> > WEB-INF\lib folder (in solr server) and also included as a 
> > dependency jar in the pom.xml of the test project.
> >
> > Here is the full stack trace of the exception:
> >
> > java.lang.NoClassDefFoundError: Could not initialize class 
> > org.apache.solr.servlet.SolrRequestParsers
> > at
> >
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(Embedd
> edSolrServer.java:112)
> > at
> >
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(Embedd
> edSolrServer.java:70)
> > at
> >
> com.gossinteractive.solr.DocPhraseUpdateProcessorTest.createEmbeddedSo
> lrServer(DocPhraseUpdateProcessorTest.java:141)
> > at
> >
> com.gossinteractive.solr.DocPhraseUpdateProcessorTest.setUp(DocPhraseU
> pdateProcessorTest.java:99)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:62)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> orImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:497)
> > at
> >
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkM
> ethod.java:44)
> > at
> >
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCall
> able.java:15)
> > at
> >
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMet
> hod.java:41)
> > at
> >
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.j
> ava:27)
> > at
> >
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.jav
> a:31)
> > at
> >
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunn
> er.java:73)
> > at
> >
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunn
> er.java:46)
> > at
> > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> > at
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> > at
> org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> > at
> >
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.j
> ava:28)
> > at
> >
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.jav
> a:31)
> > at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> > at
> >
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4T
> estReference.java:50)
> > at
> >
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.
> java:38)
> > at
> >
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(Remote
> TestRunner.java:467)
> > at
> >
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(Remote
> TestRunner.java:683)
> > at
> >
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestR
> unner.java:390)
> > at
> >
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTest
> 

Curious case of DataSource.getConnection()

2016-04-12 Thread Srinivas Kashyap
Hi,

In a Solr scheduler class which runs every 'n' interval of seconds, i'm polling 
a database table to do some custom job.

I'm getting the connection to database, through context file as below:

try {
 Context initContext = new InitialContext();
 DataSource ds = null;
 if ("tomcat".equals(p.getProperty("server.type")))
 {
   Context webContext = 
(Context)initContext.lookup("java:/comp/env");
   ds = (DataSource) 
webContext.lookup("");
 }
 else if ("ws".equals(p.getProperty("server.type"))) 
//websphere
 {
   ds = (DataSource) 
initContext.lookup("");
 }
}

 ds.getConnection();


But the, connection is not being established. No Exception/error is being 
thrown in console.

Context xml has been double checked to see all the datasource properties and 
attributes are set proper.

Any reason, i'm not able to establish database connection?

P.S: Normal IMPORT process is running unaffected i.e Data is being indexed into 
solr with the same datasource configuration in context xml.


Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
'Block A' , No 59/2, 2nd Floor, 100 Feet Ring Road,
Kadirenahalli, Padmanabhanagar
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Re: boost parent fields BlockJoinQuery

2016-04-12 Thread Mikhail Khludnev
Hello,

It's usually
parent_field:"bla bla"^10 {!parent which="is_parent:true"
score=max}(child_field:bla)
or
parent_field:"bla bla"^10 +{!parent which="is_parent:true"
score=max}(child_field:bla)

there should be no spaces in child clause, otherwise extract it to param
and refrer via v=$param


On Tue, Apr 12, 2016 at 9:56 AM, michael solomon 
wrote:

> Hi,
> I'm using in BlockJoin Parser Query for return the parent of the relevant
> child i.e:
> {!parent which="is_parent:true" score=max}(child_field:bla)
>
> It's possible to boost the parent? something like:
>
> {!parent which="is_parent:true" score=max}(child_field:bla)
> parent_field:"bla bla"^10
> Thanks,
> Michael
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Cache problem

2016-04-12 Thread Bastien Latard - MDPI AG

Thank you both, Bill and Reth!

Here is my current options from my command to launch java:
*/usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m 
-XX:MaxPermSize=20480m [...]*


So should I do *-Xms20480m -Xmx20480m* ?
Why? What would it change?

Reminder: the size of my main index is 46Gb... (80Gb all together)



BTW: what's the difference between dark and light grey in the JVM 
representation? (real/virtual memory?)



NOTE: I have only tomcat running on this server (and this is my live 
website - /i.e.: quite critical/).


So if document cache is using the OS cache, this might be the problem, 
right?
(because it seems to cache every field ==> so all the data returned by 
the query)


kr,
Bast

On 12/04/2016 08:19, Reth RM wrote:

As per solr admin dashboard's memory report, solr jvm is not using memory
more than 20 gb, where as physical memory is almost full.  I'd set
xms=xmx=16 gb and let operating system use rest. And regarding caches:
  filter cache hit ratio looks good so it should not be concern. And afaik,
document cache actually uses OS cache. Overall, I'd reduce memory allocated
to jvm as said above and try.




On Mon, Apr 11, 2016 at 7:40 PM,  wrote:


You do need to optimize to get rid of the deleted docs probably...

That is a lot of deleted docs

Bill Bell
Sent from mobile



On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG

 wrote:

Dear Solr experts :),

I read this very interesting post 'Understanding and tuning your Solr

caches' !

This is the only good document that I was able to find after searching

for 1 day!

I was using Solr for 2 years without knowing in details what it was

caching...(because I did not need to understand it before).

I had to take a look since I needed to restart (regularly) my tomcat in

order to improve performances...

But I now have 2 questions:
1) How can I know how much RAM is my solr using in real (especially for

caching)?

2) Could you have a quick look into the following images and tell me if

I'm doing something wrong?

Note: my index contains 66 millions of articles with several text fields

stored.



My solr contains several cores (all together are ~80Gb big), but almost

only the one below is used.

I have the feeling that a lot of data is always stored in RAM...and

getting bigger and bigger all the time...




(after restart)
$ sudo tail -f /var/log/tomcat7/catalina.out | grep GC

[...] after a few minutes


Here are some images, that can show you some stats about my Solr

performances...







Kind regards,
Bastien Latard




Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re: Limiting regex queries

2016-04-12 Thread Vincenzo D'Amore
Hi Michael,

I suggest to wrap the query parser you're using now with a custom one.
That's should help to handle the case where the query has a range with a
large number.

I did something like that with Edismax.

https://github.com/freedev/solr-synonyms-query-parser-plugin

Take a look at the createParser parser method.

When createParser is execute you can choose if you want rewrite the query
parameters or use a custom list.

Hope this helps,
Vincenzo


On Sun, Apr 10, 2016 at 11:38 PM, Michael Harkins  wrote:

> Well the originally architecture is out of my hands , but when someone
> sends in a query like that, if the range is a large number , my system
> basically shuts down and the cpu spikes with a large increase in
> memory usage. The queries are for strings. The query itself was an
> accident but I want to be able to prevent an accident from bringing
> down the index.
>
>
> > On Apr 10, 2016, at 12:34 PM, Erick Erickson 
> wrote:
> >
> > OK, why is this a problem? This smells like an XY problem,
> > you want to take some specific action, but it's not at all
> > clear what the problem is. There might be other ways
> > of doing this.
> >
> > If you're allowing regexes on numeric fields, using real
> > number fields (trie) and using range queries is a much
> > better way to go.
> >
> > Best,
> > Erick
> >
> >> On Sun, Apr 10, 2016 at 9:28 AM, Michael Harkins 
> wrote:
> >> Hey all,
> >>
> >> I am using lucene and solr version 4.2, and was wondering what would
> >> be the best way to not allow regex queries with very large numbers.
> >> Something like blah{1234567} or blah{1234, 123445678}
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr Sharding Strategy

2016-04-12 Thread Bhaumik Joshi
Ok i will try with pausing the indexing fully and will check the impact.

In performance test queries issued sequentially.

Thanks & Regards,
Bhaumik Joshi

From: Toke Eskildsen 
Sent: Monday, April 11, 2016 11:13 PM
To: Bhaumik Joshi
Cc: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On Tue, 2016-04-12 at 05:57 +, Bhaumik Joshi wrote:

> //Insert Document
> UpdateResponse resp = cloudServer.add(doc, 1000);
>
Don't insert documents one at a time, if it can be avoided:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


Try pausing the indexing fully when you do your query test, to check how
big the impact of indexing is.

When you run your query performance test, are the queries issued
sequentially or in parallel?


- Toke Eskildsen, State and Univeristy Library, Denmark



Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-12 Thread Reth RM
Can you please be bit more specific on what type of query are you making
and what other values are you expecting, with example?

If you know of specific jira for the use case, then you can write comments
there.


On Mon, Apr 11, 2016 at 5:54 PM, Anton K.  wrote:

> Anyone?
>
> Or how can i contact with facet heatmaps creator?
>
> 2016-04-07 18:42 GMT+03:00 Anton K. :
>
> > I am working with new solr feature: facet heatmaps. It works great, i
> > create clusters on my map with counts. When user click on cluster i zoom
> in
> > that area and i might show him more clusters or documents (based on
> current
> > zoom level).
> >
> > But all my cluster icons (i use round one, see screenshot below) placed
> > straight in the center of cluster's rectangles:
> >
> > https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
> >
> > Some clusters can be in sea and so on. Also it feels not natural in my
> > case to have icons placed orderly on the world map.
> >
> > I want to place cluster's icons in average coords based on coordinates of
> > all my docs inside cluster. Is there any way to achieve this? I am trying
> > to use stats component for facet heatmap but it isn't implemented yet.
> >
>


Re: Cache problem

2016-04-12 Thread Reth RM
As per solr admin dashboard's memory report, solr jvm is not using memory
more than 20 gb, where as physical memory is almost full.  I'd set
xms=xmx=16 gb and let operating system use rest. And regarding caches:
 filter cache hit ratio looks good so it should not be concern. And afaik,
document cache actually uses OS cache. Overall, I'd reduce memory allocated
to jvm as said above and try.




On Mon, Apr 11, 2016 at 7:40 PM,  wrote:

> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
> > On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>  wrote:
> >
> > Dear Solr experts :),
> >
> > I read this very interesting post 'Understanding and tuning your Solr
> caches' !
> > This is the only good document that I was able to find after searching
> for 1 day!
> >
> > I was using Solr for 2 years without knowing in details what it was
> caching...(because I did not need to understand it before).
> > I had to take a look since I needed to restart (regularly) my tomcat in
> order to improve performances...
> >
> > But I now have 2 questions:
> > 1) How can I know how much RAM is my solr using in real (especially for
> caching)?
> > 2) Could you have a quick look into the following images and tell me if
> I'm doing something wrong?
> >
> > Note: my index contains 66 millions of articles with several text fields
> stored.
> > 
> >
> > My solr contains several cores (all together are ~80Gb big), but almost
> only the one below is used.
> >
> > I have the feeling that a lot of data is always stored in RAM...and
> getting bigger and bigger all the time...
> >
> > 
> > 
> >
> > (after restart)
> > $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> > 
> > [...] after a few minutes
> > 
> >
> > Here are some images, that can show you some stats about my Solr
> performances...
> > 
> > 
> > 
> >
> > 
> >
> > Kind regards,
> > Bastien Latard
> >
> >
>


Re: Solr Sharding Strategy

2016-04-12 Thread Toke Eskildsen
On Tue, 2016-04-12 at 05:57 +, Bhaumik Joshi wrote:

> //Insert Document
> UpdateResponse resp = cloudServer.add(doc, 1000);
> 
Don't insert documents one at a time, if it can be avoided:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


Try pausing the indexing fully when you do your query test, to check how
big the impact of indexing is.

When you run your query performance test, are the queries issued
sequentially or in parallel?


- Toke Eskildsen, State and Univeristy Library, Denmark




Re: Solrj API for Managed Resources

2016-04-12 Thread Reth RM
I think its best to use available APIs. Here are the list of apis for
managing synonyms and stop words

https://cwiki.apache.org/confluence/display/solr/Managed+Resources

And this blog post with details
https://lucidworks.com/blog/2014/03/31/introducing-solrs-restmanager-and-managed-stop-words-and-synonyms/



On Tue, Apr 12, 2016 at 4:39 AM, iambest  wrote:

> Is there a solrj API to add synonyms or stop words using the Managed
> Resources API? I have to programmatically add them, what is the best way?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solrj-API-for-Managed-Resources-tp4269454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Specify relative path to current core conf folder when it's originally relative to solr home

2016-04-12 Thread Reth RM
 I think there are some root paths defined in solr.sh file that will be in
bin directory. You can pick root directory variable from there and use it.
Example in solrconfig.xml, there is a value as :
" ${solr.install.dir:../../../..}" I think solr.install.dir is the root
path and its definition is set in solr.sh. I'm not sure but worth giving a
try.




On Tue, Apr 12, 2016 at 9:34 AM, scott.chu  wrote:

> I got a custom tokenizer. When configuring it, there's an attribute
> 'fileDir', whose value is  a path relative to solr home. But I wish it can
> be relative to current core. Is there some system variable out-of-box, say
> {current_core}, that I can use in the value? For example,
>
> solr home = /solr5/server/solr
> In the current core's solrconfig.xml, I can specify
> 
> 
> myfiledir
> 
> 
>
> so it will refer to /solr5/server/solr/myfiledir.
>
> But I wanna put myfileDir under current core's conf folder. I wish there's
> something such as:
> ...
> {current_core}/conf/myfiledir
> ...
>
> Is it possible?
>