How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Alok Bhandari
Hello ,

I am using Solr 5.2.

I have a field defined as "string" field type. It have some values in it
like 

DOC-1 => abc ".. I am " not ? test
DOC-2 => abc ".. 
This is the single string , I want to query all documents which exactly
match this string i.e. it should return me only DOC-1 when I query for 'abc
".. I am " not ? test' and it should return me only DOC-2 if I query for
'abc"...'.

Please let me know how I can achieve this , which defType I should use.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: realtime get requirements

2016-01-13 Thread Alessandro Benedetti
Hi Matteo,
which Solr version are you using ?
Prior to 5.1 , the building of the suggester was happening by default on
startup, causing long waiting times (
https://issues.apache.org/jira/browse/SOLR-6845 ) .

If you are on a Solr >=5.1 I highly discourage the use of
buildOnStartup=true if not a specific requirement.
As Erick was saying  :


>- *The “buildOnStartup” parameter should be set to “false”*. Really.
>This can lead to *very* long startup times, many minutes on very large
>indexes. Do you really want to re-read, decompress and and add the field
>from *every* document to the suggester *every time you start Solr!* Likely
>not, but you can if you want to.
>- *The “buildOnCommit” parameter should be set to “false”*. Really. Do
>you really want to re-read, decompress and and add the field from
>*every* document to the suggester *every time you commit!* Likely
>not, but you can if you want to.
>
> In details, for the *DocumentDictionary* during the building process, for *ALL
> the documents* in the index :
>
>- *the stored content* of the configured field is read from the disk (
>* stored="true" *is required for the field to have the Suggester
>working)
>
>
>- the compressed content is decompressed ( remember that Solr stores
>the plain content of a field applying a compression algorithm [3] )
>
>
>- the suggester data structure is built
>
> We must be really careful here to this sentence :
> "for ALL the documents*" -> no delta dictionary building is happening*


So extra care every time you decide to build the Suggester !


Cheers

On 12 January 2016 at 18:18, Erick Erickson  wrote:

> right, suggester had some bad behavior where it rebuilt on startup despite
> setting the flag to _not_ do that. See:
>
> Some details here:
>
> https://lucidworks.com/blog/2015/03/04/solr-suggester/
>
> Best,
> Erick
>
> On Tue, Jan 12, 2016 at 8:12 AM, Matteo Grolla 
> wrote:
> > ok,
> >   suggester was responsible for the long time to load.
> > Thanks
> >
> > 2016-01-12 15:47 GMT+01:00 Matteo Grolla :
> >
> >> Thanks Shawn,
> >>  On a production solr instance some cores take a long time to load
> >> while other of similar size take much less. One of the differences
> between
> >> these cores is the directoryFactory.
> >>
> >> 2016-01-12 15:34 GMT+01:00 Shawn Heisey :
> >>
> >>> On 1/12/2016 2:50 AM, Matteo Grolla wrote:
> >>> > and that it works with any directory factory? (Not just
> >>> > NRTCachingDirectoryFactory)
> >>>
> >>> Realtime Get relies on the updateLog to return uncommitted documents,
> >>> and standard Lucene mechanisms to return documents that have already
> >>> been committed.  It should work with any directory.
> >>>
> >>> I would like to know why you're changing the directory.  The only time
> >>> the directory should be changed is if you want to work with something
> >>> exotic like HDFS.  With a typical installation using a typical
> >>> filesystem, NRTCachingDirectoryFactory is absolutely the best option
> and
> >>> should not be replaced with anything else.  The NRT factory uses MMap,
> >>> so there is no need to switch to MMapDirectoryFactory.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Binoy Dalal
Just query the string field and nothing else.
String fields only return on exact match.

On Wed, 13 Jan 2016, 16:52 Alok Bhandari 
wrote:

> Hello ,
>
> I am using Solr 5.2.
>
> I have a field defined as "string" field type. It have some values in it
> like
>
> DOC-1 => abc ".. I am " not ? test
> DOC-2 => abc "..
> This is the single string , I want to query all documents which exactly
> match this string i.e. it should return me only DOC-1 when I query for 'abc
> ".. I am " not ? test' and it should return me only DOC-2 if I query for
> 'abc"...'.
>
> Please let me know how I can achieve this , which defType I should use.
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Binoy Dalal
No.

On Wed, 13 Jan 2016, 16:58 Alok Bhandari 
wrote:

> Hi Binoy thanks.
>
> But does it matter which query-parser I use , shall I use "lucene" parser
> or
> "edismax" parser.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402p4250405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Alok Bhandari
Hi Binoy thanks.

But does it matter which query-parser I use , shall I use "lucene" parser or
"edismax" parser.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402p4250405.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr BooleanClauses issue with space

2016-01-13 Thread sara hajili
hi all,
what is exactly diffrence between sapce and OR in solr query  ?
i mean what is diffrence  between
q = solr OR lucene OR search
and this
q = solr lucene search?

solr default boolean occurence is OR,isn't it?


Re: Kerberos ticket not renewing when storing index on Kerberized HDFS

2016-01-13 Thread Andrew Bumstead
Thanks Ishan, I've raised a JIRA for it.

On 11 January 2016 at 20:17, Ishan Chattopadhyaya  wrote:

> Not sure how reliably renewals are taken care of in the context of
> kerberized HDFS, but here's my 10-15 minute analysis.
> Seems to me that the auto renewal thread is not spawned [0]. This relies on
> kinit.
> Not sure if having a login configuration with renewTGT is sufficient (which
> seems to be passed in by default, unless there's a jaas config being
> explicitly passed in with renewTGT=false). As per the last comments from
> Devraj & Owen [1] kinit based logins have worked more reliably.
>
> If you can rule out any setup issues, I suggest you file a JIRA and someone
> who has worked on the HdfsDirectoryFactory would be able to suggest better.
> Thanks,
> Ishan
>
> [0] -
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/security/UserGroupInformation.java#UserGroupInformation.spawnAutoRenewalThreadForUserCreds%28%29
>
> [1] - https://issues.apache.org/jira/browse/HADOOP-6656
>
> On Fri, Jan 8, 2016 at 10:21 PM, Andrew Bumstead <
> andrew.bumst...@bigdatapartnership.com> wrote:
>
> > Hello,
> >
> > I have Solr Cloud configured to stores its index files on a Kerberized
> HDFS
> > (I followed documentation at
> > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS),
> > and
> > have been able to index some documents with the files being written to
> the
> > HDFS as expected. However, it appears that some time after starting, Solr
> > is unable to connect to HDFS as it no longer has a valid Kerberos TGT.
> The
> > time-frame of this occurring is consistent with my default Kerberos
> ticket
> > lifetime of 24 hours, so it appears as though Solr is not renewing its
> > Kerberos ticket upon expiry. A restart of Solr resolves the issue again
> for
> > 24 hours.
> >
> > Is there any configuration I can add to make Solr automatically renew its
> > ticket or is this an issue with Solr?
> >
> > The following is the stack trace I am getting in Solr.
> >
> > java.io.IOException: Failed on local exception: java.io.IOException:
> > Couldn't setup connection for solr/
> sandbox.hortonworks@hortonworks.com
> > to sandbox.hortonworks.com/10.0.2.15:8020; Host Details : local host
> is: "
> > sandbox.hortonworks.com/10.0.2.15"; destination host is: "
> > sandbox.hortonworks.com":8020;
> > at
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> > at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> > at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> > at
> >
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> > at com.sun.proxy.$Proxy10.renewLease(Unknown Source)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
> > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> > at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> > at com.sun.proxy.$Proxy11.renewLease(Unknown Source)
> > at
> org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:879)
> > at
> org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
> > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
> > at
> > org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> > at
> org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.IOException: Couldn't setup connection for solr/
> > sandbox.hortonworks@hortonworks.com to
> > sandbox.hortonworks.com/10.0.2.15:8020
> > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:672)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> > at
> >
> >
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
> > at
> > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
> > at
> > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
> > at org.apache.hadoop.ipc.Client.call(Client.java:1438)
> > ... 16 more
> > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused
> 

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-13 Thread Gian Maria Ricci - aka Alkampfer
Thanks.

--
Gian Maria Ricci
Cell: +39 320 0136949



-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: lunedì 11 gennaio 2016 18:28
To: solr-user@lucene.apache.org
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> a customer need a comprehensive list of all pro and cons of using 
> standard Master Slave replica VS using Solr Cloud. I’m interested 
> especially in query performance consideration, because in this 
> specific situation the rate of new documents is really slow, but the 
> amount of data is about 50 millions of document, and the index size on 
> disk for single core is about 30 GB.

The primary advantage to SolrCloud is that SolrCloud handles most of the 
administrative and operational details for you automatically.

SolrCloud is a little more complicated to set up initially, because you must 
worry about Zookeeper as well as Solr, but once it's properly set up, there is 
no single point of failure.

> Such amount of data should be easily handled by a Master Slave replica 
> with a  single core replicated on a certain number of slaves, but we 
> need to evaluate also the option of SolrCloud, especially for fault 
> tolerance.
>

Once you're beyond initial setup, fault tolerance with SolrCloud is much easier 
than master/slave replication.  Switching a slave to a master is possible, but 
the procedure is somewhat complicated.  SolrCloud does not
*have* masters, it is a true cluster.

With master/slave replication, the master handles all indexing, and the 
finished index segments are copied to the slaves via HTTP, and the slaves 
simply need to open them.  SolrCloud does indexing on all shard replicas, 
nearly simultaneously.  Usually this is an advantage, not a disadvantage, but 
in heavy indexing situations master/slave replication
*might* show better performance on the slaves.

Thanks,
Shawn



Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-13 Thread Shivaji Dutta
- SolrCloud uses zookeeper to manage HA
- Zookeeper is a standard for all HA in Apache Hadoop
- You have collections which will manage your shards across nodes
- SolrJ Client is now fault tolerant with CloudSolrClient

This is the way future direction of the product will go.



On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
 wrote:

>Thanks.
>
>--
>Gian Maria Ricci
>Cell: +39 320 0136949
>
>
>
>-Original Message-
>From: Shawn Heisey [mailto:apa...@elyograg.org]
>Sent: lunedì 11 gennaio 2016 18:28
>To: solr-user@lucene.apache.org
>Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
>Replica
>
>On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> a customer need a comprehensive list of all pro and cons of using
>> standard Master Slave replica VS using Solr Cloud. I¹m interested
>> especially in query performance consideration, because in this
>> specific situation the rate of new documents is really slow, but the
>> amount of data is about 50 millions of document, and the index size on
>> disk for single core is about 30 GB.
>
>The primary advantage to SolrCloud is that SolrCloud handles most of the
>administrative and operational details for you automatically.
>
>SolrCloud is a little more complicated to set up initially, because you
>must worry about Zookeeper as well as Solr, but once it's properly set
>up, there is no single point of failure.
>
>> Such amount of data should be easily handled by a Master Slave replica
>> with a  single core replicated on a certain number of slaves, but we
>> need to evaluate also the option of SolrCloud, especially for fault
>> tolerance.
>>
>
>Once you're beyond initial setup, fault tolerance with SolrCloud is much
>easier than master/slave replication.  Switching a slave to a master is
>possible, but the procedure is somewhat complicated.  SolrCloud does not
>*have* masters, it is a true cluster.
>
>With master/slave replication, the master handles all indexing, and the
>finished index segments are copied to the slaves via HTTP, and the slaves
>simply need to open them.  SolrCloud does indexing on all shard replicas,
>nearly simultaneously.  Usually this is an advantage, not a disadvantage,
>but in heavy indexing situations master/slave replication
>*might* show better performance on the slaves.
>
>Thanks,
>Shawn
>
>



RE: Change leader in SolrCloud

2016-01-13 Thread Gian Maria Ricci - aka Alkampfer
Thanks.

--
Gian Maria Ricci
Cell: +39 320 0136949



-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org] 
Sent: martedì 12 gennaio 2016 10:52
To: solr-user@lucene.apache.org
Subject: Re: Change leader in SolrCloud

I would like to do a special mention of the update request processor chain Solr 
Cloud mechanism.[1] Quoting the documentation :

In a distributed SolrCloud situation setup, All processors in the chain
> *before* the DistributedUpdateProcessor are run on the first node that 
> receives an update from the client, regardless of this nodes status as 
> a leader or replica.  The DistributedUpdateProcessor then forwards the 
> update to the appropriate shard leader for the update (or to multiple 
> leaders in the event of an update that affects multiple documents, 
> such as a delete by query, or commit). The shard leader uses a 
> transaction log to apply  Atomic Updates & Optimistic Concurrency 
>  cuments> and then forwards the update to all of the shard replicas. 
> The leader and each replica run all of the processors in the chain 
> that are listed *after* the DistributedUpdateProcessor.
>

This means you need to be careful in the case you have an heavy update 
processor chain happening before the DistributedUpdateProcessor.
In that case the first node that gets the document to be indexed is going to 
have much more work ( running all the update request processor before the 
distribution) .

All the consideration already mentioned are of course still valid.

Cheers


[1]
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors

On 12 January 2016 at 08:19, Gian Maria Ricci - aka Alkampfer < 
alkamp...@nablasoft.com> wrote:

> Understood, thanks. I thought that the leader send data to other 
> shards after indexing and autocommit take place, but I know that this 
> is not the optimal situation. Sending all documents to all shard Solr 
> can guarantee consistency of data.
>
> Now everything is more clear. Thanks for the explanation.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: martedì 12 gennaio 2016 02:27
> To: solr-user 
> Subject: Re: Change leader in SolrCloud
>
> bq:  It seems to me a huge
> wasting of resources.
>
> How else would you guarantee consistency? Especially taking in to 
> account Lucene's write-once segments? Master/Slave sidesteps the 
> problem by moving entire, closed segments to the slave, but as Shawn 
> says if the master goes down the slaves don't have _any_ docs from the 
> not-closed segments.
>
> Best,
> Erick
>
> On Mon, Jan 11, 2016 at 1:42 PM, Shawn Heisey  wrote:
> > On 1/11/2016 1:23 PM, Gian Maria Ricci - aka Alkampfer wrote:
> >> Ok, this imply that if I have X replica of a shard, the document is
> indexed X+1 times? one for each replica plus the leader shard? It 
> seems to me a huge wasting of resources.
> >>
> >> In a Master/slave scenario indexing takes places only on master 
> >> node,
> then slave replicates analyzed data.
> >
> > The leader *is* a replica.  So if you have a replicationFactor of 
> > three, you have three replicas for each shard.  For each shard, one 
> > of those replicas gets elected to be the leader.  You do not have a 
> > leader and two replicas.
> >
> > Th
> e above is perhaps extremely pedantic, but understanding how SolrCloud
> > works requires understanding that being temporarily assigned the 
> > leader role does not change how the replica works, it just adds some 
> > additional coordination responsibilities.
> >
> > To answer your question, let's assume you build an index with 
> > replicationFactor=3.  No new replicas are added, and all machines 
> > are up.  In that situation, each document gets indexed a total of 
> > three
> times.
> >
> > In return for this additional complexity and resource usage, you 
> > don't have a single point of failure for indexing.  With 
> > master/slave replication, if your master goes down for any length of 
> > time, you must reconfigure all of your remaining Solr nodes to change the 
> > master.
> > Chances are very good that you will experience downtime.
> >
> > Thanks,
> > Shawn
> >
>



--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Shivaji Dutta
Erik and Shawn

Thanks for the input. In the process below we are posting the documents to
Solr over HTTP Connection in batches.

Trying to solve the same problem but in a different way :-

I have used lucene back in the day, where I would index the documents
locally on the disk and run search queries on them. Big fan of lucene.

I was wondering if there is any possibility like that.

If I have a repository of millions of documents, would it not make sense
to just index them locally and then copy the index file over to Solr and
have it read from it?

Any thoughts or blogs that could help me, or may be I am over thinking
this?

Thanks
Shivaji


On 1/13/16, 9:12 AM, "Erick Erickson"  wrote:

>It's usually not all that difficult to write a multi-threaded
>client that uses CloudSolrClient, or even fire up multiple
>instances of the SolrJ client (assuming they can work
>on discreet sections of the documents you need to index).
>
>That avoids the problem Shawn alludes to. Plus other
>issues. If you do _not_ use CloudSolrClient, then all the
>docs go to some node in the system that then sub-divides
>the list (and you really should update in batches, see:
>https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/)
>then the node that receives the packet sub-divides it
>into groups based on what shard they should be part of
>and forwards them to the leaders for that shard, very
>significantly increasing the numbers of conversations
>being carried on between Solr nodes. Times the number
>of threads you're specifying with CUSC (I really regret
>the renaming from ConcurrentUpdateSolrServer, I liked
>writing CUSS).
>
>With CloudSolrClient, you can scale nearly linearly with
>the number of shards. Not so with CUSC.
>
>FWIW,
>Erick
>
>On Tue, Jan 12, 2016 at 8:06 PM, Shawn Heisey  wrote:
>> On 1/12/2016 7:42 PM, Shivaji Dutta wrote:
>>> Now since with ConcurrentUdateSolrClient I am able to use a queue and
>>>a pool of threads, which makes it more attractive to use over
>>>CloudSolrClient which will use a HTTPSolrClient once it gets a set of
>>>nodes to do the updates.
>>>
>>> What is the recommended API for updating large amounts of documents
>>>with higher throughput rate.
>>
>> ConcurrentUpdateSolrClient has one flaw -- it swallows all exceptions
>> that happen during indexing.  Your application will never know about any
>> problems that occur during indexing.  The entire cluster could be down,
>> and your application would never know about it until you tried an
>> explicit commit operation.  Commit is an operation that is not handled
>> in the background by CUSC, so I would expect any exception to be
>> returned for that operation.
>>
>> This flaw is inherent to its design, the behavior would be very
>> difficult to change.
>>
>> If you don't care about your application getting error messages when
>> indexing requests fail, then CUSC is perfect.  This might be the case if
>> you are doing initial bulk loading.  For normal index updates after
>> initial loading, you would not want to use CUSC.
>>
>> If you do care about getting error messages when bulk indexing requests
>> fail, then you'll want to build a program with CloudSolrClient where you
>> create multiple indexing threads that all use the same the client
>>object.
>>
>> Thanks,
>> Shawn
>>
>



Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Erick Erickson
My first thought is "yes, you're overthinking it" ;)

Here's something to get you started for indexing
through a Java program:
https://cwiki.apache.org/confluence/display/solr/Using+SolrJ

Of course you _could_ use Lucene to build your indexes
and just copy them "to the right place", but there are
a number of ways that can go wrong, here are a couple:
1> if you have shards, you'd have to mimic the automatic
routing.
2> you have to mimic the analysis chain you've defined for
each field in Solr.
3> you have to copy the built Lucene indexes to the right shard
(assuming you got <1> right).

Depending on the docs in question, if they need Tika parsing
you can do that simply in SolrJ too, see:
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
(this is a bit outdated, a couple of class names have changed
in particular).

SolrJ uses an efficient binary format to move the docs. I regularly
get 20K docs/second on my local setup, see:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/
I was indexing 11M Wiki articles n about 10 minutes on some tests
recently. Solr can scale that close to linearly with more shards and
enough indexing clients. Is it really worth the effort of using Lucene?

FWIW,
Erick



On Wed, Jan 13, 2016 at 10:19 AM, Shivaji Dutta  wrote:
> Erik and Shawn
>
> Thanks for the input. In the process below we are posting the documents to
> Solr over HTTP Connection in batches.
>
> Trying to solve the same problem but in a different way :-
>
> I have used lucene back in the day, where I would index the documents
> locally on the disk and run search queries on them. Big fan of lucene.
>
> I was wondering if there is any possibility like that.
>
> If I have a repository of millions of documents, would it not make sense
> to just index them locally and then copy the index file over to Solr and
> have it read from it?
>
> Any thoughts or blogs that could help me, or may be I am over thinking
> this?
>
> Thanks
> Shivaji
>
>
> On 1/13/16, 9:12 AM, "Erick Erickson"  wrote:
>
>>It's usually not all that difficult to write a multi-threaded
>>client that uses CloudSolrClient, or even fire up multiple
>>instances of the SolrJ client (assuming they can work
>>on discreet sections of the documents you need to index).
>>
>>That avoids the problem Shawn alludes to. Plus other
>>issues. If you do _not_ use CloudSolrClient, then all the
>>docs go to some node in the system that then sub-divides
>>the list (and you really should update in batches, see:
>>https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/)
>>then the node that receives the packet sub-divides it
>>into groups based on what shard they should be part of
>>and forwards them to the leaders for that shard, very
>>significantly increasing the numbers of conversations
>>being carried on between Solr nodes. Times the number
>>of threads you're specifying with CUSC (I really regret
>>the renaming from ConcurrentUpdateSolrServer, I liked
>>writing CUSS).
>>
>>With CloudSolrClient, you can scale nearly linearly with
>>the number of shards. Not so with CUSC.
>>
>>FWIW,
>>Erick
>>
>>On Tue, Jan 12, 2016 at 8:06 PM, Shawn Heisey  wrote:
>>> On 1/12/2016 7:42 PM, Shivaji Dutta wrote:
 Now since with ConcurrentUdateSolrClient I am able to use a queue and
a pool of threads, which makes it more attractive to use over
CloudSolrClient which will use a HTTPSolrClient once it gets a set of
nodes to do the updates.

 What is the recommended API for updating large amounts of documents
with higher throughput rate.
>>>
>>> ConcurrentUpdateSolrClient has one flaw -- it swallows all exceptions
>>> that happen during indexing.  Your application will never know about any
>>> problems that occur during indexing.  The entire cluster could be down,
>>> and your application would never know about it until you tried an
>>> explicit commit operation.  Commit is an operation that is not handled
>>> in the background by CUSC, so I would expect any exception to be
>>> returned for that operation.
>>>
>>> This flaw is inherent to its design, the behavior would be very
>>> difficult to change.
>>>
>>> If you don't care about your application getting error messages when
>>> indexing requests fail, then CUSC is perfect.  This might be the case if
>>> you are doing initial bulk loading.  For normal index updates after
>>> initial loading, you would not want to use CUSC.
>>>
>>> If you do care about getting error messages when bulk indexing requests
>>> fail, then you'll want to build a program with CloudSolrClient where you
>>> create multiple indexing threads that all use the same the client
>>>object.
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>


Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Toke Eskildsen
Shivaji Dutta  wrote:
> If I have a repository of millions of documents, would it not make sense
> to just index them locally and then copy the index file over to Solr and
> have it read from it?

It is certainly possible and for some scenarios it will work well.

We do it locally: Create a shard, optimize it and add it to our SolrCloud, 
where it is never updated again. This works for us as we have immutable data 
and one of the benefits is drastically lowered hardware requirements for the 
search machines. There is a small write-up at 
https://sbdevel.wordpress.com/net-archive-search/

There was a talk at Lucene/Solr Revolution 2015 about using a similar workflow 
for indexing logfiles. I think it was this one: 
https://www.youtube.com/watch?v=u5_vzcYYWfc

Bear in mind that both Brett Hoerner (from the talk) and we are working with 
billions of documents and terabytes of index data. As you need to build your 
own logistics system, there is the usual trade-off of development & maintenance 
cost vs. just buying beefier hardware.

- Toke Eskildsen


Re: solr BooleanClauses issue with space

2016-01-13 Thread Doug Turnbull
Paste your Solr query into here http://splainer.io and it will help you
debug your scoring/matching (shameless plug I made this thing)

Also I suspect you may be using edismax. In which case the
inclusion/exclusion of explicit ORs might turn a query from a big OR query
into possibly a dismax query which maybe changes the structure of the query?

On Wed, Jan 13, 2016 at 10:05 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> You can run your query (or smaller one) with debugQuery=true and see how
> it is rewritten.
>
> Thanks,
> Emir
>
>
> On 13.01.2016 16:01, sara hajili wrote:
>
>> tnx.
>> and my main question is about maxBooleanDefault in solr config.
>> it is 1024 by default.
>> and i have a edismax query with about 500 words in this way:
>> q1= str1 OR str2 OR str3 ...OR strn
>> it throws exception that cant't parse query too boolean clause.
>> so if i changed maxBooleanDefault to 1500 it works
>> but some thing is ambiguous for me is when i don;t change
>> maxBooleanDefault
>> and it remains 1024.
>> but i changed query in this way
>> q2 = str1 str2 str3 ... strn // i eliminated OR between for and inside
>> space
>> i didn't get exception !!!
>> why?!
>> what is diffrence between q1 an q2??
>>
>> On Wed, Jan 13, 2016 at 6:28 AM, Shawn Heisey 
>> wrote:
>>
>> On 1/13/2016 5:40 AM, sara hajili wrote:
>>>
 what is exactly diffrence between sapce and OR in solr query  ?
 i mean what is diffrence  between
 q = solr OR lucene OR search
 and this
 q = solr lucene search?

 solr default boolean occurence is OR,isn't it?

>>> This depends on what the default operator is.  The default for the
>>> default operator is OR, and that would produce exactly the same results
>>> for both of the queries you have mentioned.  If the default operator is
>>> AND, then those two queries would be different.
>>>
>>> The default operator applies to the lucene and edismax parsers.  The
>>> lucene parser is Solr's default.  In older versions, the default
>>> operator could be set by a defaultOperator parameter.  I do not remember
>>> whether that was in solrconfig or schema.  That parameter is deprecated
>>> and the q.op parameter should be used now.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-13 Thread Jack Krupansky
The "Legacy Scaling and Distribution" section of the Solr Reference Guide
also gives info elated to so-called master-slave mode:
https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Distribution

Also, although the old master-slave mode is still technically supported in
the sense that the code and doc is still there, You won't be able to get
the level of community support  here on the mailing list as you can get for
SolrCloud.

Unless you're simply trying to decide whether to leave an old legacy system
as-is with the old distributed mode, nobody should be considered a fresh
new distributed Solr deployment with anything other than SolrCloud.

(Hmmm... have any of the committers considered deprecating the old
non-SolrCloud distributed mode features?)

-- Jack Krupansky

On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta 
wrote:

> - SolrCloud uses zookeeper to manage HA
> - Zookeeper is a standard for all HA in Apache Hadoop
> - You have collections which will manage your shards across nodes
> - SolrJ Client is now fault tolerant with CloudSolrClient
>
> This is the way future direction of the product will go.
>
>
>
> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>  wrote:
>
> >Thanks.
> >
> >--
> >Gian Maria Ricci
> >Cell: +39 320 0136949
> >
> >
> >
> >-Original Message-
> >From: Shawn Heisey [mailto:apa...@elyograg.org]
> >Sent: lunedì 11 gennaio 2016 18:28
> >To: solr-user@lucene.apache.org
> >Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> >Replica
> >
> >On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >> a customer need a comprehensive list of all pro and cons of using
> >> standard Master Slave replica VS using Solr Cloud. I¹m interested
> >> especially in query performance consideration, because in this
> >> specific situation the rate of new documents is really slow, but the
> >> amount of data is about 50 millions of document, and the index size on
> >> disk for single core is about 30 GB.
> >
> >The primary advantage to SolrCloud is that SolrCloud handles most of the
> >administrative and operational details for you automatically.
> >
> >SolrCloud is a little more complicated to set up initially, because you
> >must worry about Zookeeper as well as Solr, but once it's properly set
> >up, there is no single point of failure.
> >
> >> Such amount of data should be easily handled by a Master Slave replica
> >> with a  single core replicated on a certain number of slaves, but we
> >> need to evaluate also the option of SolrCloud, especially for fault
> >> tolerance.
> >>
> >
> >Once you're beyond initial setup, fault tolerance with SolrCloud is much
> >easier than master/slave replication.  Switching a slave to a master is
> >possible, but the procedure is somewhat complicated.  SolrCloud does not
> >*have* masters, it is a true cluster.
> >
> >With master/slave replication, the master handles all indexing, and the
> >finished index segments are copied to the slaves via HTTP, and the slaves
> >simply need to open them.  SolrCloud does indexing on all shard replicas,
> >nearly simultaneously.  Usually this is an advantage, not a disadvantage,
> >but in heavy indexing situations master/slave replication
> >*might* show better performance on the slaves.
> >
> >Thanks,
> >Shawn
> >
> >
>
>


Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-13 Thread Bernd Fehling
SolrCloud has some disadvantages and can't beat the easiness and simpleness of
Master Slave Replica. So I can only encourage to keep Master Slave Replica
in future versions.

Bernd

Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> The "Legacy Scaling and Distribution" section of the Solr Reference Guide
> also gives info elated to so-called master-slave mode:
> https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Distribution
> 
> Also, although the old master-slave mode is still technically supported in
> the sense that the code and doc is still there, You won't be able to get
> the level of community support  here on the mailing list as you can get for
> SolrCloud.
> 
> Unless you're simply trying to decide whether to leave an old legacy system
> as-is with the old distributed mode, nobody should be considered a fresh
> new distributed Solr deployment with anything other than SolrCloud.
> 
> (Hmmm... have any of the committers considered deprecating the old
> non-SolrCloud distributed mode features?)

-1

> 
> -- Jack Krupansky
> 
> On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta 
> wrote:
> 
>> - SolrCloud uses zookeeper to manage HA
>> - Zookeeper is a standard for all HA in Apache Hadoop
>> - You have collections which will manage your shards across nodes
>> - SolrJ Client is now fault tolerant with CloudSolrClient
>>
>> This is the way future direction of the product will go.
>>
>>
>>
>> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>>  wrote:
>>
>>> Thanks.
>>>
>>> --
>>> Gian Maria Ricci
>>> Cell: +39 320 0136949
>>>
>>>
>>>
>>> -Original Message-
>>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>>> Sent: lunedì 11 gennaio 2016 18:28
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
>>> Replica
>>>
>>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
 a customer need a comprehensive list of all pro and cons of using
 standard Master Slave replica VS using Solr Cloud. I¹m interested
 especially in query performance consideration, because in this
 specific situation the rate of new documents is really slow, but the
 amount of data is about 50 millions of document, and the index size on
 disk for single core is about 30 GB.
>>>
>>> The primary advantage to SolrCloud is that SolrCloud handles most of the
>>> administrative and operational details for you automatically.
>>>
>>> SolrCloud is a little more complicated to set up initially, because you
>>> must worry about Zookeeper as well as Solr, but once it's properly set
>>> up, there is no single point of failure.
>>>
 Such amount of data should be easily handled by a Master Slave replica
 with a  single core replicated on a certain number of slaves, but we
 need to evaluate also the option of SolrCloud, especially for fault
 tolerance.

>>>
>>> Once you're beyond initial setup, fault tolerance with SolrCloud is much
>>> easier than master/slave replication.  Switching a slave to a master is
>>> possible, but the procedure is somewhat complicated.  SolrCloud does not
>>> *have* masters, it is a true cluster.
>>>
>>> With master/slave replication, the master handles all indexing, and the
>>> finished index segments are copied to the slaves via HTTP, and the slaves
>>> simply need to open them.  SolrCloud does indexing on all shard replicas,
>>> nearly simultaneously.  Usually this is an advantage, not a disadvantage,
>>> but in heavy indexing situations master/slave replication
>>> *might* show better performance on the slaves.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
> 



Re: solr error

2016-01-13 Thread Binoy Dalal
OK. Post the entire stack trace. That way we can get an idea of what is
actually throwing this exception.

On Thu, 14 Jan 2016, 12:49 Midas A  wrote:

> when we are using solr only
>
> On Thu, Jan 14, 2016 at 12:41 PM, Binoy Dalal 
> wrote:
>
> > Can you post the entire stack trace?
> >
> > Do you get this error at startup or while you're using solr?
> >
> > On Thu, 14 Jan 2016, 12:38 Midas A  wrote:
> >
> > > we are continuously getting the error
> > > "null:org.eclipse.jetty.io.EofException"
> > > on slave .
> > >
> > > what could be the reason ?
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal


how to add new node in sole cloud cluster

2016-01-13 Thread Zap Org
in running solr cloud cluster how to add new node without disturbing the
running cluster


Re: Error while reloading collection

2016-01-13 Thread Binoy Dalal
1) Ensure that the class file is actually present at the path you've given.
2) Post the entire stack trace of the exception. You can get that from the
solr log.

On Thu, 14 Jan 2016, 12:35 davidphilip cherian 
wrote:

> You should probably ask this question here
>
>
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.0/Cloudera-Manager-Introduction/cmi_getting_help_and_support.html
>
> On Thu, Jan 14, 2016 at 12:11 PM, vidya  wrote:
>
> > Hi
> >  I am using solrcloud on cloudera cluster. I have created collections
> using
> > solrctl command which is supported by cloudera search tool. I included
> one
> > class of java in schema.xml for creating a field type which is dependent
> on
> > a jar that i have included in solrconfig.xml. But when i reload that
> > collection, I am getting an error that ERROR LOADING THAT CLASS what i
> > included in schema.xml. What else do i need to include ?
> >
> > In solrconfig.xml :  > regex=".*\.jar" />
> >
> > Error while relaoding in command line interface :
> >
> > 
> >
> > 
> >
> > 
> > 
> > 0
> > 
> > 197
> > 
> > 
> > 
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> > handling 'reload' action
> > 
> >
> > 
> >
> >
> >
> > Please help me on this.Thanks in advance
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Error-while-reloading-collection-tp4250635.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
-- 
Regards,
Binoy Dalal


Re: Error while reloading collection

2016-01-13 Thread davidphilip cherian
You should probably ask this question here

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.0/Cloudera-Manager-Introduction/cmi_getting_help_and_support.html

On Thu, Jan 14, 2016 at 12:11 PM, vidya  wrote:

> Hi
>  I am using solrcloud on cloudera cluster. I have created collections using
> solrctl command which is supported by cloudera search tool. I included one
> class of java in schema.xml for creating a field type which is dependent on
> a jar that i have included in solrconfig.xml. But when i reload that
> collection, I am getting an error that ERROR LOADING THAT CLASS what i
> included in schema.xml. What else do i need to include ?
>
> In solrconfig.xml :  regex=".*\.jar" />
>
> Error while relaoding in command line interface :
>
> 
>
> 
>
> 
> 
> 0
> 
> 197
> 
> 
> 
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> handling 'reload' action
> 
>
> 
>
>
>
> Please help me on this.Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Error-while-reloading-collection-tp4250635.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr error

2016-01-13 Thread Midas A
when we are using solr only

On Thu, Jan 14, 2016 at 12:41 PM, Binoy Dalal 
wrote:

> Can you post the entire stack trace?
>
> Do you get this error at startup or while you're using solr?
>
> On Thu, 14 Jan 2016, 12:38 Midas A  wrote:
>
> > we are continuously getting the error
> > "null:org.eclipse.jetty.io.EofException"
> > on slave .
> >
> > what could be the reason ?
> >
> --
> Regards,
> Binoy Dalal
>


Error while reloading collection

2016-01-13 Thread vidya
Hi
 I am using solrcloud on cloudera cluster. I have created collections using
solrctl command which is supported by cloudera search tool. I included one
class of java in schema.xml for creating a field type which is dependent on
a jar that i have included in solrconfig.xml. But when i reload that
collection, I am getting an error that ERROR LOADING THAT CLASS what i
included in schema.xml. What else do i need to include ?

In solrconfig.xml : 

Error while relaoding in command line interface : 







0

197



org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
handling 'reload' action






Please help me on this.Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-reloading-collection-tp4250635.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr error

2016-01-13 Thread Midas A
we are continuously getting the error
"null:org.eclipse.jetty.io.EofException"
on slave .

what could be the reason ?


Re: how to add new node in sole cloud cluster

2016-01-13 Thread Zap Org
i have 2 nodes where one got down and after restarting the server it shows
error in initializing solrconfig.xml

On Thu, Jan 14, 2016 at 12:45 PM, Zap Org  wrote:

> in running solr cloud cluster how to add new node without disturbing the
> running cluster
>


error in initializing solrconfig.xml

2016-01-13 Thread Zap Org
i have 2 running solr nodes in my cluster one node hot down. i restarted
tomcat server and its throughing exception for initializing  solrconfig.xml
and didnot recognize collection


Re: degrades qtime in a 20million doc collection

2016-01-13 Thread Jack Krupansky
I recall a couple of previous discussions regarding some sort of
filter/field cache change in Lucene where they removed what had been an
optimization for Solr.

-- Jack Krupansky

On Wed, Jan 13, 2016 at 8:10 PM, Erick Erickson 
wrote:

> It's quite surprising that you're getting this kind of query
> degradation by adding an "fq" clause
> unless something's really out of whack on the setup. How much memory
> are you giving
> the JVM? Are you autowarming? Are you indexing while this is going on,
> and if what are
> your commit parameters? If you add =true to your query, one of
> the returned sections
> will be "timings" for the various components of a query measured in
> milliseconds. Occasionally
> there will be surprises in there.
>
> What are you measuring when you say it takes seconds? The time to
> render the result page or
> are you looking at the QTime parameter of the return packet?
>
> Best,
> Erick
>
> On Wed, Jan 13, 2016 at 4:27 PM, Anria B.  wrote:
> > hi Shawn
> >
> > Thanks for the quick answer.  As for the q=*,  we also saw similar
> results
> > in our testing when doing things like
> >
> > q=somefield:qval
> > =otherfield:fqval
> >
> > Which makes a pure Lucene query.  I simplified things somewhat since our
> > results were always that as numFound got large, the query time degraded
> as
> > soon as we added any  in the mix.
> >
> > We also saw similar results for queries like
> >
> > q=query stuff
> > =edismax
> > =afield
> > =afield bfield cfield
> >
> >
> > So the query structure was not what created the 3-7 second query time, it
> > was always a correlation between is  in the query, and what is the
> > numFound.  We've run numerous load tests for bringing in good query with
> fq
> > values in the "newSearcher",  caches on, caches off  ... this same
> > phenomenon persisted.
> >
> > As for Tomcat, it's an easy enough test to run it in Jetty.  We will sure
> > try that!  GC we've had default and G1 setups.
> >
> > Thanks for giving us something to think about
> >
> > Anria
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250600.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr error

2016-01-13 Thread Binoy Dalal
Can you post the entire stack trace?

Do you get this error at startup or while you're using solr?

On Thu, 14 Jan 2016, 12:38 Midas A  wrote:

> we are continuously getting the error
> "null:org.eclipse.jetty.io.EofException"
> on slave .
>
> what could be the reason ?
>
-- 
Regards,
Binoy Dalal


How to configure authentication in windows start script?

2016-01-13 Thread Kristine Jetzke
Hi,

I am using an authentication plugin in my Solr 5.4 standalone installation
running on Windows. How do I pass authentication options to the start
script? In the Linux script is an option called AUTHC_CLIENT_CONFIGURER_ARG
but I don't find anything similar for Windows...

Thanks,

tine


Re: Solr Heap memory vs. OS memory

2016-01-13 Thread Shawn Heisey
On 1/13/2016 2:25 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> Followup question:
>
> If one has multiple instances on the same host (a host running basically 
> nothing except multiple instances of Solr), then the values specified as -Xmx 
> in the various instances should add up to 25% of the RAM of the host...
>
> Is that correct?

Here comes the standard answer of anyone who works in technology:  It
depends.

When setting the max heap, you should set it as large as it needs to be,
and no larger.  Many factors will affect how much heap is required.

It is advisable to have much more memory installed in your server than
you need for your Java heap.  If you don't, Solr performance might be
unacceptable ... but this will depend on several factors, including the
size of your indexes.

This might be useful information:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



FieldCache

2016-01-13 Thread Lewin Joy (TMS)
Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



Re: degrades qtime in a 20million doc collection

2016-01-13 Thread Anria B.
hi Shawn

Thanks for the quick answer.  As for the q=*,  we also saw similar results
in our testing when doing things like 

q=somefield:qval
=otherfield:fqval

Which makes a pure Lucene query.  I simplified things somewhat since our
results were always that as numFound got large, the query time degraded as
soon as we added any  in the mix. 

We also saw similar results for queries like 

q=query stuff
=edismax
=afield
=afield bfield cfield


So the query structure was not what created the 3-7 second query time, it
was always a correlation between is  in the query, and what is the
numFound.  We've run numerous load tests for bringing in good query with fq
values in the "newSearcher",  caches on, caches off  ... this same
phenomenon persisted.  

As for Tomcat, it's an easy enough test to run it in Jetty.  We will sure
try that!  GC we've had default and G1 setups.  

Thanks for giving us something to think about

Anria 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error: FieldCache on multivalued field

2016-01-13 Thread Lewin Joy (TMS)
*updated subject line

Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



solrcould is killed,restart,colletion don't work

2016-01-13 Thread 李铁峰

i’m a solr user, i use solrcloud 4.9.1 ,it running in tomcat , when tomcat is 
killd ,  two collectiones has error, solr admin picture is:



each collection only have one shard . how can i repair it, let collection work, 
i do not want to restart tomcat ,and I can accept some data loss.



error log:

2016年1月14日 GMT+8 04:43:21
 <> WARN
 <> RecoveryStrategy
 <> Stopping recovery for 
zkNodeName=core_node1core=qdelcomment_shard1_replica1
 <>
2016年1月14日 GMT+8 04:43:25
 <> ERROR
 <> RecoveryStrategy
 <> Error while trying to recover. 
core=qdelcomment_shard1_replica1:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms ,​ collection: qdelcomment 
slice: shard1
 <>
Error while trying to recover. 
core=qdelcomment_shard1_replica1:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: qdelcomment 
slice: shard1
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:545)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:528)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

solr admin query error
{
  "responseHeader": {
"status": 503,
"QTime": 2,
"params": {
  "indent": "true",
  "q": "*:*",
  "_": "1452720352906",
  "wt": "json"
}
  },
  "error": {
"msg": "no servers hosting shard: ",
"code": 503
  }
}

solr:
4.9.1 1625909 - mike - 2014-09-18 04:09:04
-Dsolr.hdfs.home=hdfs://cdh530cluster/solre 


thank you!
jasperli

Re: solr-5.3.1 admin console not show properly

2016-01-13 Thread Jan Høydahl
Which brand and version of Java have you installed?
Looks like you run Solr as root? Should work, but not recommended. Try 
installing and running as an ordinary user.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 13. jan. 2016 kl. 17.01 skrev David Cao :
> 
> I installed and started solr following instructions from solr wiki as this
> ... (on a Redhat server)
> 
> cd ~/
> tar zxf /tmp/solr-5.3.1.tgz
> cd solr-5.3.1/bin
> ./solr start -f
> 
> 
> Solr starts fine. But when opening console in a browser ("
> http://server-ip:8983/solr/admin.html;), it shows a partially rendered page
> with highlighted messages "*SolrCore Initialization Failures*"; and a whole
> bunch of WARN messages in this nature,
> 
> 55724 WARN  (qtp1018134259-20) [   ] o.e.j.s.ServletHandler Error for
> /solr/css/styles/common.css
> java.lang.NoSuchMethodError:
> javax/servlet/http/HttpServletRequest.isAsyncSupported()Z
>at
> org.eclipse.jetty.servlet.DefaultServlet.sendData(DefaultServlet.java:922)
>at
> org.eclipse.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:533)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
>at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>at org.eclipse.jetty.server.Server.handle(Server.java:499)
>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>at java.lang.Thread.run(Thread.java:801)
> 
> 
> There was also a line at the start of the console log,
> 
> 1784 WARN  (main) [   ] o.e.j.s.SecurityHandler
> ServletContext@o.e.j.w.WebAppContext@1c662fe5{/solr,file:/root/solr-5.3.1/server/solr-webapp/webapp/,STARTING}{/root/solr-5.3.1/server/solr-webapp/webapp}
> has uncovered http methods for path: /
> 
> 
> Any ideas? is there any work I need to do to config the classpath?
> 
> thanks a lot!
> david



Re: How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Scott Stults
This might be a good case for the Raw query parser (I haven't used it
myself).

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-RawQueryParser


k/r,
Scott

On Wed, Jan 13, 2016 at 12:05 PM, Erick Erickson 
wrote:

> what _does_ matter is getting all that through the parser which means
> you have to enclose things in quotes and escape them.
>
> For instance, consider this query  stringFIeld:abc "i am not"
>
> this will get parsed as
> stringField:abc defaultTextField:"i am not".
>
> To get around this you need to make sure the entire search gets
> through the parser as a _single_ token by enclosing in quotes. But
> then of course you have confusion because you have quotes in your
> search term so you need to escape those, something like
> stringField:"abc \"i am not\""
>
> Here's a list for Lucene 5
>
> https://lucene.apache.org/core/5_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters
>
> Best,
> Erick
>
> On Wed, Jan 13, 2016 at 3:39 AM, Binoy Dalal 
> wrote:
> > No.
> >
> > On Wed, 13 Jan 2016, 16:58 Alok Bhandari <
> alokomprakashbhand...@gmail.com>
> > wrote:
> >
> >> Hi Binoy thanks.
> >>
> >> But does it matter which query-parser I use , shall I use "lucene"
> parser
> >> or
> >> "edismax" parser.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402p4250405.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> > --
> > Regards,
> > Binoy Dalal
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


RE: Solr Heap memory vs. OS memory

2016-01-13 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Followup question:

If one has multiple instances on the same host (a host running basically 
nothing except multiple instances of Solr), then the values specified as -Xmx 
in the various instances should add up to 25% of the RAM of the host...

Is that correct?

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Wednesday, December 09, 2015 10:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Heap memory vs. OS memory

Yes. This is still accurate, Lucene still relies on memory mapped files. And 
Solr usually doesn't require that much RAM, except if you have lots of massive 
cache entries.
Markus
 
-Original message-
> From:Kelly, Frank 
> Sent: Wednesday 9th December 2015 16:19
> To: solr-user@lucene.apache.org
> Subject: Solr Heap memory vs. OS memory
> 
> Hi Folks,
> 
>  I was wondering if this link I found recommended by Erick is still accurate 
> (for Solr 5.3.1)
> 
> "For configuring your Java VM, you should rethink your memory requirements: 
> Give only the really needed amount of heap space and leave as much as 
> possible to the O/S. As a rule of thumb: Don't use more than 1Z4 of your 
> physical memory as heap space for Java running Lucene/Solr, keep the 
> remaining memory free for the operating system cache."
> 
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> So I am using several CentOS Vms (on AWS) with 8GB RAM I so should plan for < 
> 2GB for -Xms and -Xmx?
> Our scaling plan - being on AWS - is to scale out (adding more Vms - not 
> adding more memory).
> 
> Thanks!
> 
> -Frank
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


degrades qtime in a 20million doc collection

2016-01-13 Thread Anria B.
hi all, 

I have a Really fun question to ask.  I'm sitting here looking at what is by
far the beefiest box I've ever seen in my life.  256GB of RAM,  extreme
TerraBytes of disc space, the works.  Linux server properly partitioned 

Yet, what we are seeing goes against all intuition I've built up in the Solr
world

1.   Collection has 20-30 million docs.
2.   q=*=someField:SomeVal   ---> takes 2.5 seconds
3.q=someField:SomeVal -->  300ms
4.   as numFound -> infinity, qtime -> infinity.

have any of you encountered such a thing?

that FQ degrades query time by so much?   

it's pure Solr 5.3.1.   ZK + Tomcat 8 + 1shard in solr.  JDK_8u60  All
running on this same box.

We have already tested different autoCommit strategies, and different values
for heap size, starting at 16GB, 32GB, 64GB, 128GB ...The only place we
saw a 100ms improvement was between 32 - -Xmx=64GB.  

Thanks 
Anria 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: degrades qtime in a 20million doc collection

2016-01-13 Thread Shawn Heisey
On 1/13/2016 3:01 PM, Anria B. wrote:
> I have a Really fun question to ask.  I'm sitting here looking at what is by
> far the beefiest box I've ever seen in my life.  256GB of RAM,  extreme
> TerraBytes of disc space, the works.  Linux server properly partitioned 
>
> Yet, what we are seeing goes against all intuition I've built up in the Solr
> world
>
> 1.   Collection has 20-30 million docs.
> 2.   q=*=someField:SomeVal   ---> takes 2.5 seconds
> 3.q=someField:SomeVal -->  300ms
> 4.   as numFound -> infinity, qtime -> infinity.
>
> have any of you encountered such a thing?
>
> that FQ degrades query time by so much?   
>
> it's pure Solr 5.3.1.   ZK + Tomcat 8 + 1shard in solr.  JDK_8u60  All
> running on this same box.

A value of * for your query will be slow.  This is a wildcard query. 
Under the covers, what happens is that Lucene looks up every possible
value in your default field and then does a query for every single one
of those terms.  In an index with 20-30 million documents, this could be
billions of terms.  If you want to query for all documents, do a query
for *:* (star colon star) -- this is a special query string that
literally means "all documents."  It *looks* like it might mean "all
values in all fields" but it's far more specific than that.

Are you saying that you are running Solr 5.3.1 under Tomcat 8?  If so,
this is likely going to be an issue.  The Jetty that's included with
this version is properly tuned for Solr, and the bin/solr start script
will set up good garbage collection tuning.  Running in another
container is almost always a mistake.

Thanks,
Shawn



Re: degrades qtime in a 20million doc collection

2016-01-13 Thread Erick Erickson
It's quite surprising that you're getting this kind of query
degradation by adding an "fq" clause
unless something's really out of whack on the setup. How much memory
are you giving
the JVM? Are you autowarming? Are you indexing while this is going on,
and if what are
your commit parameters? If you add =true to your query, one of
the returned sections
will be "timings" for the various components of a query measured in
milliseconds. Occasionally
there will be surprises in there.

What are you measuring when you say it takes seconds? The time to
render the result page or
are you looking at the QTime parameter of the return packet?

Best,
Erick

On Wed, Jan 13, 2016 at 4:27 PM, Anria B.  wrote:
> hi Shawn
>
> Thanks for the quick answer.  As for the q=*,  we also saw similar results
> in our testing when doing things like
>
> q=somefield:qval
> =otherfield:fqval
>
> Which makes a pure Lucene query.  I simplified things somewhat since our
> results were always that as numFound got large, the query time degraded as
> soon as we added any  in the mix.
>
> We also saw similar results for queries like
>
> q=query stuff
> =edismax
> =afield
> =afield bfield cfield
>
>
> So the query structure was not what created the 3-7 second query time, it
> was always a correlation between is  in the query, and what is the
> numFound.  We've run numerous load tests for bringing in good query with fq
> values in the "newSearcher",  caches on, caches off  ... this same
> phenomenon persisted.
>
> As for Tomcat, it's an easy enough test to run it in Jetty.  We will sure
> try that!  GC we've had default and G1 setups.
>
> Thanks for giving us something to think about
>
> Anria
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250600.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcould is killed,restart,colletion don't work

2016-01-13 Thread Erick Erickson
Please outline all the steps you've done.
Did you stop tomcat then restart it? On one or more machines?

You have to have all the Solrs running that you did when you
created the collections

Best,
Erick

On Wed, Jan 13, 2016 at 1:28 PM, 李铁峰  wrote:

>
> i’m a solr user, i use solrcloud 4.9.1 ,it running in tomcat , when tomcat
> is killd ,  two collectiones has error, solr admin picture is:
>
>
> each collection only have one shard . how can i repair it, let collection
> work, i do not want to restart tomcat ,and I can accept some data loss.
>
>
>
> error log:
>
> 2016年1月14日 GMT+8 04:43:21WARNRecoveryStrategyStopping recovery for
> zkNodeName=core_node1core=qdelcomment_shard1_replica12016年1月14日 GMT+8
> 04:43:25ERRORRecoveryStrategyError while trying to recover.
> core=qdelcomment_shard1_replica1:org.apache.solr.common.SolrException: No
> registered leader was found after waiting for 4000ms ,​ collection:
> qdelcomment slice: shard1
>
> Error while trying to recover. 
> core=qdelcomment_shard1_replica1:org.apache.solr.common.SolrException: No 
> registered leader was found after waiting for 4000ms , collection: 
> qdelcomment slice: shard1
>   at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:545)
>   at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:528)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
>
>
> solr admin query error
> { "responseHeader": { "status": 503, "QTime": 2, "params": { "indent":
> "true", "q": "*:*", "_": "1452720352906", "wt": "json" } }, "error": {
> "msg": "no servers hosting shard: ", "code": 503 } }
>
> solr:
> 4.9.1 1625909 - mike - 2014-09-18 04:09:04
> -Dsolr.hdfs.home=hdfs://cdh530cluster/solre
>
> thank you!jasperli
>


Re: Setting of ramBufferSizeMB

2016-01-13 Thread Erick Erickson
ramBufferSizeMB is a _limit_ that flushes the buffer when
it is reached (actually, I think, it indexes a doc _then_
checks the size and if it's > the setting, flushes the
buffer. So technically you can exceed the buffer size by
your biggest doc's addition to the index).

But I digress. This is a _limit_. If a commit happens (either
an autocommit or client-initiated commit or a commitWithin)
then the segment is flushed without regard to ramBufferSizeMB.

Best,
Erick

On Wed, Jan 13, 2016 at 5:44 PM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> I would like to check, if I have make the following settings for
> ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
> each segment size of at least 320MB?
>
>
> 
> 320
> 
>
>
> 
>   10
>   10
>   10240
> 
>
>
> I have this setting in my solrconfig.xml, but when I checked my segments
> size under the Segments info screen on the Admin UI, I see quite a number
> of segments at the bottom which have size that are much smaller than 320MB.
> Is that the correct behaviour, or is my ramBufferSizeMB not working
> correctly?
>
> I am using Solr 5.4.0,
>
>
> Regards,
> Edwin


Searching for Chinese characters is much slower

2016-01-13 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.4.0, and the HMMChineseTokenizerFactory for my content
indexed from rich-text documents.

I found that during my search, the search for Chinese characters is much
longer than English characters. The English characters usually can be
returned in less than 200ms, but Chinese characters usually need at least 2
or 3 seconds for it to return.

I have about 3 million documents in my index, with an index size of 230GB,

Below is my setting in schema.xml.



 









 
 








  




Regards,
Edwin


Setting of ramBufferSizeMB

2016-01-13 Thread Zheng Lin Edwin Yeo
Hi,

I would like to check, if I have make the following settings for
ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
each segment size of at least 320MB?



320




  10
  10
  10240



I have this setting in my solrconfig.xml, but when I checked my segments
size under the Segments info screen on the Admin UI, I see quite a number
of segments at the bottom which have size that are much smaller than 320MB.
Is that the correct behaviour, or is my ramBufferSizeMB not working
correctly?

I am using Solr 5.4.0,


Regards,
Edwin


Re: Setting of ramBufferSizeMB

2016-01-13 Thread Zheng Lin Edwin Yeo
Hi Erick,

Thanks for your reply.

So those small segments that I found is probably due to a commit happening
during that time?

I also found that those small segments are created during the last
indexing. If I start another batch of indexing, those small segments will
probably be get merge together to form a 10GB segment, as I have defined
the maxMergeSegmentMB to be 10240MB. Then there will be other new small
segments that are formed from the latest batch of indexing. Is that the way
it works?

Regards,
Edwin


On 14 January 2016 at 10:38, Erick Erickson  wrote:

> ramBufferSizeMB is a _limit_ that flushes the buffer when
> it is reached (actually, I think, it indexes a doc _then_
> checks the size and if it's > the setting, flushes the
> buffer. So technically you can exceed the buffer size by
> your biggest doc's addition to the index).
>
> But I digress. This is a _limit_. If a commit happens (either
> an autocommit or client-initiated commit or a commitWithin)
> then the segment is flushed without regard to ramBufferSizeMB.
>
> Best,
> Erick
>
> On Wed, Jan 13, 2016 at 5:44 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > I would like to check, if I have make the following settings for
> > ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
> > each segment size of at least 320MB?
> >
> >
> > 
> > 320
> > 
> >
> >
> > 
> >   10
> >   10
> >   10240
> > 
> >
> >
> > I have this setting in my solrconfig.xml, but when I checked my segments
> > size under the Segments info screen on the Admin UI, I see quite a number
> > of segments at the bottom which have size that are much smaller than
> 320MB.
> > Is that the correct behaviour, or is my ramBufferSizeMB not working
> > correctly?
> >
> > I am using Solr 5.4.0,
> >
> >
> > Regards,
> > Edwin
>


solr-5.3.1 admin console not show properly

2016-01-13 Thread David Cao
I installed and started solr following instructions from solr wiki as this
... (on a Redhat server)

cd ~/
tar zxf /tmp/solr-5.3.1.tgz
cd solr-5.3.1/bin
./solr start -f


Solr starts fine. But when opening console in a browser ("
http://server-ip:8983/solr/admin.html;), it shows a partially rendered page
with highlighted messages "*SolrCore Initialization Failures*"; and a whole
bunch of WARN messages in this nature,

55724 WARN  (qtp1018134259-20) [   ] o.e.j.s.ServletHandler Error for
/solr/css/styles/common.css
java.lang.NoSuchMethodError:
javax/servlet/http/HttpServletRequest.isAsyncSupported()Z
at
org.eclipse.jetty.servlet.DefaultServlet.sendData(DefaultServlet.java:922)
at
org.eclipse.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:533)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:801)


There was also a line at the start of the console log,

1784 WARN  (main) [   ] o.e.j.s.SecurityHandler
ServletContext@o.e.j.w.WebAppContext@1c662fe5{/solr,file:/root/solr-5.3.1/server/solr-webapp/webapp/,STARTING}{/root/solr-5.3.1/server/solr-webapp/webapp}
has uncovered http methods for path: /


Any ideas? is there any work I need to do to config the classpath?

thanks a lot!
david


Re: solr BooleanClauses issue with space

2016-01-13 Thread Shawn Heisey
On 1/13/2016 5:40 AM, sara hajili wrote:
> what is exactly diffrence between sapce and OR in solr query  ?
> i mean what is diffrence  between
> q = solr OR lucene OR search
> and this
> q = solr lucene search?
>
> solr default boolean occurence is OR,isn't it?

This depends on what the default operator is.  The default for the
default operator is OR, and that would produce exactly the same results
for both of the queries you have mentioned.  If the default operator is
AND, then those two queries would be different.

The default operator applies to the lucene and edismax parsers.  The
lucene parser is Solr's default.  In older versions, the default
operator could be set by a defaultOperator parameter.  I do not remember
whether that was in solrconfig or schema.  That parameter is deprecated
and the q.op parameter should be used now.

Thanks,
Shawn



Re: solr BooleanClauses issue with space

2016-01-13 Thread Emir Arnautovic

Hi Sara,
You can run your query (or smaller one) with debugQuery=true and see how 
it is rewritten.


Thanks,
Emir

On 13.01.2016 16:01, sara hajili wrote:

tnx.
and my main question is about maxBooleanDefault in solr config.
it is 1024 by default.
and i have a edismax query with about 500 words in this way:
q1= str1 OR str2 OR str3 ...OR strn
it throws exception that cant't parse query too boolean clause.
so if i changed maxBooleanDefault to 1500 it works
but some thing is ambiguous for me is when i don;t change maxBooleanDefault
and it remains 1024.
but i changed query in this way
q2 = str1 str2 str3 ... strn // i eliminated OR between for and inside space
i didn't get exception !!!
why?!
what is diffrence between q1 an q2??

On Wed, Jan 13, 2016 at 6:28 AM, Shawn Heisey  wrote:


On 1/13/2016 5:40 AM, sara hajili wrote:

what is exactly diffrence between sapce and OR in solr query  ?
i mean what is diffrence  between
q = solr OR lucene OR search
and this
q = solr lucene search?

solr default boolean occurence is OR,isn't it?

This depends on what the default operator is.  The default for the
default operator is OR, and that would produce exactly the same results
for both of the queries you have mentioned.  If the default operator is
AND, then those two queries would be different.

The default operator applies to the lucene and edismax parsers.  The
lucene parser is Solr's default.  In older versions, the default
operator could be set by a defaultOperator parameter.  I do not remember
whether that was in solrconfig or schema.  That parameter is deprecated
and the q.op parameter should be used now.

Thanks,
Shawn




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: collection reflection in resource manager node

2016-01-13 Thread Shawn Heisey
On 1/13/2016 12:38 AM, vidya wrote:
> I have created a collection in one datanode on which solr server is deployed
> say DN1. I am having another datanode on which solr server is deployed which
> has resource manager service also running on it,say DN2. When i created a
> collection using solrctl command in DN1, it got reflected in DN2 but not
> DN1. Why is it so ?
> Please help me on this.
> If i need to put some jars for indexing in my collection,where do i need to
> put ? in  DN! or DN2

What is "resource manager"?  This is not part of Solr.  I am also not
familiar with solrctl ... it must be a third-party product.

If you have two Solr nodes and you create a collection that is properly
configured for redundancy (replicationFactor is at least two), it should
create cores for the collection on BOTH nodes, regardless of which one
you send the Collections API request to.  The Collections API is aware
of the entire cluster and in the interests of balance, may not create
new collections on the node where you send the request.

You may need to talk to whoever created this resource manager and ask
them how to get good results with solrctl.

Thanks,
Shawn



Re: solr BooleanClauses issue with space

2016-01-13 Thread sara hajili
tnx.
and my main question is about maxBooleanDefault in solr config.
it is 1024 by default.
and i have a edismax query with about 500 words in this way:
q1= str1 OR str2 OR str3 ...OR strn
it throws exception that cant't parse query too boolean clause.
so if i changed maxBooleanDefault to 1500 it works
but some thing is ambiguous for me is when i don;t change maxBooleanDefault
and it remains 1024.
but i changed query in this way
q2 = str1 str2 str3 ... strn // i eliminated OR between for and inside space
i didn't get exception !!!
why?!
what is diffrence between q1 an q2??

On Wed, Jan 13, 2016 at 6:28 AM, Shawn Heisey  wrote:

> On 1/13/2016 5:40 AM, sara hajili wrote:
> > what is exactly diffrence between sapce and OR in solr query  ?
> > i mean what is diffrence  between
> > q = solr OR lucene OR search
> > and this
> > q = solr lucene search?
> >
> > solr default boolean occurence is OR,isn't it?
>
> This depends on what the default operator is.  The default for the
> default operator is OR, and that would produce exactly the same results
> for both of the queries you have mentioned.  If the default operator is
> AND, then those two queries would be different.
>
> The default operator applies to the lucene and edismax parsers.  The
> lucene parser is Solr's default.  In older versions, the default
> operator could be set by a defaultOperator parameter.  I do not remember
> whether that was in solrconfig or schema.  That parameter is deprecated
> and the q.op parameter should be used now.
>
> Thanks,
> Shawn
>
>


Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Erick Erickson
It's usually not all that difficult to write a multi-threaded
client that uses CloudSolrClient, or even fire up multiple
instances of the SolrJ client (assuming they can work
on discreet sections of the documents you need to index).

That avoids the problem Shawn alludes to. Plus other
issues. If you do _not_ use CloudSolrClient, then all the
docs go to some node in the system that then sub-divides
the list (and you really should update in batches, see:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/)
then the node that receives the packet sub-divides it
into groups based on what shard they should be part of
and forwards them to the leaders for that shard, very
significantly increasing the numbers of conversations
being carried on between Solr nodes. Times the number
of threads you're specifying with CUSC (I really regret
the renaming from ConcurrentUpdateSolrServer, I liked
writing CUSS).

With CloudSolrClient, you can scale nearly linearly with
the number of shards. Not so with CUSC.

FWIW,
Erick

On Tue, Jan 12, 2016 at 8:06 PM, Shawn Heisey  wrote:
> On 1/12/2016 7:42 PM, Shivaji Dutta wrote:
>> Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool 
>> of threads, which makes it more attractive to use over CloudSolrClient which 
>> will use a HTTPSolrClient once it gets a set of nodes to do the updates.
>>
>> What is the recommended API for updating large amounts of documents with 
>> higher throughput rate.
>
> ConcurrentUpdateSolrClient has one flaw -- it swallows all exceptions
> that happen during indexing.  Your application will never know about any
> problems that occur during indexing.  The entire cluster could be down,
> and your application would never know about it until you tried an
> explicit commit operation.  Commit is an operation that is not handled
> in the background by CUSC, so I would expect any exception to be
> returned for that operation.
>
> This flaw is inherent to its design, the behavior would be very
> difficult to change.
>
> If you don't care about your application getting error messages when
> indexing requests fail, then CUSC is perfect.  This might be the case if
> you are doing initial bulk loading.  For normal index updates after
> initial loading, you would not want to use CUSC.
>
> If you do care about getting error messages when bulk indexing requests
> fail, then you'll want to build a program with CloudSolrClient where you
> create multiple indexing threads that all use the same the client object.
>
> Thanks,
> Shawn
>


Re: collection reflection in resource manager node

2016-01-13 Thread Erick Erickson
It looks like you're using Cloudera's CDH, is that true? In that case
Cloudera support might be able to provide you with more info.

Best,
Erick

On Wed, Jan 13, 2016 at 6:37 AM, Shawn Heisey  wrote:
> On 1/13/2016 12:38 AM, vidya wrote:
>> I have created a collection in one datanode on which solr server is deployed
>> say DN1. I am having another datanode on which solr server is deployed which
>> has resource manager service also running on it,say DN2. When i created a
>> collection using solrctl command in DN1, it got reflected in DN2 but not
>> DN1. Why is it so ?
>> Please help me on this.
>> If i need to put some jars for indexing in my collection,where do i need to
>> put ? in  DN! or DN2
>
> What is "resource manager"?  This is not part of Solr.  I am also not
> familiar with solrctl ... it must be a third-party product.
>
> If you have two Solr nodes and you create a collection that is properly
> configured for redundancy (replicationFactor is at least two), it should
> create cores for the collection on BOTH nodes, regardless of which one
> you send the Collections API request to.  The Collections API is aware
> of the entire cluster and in the interests of balance, may not create
> new collections on the node where you send the request.
>
> You may need to talk to whoever created this resource manager and ask
> them how to get good results with solrctl.
>
> Thanks,
> Shawn
>


Re: How to achieve exact string match query which includes spaces and quotes

2016-01-13 Thread Erick Erickson
what _does_ matter is getting all that through the parser which means
you have to enclose things in quotes and escape them.

For instance, consider this query  stringFIeld:abc "i am not"

this will get parsed as
stringField:abc defaultTextField:"i am not".

To get around this you need to make sure the entire search gets
through the parser as a _single_ token by enclosing in quotes. But
then of course you have confusion because you have quotes in your
search term so you need to escape those, something like
stringField:"abc \"i am not\""

Here's a list for Lucene 5
https://lucene.apache.org/core/5_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

Best,
Erick

On Wed, Jan 13, 2016 at 3:39 AM, Binoy Dalal  wrote:
> No.
>
> On Wed, 13 Jan 2016, 16:58 Alok Bhandari 
> wrote:
>
>> Hi Binoy thanks.
>>
>> But does it matter which query-parser I use , shall I use "lucene" parser
>> or
>> "edismax" parser.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402p4250405.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> --
> Regards,
> Binoy Dalal