Re: solr /export handler - behavior during close()

2017-05-12 Thread Susmit Shukla
Hi Joel,

Thanks for the insight. How can this exception be thrown/forced from client
side. Client can't do a System.exit() as it is running as a webapp.

Thanks,
Susmit

On Fri, May 12, 2017 at 4:44 PM, Joel Bernstein  wrote:

> In this scenario the /export handler continues to export results until it
> encounters a "Broken Pipe" exception. This exception is trapped and ignored
> rather then logged as it's not considered an exception if the client
> disconnects early.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla 
> wrote:
>
> > Hi,
> >
> > I have a question regarding solr /export handler. Here is the scenario -
> > I want to use the /export handler - I only need sorted data and this is
> the
> > fastest way to get it. I am doing multiple level joins using streams
> using
> > /export handler. I know the number of top level records to be retrieved
> but
> > not for each individual stream rolling up to the final result.
> > I observed that calling close() on a /export stream is too expensive. It
> > reads the stream to the very end of hits. Assuming there are 100 million
> > hits for each stream ,first 1k records were found after joins and we call
> > close() after that, it would take many minutes/hours to finish it.
> > Currently I have put close() call in a different thread - basically fire
> > and forget. But the cluster is very strained because of the unneccessary
> > reads.
> >
> > Internally streaming uses ChunkedInputStream of HttpClient and it has to
> be
> > drained in the close() call. But from server point of view, it should
> stop
> > sending more data once close() has been issued.
> > There is a read() call in close() method of ChunkedInputStream that is
> > indistinguishable from real read(). If /export handler stops sending more
> > data after close it would be very useful.
> >
> > Another option would be to use /select handler and get into business of
> > managing a custom cursor mark that is based on the stream sort and is
> reset
> > until it fetches the required records at topmost level.
> >
> > Any thoughts.
> >
> > Thanks,
> > Susmit
> >
>


Re: solr /export handler - behavior during close()

2017-05-12 Thread Joel Bernstein
In this scenario the /export handler continues to export results until it
encounters a "Broken Pipe" exception. This exception is trapped and ignored
rather then logged as it's not considered an exception if the client
disconnects early.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla 
wrote:

> Hi,
>
> I have a question regarding solr /export handler. Here is the scenario -
> I want to use the /export handler - I only need sorted data and this is the
> fastest way to get it. I am doing multiple level joins using streams using
> /export handler. I know the number of top level records to be retrieved but
> not for each individual stream rolling up to the final result.
> I observed that calling close() on a /export stream is too expensive. It
> reads the stream to the very end of hits. Assuming there are 100 million
> hits for each stream ,first 1k records were found after joins and we call
> close() after that, it would take many minutes/hours to finish it.
> Currently I have put close() call in a different thread - basically fire
> and forget. But the cluster is very strained because of the unneccessary
> reads.
>
> Internally streaming uses ChunkedInputStream of HttpClient and it has to be
> drained in the close() call. But from server point of view, it should stop
> sending more data once close() has been issued.
> There is a read() call in close() method of ChunkedInputStream that is
> indistinguishable from real read(). If /export handler stops sending more
> data after close it would be very useful.
>
> Another option would be to use /select handler and get into business of
> managing a custom cursor mark that is based on the stream sort and is reset
> until it fetches the required records at topmost level.
>
> Any thoughts.
>
> Thanks,
> Susmit
>


Re: file descriptors and threads differing

2017-05-12 Thread Satya Marivada
We have the same ulimits in both cases.

/proc/2/fd:
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 124 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 125 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 126 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 127 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 128 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar
lr-x-- 1 Dgisse pg014921_gisse 64 May 12 09:52 129 ->
/sanfs/mnt/vol01/solr/solr-6.3.0/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar


The same process is opening many files. In linux, isn't it that only one fd
should be opened and referenced by all threads in the process.

In case of other environment, where the jvm is different in minor version,
it opens less number of multiple files. Trying to set both of them at the
same jvm level and see how it goes.


Thanks,
Satya


On Fri, May 12, 2017 at 3:41 PM Erick Erickson 
wrote:

> Check the system settings with ulimit. Differing numbers of user processes
> or open files can cause things like this to be different on different
> boxes.
>
> Can't speak to the Java version.
>
> Best,
> Erick
>
> On Fri, May 12, 2017 at 11:56 AM, Satya Marivada <
> satya.chaita...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > We have a weird problem, with the threads being opened many and crashing
> > the app in PP7 (one of our environment). It is the same index, same
> version
> > of solr (6.3.0) and zookeeper (3.4.9) in both environments.
> > Java minor version is different (1.8.0_102 in PP8 (one of our
> environment)
> > shown below vs 1.8.0_121 in PP7(one other environment)).
> >
> > Any ideas around why files open and threads are more in one environment
> vs
> > other.
> >
> > Files open is found by looking for fd in /proc/ directory and threads
> > found by ps -elfT | wc -l.
> >
> > [image: pasted1]
> >
> > Thanks,
> > Satya
> >
>


Re: is there a way to escape in # in solr

2017-05-12 Thread ravi432
Hi Roopesh,


Try using patternreplacefilterfactory.

Let me know if it helped.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-escape-in-solr-tp4334838p4334911.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a way to escape in # in solr

2017-05-12 Thread Roopesh Uniyal
Hello Ravi,

I am new to Solr and I am facing an issue for which I want to confirm if I
need to fix anything. May be you have answer to this -

I have to use data import so that I can index some info from MS SQL
database in Solr. I am done with it. I can query/see records in solr also.

But the column like location where there were "\" in there (e.g.,
"\solr\solr-6.5\bin"), it is showing - "\\solr\\solr-6.5\\bin"),

My SQL is returning only one escape when I run the query in SQL but when I
search in solr locally (http://localhost:8983/solr/), it showing double
escape.

Is it a problem? Will it be returned as double escape if another
application (java program) tried to search from this solr index? How can I
remove these extra escape.

Thanks,
Roopesh

On Fri, May 12, 2017 at 10:49 AM, ravi432  wrote:

>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/is-there-a-way-to-escape-in-in-solr-tp4334838.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: EXT: Re: Query regarding Solr Caches

2017-05-12 Thread Erick Erickson
"The reason I asked about the Cache sizes is I had read that
configuring the Cache sizes of Solr does not provide you enough
benefits"

Where is "somewhere"? Because this is simply wrong as a blanket statement.

filterCache can have tremendous impact on query performance, depending
on the how many fq clauses you typically use and their re-use rate,
i.e. hit ratio. Not to mention its possible effects on faceting.

documentCache is usually not as dramatic once you get past it being
(max rows expected) * (num simultaneous users). However I have seen
situations where having a very large documentCache results in
impressive performance gains.

queryResultCache will short-circuit search entirely when paging.

Etc. Your results may be vary of course.

Perhaps you're confusing the docValues discussion (where the OS's
memory is used rather than Java heap) with the general cache
discussion.

Best,
Erick

On Fri, May 12, 2017 at 10:58 AM, Suresh Pendap  wrote:
> Hi Shawn,
> Thanks for the reply, it is useful. The reason I asked about the Cache
> sizes is I had read that configuring the Cache sizes of Solr does not
> provide you enough benefits, instead it is better to provide a lot of
> memory space to the Solr outside the JVM heap.
>
> Is it true that in general the strategy of using the OS buffer cache works
> better instead of using a large Cache inside the JVM heap?
>
> I was looking for some numbers that people have used for configuring these
> Caches in the past and the rational for choosing these values.
>
> Thanks
> Suresh
>
>
> On 5/11/17, 11:13 PM, "Shawn Heisey"  wrote:
>
>>On 5/11/2017 4:58 PM, Suresh Pendap wrote:
>>> This question might have been asked on the solr user mailing list
>>>earlier. Solr has four different types of Cache DocumentCache,
>>>QueryResultCache, FieldValueCache and FilterQueryCache
>>> I would like to know which of these Caches are off heap cache?
>>
>>None of them are off-heap.  For a while, there was a forked project
>>called Heliosearch that did have a bunch of off-heap memory structures,
>>but it is my understanding that Heliosearch is effectively dead now.
>>
>>There are at least three issues for bringing this capability to Solr,
>>but none of them have been added yet.
>>
>>https://issues.apache.org/jira/browse/SOLR-6638
>>https://issues.apache.org/jira/browse/SOLR-7211
>>https://issues.apache.org/jira/browse/SOLR-7210
>>
>>> Which Caches have the maximum impact on the query latency and it is
>>>recommended to configure that Cache?
>>
>>This is so dependent on your exact index and your exact queries that
>>nobody can give you a reliable answer to that question.
>>
>>The filterCache is pretty good at speeding things up when you have
>>filter queries that get reused frequently, but depending on how your
>>setup works, it might not provide the most impact.
>>
>>> I also would like to know if there is any document which provides
>>>guidelines on performing Capacity planning for a Solr cluster.
>>
>>No.  It's not possible to provide general information.  See this:
>>
>>https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-
>>definitive-answer/
>>
>>Thanks,
>>Shawn
>>
>


Re: file descriptors and threads differing

2017-05-12 Thread Erick Erickson
Check the system settings with ulimit. Differing numbers of user processes
or open files can cause things like this to be different on different boxes.

Can't speak to the Java version.

Best,
Erick

On Fri, May 12, 2017 at 11:56 AM, Satya Marivada 
wrote:

> Hi All,
>
> We have a weird problem, with the threads being opened many and crashing
> the app in PP7 (one of our environment). It is the same index, same version
> of solr (6.3.0) and zookeeper (3.4.9) in both environments.
> Java minor version is different (1.8.0_102 in PP8 (one of our environment)
> shown below vs 1.8.0_121 in PP7(one other environment)).
>
> Any ideas around why files open and threads are more in one environment vs
> other.
>
> Files open is found by looking for fd in /proc/ directory and threads
> found by ps -elfT | wc -l.
>
> [image: pasted1]
>
> Thanks,
> Satya
>


file descriptors and threads differing

2017-05-12 Thread Satya Marivada
Hi All,

We have a weird problem, with the threads being opened many and crashing
the app in PP7 (one of our environment). It is the same index, same version
of solr (6.3.0) and zookeeper (3.4.9) in both environments.
Java minor version is different (1.8.0_102 in PP8 (one of our environment)
shown below vs 1.8.0_121 in PP7(one other environment)).

Any ideas around why files open and threads are more in one environment vs
other.

Files open is found by looking for fd in /proc/ directory and threads found
by ps -elfT | wc -l.

[image: pasted1]

Thanks,
Satya


Re: CDCR Alias support?

2017-05-12 Thread Webster Homer
The CDCR request handler doesn't support aliases.
The source and target collections listed in the replica must be
collections, nothing happens if they are aliases for a collection. No
errors anywhere, just nothing.

If a non-existent collection is listed I see errors, with an alias it just
doesn't do anything. It should either work or throw an error

On Tue, May 9, 2017 at 12:21 PM, Webster Homer 
wrote:

> Still no answer to this. I've been investigating using the collections API
> for backup and restore. If CDCR supports collection aliases this would make
> things much smoother as we would restore to a new collection and then
> switch the alias to reference the new collection.
>
> On Tue, Jan 10, 2017 at 10:53 AM, Webster Homer 
> wrote:
>
>> Looking at the cdcr API and documentation I wondered if the source and
>> target collection names could be aliases. This is not discussed in the cdcr
>> documentation, when I have time I was going to test this, but if someone
>> knows for certain it might save some time.
>>
>>
>>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


solr /export handler - behavior during close()

2017-05-12 Thread Susmit Shukla
Hi,

I have a question regarding solr /export handler. Here is the scenario -
I want to use the /export handler - I only need sorted data and this is the
fastest way to get it. I am doing multiple level joins using streams using
/export handler. I know the number of top level records to be retrieved but
not for each individual stream rolling up to the final result.
I observed that calling close() on a /export stream is too expensive. It
reads the stream to the very end of hits. Assuming there are 100 million
hits for each stream ,first 1k records were found after joins and we call
close() after that, it would take many minutes/hours to finish it.
Currently I have put close() call in a different thread - basically fire
and forget. But the cluster is very strained because of the unneccessary
reads.

Internally streaming uses ChunkedInputStream of HttpClient and it has to be
drained in the close() call. But from server point of view, it should stop
sending more data once close() has been issued.
There is a read() call in close() method of ChunkedInputStream that is
indistinguishable from real read(). If /export handler stops sending more
data after close it would be very useful.

Another option would be to use /select handler and get into business of
managing a custom cursor mark that is based on the stream sort and is reset
until it fetches the required records at topmost level.

Any thoughts.

Thanks,
Susmit


Re: EXT: Re: Query regarding Solr Caches

2017-05-12 Thread Suresh Pendap
Hi Shawn,
Thanks for the reply, it is useful. The reason I asked about the Cache
sizes is I had read that configuring the Cache sizes of Solr does not
provide you enough benefits, instead it is better to provide a lot of
memory space to the Solr outside the JVM heap.

Is it true that in general the strategy of using the OS buffer cache works
better instead of using a large Cache inside the JVM heap?

I was looking for some numbers that people have used for configuring these
Caches in the past and the rational for choosing these values.

Thanks
Suresh


On 5/11/17, 11:13 PM, "Shawn Heisey"  wrote:

>On 5/11/2017 4:58 PM, Suresh Pendap wrote:
>> This question might have been asked on the solr user mailing list
>>earlier. Solr has four different types of Cache DocumentCache,
>>QueryResultCache, FieldValueCache and FilterQueryCache
>> I would like to know which of these Caches are off heap cache?
>
>None of them are off-heap.  For a while, there was a forked project
>called Heliosearch that did have a bunch of off-heap memory structures,
>but it is my understanding that Heliosearch is effectively dead now.
>
>There are at least three issues for bringing this capability to Solr,
>but none of them have been added yet.
>
>https://issues.apache.org/jira/browse/SOLR-6638
>https://issues.apache.org/jira/browse/SOLR-7211
>https://issues.apache.org/jira/browse/SOLR-7210
>
>> Which Caches have the maximum impact on the query latency and it is
>>recommended to configure that Cache?
>
>This is so dependent on your exact index and your exact queries that
>nobody can give you a reliable answer to that question.
>
>The filterCache is pretty good at speeding things up when you have
>filter queries that get reused frequently, but depending on how your
>setup works, it might not provide the most impact.
>
>> I also would like to know if there is any document which provides
>>guidelines on performing Capacity planning for a Solr cluster.
>
>No.  It's not possible to provide general information.  See this:
>
>https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-
>definitive-answer/
>
>Thanks,
>Shawn
>



How to partition the collection

2017-05-12 Thread Mikhail Ibraheem
Hi,

I have a denormalized dataset and hence has duplicate records. When I do any 
aggregation the result is wrong because it calculates duplicate data.

So I want to partition the dataset with the unique attribute then do the 
aggregation or grouping against the partitioned results.

 

1-  Can I run json facet against the result of unique results? Something 
like:

tempResult = getUniqueResults(attributeA)

finalResults=aggregate(tempResult)

 

2-  Can I join both json faceting and streaming? Something like

uniqueStream ustream = getUniqueStream()

jsonFacet(ustream)

 

Please advise.

 

Thanks

Mikhail


Re: cursormark pagination inconsistency

2017-05-12 Thread Shawn Heisey
On 5/9/2017 10:59 AM, moscovig wrote:
> We are running on solrj 6.2.0, server 6.2.1 
> and trying to fetch 100 records at a time, with nextCursorMark, 
> while* sorting on: score desc,* key asc
>
> The collection is made of 2 shards, with 3 replicas each.
>
> We get inconsistent results when not specifying specific replica for each
> shard.
>
> Sometimes the 3rd, and sometime the 10th fetch will contain results that we
> expected to see in the 15th batch.
> Something went wrong with the score sorting. 
>
> When we specify a replica for each shard to query from with
> shards=solr1:8983/solr/tweets_shard1_replica2/,solr26:8983/solr/tweets_shard2_replica3
>
> It is working as expected.
>
> It seems as if the cursor doesn't keep the sort between different replicas
> of each shard.

The way that SolrCloud accomplishes its data replication can result in
replicas that contain different numbers of deleted documents, even when
each replica contains the exact same documents that *aren't* deleted. 
Deleted documents are still part of the index, so they can affect TF and
IDF, which are the primary components in the score.  This means that the
score can be slightly different depending on which replica answers the
query.

If you want to be absolutely certain that everything is identical across
all replicas, you could optimize the collection, but this could take a
very long time if the collection is large.  You would also need to be
sure that you do not make any changes to the index until your cusorMark
pagination is complete.  Any changes to the index will likely affect
scores from one query to the next, which can affect the order of
documents in your cursormark.  You could miss documents, or find that
you've retrieved the same document more than once.

Thanks,
Shawn



Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
Hi,

I think I did one mistake when I started in step 3 solr without 
zookeeper-option.
I did:
 bin/solr start -c
but I think it should:
bin/solr start -c  -z localhost:2181

The problem is now when repeat step 4 (creating a collection) I get the 
following error:

//I uploaded my cat-config again to zookeeper with
// bin/solr zk upconfig -n cat -d $HOME/solr-6.5.1/server/solr/tommy/conf -z // 
localhost:2181


bin/solr create -c cat -shards 2

Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-12 16:38:06.593; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory cat

Creating new collection 'cat' using command:
http://localhost:8983/solr/admin/collections?action=CREATE=cat=2=1=2=cat


ERROR: Failed to create collection 'cat' due to: 
{127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
'cat_shard1_replica1': Unable to create core [cat_shard1_replica1] Caused by: 
Lock held by this virtual machine: 
/home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/write.lock}

This "data/bestand" is configured in solrconfig.xml (from tommy standalone) with
data/bestand

I tried to create the directory 
/home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/ manually , but 
nothing changed.

What is the reason for this CREATE-error?

Thomas




> ANNAMANENI RAVEENDRA  hat am 12. Mai 2017 um 15:54 
> geschrieben:
> 
> 
> Hi ,
> 
> If there is a request handler configured in solrconfig.xml and update the
> Conf in zookeeper it should show up
> 
> If already did it try reloading configuration
> 
> Thanks
> Ravi
> 
> 
> On Fri, 12 May 2017 at 9:46 AM, Thomas Porschberg 
> wrote:
> 
> > > > This is another problem I see: With my non-cloud core I have a
> > conf-directory where I have dataimport.xml, schema.xml and solrconfig.xml.
> > > > I think these 3 files are enough to import my data from my relational
> > database.
> > > > Under example/cloud I could not find one of them. How to setup DIH for
> > the solrcould?
> > >
> > > The entire configuration (what would normally be in the conf directory)
> > > is in zookeeper when you're in cloud mode, not in the core directories.
> > > You must upload a directory containing the same files that would
> > > normally be in a conf directory as a named configset to zookeeper before
> > > you try to create your collection.  This is something that the "bin/solr
> > > create" command does for you in cloud mode, typically using one of the
> > > configsets included on the disk as a source.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> > >
> > Ok, thank you. I did the following steps.
> >
> > 1. Started an external zookeeper
> > 2. Copied a conf-directory to zookeeper:
> > bin/solr zk upconfig -n books -d $HOME/solr-6.5.1/server/solr/tommy/conf
> > -z localhost:2181
> > // This is a conf-directory from a standalone solr when dataimport was
> > working!
> > --> Connecting to ZooKeeper at localhost:2181 ...
> > Uploading <> for config books to ZooKeeper at localhost:2181
> > // I think no errors, but how can I check it in zookeeper? I found no
> > files solrconfig.xml ...
> > in the zookeeper directories (installation dir and data dir)
> > 3. Started solr:
> > bin/solr start -c
> > 4. Created a books collection with 2 shards
> > bin/solr create -c books -shards 2
> >
> > Result: I see in the web-ui my books collection with the 2 shards. No
> > errors so far.
> > However, the Dataimport-entry says:
> > "Sorry, no dataimport-handler defined!"
> >
> > What could be the reason?
> >
> > Thomas
> >


is there a way to escape in # in solr

2017-05-12 Thread ravi432




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-escape-in-in-solr-tp4334838.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Erik Hatcher
Sweta -

There’s been an enormous number of changes between 6.1 and 6.5.1.  See CHANGES: 

 
https://github.com/apache/lucene-solr/blob/master/solr/CHANGES.txt#L439-L1796 


wow, huh?

And yes, there have been dramatic improvements (Solr 6.5+) in multi-word 
synonym handling, see Steve’s blog here for details: 

   
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
 


As for your other questions, not quite sure exactly what you mean on those.  
What features/improvements are you looking for specifically here?

Erik


> On May 12, 2017, at 8:39 AM, Sweta Parekh  wrote:
> 
> Hi Team,
> Can you please help me with new features, enhancements and improvements on 
> Solr 6.5.1 v/s 6.1 as we are planning to upgrade the version.
> * Has there been major improvement in multi-term / phrase synonyms 
> and match mode
> 
> * Can we perform secondary search using different mm to find better 
> results like auto relax mm
> 
> * Any new update in results exclusion, elevation etc..
> 
> 
> Regards,
> Sweta Parekh
> Search / CRO - Associate Program Manager
> Digital Marketing Services
> sweta.par...@clerx.com
> Extn: 284887 | Mobile: +(91) 9004667625
> eClerx Services Limited [www.eClerx.com]
> 



Re: Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Steve Rowe
Hi,

See 6.5.1 CHANGES: 

--
Steve
www.lucidworks.com

> On May 12, 2017, at 8:39 AM, Sweta Parekh  wrote:
> 
> Hi Team,
> Can you please help me with new features, enhancements and improvements on 
> Solr 6.5.1 v/s 6.1 as we are planning to upgrade the version.
> * Has there been major improvement in multi-term / phrase synonyms 
> and match mode
> 
> * Can we perform secondary search using different mm to find better 
> results like auto relax mm
> 
> * Any new update in results exclusion, elevation etc..
> 
> 
> Regards,
> Sweta Parekh
> Search / CRO - Associate Program Manager
> Digital Marketing Services
> sweta.par...@clerx.com
> Extn: 284887 | Mobile: +(91) 9004667625
> eClerx Services Limited [www.eClerx.com]
> 



Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread ANNAMANENI RAVEENDRA
Hi ,

If there is a request handler configured in solrconfig.xml and update the
Conf in zookeeper it should show up

If already did it try reloading configuration

Thanks
Ravi


On Fri, 12 May 2017 at 9:46 AM, Thomas Porschberg 
wrote:

> > > This is another problem I see: With my non-cloud core I have a
> conf-directory where I have dataimport.xml, schema.xml and solrconfig.xml.
> > > I think these 3 files are enough to import my data from my relational
> database.
> > > Under example/cloud I could not find one of them. How to setup DIH for
> the solrcould?
> >
> > The entire configuration (what would normally be in the conf directory)
> > is in zookeeper when you're in cloud mode, not in the core directories.
> > You must upload a directory containing the same files that would
> > normally be in a conf directory as a named configset to zookeeper before
> > you try to create your collection.  This is something that the "bin/solr
> > create" command does for you in cloud mode, typically using one of the
> > configsets included on the disk as a source.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> >
> Ok, thank you. I did the following steps.
>
> 1. Started an external zookeeper
> 2. Copied a conf-directory to zookeeper:
> bin/solr zk upconfig -n books -d $HOME/solr-6.5.1/server/solr/tommy/conf
> -z localhost:2181
> // This is a conf-directory from a standalone solr when dataimport was
> working!
> --> Connecting to ZooKeeper at localhost:2181 ...
> Uploading <> for config books to ZooKeeper at localhost:2181
> // I think no errors, but how can I check it in zookeeper? I found no
> files solrconfig.xml ...
> in the zookeeper directories (installation dir and data dir)
> 3. Started solr:
> bin/solr start -c
> 4. Created a books collection with 2 shards
> bin/solr create -c books -shards 2
>
> Result: I see in the web-ui my books collection with the 2 shards. No
> errors so far.
> However, the Dataimport-entry says:
> "Sorry, no dataimport-handler defined!"
>
> What could be the reason?
>
> Thomas
>


Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Sweta Parekh
Hi Team,
Can you please help me with new features, enhancements and improvements on Solr 
6.5.1 v/s 6.1 as we are planning to upgrade the version.
 * Has there been major improvement in multi-term / phrase synonyms and 
match mode

* Can we perform secondary search using different mm to find better 
results like auto relax mm

* Any new update in results exclusion, elevation etc..


Regards,
Sweta Parekh
Search / CRO - Associate Program Manager
Digital Marketing Services
sweta.par...@clerx.com
Extn: 284887 | Mobile: +(91) 9004667625
eClerx Services Limited [www.eClerx.com]



Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
> > This is another problem I see: With my non-cloud core I have a 
> > conf-directory where I have dataimport.xml, schema.xml and solrconfig.xml. 
> > I think these 3 files are enough to import my data from my relational 
> > database.
> > Under example/cloud I could not find one of them. How to setup DIH for the 
> > solrcould?
> 
> The entire configuration (what would normally be in the conf directory)
> is in zookeeper when you're in cloud mode, not in the core directories. 
> You must upload a directory containing the same files that would
> normally be in a conf directory as a named configset to zookeeper before
> you try to create your collection.  This is something that the "bin/solr
> create" command does for you in cloud mode, typically using one of the
> configsets included on the disk as a source.
> 
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
> 
Ok, thank you. I did the following steps.

1. Started an external zookeeper
2. Copied a conf-directory to zookeeper: 
bin/solr zk upconfig -n books -d $HOME/solr-6.5.1/server/solr/tommy/conf -z 
localhost:2181
// This is a conf-directory from a standalone solr when dataimport was working!
--> Connecting to ZooKeeper at localhost:2181 ...
Uploading <> for config books to ZooKeeper at localhost:2181
// I think no errors, but how can I check it in zookeeper? I found no files 
solrconfig.xml ...
in the zookeeper directories (installation dir and data dir)
3. Started solr:
bin/solr start -c
4. Created a books collection with 2 shards
bin/solr create -c books -shards 2

Result: I see in the web-ui my books collection with the 2 shards. No errors so 
far.
However, the Dataimport-entry says:
"Sorry, no dataimport-handler defined!"

What could be the reason?

Thomas


Re: Explicit OR in edismax query with mm=100%

2017-05-12 Thread Nguyen Manh Tien
Hi,

In our case, mm=100% is fixed. it works well for many other query.
I just need an option in edismax so that for query "Solr OR Lucene" with
explicit OR, mm will be ignore.

Thanks,
Tien

On Thu, Apr 20, 2017 at 9:56 AM, Yasufumi Mizoguchi 
wrote:

> Hi,
>
> It looks that edismax respects the mm parameter in your case.
> You should set "mm=1", if you want to obtain the results of OR search.
> "mm=100%" means that all terms in your query should match.
>
> Regards,
> Yasufumi
>
>
>
> On 2017/04/20 10:40, Nguyen Manh Tien wrote:
>
>> Hi,
>>
>> I run a query "Solr OR Lucene" with defType=edismax and mm=100%.
>> The search result show that query works similar to "Solr AND Lucene" (all
>> terms required)
>>
>> Does edismax ignore mm parameter because i already use OR explicitly here?
>>
>> Thanks,
>> Tien
>>
>>
>


Re: setup solrcloud from scratch vie web-ui

2017-05-12 Thread Shawn Heisey
On 5/12/2017 3:12 AM, Thomas Porschberg wrote:
> I want to setup a solrcloud. I want to  test sharding with one node, no 
> replication.
> I have some experience with the non-cloud solr and I also run the cloud 
> examples.
> I also have to use the DIH for importing. I think I can live with the 
> internal zookeeper.
>
> I did my first steps with solr-6.5.1.
>
> My first question is: Is it possible to setup a new solrcloud with the web-ui 
> only?

>From what I can tell, something you cannot do with the admin UI is
upload a configuration to zookeeper.  If you don't already have one
uploaded, then that must be accomplished separately, which you should be
able to do at the commandline with the "bin/solr zk" commands.  Because
the DIH setup is in the configuration that you will be uploading, you
also cannot set up DIH from within the UI.

> When I start solr with: 'bin/solr start -c'
>
> I get a menu on the left side where I can create new collections and cores.
> I think when I have only one node with no replication a collection maps to 
> one core, right?

What you are describing is one replica, or replicationFactor=1.  The
leader is also a replica.  A one-shard one-replica collection will
consist of a single core, but you do not need to worry about that core.

> Should I create first the core or the collection? 

You are only concerned with collections when running in cloud mode.  Let
Solr worry about the cores.

> What should I fill in as instanceDir?

Nothing.  You won't be using the part of the UI that asks about
instanceDir and dataDir.  That is only for non-cloud mode, and the
configuration must already exist in the core directory that you wish to
create.

> This is another problem I see: With my non-cloud core I have a conf-directory 
> where I have dataimport.xml, schema.xml and solrconfig.xml. 
> I think these 3 files are enough to import my data from my relational 
> database.
> Under example/cloud I could not find one of them. How to setup DIH for the 
> solrcould?

The entire configuration (what would normally be in the conf directory)
is in zookeeper when you're in cloud mode, not in the core directories. 
You must upload a directory containing the same files that would
normally be in a conf directory as a named configset to zookeeper before
you try to create your collection.  This is something that the "bin/solr
create" command does for you in cloud mode, typically using one of the
configsets included on the disk as a source.

https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files

Thanks,
Shawn



Re: Dynamic facets during runtime

2017-05-12 Thread Erik Hatcher
Use "appends" instead of "defaults". 

> On May 11, 2017, at 23:23, Jeyaprakash Singarayar  
> wrote:
> 
> Hi,
> 
> Our application has a facet select admin screen UI that would allow the
> users to add/update/delete the facets that has to be returned from Solr.
> 
> Right now we have the facet fields defined in the defaults of
> requestHandler.
> 
> So if a user wanted a new facet, I know sending that newly selected facet
> with the query would override the list in the solrconfig.xml
> 
> If there any better way rather than making all the facets sent through
> querytime.
> 
> Thanks,
> Jeyaprakash


setup solrcloud from scratch vie web-ui

2017-05-12 Thread Thomas Porschberg
Hi,

I want to setup a solrcloud. I want to  test sharding with one node, no 
replication.
I have some experience with the non-cloud solr and I also run the cloud 
examples.
I also have to use the DIH for importing. I think I can live with the internal 
zookeeper.

I did my first steps with solr-6.5.1.

My first question is: Is it possible to setup a new solrcloud with the web-ui 
only?

When I start solr with: 'bin/solr start -c'

I get a menu on the left side where I can create new collections and cores.
I think when I have only one node with no replication a collection maps to one 
core, right?

Should I create first the core or the collection? 
What should I fill in as instanceDir? 

For example: When I create at the command line 
a 'books/data' directory under '$HOME/solr-6.5.1/server/solr'
and then fill in 'books' as instanceDir and 'data' as data-Directory 
I get a 
'SolrCore Initialization Failures'

books: 
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 Could not find configName for collection books found:null


Is something like a step by step manual available? 
Next step would be to setup DIH again. 

This is another problem I see: With my non-cloud core I have a conf-directory 
where I have dataimport.xml, schema.xml and solrconfig.xml. 
I think these 3 files are enough to import my data from my relational database.
Under example/cloud I could not find one of them. How to setup DIH for the 
solrcould?

Best regards
Thomas


Re: Query regarding Solr Caches

2017-05-12 Thread Shawn Heisey
On 5/11/2017 4:58 PM, Suresh Pendap wrote:
> This question might have been asked on the solr user mailing list earlier. 
> Solr has four different types of Cache DocumentCache, QueryResultCache, 
> FieldValueCache and FilterQueryCache
> I would like to know which of these Caches are off heap cache?

None of them are off-heap.  For a while, there was a forked project
called Heliosearch that did have a bunch of off-heap memory structures,
but it is my understanding that Heliosearch is effectively dead now.

There are at least three issues for bringing this capability to Solr,
but none of them have been added yet.

https://issues.apache.org/jira/browse/SOLR-6638
https://issues.apache.org/jira/browse/SOLR-7211
https://issues.apache.org/jira/browse/SOLR-7210

> Which Caches have the maximum impact on the query latency and it is 
> recommended to configure that Cache?

This is so dependent on your exact index and your exact queries that
nobody can give you a reliable answer to that question.

The filterCache is pretty good at speeding things up when you have
filter queries that get reused frequently, but depending on how your
setup works, it might not provide the most impact.

> I also would like to know if there is any document which provides guidelines 
> on performing Capacity planning for a Solr cluster.

No.  It's not possible to provide general information.  See this:

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn