Re: can we use Streaming Expressions for different collection

2016-01-04 Thread Joel Bernstein
Can you describe your use case in more detail?

In general Streaming Expressions can be used to combine data streams
(searches) from different collections. There is a limited set of Streaming
Expressions described on the (
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions),
which are available in Solr 5.

Trunk has a much large set of expressions, but they have not yet been
documented. They will be documented before the 6.0 release.

If you can describe your use case in detail I can let you know if there is
an expression in Solr 5 or trunk (Solr 6) that would fit.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 4, 2016 at 12:24 PM, Mugeesh Husain  wrote:

> I am checking the arcticle this->
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> Can we implement merge operation for different collection or different node
> in solrcloud
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/can-we-use-Streaming-Expressions-for-different-collection-tp4248461.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Running Lucene/SOR on Hadoop

2016-01-04 Thread Tim Williams
Apache Blur (Incubating) has several approaches (hive, spark, m/r)
that could probably help with this ranging from very experimental to
stable.  If you're interested, you can ask over on
blur-u...@incubator.apache.org ...

Thanks,
--tim

On Fri, Dec 25, 2015 at 4:28 AM, Dino Chopins  wrote:
> Hi Erick,
>
> Thank you for your response and pointer. What I mean by running Lucene/SOLR
> on Hadoop is to have Lucene/SOLR index available to be queried using
> mapreduce or any best practice recommended.
>
> I need to have this mechanism to do large scale row deduplication. Let me
> elaborate why I need this:
>
>1. I have two data sources with 35 and 40 million records of customer
>profile - the data come from two systems (SAP and MS CRM)
>2. Need to index and compare row by row of the two data sources using
>name, address, birth date, phone and email field. For birth date and email
>it will use exact comparison, but for the other fields will use
>probabilistic comparison. Btw, the data has been normalized before they are
>being indexed.
>3. Each finding will be categorized under same person, and will be
>deduplicated automatically or under user intervention depending on the
>score.
>
> I usually use it using Lucene index on local filesystem and use term
> vector, but since this will be repeated task and then challenged by
> management to do this on top of Hadoop cluster I need to have a framework
> or best practice to do this.
>
> I understand that to have Lucene index on HDFS is not very appropriate
> since HDFS is designed for large block operation. With that understanding,
> I use SOLR and hope to query it using http call from mapreduce job.  The
> snippet code is below.
>
> url = new URL(SOLR-Query-URL);
>
> HttpURLConnection connection = (HttpURLConnection)
> url.openConnection();
> connection.setRequestMethod("GET");
>
> The later method turns out to perform very bad. The simple mapreduce job
> that only read the data sources and write to hdfs takes 15 minutes, but
> once I do the http request it takes three hours now and still ongoing.
>
> What went wrong? And what will be solution to my problem?
>
> Thanks,
>
> Dino
>
> On Mon, Dec 14, 2015 at 12:30 AM, Erick Erickson 
> wrote:
>
>> First, what do you mean "run Lucene/Solr on Hadoop"?
>>
>> You can use the HdfsDirectoryFactory to store Solr/Lucene
>> indexes on Hadoop, at that point the actual filesystem
>> that holds the index is transparent to the end user, you just
>> use Solr as you would if it was using indexes on the local
>> file system. See:
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>>
>> If you want to use Map-Reduce to _build_ indexes, see the
>> MapReduceIndexerTool in the Solr contrib area.
>>
>> Best,
>> Erick
>>
>
>
>
>
> --
> Regards,
>
> Dino


Re: shard lost - solr5.3

2016-01-04 Thread GOURAUD Emmanuel
hi there 

replying to myself 

i have set the replica property "preferredLeader" on this shard, shut down all 
replica for this shard and started only the "preferred" one, this forced an 
election and save my "ops" night and my new year party!! 

cheers, 

Emmanuel 



De: "GOURAUD Emmanuel"  
À: solr-user@lucene.apache.org 
Envoyé: Jeudi 31 Décembre 2015 15:30:42 
Objet: shard lost - solr5.3 

Hi there, 

I have a collection that is composed of 8 shards with a replicationFactor of 2 

i found 2 cores of the same shard in recoveryfailed status so i decided to 
restart both, 

after having doing that , i do not have any leader on that shard... and both 
cores are down 

is there a way to force a leader at startup or with the API? can fore election? 

thanks for your help 

Emmanuel 



how to search miilions of record in solr query

2016-01-04 Thread Mugeesh Husain
hi,

I have a requirement to search ID field values like Id:(2,3,6,7 upto
millions),
in which query parser i should to write the result should be display within
a 50 ms.


Please suggest me which query parser i should use for above search.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query behavior difference.

2016-01-04 Thread Modassar Ather
Hi,

Kindly help me understand how will relevance ranking differ int following
searches.

query : fl:network
query : fl:networ*

What I am observing that the results returned are different in both of them
in a way that the top documents returned for q=fl:network is not present in
the top results of q=fl:networ*.
For example for q=fl:network I am getting top documents having around 20
occurrence of network whereas the top result of q=fl:networ* has only
couple of occurrence of network.
I am aware of the underlying normalization process participation in
relevance ranking of documents but not able to understand such a difference
in the ranking of result for the queries.

Thanks,
Modassar


Does soft commit re-opens searchers in disk?

2016-01-04 Thread Gili Nachum
Hello,

When a new document is added, it becomes visible after a soft commit,
during which it is written to a Lucene RAMDirectory (in heap). Then after a
hard commit, the RAMDirectory is removed from memory and the docs are
written to the index on disk.
What happens if I hard commit (write to disk) with openSearcher=false.
Would I lose document visibility? since it's no longer in memory AND the
hard commit didn't open a new searcher on disk?

Does soft commit also re-opens Searchers over the index on disk?

Here's my commit configuration:


  60
*false*


 ${solr.autoSoftCommit.maxTime:3}
  

Thanks.


Re: how to search miilions of record in solr query

2016-01-04 Thread Upayavira
This is not a use-case to which Lucene lends itself. However, if you
must, I would try the terms query parser, which I believe is used like
this:

{!terms f=id}2,3,6,7

Upayavira

On Mon, Jan 4, 2016, at 10:41 AM, Mugeesh Husain wrote:
> hi,
> 
> I have a requirement to search ID field values like Id:(2,3,6,7 upto
> millions),
> in which query parser i should to write the result should be display
> within
> a 50 ms.
> 
> 
> Please suggest me which query parser i should use for above search.
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query behavior difference.

2016-01-04 Thread Ahmet Arslan
Hi,

I think wildcard queries fl:networ* are re-written into Constant Score Query.
fl=*,score should returns same score for all documents that are retrieved.

Ahmet



On Monday, January 4, 2016 12:22 PM, Modassar Ather  
wrote:
Hi,

Kindly help me understand how will relevance ranking differ int following
searches.

query : fl:network
query : fl:networ*

What I am observing that the results returned are different in both of them
in a way that the top documents returned for q=fl:network is not present in
the top results of q=fl:networ*.
For example for q=fl:network I am getting top documents having around 20
occurrence of network whereas the top result of q=fl:networ* has only
couple of occurrence of network.
I am aware of the underlying normalization process participation in
relevance ranking of documents but not able to understand such a difference
in the ranking of result for the queries.

Thanks,
Modassar


Re: Custom auth plugin not loaded in SolrCloud

2016-01-04 Thread tine-2
Hi,

are there any news on this? Was anyone able to get it to work?

Cheers,

tine



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-auth-plugin-not-loaded-in-SolrCloud-tp4245670p4248340.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Toke Eskildsen
On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote:
> A) Before I start the optimization, the server's memory usage
> is consistent at around 16GB, when Solr startsup and we did some searching.

How do you read this number?

> However, when I click on the optimization button, the memory usage
> increases gradually, until it reaches the maximum of 64GB which the server
> has.

There are multiple ways of looking at memory. The most relevant ones in
this context are

- Total memory on the system
  This appears to be 64GB.

- Free memory on the system
  Usually determined by 'top' under Linux or Task Manager under Windows.

- Memory used for caching on the system
  Usually determined by 'top' under Linux or Task Manager under Windows.

- JVM memory usage
  Usually determined by 'top' under Linux or Task Manager under Windows.
  Look for "Res" (resident) for the task in Linux. It might be called 
  "physical" under Windows.


- Maximum JVM heap (Xmx)
  Lightest grey in "JVM-Memory" in the Solr Admin interface Dashboard.

- Allocated JVM heap (Xmx)
  Medium grey in "JVM-Memory" in the Solr Admin interface Dashboard.

- Active JVM heap (Xmx)
  Dark grey in "JVM-Memory" in the Solr Admin interface Dashboard.


I am guessing that the number you are talking about is "Free memory on
the system" and as Shawn and Erick points out, a full allocation there
is expected behaviour.

What we are interested in are the JVM heap numbers.

- Toke Eskildsen, State and University Library, Denmark




Hard commits, soft commits and transaction logs

2016-01-04 Thread Clemens Wyss DEV
[Happy New Year to all]

Is all herein
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
mentioned/recommended still valid for Solr 5.x?

- Clemens


Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Shawn Heisey
On 1/3/2016 7:05 PM, Zheng Lin Edwin Yeo wrote:
> A) Before I start the optimization, the server's memory usage
> is consistent at around 16GB, when Solr startsup and we did some searching.
> However, when I click on the optimization button, the memory usage
> increases gradually, until it reaches the maximum of 64GB which the server
> has. But this only happens to the collection with index of 200GB, and not
> other collections which has smaller index size (they are at most 1GB at the
> moment).



> A) I am quite curious at this also, because in the Task Manager of the
> server, the amount of memory usage stated does not tally with the
> percentage of memory usage. When I start optimizatoin, the memory usage
> states the JVM is only using 14GB, but the percentage of memory usage is
> almost 100%, when I have 64GB RAM. I have check the other processes running
> in the server, and did not found any other processes that takes up a large
> amount of memory, and the total amount of memory usage for the whole sever
> is only around 16GB.

Toke's reply is spot on.

In your first answer above, you didn't really answer my question, which
was "What *exactly* are you looking at that says Solr is using all your
memory?"  You've said "the server's memory usage" but haven't described
how you got that number.

Here's a screenshot of "top" on one of my Solr servers, with the list
sorted by memory usage:

https://www.dropbox.com/s/i49s2uyfetwo3xq/solr-mem-prod-8g-heap.png?dl=0

This machine has 165GB (base 2 number) of index data on it, and 64GB of
memory.  Solr has been assigned an 8GB heap.  Here's more specific info
about the size of the index data:

root@idxb3:/index/solr5/data# du -hs data
165Gdata
root@idxb3:/index/solr5/data# du -s data
172926520   data

You can see that the VIRT memory size of the Solr process is
approximately the same as the total index size (165GB) plus the max heap
(8GB), which adds up to 173GB.  The RES memory size of the java process
is 8.3GB -- just a little bit larger than the max heap.

At the OS level, my server shows 46GB used out of 64GB total ... which
probably seems excessive, until you consider the 36 million kilobytes in
the "cached" statistic.  This is the amount of memory being used for the
page cache.   If you subtract that memory, then you can see that this
server has only allocated about 10GB of RAM total -- exactly what I
would expect for a Linux machine dedicated to Solr with the max heap at 8GB.

Although my server is indicating about 18GB of memory free, I have seen
perfectly functioning servers with that number very close to zero.  It
is completely normal for the "free" memory statistic on Linux and
Windows to show a few megabytes or less, especially when you optimize a
Solr index, which reads (and writes) all of the index data, and will
fill up the page cache.

So, I will ask something very similar to my initial question.  Where
*exactly* are you looking to see the memory usage that you believe is a
problem?  A screenshot would be very helpful.

Here's a screenshot from my Windows client.  This machine is NOT running
Solr, but the situation with free and cached memory is similar.

https://www.dropbox.com/s/wex1gbj7e45g8ed/windows7-mem-usage.png?dl=0

I am not doing anything particularly unusual with this machine, but it
says there is *zero* free memory, out of 16GB total.  There is 9GB of
memory in the page cache, though -- memory that the OS will instantly
give up if any program requests it, which you can see because the
"available" stat is also about 9GB.  This Windows machine is doing
perfectly fine as far as memory.

Thanks,
Shawn



Solr suggest, auto complete & spellcheck

2016-01-04 Thread Steven White
Hi,

I'm trying to understand what are the differences between Solr suggest,
auto complete & spellcheck?  Isn't each a function of the UI?  If not, can
you provide me with links that show end-to-end example setting up Solr to
get all of the 3 features?

I'm on Solr 5.2.

Thanks

Steve


Re: Facet shows deleted values...

2016-01-04 Thread Don Bosco Durai
Tomás, thanks for the suggestion. facet.mincount will solve my issue.


Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And 
I also read somewhere that explicit commit is not recommended in SolrCloud 
mode. Regarding auto warm, my server has/was been running for a while.

Lost my env during the holidays. I will rebuild it and monitor this further. I 
will also try to explicit commit() to see if that helps.

Thanks

Bosco





On 12/29/15, 5:48 PM, "Tomás Fernández Löbbe"  wrote:

>I believe the problem here is that terms from the deleted docs still appear
>in the facets, even with a doc count of 0, is that it? Can you use
>facet.mincount=1 or would that not be a good fit for your use case?
>
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter
>
>Tomás
>
>On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson 
>wrote:
>
>> Let's be sure we're using terms similarly
>>
>> That article is from 2010, so is unreliable in the 5.2 world, I'd ignore
>> that.
>>
>> First, facets should always reflect the latest commit, regardless of
>> expungeDeletes or optimizes/forcemerges.
>>
>> _commits_ are definitely recommended. Optimize/forcemerge (or
>> expungedeletes) are rarely necessary and
>> should _not_ be necessary for facets to not count omitted documents.
>>
>> Is it possible that your autowarm period is long and you're still
>> getting an old searcher when you run your tests?
>>
>> Assuming that you commit(), then wait a few minutes, do you see
>> inaccurate facets? If so, what are the
>> exact steps you follow?
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai 
>> wrote:
>> > I am purging some of my data on regular basis, but when I run a facet
>> query, the deleted values are still shown in the facet list.
>> >
>> > Seems, commit with expunge resolves this issue (
>> http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
>> ). But it seems, commit is no more recommended. Also, I am running Solr 5.2
>> in SolrCloud mode.
>> >
>> > What is the recommendation here?
>> >
>> > Thanks
>> >
>> > Bosco
>> >
>> >
>>



Re: how to search miilions of record in solr query

2016-01-04 Thread Upayavira
Yes, because only a small portion of that 250ms is spent in the query
parser. Most of it, i would suggest, is spent retrieving and merging
posting lists.

In an inverted index (which Lucene is), you store the list of documents
matching a term against that term - that is your postings list.

When you search against multiple terms, Lucene needs to merge those into
a definitive list of matching documents, and for large numbers of terms,
that can be costly.

Upayavira

On Mon, Jan 4, 2016, at 04:29 PM, Erick Erickson wrote:
> Best of luck with that ;). 250ms isn't bad at all for "searching
> millions of IDs".
> Frankly, I'm not at all sure where I'd even start. With millions of
> search
> terms, I'd have to profile the application to see where it was spending
> the
> time before even starting.
> 
> Best,
> Erick
> 
> On Mon, Jan 4, 2016 at 5:03 AM, Mugeesh Husain  wrote:
> >>>This is not a use-case to which Lucene lends itself. However, if you
> >>>must, I would try the terms query parser, which I believe is used like
> >>>this: {!terms f=id}2,3,6,7
> >
> > I did try terms query parser like above, but the problem is performance, i
> > am getting result 250ms but i am looking for a parser which give result
> > within 50ms.
> >
> > I am also looking for custom query parser but i dont know which way i should
> > used that.
> >
> >
> >
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4248388.html
> > Sent from the Solr - User mailing list archive at Nabble.com.


Field Size per document in Solr

2016-01-04 Thread KNitin
Hi,

 I want to get the size of individual fields per document (or per index) in
solrcloud. Is there a way to do this using exiting solr or lucene api?

*Use case*: I have a few dynamic fields which may or may not be populated
everyday depending on certain conditions. I also do faceting and some
custom processing on these fields (using custom solr components). I want to
be able to plot the per field size of an index in realtime so that I can
try to identify the trend between fields & latencies.

Thanks a lot in advance!
Nitin


Re: Field Size per document in Solr

2016-01-04 Thread Upayavira

Solr does store the term positions, but you won't find it easy to
extract them, as they are stored against terms not fields.

Your best bet is to index field lengths into Solr alongside the field
values. You could use an UpdateProcessor to do this if you want to do it
in Solr.

Upayavira 

On Tue, Jan 5, 2016, at 12:39 AM, KNitin wrote:
> Hi,
> 
>  I want to get the size of individual fields per document (or per index)
>  in
> solrcloud. Is there a way to do this using exiting solr or lucene api?
> 
> *Use case*: I have a few dynamic fields which may or may not be populated
> everyday depending on certain conditions. I also do faceting and some
> custom processing on these fields (using custom solr components). I want
> to
> be able to plot the per field size of an index in realtime so that I can
> try to identify the trend between fields & latencies.
> 
> Thanks a lot in advance!
> Nitin


Re: Solr suggest, auto complete & spellcheck

2016-01-04 Thread Erick Erickson
Here's a writeup on suggester:
https://lucidworks.com/blog/2015/03/04/solr-suggester/

The biggest difference is that spellcheck returns individual _terms_
whereas suggesters can return entire fields.

Neither are "a function of the UI" any more than searching is a
function of the UI. In both cases you have to do something
user-friendly with the return.

Best,
Erick

On Mon, Jan 4, 2016 at 2:06 PM, Steven White  wrote:
> Hi,
>
> I'm trying to understand what are the differences between Solr suggest,
> auto complete & spellcheck?  Isn't each a function of the UI?  If not, can
> you provide me with links that show end-to-end example setting up Solr to
> get all of the 3 features?
>
> I'm on Solr 5.2.
>
> Thanks
>
> Steve


Re: Facet shows deleted values...

2016-01-04 Thread Shawn Heisey
On 1/4/2016 4:11 PM, Don Bosco Durai wrote:
> Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. 
> And I also read somewhere that explicit commit is not recommended in 
> SolrCloud mode. Regarding auto warm, my server has/was been running for a 
> while.

Since 4.0, autoCommit with openSearcher set to false is highly
recommended, no matter what your needs are regarding visibility, and
whether or not you're running in cloud mode.  The exact interval to use
is a subject for vigorous debate.  A common maxTime value that you will
see for autoCommit is 15 seconds (15000).  I personally feel this is too
frequent, but many people use that value with no problems.  I use five
minutes (30) in my own config, but over the course of those five
minutes, there's not much in the way of updates, so the log replay will
take very little time.  Using autoCommit with openSearcher set to false
takes care of transaction log rotation, it doesn't do ANYTHING for
document visibility.

The issue of how to handle document visibility will depend on exactly
how you use your index.  Do not worry about whether the index is
SolrCloud or not for this topic.

One way of handling document visibility is to use autoSoftCommit
(available since 4.0) in your config ... with maxTime set to the longest
possible interval you can stand.  My personal recommendation is to never
set that interval shorter than one minute (6).  Push back if you are
told that documents must be visible faster than that.  If you use
autoSoftCommit, you won't need explicit commits from your indexing
application.

Another way to handle document visibility is the commitWithin parameter
on each update request.  This is similar to autoSoftCommit, but gets set
on the update request.  Just like autoSoftCommit, I would not recommend
a value less than one minute, and if this parameter is used on all
updates, you will never need an explicit commit.

Using autoSoftCommit or commitWithin is a good option if there are many
clients/threads sending changes to the same index or the indexing
happens in bursts where the update size is wildly different and
completely unpredictable.

The final way to handle document visibility is explicit commits.  When
you want changes to be visible, you send a commit, hard or soft, with
openSearcher set to true (this is the default for this parameter), and a
short time later, all changes sent before that commit will become
visible.  This is how I handle my own index.  This is a good option if
all indexing is coming from a single source and that source has complete
control over all indexing operations.

One of the strong goals with commits is to avoid them happening too
frequently, so they don't overlap, and so the machine is spending less
time handling commits than it spends either idle or handling queries.

Here's a blog post with more detail.  The blog post says "SolrCloud" but
almost all of it is equally applicable to Solr 4.x and 5.x indexes that
are not running in cloud mode:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



Re: Facet shows deleted values...

2016-01-04 Thread Erick Erickson
bq:  And I also read somewhere that explicit commit is not recommended
in SolrCloud mode

Not quite, it's just easy to have too many commits happen too
frequently from multiple
indexing clients. It's also rare that the benefits of the clients
issuing commits outweighs
the chance of getting it wrong. It's not so much it's not recommended
as usually not at all
necessary and easy to get wrong.

Best,
Erick



On Mon, Jan 4, 2016 at 5:15 PM, Shawn Heisey  wrote:
> On 1/4/2016 4:11 PM, Don Bosco Durai wrote:
>> Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. 
>> And I also read somewhere that explicit commit is not recommended in 
>> SolrCloud mode. Regarding auto warm, my server has/was been running for a 
>> while.
>
> Since 4.0, autoCommit with openSearcher set to false is highly
> recommended, no matter what your needs are regarding visibility, and
> whether or not you're running in cloud mode.  The exact interval to use
> is a subject for vigorous debate.  A common maxTime value that you will
> see for autoCommit is 15 seconds (15000).  I personally feel this is too
> frequent, but many people use that value with no problems.  I use five
> minutes (30) in my own config, but over the course of those five
> minutes, there's not much in the way of updates, so the log replay will
> take very little time.  Using autoCommit with openSearcher set to false
> takes care of transaction log rotation, it doesn't do ANYTHING for
> document visibility.
>
> The issue of how to handle document visibility will depend on exactly
> how you use your index.  Do not worry about whether the index is
> SolrCloud or not for this topic.
>
> One way of handling document visibility is to use autoSoftCommit
> (available since 4.0) in your config ... with maxTime set to the longest
> possible interval you can stand.  My personal recommendation is to never
> set that interval shorter than one minute (6).  Push back if you are
> told that documents must be visible faster than that.  If you use
> autoSoftCommit, you won't need explicit commits from your indexing
> application.
>
> Another way to handle document visibility is the commitWithin parameter
> on each update request.  This is similar to autoSoftCommit, but gets set
> on the update request.  Just like autoSoftCommit, I would not recommend
> a value less than one minute, and if this parameter is used on all
> updates, you will never need an explicit commit.
>
> Using autoSoftCommit or commitWithin is a good option if there are many
> clients/threads sending changes to the same index or the indexing
> happens in bursts where the update size is wildly different and
> completely unpredictable.
>
> The final way to handle document visibility is explicit commits.  When
> you want changes to be visible, you send a commit, hard or soft, with
> openSearcher set to true (this is the default for this parameter), and a
> short time later, all changes sent before that commit will become
> visible.  This is how I handle my own index.  This is a good option if
> all indexing is coming from a single source and that source has complete
> control over all indexing operations.
>
> One of the strong goals with commits is to avoid them happening too
> frequently, so they don't overlap, and so the machine is spending less
> time handling commits than it spends either idle or handling queries.
>
> Here's a blog post with more detail.  The blog post says "SolrCloud" but
> almost all of it is equally applicable to Solr 4.x and 5.x indexes that
> are not running in cloud mode:
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>


Re: apply document filter to solr index

2016-01-04 Thread Alexandre Rafalovitch
Well, you have a crawling and extraction pipeline. You can probably inject
a classification algorithm somewhere in there, possibly NLP trained on
manual seed. Or just a list of typical words as a start.

This is kind of pre-Solr stage though.

Regards,
Alex
On 4 Jan 2016 7:37 pm,  wrote:

> Hi everyone, I'm working on a search engine based on solr which indexes
> documents from a large variety of websites.
> The engine is focused on cook recipes. However, one problem is that these
> websites provide not only content related to cooking recipes but also
> content related to: fashion, travel, politics, liberty rights etc etc which
> are not what the user expects to find on a cooking recipes dedicated search
> engine.
> Is there any way to filter out content which is not related to the core
> business of the search engine?
> Something like parental control software maybe?
> Kind regards,Christian Christian Fotache Tel: 0728.297.207 Fax:
> 0351.411.570


[Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Hi guys,
this is the scenario we are studying :

Solr 4.10.2
16 shards, a solr instance aggregating the results running a distrib query
with shards=. ( all the shards) .

Currently we are not using shards.tolerant=true, so we throw an exception
on error.

We are in a situation when a shard is too slow to respond ( empty filter
cache, big load).
According to the timeout that the shard handler is expecting that shard is
not fast enough, and for this reason we whole request fails.

So far, everything is clear.
We need to improve the speed of the shards, managing properly the auto
warming , load balancing etc .
We can play with the tolerant factor, and possibly be tolerant of errors.

But what happens is that the solr aggregator which runs the queries against
the shards is exhausting his threads...
Looking into the code, in the case we are not tolerant we get this :

// Was there an exception?
> if (srsp.getException() != null) {
>   // If things are not tolerant, abort everything and rethrow
>   if(!tolerant) {
>* shardHandler1.cancelAll();*
> if (srsp.getException() instanceof SolrException) {
>   throw (SolrException)srsp.getException();
> } else {
>   throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> srsp.getException());
> }


I would assume that is the responsible of the thread cleaning.
Any idea why the thread cleaning should not happen properly?
Can be some jetty misconfiguration ?

Cheers
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: apply document filter to solr index

2016-01-04 Thread Binoy Dalal
There is no way that you can do that in solr.

You'll have to write something at the app level,  where you're crawling
your docs or write a custom update handler that will preprocess the crawled
docs and throw out the irrelevant ones.

One way you can do that is look at the doc title and the url for certain
keywords that might tell you that the particular article belongs to the
fashion domain etc.
If the content is well structured then you might also have certain fields
in the raw crawled doc that tell you the doc category.
To look at the raw crawled doc you can use the
DocumentAnalysisRequestHandler.

On Mon, 4 Jan 2016, 18:07   wrote:

> Hi everyone, I'm working on a search engine based on solr which indexes
> documents from a large variety of websites.
> The engine is focused on cook recipes. However, one problem is that these
> websites provide not only content related to cooking recipes but also
> content related to: fashion, travel, politics, liberty rights etc etc which
> are not what the user expects to find on a cooking recipes dedicated search
> engine.
> Is there any way to filter out content which is not related to the core
> business of the search engine?
> Something like parental control software maybe?
> Kind regards,Christian Christian Fotache Tel: 0728.297.207 Fax:
> 0351.411.570

-- 
Regards,
Binoy Dalal


Re: Does soft commit re-opens searchers in disk?

2016-01-04 Thread Daniel Collins
If you have already done a soft commit and that opened a new searcher, then
the document will be visible from that point on.  The results returned by
that searcher cannot be changed by the hard commit (whatever that is doing
under the hood, the segment that has that document in must still be visible
to the searcher).  I don't know exactly how the soft commit stores its
segment, but there must be some kind of reference counting like there is
for disk segments since the searcher has that "segment" open (regardless of
whether that segment is in RAM or on disk).

On 4 January 2016 at 14:05, Emir Arnautovic 
wrote:

> Hi Gili,
> Visibility is related to searcher - if you reopen searcher it will be
> visible. If hard commit happens without reopening searcher, documents will
> not be visible till next soft commit happens.
> You can find more details about commits on
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> HTH,
> Emir
>
>
> On 04.01.2016 11:14, Gili Nachum wrote:
>
>> Hello,
>>
>> When a new document is added, it becomes visible after a soft commit,
>> during which it is written to a Lucene RAMDirectory (in heap). Then after
>> a
>> hard commit, the RAMDirectory is removed from memory and the docs are
>> written to the index on disk.
>> What happens if I hard commit (write to disk) with openSearcher=false.
>> Would I lose document visibility? since it's no longer in memory AND the
>> hard commit didn't open a new searcher on disk?
>>
>> Does soft commit also re-opens Searchers over the index on disk?
>>
>> Here's my commit configuration:
>>
>> 
>>60
>> *false*
>> 
>> 
>>   ${solr.autoSoftCommit.maxTime:3}
>>
>>
>> Thanks.
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


MapReduceIndexerTool Indexing

2016-01-04 Thread vidya
Hi

I have used MapReduceIndexerTool to index data in my hdfs to solr inorder to
search it. I want to know whether it indexes entire data when some new data
is added to that path, again when tool is run on it.

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387.html
Sent from the Solr - User mailing list archive at Nabble.com.


apply document filter to solr index

2016-01-04 Thread liviuchristian
Hi everyone, I'm working on a search engine based on solr which indexes 
documents from a large variety of websites. 
The engine is focused on cook recipes. However, one problem is that these 
websites provide not only content related to cooking recipes but also content 
related to: fashion, travel, politics, liberty rights etc etc which are not 
what the user expects to find on a cooking recipes dedicated search engine. 
Is there any way to filter out content which is not related to the core 
business of the search engine?
Something like parental control software maybe?
Kind regards,Christian Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570

Re: how to search miilions of record in solr query

2016-01-04 Thread Mugeesh Husain
>>This is not a use-case to which Lucene lends itself. However, if you 
>>must, I would try the terms query parser, which I believe is used like 
>>this: {!terms f=id}2,3,6,7 

I did try terms query parser like above, but the problem is performance, i
am getting result 250ms but i am looking for a parser which give result
within 50ms.

I am also looking for custom query parser but i dont know which way i should
used that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4248388.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does soft commit re-opens searchers in disk?

2016-01-04 Thread Emir Arnautovic

Hi Gili,
Visibility is related to searcher - if you reopen searcher it will be 
visible. If hard commit happens without reopening searcher, documents 
will not be visible till next soft commit happens.
You can find more details about commits on 
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


HTH,
Emir

On 04.01.2016 11:14, Gili Nachum wrote:

Hello,

When a new document is added, it becomes visible after a soft commit,
during which it is written to a Lucene RAMDirectory (in heap). Then after a
hard commit, the RAMDirectory is removed from memory and the docs are
written to the index on disk.
What happens if I hard commit (write to disk) with openSearcher=false.
Would I lose document visibility? since it's no longer in memory AND the
hard commit didn't open a new searcher on disk?

Does soft commit also re-opens Searchers over the index on disk?

Here's my commit configuration:


   60
*false*


  ${solr.autoSoftCommit.maxTime:3}
   

Thanks.



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Querying with action parameter included in URL

2016-01-04 Thread vidya
Hi
 
I am pretty new to solr and when i am going through the tutorials , I came
across urls for querying like 
"http://localhost:8983/solr/admin/configs?action=CREATE=booksConfig=genericTemplate;
.
I wanted to know how to implement the same by doing changes in schema.xml or
solrconfig.xml. Where should i make changes when an "action=" is specified.

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-with-action-parameter-included-in-URL-tp4248576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MapReduceIndexerTool Indexing

2016-01-04 Thread vidya
Hi

I would like to index only new data but not already indexed data(delta
Indexing). how can i achieve it using MRIT.

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Querying with action parameter included in URL

2016-01-04 Thread Binoy Dalal
I think that all this will do is create a config file with the name
booksConfig based on a template. This and other calls like these are solr's
core admin api calls that you make through http requests.
You don't need to make any changes to your schema or solrconfig files in
order to execute such calls.

On Tue, 5 Jan 2016, 11:57 vidya  wrote:

> Hi
>
> I am pretty new to solr and when i am going through the tutorials , I came
> across urls for querying like
> "
> http://localhost:8983/solr/admin/configs?action=CREATE=booksConfig=genericTemplate
> "
> .
> I wanted to know how to implement the same by doing changes in schema.xml
> or
> solrconfig.xml. Where should i make changes when an "action=" is specified.
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Querying-with-action-parameter-included-in-URL-tp4248576.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Querying with action parameter included in URL

2016-01-04 Thread davidphilip cherian
Hi Vidya,

I think you are confused with solr search queries/requests with solr other
restful apis to perform CRUD operations on collections.

Sample of search queries are list here with standard query parser :
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser

Solr collection apis to perform crud operations on collections :
https://cwiki.apache.org/confluence/display/solr/Collections+API



On Tue, Jan 5, 2016 at 12:37 PM, Binoy Dalal  wrote:

> I think that all this will do is create a config file with the name
> booksConfig based on a template. This and other calls like these are solr's
> core admin api calls that you make through http requests.
> You don't need to make any changes to your schema or solrconfig files in
> order to execute such calls.
>
> On Tue, 5 Jan 2016, 11:57 vidya  wrote:
>
> > Hi
> >
> > I am pretty new to solr and when i am going through the tutorials , I
> came
> > across urls for querying like
> > "
> >
> http://localhost:8983/solr/admin/configs?action=CREATE=booksConfig=genericTemplate
> > "
> > .
> > I wanted to know how to implement the same by doing changes in schema.xml
> > or
> > solrconfig.xml. Where should i make changes when an "action=" is
> specified.
> >
> > Thanks in advance
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Querying-with-action-parameter-included-in-URL-tp4248576.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> --
> Regards,
> Binoy Dalal
>


Re: Multiple solr instances on one server

2016-01-04 Thread Jack Krupansky
See the Solr Reference Guide:

"
-s 

Sets the solr.solr.home system property; Solr will create core directories
under this directory. This allows you to run multiple Solr instances on the
same host while reusing the same server directory set using the -d
parameter. If set, the specified directory should contain a solr.xml file,
unless solr.xml exists in ZooKeeper. The default value is server/solr.
"
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference



-- Jack Krupansky

On Mon, Jan 4, 2016 at 10:28 AM, Mugeesh Husain  wrote:

> you could start solr with multiple port like below
>
>
> bin/solr start -p 8983 one instance
> bin/solr start -p 8984 second instance and so its depend on you
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248413.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello,


(Solr 5.2.1)


I'm wanting to run multple solr instances on one server, does anyone know which 
is better- allowing each solr instance to use their own internal jetty or to 
install jetty on the server?


Many thanks


Philippa


Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello,

Thanks for your reply.  Do you know if there are many disadvantages to running 
multiple solr instances all running their own internal jetty. I'm trying to 
work out if this would work or if I would need to install jetty myself on the 
machine and use that instead. I'm not sure how many solr instances I would need 
to run yet, it could be as high as 10.

From: Mugeesh Husain 
Sent: 04 January 2016 15:28
To: solr-user@lucene.apache.org
Subject: Re: Multiple solr instances on one server

you could start solr with multiple port like below


bin/solr start -p 8983 one instance
bin/solr start -p 8984 second instance and so its depend on you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query regarding AnalyzingInfixLookupFactory

2016-01-04 Thread radhika tayal
Hi,
  I am trying to use AnalyzingInfixLookupFactory for Auto-suggest. But I am
facing one issue related to duplicate result. Below is exact problem i am
facing

A lot of the fields I am capturing (multivalue) contains data that are
repeated (eg. new york exists in the title fields of many articles). So
when I search for ‘New Y’, I get multiple results with the same value of
New york. Is there a way to prevent these duplicates from appearing in the
suggestions?

Thanks
Radhika


Re: Multiple solr instances on one server

2016-01-04 Thread Mugeesh Husain
you could start solr with multiple port like below


bin/solr start -p 8983 one instance
bin/solr start -p 8984 second instance and so its depend on you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple solr instances on one server

2016-01-04 Thread Mugeesh Husain
you could use inbuilt(internal) jetty in the production, its depend on
requirement.

if you want to use other container, tomcat would be the best.

Elaborate your requirement Please why you want to use multiple instance in a
single server ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Yes Erick, our jetty is configured with a 10.000 threads.

Actually the puzzle got more complicated as we realised the connTimeout by
default is set to 0.
But we definetely get an error from one of the shards and the aggregator
throw the exception because not tolerant.

The weird thing is that the shard presents an error which is a typical clue
of a client closing the http connection.

*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: 37661057 [qtp15642-279052] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://solr10.bug *
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
...
*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: Caused by: org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at: http://solr10.bug
*
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: ... 1 more
*Jan 03 16:55:55 solr-a00.bug.example.com 
java[10661]: Caused by: java.net.SocketException: Connection reset*
...

*Shard Log*

*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214068 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:207)
...
*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214073 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
...
*Jan 03 16:55:10 solr.bug.example.com 
java[21200]: 1214074 [qtp1018590076-595] WARN
 org.eclipse.jetty.server.Response  – Committed before 500
{trace=org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 

Re: how to search miilions of record in solr query

2016-01-04 Thread Erick Erickson
Best of luck with that ;). 250ms isn't bad at all for "searching
millions of IDs".
Frankly, I'm not at all sure where I'd even start. With millions of search
terms, I'd have to profile the application to see where it was spending the
time before even starting.

Best,
Erick

On Mon, Jan 4, 2016 at 5:03 AM, Mugeesh Husain  wrote:
>>>This is not a use-case to which Lucene lends itself. However, if you
>>>must, I would try the terms query parser, which I believe is used like
>>>this: {!terms f=id}2,3,6,7
>
> I did try terms query parser like above, but the problem is performance, i
> am getting result 250ms but i am looking for a parser which give result
> within 50ms.
>
> I am also looking for custom query parser but i dont know which way i should
> used that.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4248388.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: MapReduceIndexerTool Indexing

2016-01-04 Thread Erick Erickson
Yes it does. MRIT is intended for initial bulk loads. It takes whatever
it's pointed at and indexes it.

Additionally, it does not update documents. If the same document (by
ID) is indexed twice, you'll wind up with two copies in your results.

Best,
Erick

On Mon, Jan 4, 2016 at 5:00 AM, vidya  wrote:
> Hi
>
> I have used MapReduceIndexerTool to index data in my hdfs to solr inorder to
> search it. I want to know whether it indexes entire data when some new data
> is added to that path, again when tool is run on it.
>
> Thanks in advance
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387.html
> Sent from the Solr - User mailing list archive at Nabble.com.


can we use Streaming Expressions for different collection

2016-01-04 Thread Mugeesh Husain
I am checking the arcticle this->
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

Can we implement merge operation for different collection or different node
in solrcloud





--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-we-use-Streaming-Expressions-for-different-collection-tp4248461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Erick Erickson
How many threads are you allocating for the servlet container? 10,000
is the "usual" number.

Best,
Erick

On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti
 wrote:
> Hi guys,
> this is the scenario we are studying :
>
> Solr 4.10.2
> 16 shards, a solr instance aggregating the results running a distrib query
> with shards=. ( all the shards) .
>
> Currently we are not using shards.tolerant=true, so we throw an exception
> on error.
>
> We are in a situation when a shard is too slow to respond ( empty filter
> cache, big load).
> According to the timeout that the shard handler is expecting that shard is
> not fast enough, and for this reason we whole request fails.
>
> So far, everything is clear.
> We need to improve the speed of the shards, managing properly the auto
> warming , load balancing etc .
> We can play with the tolerant factor, and possibly be tolerant of errors.
>
> But what happens is that the solr aggregator which runs the queries against
> the shards is exhausting his threads...
> Looking into the code, in the case we are not tolerant we get this :
>
> // Was there an exception?
>> if (srsp.getException() != null) {
>>   // If things are not tolerant, abort everything and rethrow
>>   if(!tolerant) {
>>* shardHandler1.cancelAll();*
>> if (srsp.getException() instanceof SolrException) {
>>   throw (SolrException)srsp.getException();
>> } else {
>>   throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> srsp.getException());
>> }
>
>
> I would assume that is the responsible of the thread cleaning.
> Any idea why the thread cleaning should not happen properly?
> Can be some jetty misconfiguration ?
>
> Cheers
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: shard lost - solr5.3

2016-01-04 Thread Erick Erickson
There's no reason to shut down your node. You should be able
to issue a REBALANCELEADERS command, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

on a currently-running cluster and all your preferred leaders
(assuming the nodes are up) should become the leader of their respective shards.

I should emphasize, though, that this is rarely necessary unless you
have lots and lots
and lots of shards. The use-case that code was written for was,
literally, hundreds
of shards all had their leaders on a single node.

FWIW,
Erick

On Mon, Jan 4, 2016 at 3:26 AM, GOURAUD Emmanuel  wrote:
> hi there
>
> replying to myself
>
> i have set the replica property "preferredLeader" on this shard, shut down 
> all replica for this shard and started only the "preferred" one, this forced 
> an election and save my "ops" night and my new year party!!
>
> cheers,
>
> Emmanuel
>
>
>
> De: "GOURAUD Emmanuel" 
> À: solr-user@lucene.apache.org
> Envoyé: Jeudi 31 Décembre 2015 15:30:42
> Objet: shard lost - solr5.3
>
> Hi there,
>
> I have a collection that is composed of 8 shards with a replicationFactor of 2
>
> i found 2 cores of the same shard in recoveryfailed status so i decided to 
> restart both,
>
> after having doing that , i do not have any leader on that shard... and both 
> cores are down
>
> is there a way to force a leader at startup or with the API? can fore 
> election?
>
> thanks for your help
>
> Emmanuel
>


Re: Hard commits, soft commits and transaction logs

2016-01-04 Thread Erick Erickson
As far as I know. If you see anything different, let me know and
we'll see if we can update it.

Best,
Erick

On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV  wrote:
> [Happy New Year to all]
>
> Is all herein
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> mentioned/recommended still valid for Solr 5.x?
>
> - Clemens


Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
We store a huge amount of data across 10 shards and are getting to a point 
where we keep having to up the heap to stop solr from crashing.  We are trying 
to keep the heap size down, and plan to to host multiple solr instances on each 
server which will have a much smaller heap size.

From: Mugeesh Husain 
Sent: 04 January 2016 16:01
To: solr-user@lucene.apache.org
Subject: Re: Multiple solr instances on one server

you could use inbuilt(internal) jetty in the production, its depend on
requirement.

if you want to use other container, tomcat would be the best.

Elaborate your requirement Please why you want to use multiple instance in a
single server ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple solr instances on one server

2016-01-04 Thread Erick Erickson
Right, that's the most common reason to run multiple JVMs. You must
be running multiple replicas on each box though to make that viable. By
running say 2 JVMS, you're essentially going from hosting, say, 4 replicas
in one JVM to 2 replicas in each of 2 JVMs.

You'll incur some overhead due to the second instance of Java running,
but that's usually negligible.

There's no reason at all to run an independent Jetty, just use the startup
scripts to specify a second port as outlined above. If you use the startup
script and specify the -e cloud example (on your local box, say), go ahead
and specify two instances of Solr. The script will echo out the exact command
used to start them up and you can use that as an example.

Best,
Erick

On Mon, Jan 4, 2016 at 8:16 AM, philippa griggs
 wrote:
> We store a huge amount of data across 10 shards and are getting to a point 
> where we keep having to up the heap to stop solr from crashing.  We are 
> trying to keep the heap size down, and plan to to host multiple solr 
> instances on each server which will have a much smaller heap size.
> 
> From: Mugeesh Husain 
> Sent: 04 January 2016 16:01
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple solr instances on one server
>
> you could use inbuilt(internal) jetty in the production, its depend on
> requirement.
>
> if you want to use other container, tomcat would be the best.
>
> Elaborate your requirement Please why you want to use multiple instance in a
> single server ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248429.html
> Sent from the Solr - User mailing list archive at Nabble.com.