Re: Slow indexing speed when index size is large?

2016-10-13 Thread Zheng Lin Edwin Yeo
Thanks for the reply Shawn.

Currently, my heap allocation to each Solr instance is 22GB.
Is that big enough?

Regards,
Edwin


On 13 October 2016 at 23:56, Shawn Heisey  wrote:

> On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote:
> > Would like to find out, will the indexing speed in a collection with a
> > very large index size be much slower than one which is still empty or
> > a very small index size? This is assuming that the configurations,
> > indexing code and the files to be indexed are the same. Currently, I
> > have a setup in which the collection is still empty, and I managed to
> > achieve an indexing speed of more than 7GB/hr. I also have another
> > setup in which the collection has an index size of 1.6TB, and when I
> > tried to index new documents to it, the indexing speed is less than
> > 0.7GB/hr.
>
> I have noticed this phenomenon myself.  As the amount of index data
> already present increases, indexing slows down.  Best guess as to the
> cause: more frequent and longer-lasting garbage collections.
>
> Indexing involves a LOT of memory allocation.  Most of the memory chunks
> that get allocated are quickly discarded because they do not need to be
> retained.
>
> If you understand how the Java memory model works, then you know that
> this means there will be a lot of garbage collection.  Each GC will tend
> to take longer if there are a large number of objects allocated that are
> NOT garbage.
>
> When the index is large, Lucene/Solr must allocate and retain a larger
> amount of memory just to ensure that everything works properly.  This
> leaves less free memory, so indexing will cause more frequent garbage
> collections ... and because the amount of retained memory is
> correspondingly larger, each garbage collection will take longer than it
> would with a smaller index.  A ten to one difference in speed does seem
> extreme, though.
>
> You might want to increase the heap allocated to each Solr instance, so
> GC is less frequent.  This can take memory away from the OS disk cache,
> though.  If the amount of OS disk cache drops too low, general
> performance may suffer.
>
> Thanks,
> Shawn
>
>


Re: Rename shards ?

2016-10-13 Thread Erick Erickson
That doesn't make a great deal of sense no the surface. If you're
using the default document routing is a hash of the  modulo
(# shards), so there's no good way for you to put docs all from GB on
the same shard.

If you're using the "implicit router", you can simply use the
collections CREATESHARD and give it whatever name you want...

I suspect this an "XY" problem. Why do you want to name your shards?
If you tell us the use-case, perhaps we can have more cogent
responses.

Best,
Erick

On Mon, Oct 10, 2016 at 7:49 AM, Customer  wrote:
> Hey,
>
> is there a way to rename shards or ideally give them names when creating
> collection? For example instead of names like Shard1, Shard2 on admin page I
> would like to see countries instead, for example GB, US, AU etc ...
>
> Is there any way to do that ?
>
> Thanks


Re: Solr & ALfresco: infinite write on solr

2016-10-13 Thread Erick Erickson
You should ask from help from Alfresco.

Plus, what Solr are you using? There is no 1.6 version

Best,
Erick

On Thu, Oct 13, 2016 at 5:56 AM, Salvatore Pulvirenti
 wrote:
> Hi,
> a little tip.
>
> If we look into the index directory we see this file
>
> /lucene-808e5ff8ac506400677fbfa9db547bd8-write.lock/
>
> We think that the process/thread that is operating the check write this file
> to avoid other process/thread can make noise.
> What if we delete this lock file?
>
> Cheers
> Salvo
>
>
> Il 13/10/2016 11:28, Salvatore Pulvirenti ha scritto:
>>
>> Hi,
>> we use solr 1.6 to index file inside an alfresco 4.2 application, that is
>> not working good anymore and we don't understand why.
>>
>> When the solr stop to index in the logs, we read this row:
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>
>> We tried to restart tomcat a lot of times and we observe this strange
>> behavior:
>> - solr start index, everytingh is running good and we can see 24 thread
>> working, log say:
>>
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco20] ..
>> checking for path change
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. checking
>> for path change
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco2] .. checking
>> for path change
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco12] ..
>> updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
>>
>> - after some minutes one thread start writing "checking aux doc exists in
>> index before we update it" while other thread is working good
>>
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco15] ..
>> checking for path change
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco21] ..
>> updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco24] ..
>> updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco14] ..
>> updating
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. updating
>>
>> - after some minutes no-one thread is working and in the log a can find
>> only the row:
>>
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
>> aux doc exists in index before we update it
>>
>> After this no-one index was created.
>>
>> Anyone have any idea what we can do to solve this issue?
>>
>> Thanks in advice.
>>
>>
>> Salvo


Re: [Solr 5.1.0] - Ignoring Whitespaces as delimiters

2016-10-13 Thread Jan Høydahl
Have you tried PatternTokenizer?

Sendt fra min iPhone

> Den 13. okt. 2016 kl. 04.03 skrev deniz :
> 
> Hello,
> 
> Are there any built-in tokenizers which will do sth like StandardTokenizer,
> but will not tokenize on whitespace? 
> 
> e.g field:abc cde-rfg will be tokenized as "abc cde" and "rfg", not "abc",
> "cde", "rfg"
> 
> 
> I have checked the existing tokenizers/analyzers and it seems like there is
> no other way but writing a custom tokenizer... 
> 
> 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-1-0-Ignoring-Whitespaces-as-delimiters-tp4300939.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Multi-core query performance tuning/monitoring

2016-10-13 Thread Oleg Ievtushok
Hi

I have a few filter queries that use multiple cores join to filter
documents. After I inverted those joins they became slower. So, it looks
something like that:

I used to query "product" core with query that contains fq={!join to=tags
from=preferred_tags fromIndex=user}(country:US AND
...)=product_category:0&...
Now I query "user" core with query that contains fq={!join
to=preferred_tags from=tags fromIndex=product}(product_category:0 AND
...)=country:US&...

Both tags and preferred_tags might contain multiple values and "product"
core is more oftenly used(so could be that the cache is warmer for that
core). "user" index is smaller then "product". After a few queries Solr
seems to warm up and serves the query ~50x faster, but the initial queries
are extremely slow. I tried turning off caching for the filter and making
it's cost higher then 150, but it did not help much. I was thinking about
adding autowarmup queries, but first I want to check what makes the join so
slow, so what would be a right way to debug it to see which part of it is
the slowest one...

Also, if I will go with autowarmup since there are 2 cores involved I
wonder which warmup query should be used... "fq={!join to=preferred_tags
from=tags fromIndex=product}(product_category:0 AND ...)" on "user" core or
"fq=(product_category:0 AND ...)" on "product"...

Solr version is 4.3.0


Regards, Oleg


Re: Slow indexing speed when index size is large?

2016-10-13 Thread Shawn Heisey
On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote:
> Would like to find out, will the indexing speed in a collection with a
> very large index size be much slower than one which is still empty or
> a very small index size? This is assuming that the configurations,
> indexing code and the files to be indexed are the same. Currently, I
> have a setup in which the collection is still empty, and I managed to
> achieve an indexing speed of more than 7GB/hr. I also have another
> setup in which the collection has an index size of 1.6TB, and when I
> tried to index new documents to it, the indexing speed is less than
> 0.7GB/hr. 

I have noticed this phenomenon myself.  As the amount of index data
already present increases, indexing slows down.  Best guess as to the
cause: more frequent and longer-lasting garbage collections.

Indexing involves a LOT of memory allocation.  Most of the memory chunks
that get allocated are quickly discarded because they do not need to be
retained.

If you understand how the Java memory model works, then you know that
this means there will be a lot of garbage collection.  Each GC will tend
to take longer if there are a large number of objects allocated that are
NOT garbage.

When the index is large, Lucene/Solr must allocate and retain a larger
amount of memory just to ensure that everything works properly.  This
leaves less free memory, so indexing will cause more frequent garbage
collections ... and because the amount of retained memory is
correspondingly larger, each garbage collection will take longer than it
would with a smaller index.  A ten to one difference in speed does seem
extreme, though.

You might want to increase the heap allocated to each Solr instance, so
GC is less frequent.  This can take memory away from the OS disk cache,
though.  If the amount of OS disk cache drops too low, general
performance may suffer.

Thanks,
Shawn



Slow indexing speed when index size is large?

2016-10-13 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, will the indexing speed in a collection with a very
large index size be much slower than one which is still empty or a very
small index size? This is assuming that the configurations, indexing code
and the files to be indexed are the same.

Currently, I have a setup in which the collection is still empty, and I
managed to achieve an indexing speed of more than 7GB/hr. I also have
another setup in which the collection has an index size of 1.6TB, and when
I tried to index new documents to it, the indexing speed is less than
0.7GB/hr.

This setup was done with Solr 5.4.0

Regards,
Edwin


Re: Solr stop index after few minutes from restart

2016-10-13 Thread Shawn Heisey
On 10/13/2016 2:50 AM, Angelo Gaverini wrote:
> We use solr to index file inside an alfresco application, that is not
> working good anymore and we don't understand why.
> When the solr stop index in the logs we read this row:
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>
> We tried to restart tomcat a lot of times and we observe this strange
> behavior:
> - solr start index, everytingh is running good and we can see 24 thread
> working, log say:
>
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco20] .. checking
> for path change
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. checking
> for path change
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco2] .. checking
> for path change
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco12] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
>
> - after some minutes one thread start writing "checking aux doc exists in
> index before we update it" while other thread is working good
>
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco15] .. checking
> for path change
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco21] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco24] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco14] .. updating
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. updating
>
> - after some minutes no-one thread is working and in the log a can find
> only the row:
>
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it
>  INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
> aux doc exists in index before we update it

These logs did not come from Solr.  If you need help with what this
means, you're going to need to talk to whoever created the software,
which is presumably whoever created Alfresco.  We have no idea what this
software is doing, or what those log messages mean.

If you have proof that the priblem is in *Solr*, then show us evidence
from Solr.  Otherwise, you should be talking to somebody who knows about
the software that created the log above.  If they investigate the
problem and believe that it is Solr, then they may be able to help you
find the evidence you need to ask about on this list.

> After this no-one index was created.
>
> I found then in the index directory the file
> "lucene-85839584395834953-write.lock" and the creation date is the same as
> the first row with the message "checking aux doc exists in index before we
> update it" in the logs.

The write lock is created by Lucene, by the part of the code that
changes the index, to ensure that only one process (in this case, Solr,
which is a Lucene application) can update the on-disk index.  It sounds
like Solr opens the index writer object once you request a change to the
index, and probably keeps it open indefinitely after that unless you
restart Solr.  A core reload might also remove the lock until the next
change, but without testing, I can't say for sure.  That is an EXTREMELY
low-level detail that you really don't need to be worried about.  Solr
and Lucene will handle it, and I am not aware of any bugs in that part
of the system.

It looks like somebody else is asking the list separately about this
same problem, probably from your organization.  They asked about
deleting the 

Re: Unsubscribe from this mailing-list

2016-10-13 Thread Pritam Kute
Try sending mail to: solr-user-unsubscr...@lucene.apache.org from your
registered email.

Thanks & Regards,
--
*Pritam Kute*

On Wed, Oct 12, 2016 at 3:27 AM, prakash reddy 
wrote:

> Please remove me from the mailing list.
>


Re: How to unload solr collections?

2016-10-13 Thread Shawn Heisey
On 10/13/2016 2:52 AM, Girish Chafle wrote:
> We are using Solr 5.2.1 with SolrJ API. To improve/minimize the Solr heap
> utilization we would like to explicitly unload Solr collections after
> completing the search queries.Is there an API to unload Solr Collections
> for SolrCloud?
>
> The real issue we are trying to solve is Solr running into out of memory
> errors on searching large amount of data for a given heap setting. Keeping
> fixed heap size, we plan to load/unload collections so that we do not let
> Solr run out of memory. Any suggestions/help on this is highly appreciated.

What you are describing sounds like the LotsOfCores functionality ...
but this functionality does NOT work with SolrCloud.  SolrCloud
currently assumes that all collections in the zookeeper database are
available if the servers that host them are available.

https://wiki.apache.org/solr/LotsOfCores

If you were to unload the *cores* that make up a collection, then the
collection would be down until you manually re-loaded them, because
SolrCloud doesn't have the functionality you're after.  Using CoreAdmin
(which can unload and create cores) with SolrCloud is not recommended,
because it's very easy to do it incorrectly.

SolrCloud could in theory be adjusted to make what you want possible,
but it would not be a trivial development effort.  Writing, testing, and
fixing the code could take weeks or months, particularly because Solr
does not have paid developers.  We are all volunteers.

Assuming that such a feature IS developed, or you figure out how to
unload/reload cores manually without problems, if a collection is large
enough that it is causing serious issues with your heap, then reloading
it after it has been unloaded would be a very time-consuming process. 
I'm not sure that users would appreciate the long search delays while
Solr spins the collection back up.

Your best bet right now is to add hardware, either additional servers or
more memory in the servers you have.

Thanks,
Shawn



Re: [Solr-5-4-1] Why SolrCloud leader is putting all replicas in recovery at the same time ?

2016-10-13 Thread Gerald Reinhart


Hi Pushkar Raste,

  Thanks for your hits.
  We will try the 3rd solution and keep you posted.

Gérald Reinhart

On 10/07/2016 02:23 AM, Pushkar Raste wrote:
A couple of questions/suggestions
- This normally happens after leader election, when new leader gets elected, it 
will force all the nodes to sync with itself.
Check logs to see when this happens, if leader was changed. If that is true 
then you will have to investigate why leader change takes place.
I suspect leader goes into long enough GC pause that makes zookeeper leader is 
no longer available and initiates leader election.

- What version of Solr you are using.  
SOLR-8586 introduced 
IndexFingerprint check, unfortunately it was broken and hence replica would always do full 
index replication. Issue is now fixed in 
SOLR-9310, this should help replicas 
recover faster.

- You should also increase ulog log size (default threshold is 100 docs or 10 
tlogs whichever is hit first). This will again help replicas recover faster 
from tlogs (of course, there would be a threshold after which recovering from 
tlog would in fact take longer than copying over all the index files from 
leader)


On Thu, Oct 6, 2016 at 5:23 AM, Gerald Reinhart 
> wrote:

Hello everyone,

   Our Solr Cloud  works very well for several months without any significant 
changes: the traffic to serve is stable, no major release deployed...

   But randomly, the Solr Cloud leader puts all the replicas in recovery at the 
same time for no obvious reason.

   Hence, we can not serve the queries any more and the leader is overloaded 
while replicating all the indexes on the replicas at the same time which 
eventually implies a downtime of approximately 30 minutes.

   Is there a way to prevent it ? Ideally, a configuration saying a percentage 
of replicas to be put in recovery at the same time?

Thanks,

Gérald, Elodie and Ludovic


--
[cid:part1.0508.06030105@kelkoo.com]

Gérald Reinhart Software Engineer

E   
gerald.reinh...@kelkoo.comY!Messenger gerald.reinhart
T +33 (0)4 56 09 07 41
A Parc Sud Galaxie - Le Calypso, 6 rue des Méridiens, 38130 Echirolles


[cid:part4.08030706.00010405@kelkoo.com]




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.



--


--
[cid:part9.0705.09050102@kelkoo.com]

Gérald Reinhart Software Engineer

E   
gerald.reinh...@kelkoo.comY!Messenger gerald.reinhart
T +33 (0)4 56 09 07 41
A Parc Sud Galaxie - Le Calypso, 6 rue des Méridiens, 38130 Echirolles


[cid:part12.05000204.08070006@kelkoo.com]




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: INVALID in email address

2016-10-13 Thread hairymcclarey
Thanks, never saw that before from yahoo. 

On Thursday, October 13, 2016 12:18 PM, Hasan Diwan  
wrote:
 

 The "From" field in your email client is set to that. However, your
"Reply-To" is correct. As to the deeper reason "why", it's a fairly
easy-to-defeat workaround for spam.. -- H

On 13 October 2016 at 01:39,  wrote:

> Anyone know why this appears after my email address when I reply to a
> thread in the user group?




-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
Sent from my mobile device


   

Re: Query by distance

2016-10-13 Thread Emir Arnautovic

Hi,

Did you try simple phrase query> PositionNSD:"Chief Executive Officer"?

Did you apply synonym filter on query or index time?

Emir


On 11.10.2016 17:49, marotosg wrote:

Hi,

I have a field which contains Job Positions for people. This field uses a
SynonymFilterFactory


The field contains the following data "Chief Sales Officer"  and my synonyms
file has an entrance
like "Chief Sales Officer, Chief of Sales, Chief Sales Executive".

My Analyzer return for "Chief Sales Officer"  these tokens. "chief chief
chief sales of sales officer sales executive"

I have a query like below which is returning a match for "Chief Executive
officer" which is not good.
  PositionNSD:(Chief))^3 OR ((PositionNSD:Chief*))^1.5)
AND
  ((PositionNSD:(Executive))^3 OR ((PositionNSD:Executive*))^1.5)
AND
  ((PositionNSD:(Officer))^3 OR ((PositionNSD:Officer*))^1.5)))

Can anyone suggest a solution to keep the distance between the terms or do
something to avoid to match on any token no matter the position?

Thanks a lot.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-by-distance-tp4300660.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: INVALID in email address

2016-10-13 Thread Hasan Diwan
The "From" field in your email client is set to that. However, your
"Reply-To" is correct. As to the deeper reason "why", it's a fairly
easy-to-defeat workaround for spam.. -- H

On 13 October 2016 at 01:39,  wrote:

> Anyone know why this appears after my email address when I reply to a
> thread in the user group?




-- 
OpenPGP:
https://sks-keyservers.net/pks/lookup?op=get=0xFEBAD7FFD041BBA1
Sent from my mobile device


Re: multivalued coordinate for geospatial search

2016-10-13 Thread Emir Arnautovic

Hi Chris,

In order to make it work you have to concatenate lat/lon before it 
reaches indexing. You can do that by using processor chain and adding 
ConcatFieldUpdateProcessorFactory.


Emir


On 12.10.2016 11:26, Chris Chris wrote:

Hello solr users!

I am trying to use geospatial to do some basic distance search in Solr4.10

At the moment, I got it working if I have just on set of coordinate
(latitude,longitude) per document.

However, I need to get it to work when I have an unknown numbers of set of
coordinates per document: the document should be returned if any of its
coordinates is within the distance threshold of a given coordinate.


Below is how it is working when I have just one set of coordinate per
document.









The reason why I am using the copyField is because the latitude and
longitude are provided in separate fields, not in the "lat,lon" format.
So far, all my attempts to use multivalued failed, and I would greatly
appreciate some help.

Thanks!

Chris



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr & ALfresco: infinite write on solr

2016-10-13 Thread Salvatore Pulvirenti

Hi,
a little tip.

If we look into the index directory we see this file

/lucene-808e5ff8ac506400677fbfa9db547bd8-write.lock/

We think that the process/thread that is operating the check write this 
file to avoid other process/thread can make noise.

What if we delete this lock file?

Cheers
Salvo

Il 13/10/2016 11:28, Salvatore Pulvirenti ha scritto:

Hi,
we use solr 1.6 to index file inside an alfresco 4.2 application, that 
is not working good anymore and we don't understand why.


When the solr stop to index in the logs, we read this row:
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it


We tried to restart tomcat a lot of times and we observe this strange 
behavior:
- solr start index, everytingh is running good and we can see 24 
thread working, log say:


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco20] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco2] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco12] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. 
updating


- after some minutes one thread start writing "checking aux doc exists 
in index before we update it" while other thread is working good


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco15] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco21] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco24] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco14] .. 
updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. 
updating


- after some minutes no-one thread is working and in the log a can 
find only the row:


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it


After this no-one index was created.

Anyone have any idea what we can do to solve this issue?

Thanks in advice.


Salvo


Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-13 Thread Emir Arnautovic

Hi Obaid,

You may also want to check out 
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets


Emir

On 13.10.2016 00:33, Nick Vasilyev wrote:

Check out cursorMark, it should be available in your release. There is some
good information on this page:

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results


On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid <
obaid.salik...@iacpublishinglabs.com> wrote:


Hi,

I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it
has 3 nodes), and let me first state that I am new Solr. I want to retrieve
all documents from Sold (essentially just one field from each document).

What is the best way of fetching this much data without overloading Solr
cluster?


Approach I tried:
I tried using the following API (running every minute) to fetch a batch of
1000 documents every minute. On Each run, I initialize start with the new
index i.e adding 1000.
http://SOLR_HOST/solr/abc/select?q=*:*==1=
1000=url=csv=false=false

However, with the above approach, I have two issues:

1.   Solr cluster gets overloaded i.e it slows down

2.   I am not sure if start=X=1000 would give me the correct
results (changing rows=2 or rows=4 gives me totally different results,
which is why I am not confident if I will get the correct results).


Thanks
Obaid




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Solr & ALfresco: infinite write on solr

2016-10-13 Thread Salvatore Pulvirenti

Hi,
we use solr 1.6 to index file inside an alfresco 4.2 application, that 
is not working good anymore and we don't understand why.


When the solr stop to index in the logs, we read this row:
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it


We tried to restart tomcat a lot of times and we observe this strange 
behavior:
- solr start index, everytingh is running good and we can see 24 thread 
working, log say:


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco20] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco2] .. 
checking for path change

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco12] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating

- after some minutes one thread start writing "checking aux doc exists 
in index before we update it" while other thread is working good


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco15] .. 
checking for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco21] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco24] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco14] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. updating

- after some minutes no-one thread is working and in the log a can find 
only the row:


 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. 
checking aux doc exists in index before we update it


After this no-one index was created.

Anyone have any idea what we can do to solve this issue?

Thanks in advice.


Salvo


How to unload solr collections?

2016-10-13 Thread Girish Chafle
We are using Solr 5.2.1 with SolrJ API. To improve/minimize the Solr heap
utilization we would like to explicitly unload Solr collections after
completing the search queries.Is there an API to unload Solr Collections
for SolrCloud?

The real issue we are trying to solve is Solr running into out of memory
errors on searching large amount of data for a given heap setting. Keeping
fixed heap size, we plan to load/unload collections so that we do not let
Solr run out of memory. Any suggestions/help on this is highly appreciated.

Thanks


Solr stop index after few minutes from restart

2016-10-13 Thread Angelo Gaverini
We use solr to index file inside an alfresco application, that is not
working good anymore and we don't understand why.
When the solr stop index in the logs we read this row:
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it

We tried to restart tomcat a lot of times and we observe this strange
behavior:
- solr start index, everytingh is running good and we can see 24 thread
working, log say:

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco20] .. checking
for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. checking
for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco2] .. checking
for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco12] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating

- after some minutes one thread start writing "checking aux doc exists in
index before we update it" while other thread is working good

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco3] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco1] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco15] .. checking
for path change
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco21] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco24] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco14] .. updating
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco9] .. updating

- after some minutes no-one thread is working and in the log a can find
only the row:

 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it
 INFO  [solr.tracker.CoreTracker] [SolrTrackingPool-alfresco4] .. checking
aux doc exists in index before we update it

After this no-one index was created.

I found then in the index directory the file
"lucene-85839584395834953-write.lock" and the creation date is the same as
the first row with the message "checking aux doc exists in index before we
update it" in the logs.

Anyone have any idea what i can do to solve this issue?

Thanks in advice.


INVALID in email address

2016-10-13 Thread hairymcclarey
Anyone know why this appears after my email address when I reply to a thread in 
the user group?

Re: questions about shard key

2016-10-13 Thread hairymcclarey
See here:
https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/

The default is to take 16 bits from the prefix and 16 from the ID.
Not sure about the second part of your question, maybe someone else can answer 
that. 

On Wednesday, October 12, 2016 9:26 PM, "Huang, Daniel" 
 wrote:
 

 Hi,

I was reading about document routing with CompositId 
(https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud).
 The document says that I could prefix a shard key to a document ID like 
“IBM!12345”. It further mentioned that I could specify the number of bit for 
the shard key like “IBM/3!12345”. My question is, what is the default number of 
bit for the shard key when it is not specified? That is, what is n when 
“IBM/n!12345” is equivalent to “IBM/12345”.

Another question is regarding 2-level routing prefix. Does it support number of 
bit as well? For example, does something like “USA/2!IBM/3!12345” work?

Thanks
Daniel

The information contained in this e-mail, and any attachment, is confidential 
and is intended solely for the use of the intended recipient. Access, copying 
or re-use of the e-mail or any attachment, or any information contained 
therein, by any other person is not authorized. If you are not the intended 
recipient please return the e-mail to the sender and delete it from your 
computer. Although we attempt to sweep e-mail and attachments for viruses, we 
do not guarantee that either are virus-free and accept no liability for any 
damage sustained as a result of viruses. 

Please refer to http://disclaimer.bnymellon.com/eu.htm for certain disclosures 
relating to European legal entities.