ping query frequency

2013-03-03 Thread adm1n
Hi,


I'm wonderring how frequent this query should be made. Currently it is done
before each select request (some very old legacy). I googled a little and
found out that it is bad practice and has performance impact. So the
question is should I completely remove it or just do it once in some period
of time.

What is the best practice?


thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: create cores dynamically

2013-03-03 Thread Jilal Oussama
For me, it always creates the data dir next to the conf dir I specified
(this should depend on your core configuration) and load the core into solr
(and I think this is what it is suposed to do)
On Mar 3, 2013 3:18 AM, adeelmahmood adeelmahm...@gmail.com wrote:

 I am not sure if I understand how the creating cores dynamically
 functionality is supposed to work. From what I have sort of figured out is
 that I need to specify the instanceDir as the path to a directory which
 contains the conf file. So I have directory as template for configuration
 files but when I use this path, solr adds the data directory next to this
 template conf directory which defeats the purpose. I was hoping that it
 will
 copy the template files into a new directory created for the core. Is that
 not how its supposed to work.

 Any help is appreciated.

 Thanks
 Adeel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/create-cores-dynamically-tp4044279.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Returning to Solr 4.0 from 4.1

2013-03-03 Thread Dotan Cohen
On Sat, Mar 2, 2013 at 9:32 PM, Upayavira u...@odoko.co.uk wrote:
 What I'm questioning is whether the issue you see in 4.1 has been
 resolved in Subversion. While I would not expect 4.0 to read a 4.1
 index, the SVN branch/4.2 should be able to do so effortlessly.

 Upayavira


I see, thanks. Actually, running a clean 4.1 with no previous index
does not have the issues.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


advice on a backup service of solr collection

2013-03-03 Thread adfel70
Hi,
I want to write a component that will protect solr cluster from indexing
damage.
The component will take down a replica from each shard (backup replicas),
before indexing.
Then the indexing will start. When the indexing is finished, some tests will
be performed and only if they pass, the backup replicas will be brought on.
If some test fails, I will have to check what went wrong in the indexing
process, and in any case, I'll have the backup replicas with which I can
bring the cluster back without loosing any data.


Any insights on this component?
Any advice on the issue?

Can I randomly take one of the replicas in each shard?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/advice-on-a-backup-service-of-solr-collection-tp4044350.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: cant' execute query in solr 4.1 admin page

2013-03-03 Thread Shawn Heisey

On 3/3/2013 12:15 AM, adfel70 wrote:

I used to execute queries in solr admin page for testing purposes.
I updated to solr 4.1 and it seems this feature is not working.
I can see that the query is always sent without and query parameter.


This sounds like this bug:

https://issues.apache.org/jira/browse/SOLR-4393

It looks like the patch for another bug fixes it:

https://issues.apache.org/jira/browse/SOLR-4349

I use 4.2-SNAPSHOT and have not seen this problem on the latest Firefox. 
 I was tempted by new features and never did use the released version.


Thanks,
Shawn



Re: cant' execute query in solr 4.1 admin page

2013-03-03 Thread Stefan Matheis
Hey

Already documented and fixed for 4.2:
https://issues.apache.org/jira/browse/SOLR-4349

Stefan 


On Sunday, March 3, 2013 at 8:15 AM, adfel70 wrote:

 Hi
 I used to execute queries in solr admin page for testing purposes.
 I updated to solr 4.1 and it seems this feature is not working.
 I can see that the query is always sent without and query parameter.
 
 Is this a known bug?
 
 thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/cant-execute-query-in-solr-4-1-admin-page-tp4044297.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).





Re: ping query frequency

2013-03-03 Thread Shawn Heisey

On 3/3/2013 2:15 AM, adm1n wrote:

I'm wonderring how frequent this query should be made. Currently it is done
before each select request (some very old legacy). I googled a little and
found out that it is bad practice and has performance impact. So the
question is should I completely remove it or just do it once in some period
of time.


Can you point me at the place where it says that it's bad practice to do 
frequent pings?  I use the ping functionality in my haproxy load 
balancer that sits in front of Solr.  It executes a ping request against 
all my Solr instances every five seconds.  Most of the time, the ping 
request (which is distributed) finishes in single-digit milliseconds. If 
that is considered bad practice, I want to figure out why and submit 
issues to get the problem fixed.


I can imagine that sending a ping before every query would be a bad 
idea, but I am hoping that the way I'm using it is OK.


The only problem with ping requests that I have ever noticed was caused 
by long garbage collection pauses on my 8GB Solr heap.  Those pauses 
caused the load balancer to incorrectly mark the active Solr instance(s) 
as down and send requests to a backup.


Through experimentation with -XX memory tuning options, I have now 
eliminated the GC pause problem.  For machines running Solr 
4.2-SNAPSHOT, I have reduced the heap to 6GB, the 3.5.0 machines are 
still running with 8GB.


Thanks,
Shawn



Re: create cores dynamically

2013-03-03 Thread Shawn Heisey

On 3/2/2013 8:18 PM, adeelmahmood wrote:

I am not sure if I understand how the creating cores dynamically
functionality is supposed to work. From what I have sort of figured out is
that I need to specify the instanceDir as the path to a directory which
contains the conf file. So I have directory as template for configuration
files but when I use this path, solr adds the data directory next to this
template conf directory which defeats the purpose. I was hoping that it will
copy the template files into a new directory created for the core. Is that
not how its supposed to work.


In solr.xml, instanceDir is relative to solr.solr.home, which defaults 
to solr, relative to the current working directory.  The solr.solr.home 
directory is where solr.xml lives.  Inside instanceDir, solr looks in 
conf to find the config.  The value for dataDir defaults to data, 
relative to instanceDir.  I don't think instanceDir has a default value, 
but I could be wrong about that.


Here's what I use, with solr.solr.home set to /index/solr4:

core loadOnStartup=true instanceDir=cores/ncmain/ transient=false 
name=ncmain dataDir=../../data/ncmain/


This means that all config directories are under /index/solr4/cores and 
all index directories are under /index/solr4/data ... I can easily 
delete all the data without touching the configs.


For another installation where I'm using SolrCloud, it is set up a 
almost the same, but it's still separated because the active configs are 
in zookeeper, with a backup copy on the disk.


Thanks,
Shawn



Re: create cores dynamically

2013-03-03 Thread adeelmahmood
Well this is all useful information but not sure if it really answers my
question. Let me rephrase what exactly I am trying to do. Lets say I start
with core0 so this is how the directory structure looks like

solr.home
- solr.xml
- core0
  - conf
  - data

now when I want to dynamically add core1 i want to end up with a structure
like this

solr.home
- solr.xml
- core0
  - conf
  - data
- core1
  - conf
  - data

is this possible with dynamic core creation, to have a seperate directory
with conf and data directory inside it for each core separetely

Thanks for the help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/create-cores-dynamically-tp4044279p4044389.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: create cores dynamically

2013-03-03 Thread Shawn Heisey

On 3/3/2013 12:08 PM, adeelmahmood wrote:

Well this is all useful information but not sure if it really answers my
question. Let me rephrase what exactly I am trying to do. Lets say I start
with core0 so this is how the directory structure looks like

solr.home
- solr.xml
- core0
   - conf
   - data

now when I want to dynamically add core1 i want to end up with a structure
like this

solr.home
- solr.xml
- core0
   - conf
   - data
- core1
   - conf
   - data

is this possible with dynamic core creation, to have a seperate directory
with conf and data directory inside it for each core separetely


Yes, you can do this.  You'll need to create the new core directory and 
its conf directory before creating the core within Solr, Solr doesn't do 
that part for you.  Solr will automatically create the data directory, 
though.


On my setup, I don't create cores dynamically.  I have a central config 
directory and I create symlinks within the individual core config 
directories back to the central.


Thanks,
Shawn



Making tika process mail attachments eludes me

2013-03-03 Thread Leif Hetlesæther

Been trying for a while to create an index of a mailbox.
I have downloaded solr-4.1.0.tgz, configured 
example/example-DIH/solr/mail/conf/data-config.xml and emails are 
indexed, but the attachmens eludes me. The config says: Note - In order 
to index attachments, set processAttachement=true and drop Tika and 
its dependencies to example-DIH/solr/mail/lib  directory


Have tried dropping files from the contrib/extract/lib, but no luck. My 
friend Google seems to be unable to help me.


Do I need to modify schema.xml or solrconfig.xml ? Cannot see any trace 
of Tika or errors in my logfile.


Does it exist a working example to index mails and attachments somewhere 
to download?


--
Regards
Leif Hetlesæther



Re: ping query frequency

2013-03-03 Thread Amit Nithian
We too run a ping every 5 seconds and I think the concurrent Mark/Sweep
helps to avoid the LB from taking a box out of rotation due to long pauses.
Either that or I don't see large enough pauses for my LB to take it out
(it'd have to fail 3 times in a row or 15 seconds total before it's gone).

The ping query does execute an actual query so of course you want to make
this as simple as possible (i.e. q=primary_key:value) so that there's
limited to no scanning of the index. I think our query does an id:0 which
would always return 0 docs but also any stupid-simple query is fine so long
as it hits the caches on subsequent hits. The goal, to me at least, is not
that the ping query yields actual docs but that it's a mechanism to remove
a solr server out of rotation without having to login to an ops
controlled device directly.

I'd definitely remove the ping per request (wouldn't the fact that you are
doing /select serve as the ping and hence defeat the purpose of the ping
query) and definitely do the frequent ping as we are describing if you want
to have your solr boxes behind some load balancer.


On Sun, Mar 3, 2013 at 8:21 AM, Shawn Heisey s...@elyograg.org wrote:

 On 3/3/2013 2:15 AM, adm1n wrote:

 I'm wonderring how frequent this query should be made. Currently it is
 done
 before each select request (some very old legacy). I googled a little and
 found out that it is bad practice and has performance impact. So the
 question is should I completely remove it or just do it once in some
 period
 of time.


 Can you point me at the place where it says that it's bad practice to do
 frequent pings?  I use the ping functionality in my haproxy load balancer
 that sits in front of Solr.  It executes a ping request against all my Solr
 instances every five seconds.  Most of the time, the ping request (which is
 distributed) finishes in single-digit milliseconds. If that is considered
 bad practice, I want to figure out why and submit issues to get the problem
 fixed.

 I can imagine that sending a ping before every query would be a bad idea,
 but I am hoping that the way I'm using it is OK.

 The only problem with ping requests that I have ever noticed was caused by
 long garbage collection pauses on my 8GB Solr heap.  Those pauses caused
 the load balancer to incorrectly mark the active Solr instance(s) as down
 and send requests to a backup.

 Through experimentation with -XX memory tuning options, I have now
 eliminated the GC pause problem.  For machines running Solr 4.2-SNAPSHOT, I
 have reduced the heap to 6GB, the 3.5.0 machines are still running with 8GB.

 Thanks,
 Shawn




Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?

2013-03-03 Thread Mark Bennett
Hi Mikhail,

Thanks for the links, looks like interesting stuff.

Sadly this project is stuck in 3.x for some very thorny reasons...

Googling around, looks like this might be strictly 4.x...

On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Mark,

 AFAIK

 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
 a convenient framework for such juggling.
 Please also be aware of the good starting point

 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html



 On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com
 wrote:

  Scenario:
 
  You're submitting a block of text as a query.
 
  You're content to let solr / lucene handing query parsing and
 tokenziation,
  etc.
 
  But you'd like to have ALL eventually produced leaf-nodes in the parse
 tree
  to have:
  * Boolean .MUST (effectively a + prefix)
  * Fuzzy match of ~1 or ~2
 
  In a simple application, and if there were no punctuation, you could
  preprocess the query, effectively:
  * split on whitespace
  * for t in tokens: t = + + t + ~2
 
  But this is ugly, and even then I think things like stop words would be
  messed up:
  * OK in Solr:   the chair(it can properly remove the)
  * But if this:+the~2  +chair~2   (I'm not sure this would work)
 
  Sure, at the application level you could also remove the stop words in
 the
  for t in tokens loop, but then some other weird case would come up.
  Maybe one of the field's analyzers has some other token filter you forgot
  about, so you'd have to bring that logic forward as well.
 
  (Long story of why I'd want to do all this... and I know people think
  adding ~2 to all tokens will give bad results anyway, trying to fix
  inherited code that can't be scrapped, etc)
 
  --
  Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
  Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: What makes an Analyzer/Tokenizer/CharFilter/etc suitable for Solr?

2013-03-03 Thread Alexandre Rafalovitch
Thanks Jack.

On Thu, Feb 28, 2013 at 11:04 PM, Jack Krupansky j...@basetechnology.comwrote:

 The package Javadoc for Solr analysis is a good start:

 http://lucene.apache.org/solr/**4_1_0/solr-core/org/apache/**
 solr/analysis/package-tree.**htmlhttp://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/analysis/package-tree.html


Actually, this is representative of why I am writing my own utility. That
package tree does not actually make it easy to see all the derivative
classes, as they are hiding behind the multiple levels of abstraction. I am
not saying it is terribly hard. Still, for a non-Java programmer who is
just stepping out of Solr as a black box and trying to understand what can
be plugged-in in various configurations to improve their results, it is
non-trivial first couple of times. Especially, since it is not just the
class name that is important but also which jar need to be added to the
library statement.

My (preliminary) output for the CharFilters looks like this:
 -CharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.util)
 HTMLStripCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter)
 MappingCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter)
 PersianCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.fa)
 JapaneseIterationMarkCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.1.0.jar/org.apache.lucene.analysis.ja)
 PatternReplaceCharFilterFactory
(example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.pattern)
 LegacyHTMLStripCharFilterFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.analysis)
 MockCharFilterFactory
(dist/solr-test-framework-4.1.0.jar/org.apache.solr.analysis)

And (part of) URP tree:
 -UpdateRequestProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 UIMAUpdateRequestProcessorFactory
(dist/solr-uima-4.1.0.jar/org.apache.solr.uima.processor)
 -AbstractDefaultValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 DefaultValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 TimestampUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 UUIDUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 CloneFieldUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 DistributedUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 -FieldMutatingUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 ConcatFieldUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 CountFieldValuesUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 FieldLengthUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 -FieldValueSubsetUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 FirstFieldValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)
 LastFieldValueUpdateProcessorFactory
(dist/solr-core-4.1.0.jar/org.apache.solr.update.processor)


- at the start is abstract class, I also have * (not here) for classes
without empty constructor (hence my original question).



 Especially the AbstractAnalysisFactory:

 http://lucene.apache.org/core/**4_1_0/analyzers-common/org/**
 apache/lucene/analysis/util/**AbstractAnalysisFactory.htmlhttp://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html

This is useful and confirms my 'empty-constructor' assumption.


 Also, look at the various factories in solrconfig.xml for other Solr
 extension points. Including search components, spellcheckers, etc.

Will do. I was just wondering if there was a semi-comprehensive list. But I
can build it iteratively.

 Regards,
   Alex.


 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Thursday, February 28, 2013 10:32 PM
 To: solr-user@lucene.apache.org
 Subject: What makes an Analyzer/Tokenizer/CharFilter/**etc suitable for
 Solr?


 Hello,

 I want to have a unified reference of all different processors one could
 use in Solr in various extension points.

 I have written a small tool to extract all implementations
 of UpdateRequestProcessorFactory, Analyzer, CharFilterFactory, etc
 (actually of any root class).

 However, I assume not all Lucene Analyzer derivatives can be just plugged
 into Solr.

 Is it fair to say that the class must:
 *) 

Re: Backtick character in field values, and results

2013-03-03 Thread Neelesh
Thank you! This saved me a lot of trouble!


On Thu, Feb 28, 2013 at 4:39 AM, Erick Erickson erickerick...@gmail.comwrote:

 ICUFoldingFilterFactory is folding the backtick (grave accent).

 See admin/analysis page, it's a lifesaver in these situations!

 Best
 Erick


 On Fri, Feb 22, 2013 at 3:46 PM, Neelesh neele...@gmail.com wrote:

  With a text_unbroken field
  fieldType name=text_unbroken class=solr.TextField omitNorms=true
  omitTermFreqAndPositions=true analyzer tokenizer class=
  solr.KeywordTokenizerFactory / filter class=
  solr.ICUFoldingFilterFactory / /analyzer /fieldType
  A query like
  field:Hello` matches both Hello and Hello`. This does not happen with
  something like +. That is,
  field:Hello+ does not match Hello, but only matches Hello+
  Is there something special about backticks? Are there more such really
  special characters?
 
  Thanks!
  -neelesh
 



Re: Backtick character in field values, and results

2013-03-03 Thread Shawn Heisey

On 2/28/2013 5:39 AM, Erick Erickson wrote:

ICUFoldingFilterFactory is folding the backtick (grave accent).

See admin/analysis page, it's a lifesaver in these situations!


Is this the way it's supposed to behave?  From what I could tell in my 
look at the analysis page, it is folding the backtick into nothing.  It 
happens on both 3.5.0 and 4.2-SNAPSHOT.


The filter doesn't seem to have this behavior with any of the regular 
punctuation that I have tested.  I use the ICUFoldingFilterFactory in my 
schema for most of my fields.


Thanks,
Shawn



atomic updates fail with solrcloud , and real time get throwing NPE

2013-03-03 Thread mike st. john
atomic updates are failing in solrcloud , unless the update is sent to 
the shard where the doc resides.  Real time get is throwing NPE when run 
without distrib=false


tried with 4.1 and 4.2 snapshot.


Any ideas?


Thanks.

msj