date:20130807

On 8/7/2013 12:13 AM, Per Steffensen wrote:
 Is there a way I can configure Solrs so that it handles its shared
 completely in memory? If yes, how? No writing to disk - neither
 transactionlog nor lucene indices. Of course I accept that data is lost
 if the Solr crash or is shut down.

The lucene index part can be done using RAMDirectoryFactory.  It's
generally not a good idea, though.  If you have enough RAM for that,
then you have enough RAM to fit your entire index into the OS disk
cache.  I don't think you can really do anything about the transaction
log being on disk, but I could be incorrect about that.

Relying on the OS disk cache and the default directory implementation
will usually give you equivalent or better query performance compared to
putting your index into JVM memory.  You won't need a massive Java heap
and the garbage collection problems that it creates.  A side bonus: you
don't lose your index when Solr shuts down.

If you have extremely heavy indexing, then RAMDirectoryFactory might
work better -- assuming you've got your GC heavily tuned.  A potentially
critical problem with RAMDirectoryFactory is that merging/optimizing
will require at least twice as much RAM as your total index size.

Here's a complete discussion about this:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

NB: That article was written for 3.x, when NRTCachingDirectoryFactory
(the default in 4.x) wasn't available.  The NRT factory *uses*
MMapDirectory.

Thanks,
Shawn

RE: entity classification solr

2013-08-07 Thread Markus Jelsma

Yes, you can copyField the source's contents to another field, use the 
KeepWordTokenFilter to keep only those words you really care about. Using 
(e)dismax you can then apply a heavy boost on the field. All special words in 
that field will show up higher if queried for. 

-Original message-
 From:smanad sma...@gmail.com
 Sent: Wednesday 7th August 2013 3:23
 To: solr-user@lucene.apache.org
 Subject: entity classification solr

 I have the following situation when using Solr 4.3. 
 My document contains entities for example peanut butter. I have a list
 of such entities. These are items that go together and are not to be treated
 as two individual words. During indexing, I want solr to realize this and
 treat peanut butter as an entity. For example if someone searches for

 peanut

 then documents that have the word peanut should rank higher than documents
 that have the word peanut butter. However if someone searches for

 peanut butter

 then the document that has peanut butter should show up higher than ones
 that have just peanut. Is there a config setting somewhere which can be
 modified such that the entity list can be specified in a file and Solr would
 do the needful?

 Should I be using KeepWordFilterFactory for this? 

 Any pointers will be much appreciated.
 Thanks, 
 -Manasi

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: Large config files in SolrCloud

2013-08-07 Thread Markus Jelsma

With SOLR-5115 there's support for forcing ZkResourceLoader to use 
SolrResourceLoader using file:/// prefix in your schema. This forces Solr to 
load files from the FS as usual.
https://issues.apache.org/jira/browse/SOLR-5115

-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Friday 2nd August 2013 15:27
 To: solr-user@lucene.apache.org
 Subject: RE: Large config files in SolrCloud

 Ok, i managed to load a config file from the node's sharedLib directory by 
 pointing it to `../../lib/file`. This works fine in normal mode, in cloud 
 mode all it does is attempting to find it in Zookeeper by trying 
 /configs/COLLECTION_NAME/path. Properties such as ${solr.home} are not 
 recognized in the schema, it seems.

 Any idea on how to force Solr in cloud mode to grab the file from the local 
 FS?

 Thanks,
 Markus

 -Original message-
  From:Markus Jelsma markus.jel...@openindex.io
  Sent: Friday 2nd August 2013 14:34
  To: solr-user@lucene.apache.org
  Subject: RE: Large config files in SolrCloud

  Yes, all the usual config files are well under 1MB and work as expected. 
  This file is under 2MB and the limit i set is 5MB. Setting jute.maxbuffer 
  (all lowercase) did work during a test a long time ago but we'd like to put 
  the new features in production and we're stuck at this trivial issue :)

  Thanks 

  -Original message-
   From:Erick Erickson erickerick...@gmail.com
   Sent: Friday 2nd August 2013 14:28
   To: solr-user@lucene.apache.org
   Subject: Re: Large config files in SolrCloud

   Hmmm, does it work with smaller config files? There's been a limit of 1M
   for ZK files, and I'm wondering if your setup would work with, say, 2M
   configs as a check that it's something else rather than just the 1M 
   limit.

   FWIW,
   Erick

   On Fri, Aug 2, 2013 at 8:18 AM, Markus Jelsma 
   markus.jel...@openindex.iowrote:

Hi,

I have a few very large configuration files but it doens't work in cloud
mode due to the KeeperException$ConnectionLossException. All 10 Solr 
nodes
run trunk and have jute.maxbuffer set to 5242880 (5MB). I can confirm 
it is
set properly by looking at the args in the Solr GUI. All Zookeepers have
exactly the same key/value set, i can confirm this by looking with ps at
the process, it is really there, the first parameter.

But it doesn't work!

I have had it working once but it doesn't like me anymore. Putting the
config files in the node's sharedLib or the core's lib directory doesn't
work either, the files are clearly loaded according to the logs but the
TokenFilters cannot access them and complain about a config file not 
being
found.

I'm out of ideas, any to share?

Thanks,
Markus

Re: In-memory collections?

2013-08-07 Thread Per Steffensen


On 8/7/13 9:04 AM, Shawn Heisey wrote:

On 8/7/2013 12:13 AM, Per Steffensen wrote:

Is there a way I can configure Solrs so that it handles its shared
completely in memory? If yes, how? No writing to disk - neither
transactionlog nor lucene indices. Of course I accept that data is lost
if the Solr crash or is shut down.

The lucene index part can be done using RAMDirectoryFactory.  It's
generally not a good idea, though.  If you have enough RAM for that,
then you have enough RAM to fit your entire index into the OS disk
cache.  I don't think you can really do anything about the transaction
log being on disk, but I could be incorrect about that.

Relying on the OS disk cache and the default directory implementation
will usually give you equivalent or better query performance compared to
putting your index into JVM memory.  You won't need a massive Java heap
and the garbage collection problems that it creates.  A side bonus: you
don't lose your index when Solr shuts down.

If you have extremely heavy indexing, then RAMDirectoryFactory might
work better -- assuming you've got your GC heavily tuned.  A potentially
critical problem with RAMDirectoryFactory is that merging/optimizing
will require at least twice as much RAM as your total index size.

Here's a complete discussion about this:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

NB: That article was written for 3.x, when NRTCachingDirectoryFactory
(the default in 4.x) wasn't available.  The NRT factory *uses*
MMapDirectory.

Thanks,
Shawn



Thanks, Shawn

The thing is that this will be used for a small ever-changing 
collection. In our system we load a lot of documents into a SolrCloud 
cluster. A lot of processes across numerous machines work in parallel on 
loading those documents. Those processes needs to coordinate (hold each 
other back) from time to time and they do so by taking distributed 
locks. Until now we have used the ZooKeeper cluster at hand for taking 
those distributed locks, but the need for locks is so heavy that it 
causes congestion in ZooKeeper, and ZooKeeper really cannot scale in 
that area. We could use several ZooKeeper clusters, but we have decided 
to use a locking collection in Solr instead - that will scale. You can 
implement locking in Solr using versioning and optimistic locking. So 
this collection will at any time just contain the few locks (counted in 
max a few hundreds) that are current right now. Lots of locks will be 
taken, but each of them will only exist in a few ms before deleted 
again. Therefore it will not take up a lot of memory, I guess?


Guess we will try RAMDirectoryFactory, and I will look into how we can 
avoid Solr transactionlog being written (to disk at least).


Regards, Per Steffensen

Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)

2013-08-07 Thread Daniel Collins

I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but
hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec).

I then had to back-out and return to a 4.3 system, and got an error when it
tried to read the index.

Now, it was only a dev system, so not a problem, and normally I would use
restore a backup anyway, but shouldn't this work?  If I haven't changed the
codec, then Solr 4.4 should be using the same code as 4.3, so the data
should be compatible, no?

I noticed its in DocValues, but I thought they were supposed to be
compatible using the default format which we do?

Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format
version is not supported (resource:
NIOFSIndexInput(path=/bb/news/search/solr/main/data/index/_3bs_Lucene42_0.dvm)):
1 (needs to be between 0 and 0)
at
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148)
at
org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130)
at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.init(Lucene42DocValuesProducer.java:84)
at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesFormat.fieldsProducer(Lucene42DocValuesFormat.java:133)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:213)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:282)
at
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:134)
at
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
at
org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
... 18 more


Cheers, Daniel

Re: Solr Split Shard - Document loss and down time

2013-08-07 Thread Ranjith Venkatesan

Hi Erick,

I have a question. Suppose if any error occurred during shard split , is
there any approach to revert back the split action? .  This is seriously
breaking my head. For me documents are getting lost when any of the node for
that shard is dead when split shard is in progress. 

Thanks

Ranjith



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data Import from MYSQL and POSTGRESQL

2013-08-07 Thread Spadez

For the data import handler I have moved he mysql and postgresql jar files to
the solr lib directory (/opt/solr/lib).

My issue is in the data-config.xml I have put two datasources, however, I am
stuck on what to put for the driver values and the urls.

dataSource name=quot;mysqlquot; driver=quot;lt;driver url=url
user=user password=pass /
dataSource name=quot;postgresqlquot; driver=quot;lt;driver
url=url user=user password=pass/

Is anyone able to tell me hat I should be putting for these values please?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-from-MYSQL-and-POSTGRESQL-tp4082974.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

2013-08-07 Thread Anshum Gupta

Hi Ranjith,

Here are a few things to note about shard split:
1. The command auto-retries. Also, if there's something that went wrong
during a split, you should wait for it to complete.
2. In case of a failure, the parent shard is supposed to be intact and the
new sub-shards wouldn't replace the parent shard.
3. If you tried using 4.3.*, the commit isn't called and so the documents
wouldn't be visible on the subshards unless you call an explicit commit.
Having said that, I'd highly recommend you not to use 4.3 for trying to
shard splitting.

Can you explain further by what you mean by documents are getting lost?
AFAIR, the code is supposed to handle failure midway through the shard
split call, including dead leader/overseer.


On Wed, Aug 7, 2013 at 3:07 PM, Ranjith Venkatesan
ranjit...@zohocorp.comwrote:

 Hi Erick,

 I have a question. Suppose if any error occurred during shard split , is
 there any approach to revert back the split action? .  This is seriously
 breaking my head. For me documents are getting lost when any of the node
 for
 that shard is dead when split shard is in progress.

 Thanks

 Ranjith



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net

Data Import Handler Help

2013-08-07 Thread Alejandro Calbazana

Hi,

I'm looking for a bit of guidance in implementing a data import handler for
mongodb.

I am using
https://github.com/sucode/solrMongoDBImporter/blob/master/README.md as a
starting point, and I can get full imports working properly with a few
adjustments to the source.   The problem comes in when I try delta
imports.  After adding code to support delta queries and looking at how the
sql import handler works, I get deltas reads but the counts grow out of
control.  Its as if DocBuilder does not know when to stop processing.
Example: I have one doc to be read but I get 2 docs added/updated.

Has anyone seen this before?  Using 4.2.0.

Thanks

Solr doesn't make indexes for all the enteries

2013-08-07 Thread Kamaljeet Kaur

Hello,
I am a newbie to solr. I have installed and configured it with my django
project. I am using the following versions:
django-haystack - 2.0.0
ApacheSolr - 3.5.0
Django - 1.4
mysql - 5.5.32-0

Here is the model, whose data I want to index: http://tny.cz/422c5fb7
Here is search_indexes.py: http://tny.cz/8de95043

Have created the file templates/search/indexes/myapp/userprofile_text.txt
Have created a template to show the results after querying from database.

I have build schema using the command $ ./manage.py build_solr_schema
and replaced the contents of example/solr/conf/schema.xml with the output.
Here is it http://tny.cz/49fe8e1d

When I use the command $ ./manage.py rebuild_index to create indexes, it
shows:

WARNING: This will irreparably remove EVERYTHING from your search index in
connection 'default'.
Your choices after this are to restore from backups or rebuild via the
`rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 18 user profiles

But indexes are shown only for 10 user profiles. I am seeing the indexes
here:

http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

Only those userprofiles are shown, when searched for. Why does it happen?
And how to solve this issue?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-doesn-t-make-indexes-for-all-the-enteries-tp4082977.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: poor facet search performance

2013-08-07 Thread Toke Eskildsen

On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote:

[Custom facet structure]

Then we sorted those sets of facet fields by total document frequency
so we enumerate the more frequent facet fields first, and we stop
looking when we find a facet field which has less total document
matches than the top N facet counts we are looking for.

So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N
cut-off is an interesting optimization for that.

[...]

The slaves just pre-load that binary structure directly into ram in one
shot in the background when opening a new snapshot for search.

We used a similar pre-calculation some years ago but abandoned it as the
cost of Pre-generate_structure + #duplicates * (distribute_structure +
open_structure) was just as costly and less flexible than #duplicates
* generate_structure for us.

We have 200 million docs, 10 shards, about 20 facet fields, some of
which contain about 20,000 unique values. We show top 10 facets for
about 10 different fields in results page. We provide search results
with lots of facets and date counts in around 200-300ms using this
technique.

Currently, we are porting this entire system to SOLR. For a single
core index of 8 million docs, using similar documents and facet fields
from our production indexes, I cant get faceted search to perform
anywhere close to 300ms for general searches. More like 1.5-3
seconds.

Solr fc faceting treats each facet independently and in a docID-facet
manner so what happens is

foreach facet {
foreach docIDinResultSet {
foreach tagIDinDocument {
facet.counter[tagID]++
}
}
}

With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count
it 80M. That does not normally take 1.5-3 seconds in, so something seems
off. Do you have a lot of facet tags (aka terms) for each document?

Is there anything else that I should look into for getting better facet
performance?

Could you list the part of the Solr log with the facet structures? Just
grep for UnInverted. They look something like this:
UnInverted multi-valued field
{field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0}

Given these metrics (200m docs, 20 facet fields, some fields with
20,000 unique values), what kind of facet search performance should I
expect?

Due to the independent faceting handling in Solr, the facet time will
scale a bit worse than linear to the number of documents, relative to
your test setup. With a loop count of 200M*10 (or 20? I am a bit
confused on how many facets you show at a time) = 2G, this will take
multiple seconds. Unless you go experimental (SOLR-2412 to bang my own
drum), your facet count needs to go down or you need to shard with Solr.

Also we need to issue frequent commits since we are constantly
streaming new content into the system.

You could use a setup with a smaller live shard and multiple stale
ones, but depending on corpus your ranking might suffer.

- Toke Eskildsen, State and University Library, Denmark

Solr 4.4 ShingleFilterFactroy exception

2013-08-07 Thread Prasi S

Hi,
I have setup solr 4.4 with cloud. When i start solr, I get an exception as
below,

*ERROR [CoreContainer] Unable to create core: mycore_sh1:
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
fieldType text_shingle: Plugin init failure for [schema.xml]
analyzer/filter: Error instantiating class:
'org.apache.lucene.analysis.shingle.ShingleFilterFactory'*
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) [:4.4.0
1504776 - sarowe - 2013-07-19 02:58:35]
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[:1.6.0_43]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
[:1.6.0_43]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[:1.6.0_43]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
[:1.6.0_43]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
[:1.6.0_43]
at java.lang.Thread.run(Thread.java:662) [:1.6.0_43]
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: Error instantiating class:
'org.apache.lucene.analysis.shingle.ShingleFilterFactory'



The same file works well with solr 4.2. Pls help.


Thanks,
Prasi

Re: Multiple sorting does not work as expected

Well, at least it's not throwing an error G.

Sorting on a tokenized field is not supported,
or rather the behavior is undefined. Your Name
field is tokenized if it's the stock text_en field.

Best
Erick


On Tue, Aug 6, 2013 at 11:03 AM, Mysurf Mail stammail...@gmail.com wrote:

 I don't see how it is sorted.
 this is the order as displayed above

 1- BOM Total test2
 2- BOM Total test - Copy
 3- BOM Total test2

 all in the same  2.2388418 score


 On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  The Name field is sorted as you have requested - desc. I suspect that
  you wanted name to be sorted asc (natural order.)
 
  -- Jack Krupansky
 
  -Original Message- From: Mysurf Mail
  Sent: Tuesday, August 06, 2013 10:22 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Multiple sorting does not work as expected
 
 
  my schema
 
  
  field name=Name type=text_en indexed=true stored=true
  required=true/
  field name=Version type=int indexed=true stored=true
  required=true/
  
 
 
 
  On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail stammail...@gmail.com
 wrote:
 
   My documents has 2 indexed attribute - name (string) and version
 (number)
  I want within the same score the documents will be displayed by the
  following order
 
  score(desc),name(desc),**version(desc)
 
  Therefor I query using :
 
  http://localhost:8983/solr/**vault/select
 http://localhost:8983/solr/vault/select
  ?
 q=BOMfl=*:score
 sort=score+desc,Name+desc,**Version+desc
 
  And I get the following inside the result:
 
  doc
 str name=NameBOM Total test2/str
 ...
 int name=Version2/int
 ...
 float name=score2.2388418/float
  /doc
  doc
 str name=NameBOM Total test - Copy/str
 ...
 int name=Version2/int
 ...
 float name=score2.2388418/float
  /doc
  doc
str name=NameBOM Total test2/str
...
int name=Version1/int
...
float name=score2.2388418/float
  /doc
 
  The scoring is equal, but the name is not sorted.
 
  What am I doing wrong here?

Re: Solr 4.4 ShingleFilterFactroy exception

2013-08-07 Thread Prasi S

Any suggestions pls?


On Wed, Aug 7, 2013 at 5:17 PM, Prasi S prasi1...@gmail.com wrote:

 Hi,
 I have setup solr 4.4 with cloud. When i start solr, I get an exception as
 below,

 *ERROR [CoreContainer] Unable to create core: mycore_sh1:
 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
 fieldType text_shingle: Plugin init failure for [schema.xml]
 analyzer/filter: Error instantiating class:
 'org.apache.lucene.analysis.shingle.ShingleFilterFactory'*
  at
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
 at
 org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
  at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
 at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
 [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 [:1.6.0_43]
  at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 [:1.6.0_43]
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 [:1.6.0_43]
 at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 [:1.6.0_43]
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 [:1.6.0_43]
  at java.lang.Thread.run(Thread.java:662) [:1.6.0_43]
 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
 [schema.xml] analyzer/filter: Error instantiating class:
 'org.apache.lucene.analysis.shingle.ShingleFilterFactory'



 The same file works well with solr 4.2. Pls help.


 Thanks,
 Prasi

Re: 'Optimizing' Solr Index Size

The general advice is to not merge (optimize) unless your
index is relatively static. You're quite correct, optimizing
simply recovers the space from deleted documents, otherwise
it won't change much (except having fewer segments).

Here's a _great_ video that Mike McCandless put together:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

But in general _whenever_ segments are merged, the
resulting segment will have all the data from deleted docs
removed, and segments are merged continually when
data is being added to the index.

Quick-n-dirty way to estimate the space savings
optimize will give you. Look at the admin page for the core and
the ratio of deleted docs to numDocs is about the unused
space that would be regained by an optimize. From there it's
your call G...

Best
Erick


On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 To maybe answer another one of my questions about the 50Gb recovered when
 running:

 curl '

 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
 '

 It looks to me that it was from deleted docs being completely removed from
 the index.

 Thanks



 On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

  Well, I guess I can answer one of my questions which I didn't exactly
  explicitly state, which is: how do I force solr to merge segments to a
  given maximum. I forgot about doing this:
 
  curl '
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
  '
 
  which reduced the number of segments in my index from 12 to 10.
 Amazingly,
  it also reduced the space used by almost 50Gb. Is that even possible?
 
  Thanks again
  Brendan
 
 
 
  On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
  brendan.grain...@gmail.com wrote:
 
  Hi All,
 
  First of all, what I was actually trying to do is actually get a little
  space back. So if there is a better way to do this by adjusting the
  MergePolicy or something else please let me know. My index is currently
  200Gb. In the past (Solr 1.4) we've found that optimizing the index will
  double the size of the index temporarily then usually when it's done we
 end
  up with a smaller index and slightly faster search query times.
 
  Should I even bother optimizing? My impression was that with the
  TieredMergePolicy this would be less necessary. Would merging segments
 into
  larger ones save any space and if so is there a way to tell solr to do
 that?
 
  Thanks
  Brendan
 
 
 
 
  --
  Brendan Grainger
  www.kuripai.com
 



 --
 Brendan Grainger
 www.kuripai.com

Re: external zookeeper with SolrCloud

Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4
was carried out on all machines?

Erick


On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Machines are definitely up. Solr4 node and zookeeper instance share the
 machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know
 about the zk instances.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 06, 2013 5:03 PM
 To: solr-user@lucene.apache.org
 Subject: Re: external zookeeper with SolrCloud

 First off, even 6 ZK instances are overkill, vast overkill. 3 should be
 more than enough.

 That aside, however, how are you letting your Solr nodes know about the zk
 machines?
 Is it possible you've pointed some of your Solr nodes at specific ZK
 machines
 that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3

 Best
 Erick


 On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,
 
  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
  We have 6 zookeeper instances. We are planning to change to odd number of
  zookeeper instances.
 
  With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
  never connects to zookeeper (can't see the admin page) until all
 zookeeper
  instances are up and we restart all solr nodes. It was suggested that it
  could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and
  this bug is solved in Solr 4.4
 
  We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of
 6
  zookeeper instances and then brought up all ten Solr4 nodes. We kept
 seeing
  this exception in Solr logs:
 
  751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
  0x0 for server null, unexpected error, closing socket connection and
  attempting reconnect java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  at
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
  at
  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 
  And after a while saw this exception.
 
  INFO  - 2013-08-05 22:24:07.582;
  org.apache.solr.common.cloud.ConnectionManager; Watcher
  org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
 Watcher:
  qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
  qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
  event WatchedEvent state:SyncConnected type:None path:null path:null
  type:None
  INFO  - 2013-08-05 22:24:07.662;
  org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status
  change trigger but we are already closed
  754311 [main-EventThread] INFO
   org.apache.solr.common.cloud.ConnectionManager  ? Client-ZooKeeper
 status
  change trigger but we are already closed
 
  We brought up all zookeeper instances but the cloud never came up until
  all solr nodes were restarted. Do we need to change any settings? After
  weekend reboot, all zookeeper instances come up one by one. While
 zookeeper
  instances are coming up solr nodes are also getting started. With this
  issue, we have to put checks to make sure all zookeeper instances are up
  before we bring up any solr node.
 
  Thanks!!
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, June 11, 2013 10:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: external zookeeper with SolrCloud
 
 
  On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com
 wrote:
 
   Thanks Mark.
  
   Looks like this bug is fixed in Solr 4.4. Do you have any date for
  official release of 4.4?
 
  Looks like it might come out in a couple of weeks.
 
   Is there any instruction available on how to build Solr 4.4 from SVN
  repository?
 
  It's java, so it's pretty easy - you might find some help here:
  http://wiki.apache.org/solr/HowToContribute
 
  - Mark
 
  
   -Original Message-
   From: Mark Miller [mailto:markrmil...@gmail.com]
   Sent: Monday, June 10, 2013 8:05 PM
   To: solr-user@lucene.apache.org
   Subject: Re: external zookeeper with SolrCloud
  
   This might be https://issues.apache.org/jira/browse/SOLR-4899
  
   - Mark
  
   On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
  
   Hi,
  
  
  
   We're setting up 5 shard SolrCloud with external zoo keeper. When we
  bring up Solr nodes while the zookeeper instance is not up and running,
 we
  see this error in Solr logs.
  
  
  
   java.net.ConnectException: Connection refused
  
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  
 at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  
 at

Re: Transform data at index time: country - continent

Walter:

Oooh, nice! One could even use a copyField if one wanted to
keep them separate...

Erick


On Tue, Aug 6, 2013 at 12:38 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Would synonyms help? If you generate the query terms for the continents,
 you could do something like this:

 usa = continent-na
 canada = continent-na
 germany = continent-europe

 und so weiter.

 wunder

 On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote:

  Am 05.08.2013 15:52, schrieb Jack Krupansky:
  You can write a brute force JavaScript script using the StatelessScript
  update processor that hard-codes the mapping.
 
  I'll probably do something like this. Unfortunately I have no influence
  on the original db itself, so I have fix this in solr.
 
  Cheers
  Chris
 
 
  --
  Zoologisches Forschungsmuseum Alexander Koenig
  - Leibniz-Institut für Biodiversität der Tiere -
  Adenauerallee 160, 53113 Bonn, Germany
  www.zfmk.de
 
  Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
  Sitz: Bonn

Re: Solr design. Choose Cores or Shards?

Shards are special cores, usually hosted on separate machines
that comprise one single large (logical) index. Shards need to
have the same schema, config, etc usually.

So unless you have a corpus that's too large to fit on a single piece
of your hardware, you'll always be using cores. And since your
cores have different types of data (and presumably use different
schemas), you're talking cores.

Best
Erick




On Tue, Aug 6, 2013 at 10:49 PM, manju16832003 manju16832...@gmail.comwrote:

 Hi,
 I have a confusion over choosing Cores or Shards for the project scenario.
 My scenario is as follows
 I have three entities
  1. Customers
  2. Product Info
  3. Listings [Contains all the listings posted by customer based on
 product]

 I'm planning to design Solr structure for the above scenario like this
  1. Customers Core
  2. Product Info Core
  3. Listings Core
  4. Searchable Listing Core [Indexing searchable parameters selected from
 Listings, Product Info and Customer entities].

 Having in mind that there wouldn't be much updates to Customers, Product
 Info. There will be regular updates to Listings that in turn I need to
 update Searchable listings that I could manage it.

 My Confusion is it feasible to choose many cores or use shards. I do not
 have much experience on how shards works and why they are used for. I would
 like to know the suggestions :-) for the design like this.
 What are the implications if I were to choose to use many cores and handle
 stuff at application level calling different cores.

 Thanks




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-design-Choose-Cores-or-Shards-tp4082930.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Split Shard - Document loss and down time

2013-08-07 Thread Ranjith Venkatesan

I have explained in the above post with screenshots. Indexing gets failed
when any node is down and also shard splitting is in progress



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082994.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: external zookeeper with SolrCloud

2013-08-07 Thread Raymond Wiker

You said earlier that you had 6 zookeeper instances, but the zkHost param
only shows 5 instances... is that correct?


On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Machines are definitely up. Solr4 node and zookeeper instance share the
 machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know
 about the zk instances.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 06, 2013 5:03 PM
 To: solr-user@lucene.apache.org
 Subject: Re: external zookeeper with SolrCloud

 First off, even 6 ZK instances are overkill, vast overkill. 3 should be
 more than enough.

 That aside, however, how are you letting your Solr nodes know about the zk
 machines?
 Is it possible you've pointed some of your Solr nodes at specific ZK
 machines
 that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3

 Best
 Erick


 On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,
 
  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
  We have 6 zookeeper instances. We are planning to change to odd number of
  zookeeper instances.
 
  With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
  never connects to zookeeper (can't see the admin page) until all
 zookeeper
  instances are up and we restart all solr nodes. It was suggested that it
  could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and
  this bug is solved in Solr 4.4
 
  We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of
 6
  zookeeper instances and then brought up all ten Solr4 nodes. We kept
 seeing
  this exception in Solr logs:
 
  751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
  0x0 for server null, unexpected error, closing socket connection and
  attempting reconnect java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  at
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
  at
  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 
  And after a while saw this exception.
 
  INFO  - 2013-08-05 22:24:07.582;
  org.apache.solr.common.cloud.ConnectionManager; Watcher
  org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
 Watcher:
  qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
  qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
  event WatchedEvent state:SyncConnected type:None path:null path:null
  type:None
  INFO  - 2013-08-05 22:24:07.662;
  org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status
  change trigger but we are already closed
  754311 [main-EventThread] INFO
   org.apache.solr.common.cloud.ConnectionManager  ? Client-ZooKeeper
 status
  change trigger but we are already closed
 
  We brought up all zookeeper instances but the cloud never came up until
  all solr nodes were restarted. Do we need to change any settings? After
  weekend reboot, all zookeeper instances come up one by one. While
 zookeeper
  instances are coming up solr nodes are also getting started. With this
  issue, we have to put checks to make sure all zookeeper instances are up
  before we bring up any solr node.
 
  Thanks!!
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, June 11, 2013 10:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: external zookeeper with SolrCloud
 
 
  On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com
 wrote:
 
   Thanks Mark.
  
   Looks like this bug is fixed in Solr 4.4. Do you have any date for
  official release of 4.4?
 
  Looks like it might come out in a couple of weeks.
 
   Is there any instruction available on how to build Solr 4.4 from SVN
  repository?
 
  It's java, so it's pretty easy - you might find some help here:
  http://wiki.apache.org/solr/HowToContribute
 
  - Mark
 
  
   -Original Message-
   From: Mark Miller [mailto:markrmil...@gmail.com]
   Sent: Monday, June 10, 2013 8:05 PM
   To: solr-user@lucene.apache.org
   Subject: Re: external zookeeper with SolrCloud
  
   This might be https://issues.apache.org/jira/browse/SOLR-4899
  
   - Mark
  
   On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
  
   Hi,
  
  
  
   We're setting up 5 shard SolrCloud with external zoo keeper. When we
  bring up Solr nodes while the zookeeper instance is not up and running,
 we
  see this error in Solr logs.
  
  
  
   java.net.ConnectException: Connection refused
  
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  
 at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  
 at

Re: Solr doesn't make indexes for all the enteries

2013-08-07 Thread Raymond Wiker

You're explicitly asking for only 10 search results - that's what the
rows=10 parameter does.

If you want to see alll results, you can either increase rows, or run
multiple queries, increasing offset each time.

On Wed, Aug 7, 2013 at 12:21 PM, Kamaljeet Kaur kamal.kaur...@gmail.comwrote:

Hello,
I am a newbie to solr. I have installed and configured it with my django
project. I am using the following versions:
django-haystack - 2.0.0
ApacheSolr - 3.5.0
Django - 1.4
mysql - 5.5.32-0

Here is the model, whose data I want to index: http://tny.cz/422c5fb7
Here is search_indexes.py: http://tny.cz/8de95043

Have created the file templates/search/indexes/myapp/userprofile_text.txt
Have created a template to show the results after querying from database.

I have build schema using the command $ ./manage.py build_solr_schema
and replaced the contents of example/solr/conf/schema.xml with the output.
Here is it http://tny.cz/49fe8e1d

When I use the command $ ./manage.py rebuild_index to create indexes,
it
shows:

But indexes are shown only for 10 user profiles. I am seeing the indexes
here:

http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

Only those userprofiles are shown, when searched for. Why does it happen?
And how to solve this issue?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-doesn-t-make-indexes-for-all-the-enteries-tp4082977.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: external zookeeper with SolrCloud

2013-08-07 Thread Joshi, Shital

I went through Admin page - Dashboard of all 10 nodes and verified that each 
one is using solr-spec 4.4.0. 

solr-spec  4.4.0
solr-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35
lucene-spec 4.4.0
lucene-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:53:42

Is there anything else I can check to verify that we upgraded to solr 4.4.0?


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, August 07, 2013 8:10 AM
To: solr-user@lucene.apache.org
Subject: Re: external zookeeper with SolrCloud

Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4
was carried out on all machines?

Erick


On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Machines are definitely up. Solr4 node and zookeeper instance share the
 machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know
 about the zk instances.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 06, 2013 5:03 PM
 To: solr-user@lucene.apache.org
 Subject: Re: external zookeeper with SolrCloud

 First off, even 6 ZK instances are overkill, vast overkill. 3 should be
 more than enough.

 That aside, however, how are you letting your Solr nodes know about the zk
 machines?
 Is it possible you've pointed some of your Solr nodes at specific ZK
 machines
 that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3

 Best
 Erick


 On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,
 
  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
  We have 6 zookeeper instances. We are planning to change to odd number of
  zookeeper instances.
 
  With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
  never connects to zookeeper (can't see the admin page) until all
 zookeeper
  instances are up and we restart all solr nodes. It was suggested that it
  could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and
  this bug is solved in Solr 4.4
 
  We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of
 6
  zookeeper instances and then brought up all ten Solr4 nodes. We kept
 seeing
  this exception in Solr logs:
 
  751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
  0x0 for server null, unexpected error, closing socket connection and
  attempting reconnect java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  at
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
  at
  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 
  And after a while saw this exception.
 
  INFO  - 2013-08-05 22:24:07.582;
  org.apache.solr.common.cloud.ConnectionManager; Watcher
  org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
 Watcher:
  qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
  qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
  event WatchedEvent state:SyncConnected type:None path:null path:null
  type:None
  INFO  - 2013-08-05 22:24:07.662;
  org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status
  change trigger but we are already closed
  754311 [main-EventThread] INFO
   org.apache.solr.common.cloud.ConnectionManager  ? Client-ZooKeeper
 status
  change trigger but we are already closed
 
  We brought up all zookeeper instances but the cloud never came up until
  all solr nodes were restarted. Do we need to change any settings? After
  weekend reboot, all zookeeper instances come up one by one. While
 zookeeper
  instances are coming up solr nodes are also getting started. With this
  issue, we have to put checks to make sure all zookeeper instances are up
  before we bring up any solr node.
 
  Thanks!!
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, June 11, 2013 10:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: external zookeeper with SolrCloud
 
 
  On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com
 wrote:
 
   Thanks Mark.
  
   Looks like this bug is fixed in Solr 4.4. Do you have any date for
  official release of 4.4?
 
  Looks like it might come out in a couple of weeks.
 
   Is there any instruction available on how to build Solr 4.4 from SVN
  repository?
 
  It's java, so it's pretty easy - you might find some help here:
  http://wiki.apache.org/solr/HowToContribute
 
  - Mark
 
  
   -Original Message-
   From: Mark Miller [mailto:markrmil...@gmail.com]
   Sent: Monday, June 10, 2013 8:05 PM
   To: solr-user@lucene.apache.org
   Subject: Re: external zookeeper with SolrCloud
  
   This might be https://issues.apache.org/jira/browse/SOLR-4899
  
   - Mark
  
   On Jun 10, 2013, at 5:59 PM, Joshi, Shital

[solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?

On the first click the values are refreshed. On the second click the page
gets redirected:

from: http://localhost:8983/solr/#/statements/plugins/cache
to: http://localhost:8983/solr/#/

Is this intentional?

Regards,

Dmitry

RE: external zookeeper with SolrCloud

2013-08-07 Thread Joshi, Shital

We have all 6 instances in zkhost parameter. 

-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Wednesday, August 07, 2013 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: external zookeeper with SolrCloud

You said earlier that you had 6 zookeeper instances, but the zkHost param
only shows 5 instances... is that correct?

On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Machines are definitely up. Solr4 node and zookeeper instance share the
 machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know
 about the zk instances.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 06, 2013 5:03 PM
 To: solr-user@lucene.apache.org
 Subject: Re: external zookeeper with SolrCloud

 First off, even 6 ZK instances are overkill, vast overkill. 3 should be
 more than enough.

 That aside, however, how are you letting your Solr nodes know about the zk
 machines?
 Is it possible you've pointed some of your Solr nodes at specific ZK
 machines
 that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3

 Best
 Erick

 On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,

  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
  We have 6 zookeeper instances. We are planning to change to odd number of
  zookeeper instances.

  With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
  never connects to zookeeper (can't see the admin page) until all
 zookeeper
  instances are up and we restart all solr nodes. It was suggested that it
  could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and
  this bug is solved in Solr 4.4

  We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of
 6
  zookeeper instances and then brought up all ten Solr4 nodes. We kept
 seeing
  this exception in Solr logs:

  751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
  0x0 for server null, unexpected error, closing socket connection and
  attempting reconnect java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  at

 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
  at
  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

  And after a while saw this exception.

  INFO  - 2013-08-05 22:24:07.582;
  org.apache.solr.common.cloud.ConnectionManager; Watcher
  org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
 Watcher:
  qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
  qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
  event WatchedEvent state:SyncConnected type:None path:null path:null
  type:None
  INFO  - 2013-08-05 22:24:07.662;
  org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status
  change trigger but we are already closed
  754311 [main-EventThread] INFO
   org.apache.solr.common.cloud.ConnectionManager  ? Client-ZooKeeper
 status
  change trigger but we are already closed

  We brought up all zookeeper instances but the cloud never came up until
  all solr nodes were restarted. Do we need to change any settings? After
  weekend reboot, all zookeeper instances come up one by one. While
 zookeeper
  instances are coming up solr nodes are also getting started. With this
  issue, we have to put checks to make sure all zookeeper instances are up
  before we bring up any solr node.

  Thanks!!

  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, June 11, 2013 10:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: external zookeeper with SolrCloud

  On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com
 wrote:

   Thanks Mark.

   Looks like this bug is fixed in Solr 4.4. Do you have any date for
  official release of 4.4?

  Looks like it might come out in a couple of weeks.

   Is there any instruction available on how to build Solr 4.4 from SVN
  repository?

  It's java, so it's pretty easy - you might find some help here:
  http://wiki.apache.org/solr/HowToContribute

  - Mark

   -Original Message-
   From: Mark Miller [mailto:markrmil...@gmail.com]
   Sent: Monday, June 10, 2013 8:05 PM
   To: solr-user@lucene.apache.org
   Subject: Re: external zookeeper with SolrCloud

   This might be https://issues.apache.org/jira/browse/SOLR-4899

   - Mark

   On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com
  wrote:

   Hi,

   We're setting up 5 shard SolrCloud with external zoo keeper. When we
  bring up Solr nodes while the zookeeper instance is not up and running,
 we
  see this error in Solr logs.

   java.net.ConnectException: Connection refused

Re: Measuring SOLR performance

Hi Roman,

Finally, this has worked! Thanks for quick support.

The graphs look awesome. At least on the index sample :) It is quite easy
to setup and run + possible to run directly on the shard server in
background mode.

my test run was:

python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
foo -t /solr/statements -e statements

Thanks!

Dmitry


On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Dmitry,

 I've modified the solrjmeter to retrieve data from under the core (the -t
 parameter) and the rest from the /solr/admin - I could test it only against
 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh
 checkout

 my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t
 /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate

 Thanks!

 roman


 On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi,
 
  Thanks for the clarification, Shawn!
 
  So with this in mind, the following work:
 
  http://localhost:8983/solr/statements/admin/system?wt=json
  http://localhost:8983/solr/statements/admin/mbeans?wt=json
 
  not copying their output to save space.
 
  Roman:
 
  is this something that should be set via -t parameter as well?
 
  Dmitry
 
 
 
  On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/6/2013 6:17 AM, Dmitry Kan wrote:
Of three URLs you asked for, only the 3rd one gave response:
   snip
The rest report 404.
   
On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
Hi Dmitry,
So I think the admin pages are different on your version of solr,
 what
   do
you see when you request... ?
   
http://localhost:8983/solr/admin/system?wt=json
http://localhost:8983/solr/admin/mbeans?wt=json
http://localhost:8983/solr/admin/cores?wt=json
  
   Unless you have a valid defaultCoreName set in your (old-style)
   solr.xml, the first two URLs won't work, as you've discovered.  Without
   that valid defaultCoreName (or if you wanted info from a different
   core), you'd need to add a core name to the URL for them to work.
  
   The third one, which works for you, is a global handler for
 manipulating
   cores, so naturally it doesn't need a core name to function.  The URL
   path for this handler is defined by solr.xml.
  
   Thanks,
   Shawn

Re: softCommit doesn't work - ?

2013-08-07 Thread Andre Bois-Crettez


(a bit late, I know)

On 07/23/2013 02:09 PM, Erick Erickson wrote:

First a minor nit. The server.add(doc, time) is a hard commit, not a soft one.

By default, no, commitWithin is indeed a soft commit.

As per
http://lucene.472066.n3.nabble.com/near-realtime-search-and-dih-td494.html#a4000133
commitWithin is a soft commit on Solr 4. 

I just verified in 4.4 code, SolrConfig has :
getBool(updateHandler/commitWithin/softCommit,true)

--
André Bois-Crettez

Software Architect
Search Developer
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?

2013-08-07 Thread Stefan Matheis

It shouldn't .. but from your description sounds as the javascript-onclick 
handler doesn't work on the second click (which would do a page reload).

if you use chrome, firefox or safari .. can you open the developer tools and 
check if they report any javascript error? which would explain why ..

BTW: You don't have to use that button in the meantime .. just refresh the page 
(that is exactly what the button does). sure, it should work, but shouldn't 
stop you from refreshing the page :)

- Stefan 


On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote:

 On the first click the values are refreshed. On the second click the page
 gets redirected:
 
 from: http://localhost:8983/solr/#/statements/plugins/cache
 to: http://localhost:8983/solr/#/
 
 Is this intentional?
 
 Regards,
 
 Dmitry

Error loading class 'solr.ISOLatin1AccentFilterFactory'

2013-08-07 Thread Parul Gupta(Knimbus)

Hi,

I am trying to use solr.ISOLatin1AccentFilterFactory in solr4.3.1,But its
giving error 
Error loading class 'solr.ISOLatin1AccentFilterFactory'.

However its working fine in Solr3.6
...

Can anybody suggest me how to remove this error..Or is there any new
FilterFactory I have to use?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-loading-class-solr-ISOLatin1AccentFilterFactory-tp4083012.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Measuring SOLR performance

Hi Roman,

One more question. I tried to compare different runs (g1 vs cms) using the
command below, but get an error. Should I attach some other param(s)?


python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx
**ERROR**
  File solrjmeter.py, line 1427, in module
main(sys.argv)
  File solrjmeter.py, line 1303, in main
check_options(options, args)
  File solrjmeter.py, line 185, in check_options
error(The folder '%s' does not exist % rf)
  File solrjmeter.py, line 66, in error
traceback.print_stack()
The folder '0' does not exist

Dmitry




On Wed, Aug 7, 2013 at 4:13 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 Finally, this has worked! Thanks for quick support.

 The graphs look awesome. At least on the index sample :) It is quite easy
 to setup and run + possible to run directly on the shard server in
 background mode.

 my test run was:

 python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
 ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
 foo -t /solr/statements -e statements

 Thanks!

 Dmitry


 On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Dmitry,

 I've modified the solrjmeter to retrieve data from under the core (the -t
 parameter) and the rest from the /solr/admin - I could test it only
 against
 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh
 checkout

 my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t
 /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate

 Thanks!

 roman


 On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi,
 
  Thanks for the clarification, Shawn!
 
  So with this in mind, the following work:
 
  http://localhost:8983/solr/statements/admin/system?wt=json
  http://localhost:8983/solr/statements/admin/mbeans?wt=json
 
  not copying their output to save space.
 
  Roman:
 
  is this something that should be set via -t parameter as well?
 
  Dmitry
 
 
 
  On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/6/2013 6:17 AM, Dmitry Kan wrote:
Of three URLs you asked for, only the 3rd one gave response:
   snip
The rest report 404.
   
On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
Hi Dmitry,
So I think the admin pages are different on your version of solr,
 what
   do
you see when you request... ?
   
http://localhost:8983/solr/admin/system?wt=json
http://localhost:8983/solr/admin/mbeans?wt=json
http://localhost:8983/solr/admin/cores?wt=json
  
   Unless you have a valid defaultCoreName set in your (old-style)
   solr.xml, the first two URLs won't work, as you've discovered.
  Without
   that valid defaultCoreName (or if you wanted info from a different
   core), you'd need to add a core name to the URL for them to work.
  
   The third one, which works for you, is a global handler for
 manipulating
   cores, so naturally it doesn't need a core name to function.  The URL
   path for this handler is defined by solr.xml.
  
   Thanks,
   Shawn

RE: poor facet search performance

2013-08-07 Thread Robert Stewart

A data structure like:

fieldId - BitArray  (for fields with docFreq  1/9 of total docs)
fieldId - VIntList (variable byte encoded array of ints, for fields with 
docFreq  1/9 of total docs)

And the list is sorted top to bottom with most frequent fields at the top 
(highest doc freqs at the top).

We enumerate top to bottom getting intersection against set of docs matching 
search:
* If BitArray, then do special intersection using bit couting of internal 
uint[] structure of BitArray vs. BitArray of doc result set
* If VIntList, then do enumeration of VIntList against BitArray of doc result 
set

We push the counts into priority queue with size of facet.limit and when we 
fill that up, if the next fieldId in the structure has docFreq  the min count 
in the priority queue it breaks out of the loop.  Since a lot of our facet 
fields have power curve distribution (a long tail of less frequent values), 
breaking out early helps a lot.

Also, we do facet counts in parallel using Parallel.ForEach (in C# not Java), 
so each field in list of facet.field is done on its own thread  (I believe 
SOLR is doing something similar now).  We have a lot of cores on our servers so 
it works well.



From: Toke Eskildsen [t...@statsbiblioteket.dk]
Sent: Wednesday, August 07, 2013 7:45 AM
To: solr-user@lucene.apache.org
Subject: Re: poor facet search performance

On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote:

[Custom facet structure]

 Then we sorted those sets of facet fields by total document frequency
 so we enumerate the more frequent facet fields first, and we stop
 looking when we find a facet field which has less total document
 matches than the top N facet counts we are looking for.

So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N
cut-off is an interesting optimization for that.

[...]

 The slaves just pre-load that binary structure directly into ram in one
 shot in the background when opening a new snapshot for search.

We used a similar pre-calculation some years ago but abandoned it as the
cost of Pre-generate_structure + #duplicates * (distribute_structure +
open_structure) was just as costly and less flexible than #duplicates
* generate_structure for us.

 We have 200 million docs, 10 shards, about 20 facet fields, some of
 which contain about 20,000 unique values.  We show top 10 facets for
 about 10 different fields in results page.   We provide search results
 with lots of facets and date counts in around 200-300ms using this
 technique.

 Currently, we are porting this entire system to SOLR.  For a single
 core index of 8 million docs, using similar documents and facet fields
 from our production indexes, I cant get faceted search to perform
 anywhere close to 300ms for general searches.   More like 1.5-3
 seconds.

Solr fc faceting treats each facet independently and in a docID-facet
manner so what happens is

foreach facet {
  foreach docIDinResultSet {
foreach tagIDinDocument {
  facet.counter[tagID]++
}
  }
}

With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count
it 80M. That does not normally take 1.5-3 seconds in, so something seems
off. Do you have a lot of facet tags (aka terms) for each document?

 Is there anything else that I should look into for getting better facet
 performance?

Could you list the part of the Solr log with the facet structures? Just
grep for UnInverted. They look something like this:
UnInverted multi-valued field
{field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0}

 Given these metrics (200m docs, 20 facet fields, some fields with
 20,000 unique values), what kind of facet search performance should I
 expect?

Due to the independent faceting handling in Solr, the facet time will
scale a bit worse than linear to the number of documents, relative to
your test setup. With a loop count of 200M*10 (or 20? I am a bit
confused on how many facets you show at a time) = 2G, this will take
multiple seconds. Unless you go experimental (SOLR-2412 to bang my own
drum), your facet count needs to go down or you need to shard with Solr.

 Also we need to issue frequent commits since we are constantly
 streaming new content into the system.

You could use a setup with a smaller live shard and multiple stale
ones, but depending on corpus your ranking might suffer.

- Toke Eskildsen, State and University Library, Denmark

RE: poor facet search performance

2013-08-07 Thread Robert Stewart

FYI, I am now using docValues for facet fields with somewhat better 
performance, at least more consistent performance (especially with frequent 
commits).  Also, I see my main bottleneck seems to be ec2 servers - I am not 
running on m3.xlarge with provisioned EBS 4000 IOPS and it is looking much 
better.

From: Toke Eskildsen [t...@statsbiblioteket.dk]
Sent: Wednesday, August 07, 2013 7:45 AM
To: solr-user@lucene.apache.org
Subject: Re: poor facet search performance

On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote:

[Custom facet structure]

 Then we sorted those sets of facet fields by total document frequency
 so we enumerate the more frequent facet fields first, and we stop
 looking when we find a facet field which has less total document
 matches than the top N facet counts we are looking for.

So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N
cut-off is an interesting optimization for that.

[...]

 The slaves just pre-load that binary structure directly into ram in one
 shot in the background when opening a new snapshot for search.

We used a similar pre-calculation some years ago but abandoned it as the
cost of Pre-generate_structure + #duplicates * (distribute_structure +
open_structure) was just as costly and less flexible than #duplicates
* generate_structure for us.

 We have 200 million docs, 10 shards, about 20 facet fields, some of
 which contain about 20,000 unique values.  We show top 10 facets for
 about 10 different fields in results page.   We provide search results
 with lots of facets and date counts in around 200-300ms using this
 technique.

 Currently, we are porting this entire system to SOLR.  For a single
 core index of 8 million docs, using similar documents and facet fields
 from our production indexes, I cant get faceted search to perform
 anywhere close to 300ms for general searches.   More like 1.5-3
 seconds.

Solr fc faceting treats each facet independently and in a docID-facet
manner so what happens is

foreach facet {
  foreach docIDinResultSet {
foreach tagIDinDocument {
  facet.counter[tagID]++
}
  }
}

With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count
it 80M. That does not normally take 1.5-3 seconds in, so something seems
off. Do you have a lot of facet tags (aka terms) for each document?

 Is there anything else that I should look into for getting better facet
 performance?

Could you list the part of the Solr log with the facet structures? Just
grep for UnInverted. They look something like this:
UnInverted multi-valued field
{field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0}

 Given these metrics (200m docs, 20 facet fields, some fields with
 20,000 unique values), what kind of facet search performance should I
 expect?

Due to the independent faceting handling in Solr, the facet time will
scale a bit worse than linear to the number of documents, relative to
your test setup. With a loop count of 200M*10 (or 20? I am a bit
confused on how many facets you show at a time) = 2G, this will take
multiple seconds. Unless you go experimental (SOLR-2412 to bang my own
drum), your facet count needs to go down or you need to shard with Solr.

 Also we need to issue frequent commits since we are constantly
 streaming new content into the system.

You could use a setup with a smaller live shard and multiple stale
ones, but depending on corpus your ranking might suffer.

- Toke Eskildsen, State and University Library, Denmark

RE: SolrCloud Indexing question

2013-08-07 Thread Kalyan Kuram

Thank you so much for the suggestion, Is the same recommended for querying too 
i found it very slow when i do query using clousolrserver
Kalyan

 Date: Tue, 6 Aug 2013 13:25:37 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud Indexing question

 On 8/6/2013 12:55 PM, Kalyan Kuram wrote:
  Hi AllI need suggestion on how to send indexing commands to 2 different 
  solr server,Basically i want to mirror my index,here is the scenarioi have 
  2 cluster,
  each cluster has one master and 2 slaves with external zookeeper in the 
  fronti need suggestion on what solr api class i should use to send indexing 
  commands to 2 masters,will LBHttpSolrServer do the indexing or is this only 
  used for querying
  If there is a better approach please suggest
  Kalyan  

 If you're using zookeeper, then your index is SolrCloud, and you don't 
 have masters and slaves.  The traditional master/slave replication model 
 does not apply to SolrCloud.

 With SolrCloud, there is no need to have two independent clusters.  If a 
 server dies, the other servers in the cloud will keep the cluster 
 operational.  When you bring the dead server back with the proper 
 config, it will automatically be synchronized with the cluster.

 For a Java program with SolrJ, use a CloudSolrServer object for each 
 cluster.  The constructor for CloudSolrServer accepts the same zkHost 
 parameter that you give to each Solr server when starting in SolrCloud 
 mode.  You cannot index to independent clusters at the same time through 
 one object - if they truly are independent SolrCloud installs, you have 
 to manage updates to both of them independently.

 Thanks,
 Shawn

Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?

Hi Stefan,

I was able to debug the second click scenario (was tricky to catch it,
since on click redirect happens and logs statements of the previous are
gone; worked via setting break-points in plugins.js) and got these errors
(firefox 23.0 ubuntu):

[17:20:00.731] TypeError: anonymous function does not always return a value
@ http://localhost:8983/solr/js/scripts/logging.js?_=4.3.1:294
[17:20:00.743] TypeError: anonymous function does not always return a value
@ http://localhost:8983/solr/js/scripts/plugins.js?_=4.3.1:371
[17:20:00.769] TypeError: anonymous function does not always return a value
@ http://localhost:8983/solr/js/scripts/replication.js?_=4.3.1:35
[17:20:00.771] TypeError: anonymous function does not always return a value
@ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:68
[17:20:00.772] TypeError: anonymous function does not always return a value
@ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:1185



Dmitry


On Wed, Aug 7, 2013 at 4:35 PM, Stefan Matheis matheis.ste...@gmail.comwrote:

 It shouldn't .. but from your description sounds as the javascript-onclick
 handler doesn't work on the second click (which would do a page reload).

 if you use chrome, firefox or safari .. can you open the developer tools
 and check if they report any javascript error? which would explain why ..

 BTW: You don't have to use that button in the meantime .. just refresh the
 page (that is exactly what the button does). sure, it should work, but
 shouldn't stop you from refreshing the page :)

 - Stefan


 On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote:

  On the first click the values are refreshed. On the second click the page
  gets redirected:
 
  from: http://localhost:8983/solr/#/statements/plugins/cache
  to: http://localhost:8983/solr/#/
 
  Is this intentional?
 
  Regards,
 
  Dmitry

RE: problems running solr 4.4 with HDFS HA

2013-08-07 Thread Greg Walters

Hi Mark,

Setting str name=solr.hdfs.confdir properly in my solrconfig.xml did it.

Thanks!

Greg Walters | Operations Team
530 Maryville Center Drive, Suite 250
St. Louis, Missouri  63141
t. 314.225.2745 |  c. 314.225.2797
gwalt...@sherpaanalytics.com
www.sherpaanalytics.com

Re: 'Optimizing' Solr Index Size

2013-08-07 Thread Brendan Grainger

Thanks Erick,  our index is relatively static. I think the deletes must be
coming from 'reindexing' the same documents so definitely handy to recover
the space. I've seen that video before. Definitely very interesting.

Brendan


On Wed, Aug 7, 2013 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote:

 The general advice is to not merge (optimize) unless your
 index is relatively static. You're quite correct, optimizing
 simply recovers the space from deleted documents, otherwise
 it won't change much (except having fewer segments).

 Here's a _great_ video that Mike McCandless put together:

 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 But in general _whenever_ segments are merged, the
 resulting segment will have all the data from deleted docs
 removed, and segments are merged continually when
 data is being added to the index.

 Quick-n-dirty way to estimate the space savings
 optimize will give you. Look at the admin page for the core and
 the ratio of deleted docs to numDocs is about the unused
 space that would be regained by an optimize. From there it's
 your call G...

 Best
 Erick


 On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

  To maybe answer another one of my questions about the 50Gb recovered when
  running:
 
  curl '
 
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
  '
 
  It looks to me that it was from deleted docs being completely removed
 from
  the index.
 
  Thanks
 
 
 
  On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
  brendan.grain...@gmail.com wrote:
 
   Well, I guess I can answer one of my questions which I didn't exactly
   explicitly state, which is: how do I force solr to merge segments to a
   given maximum. I forgot about doing this:
  
   curl '
  
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
   '
  
   which reduced the number of segments in my index from 12 to 10.
  Amazingly,
   it also reduced the space used by almost 50Gb. Is that even possible?
  
   Thanks again
   Brendan
  
  
  
   On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
   brendan.grain...@gmail.com wrote:
  
   Hi All,
  
   First of all, what I was actually trying to do is actually get a
 little
   space back. So if there is a better way to do this by adjusting the
   MergePolicy or something else please let me know. My index is
 currently
   200Gb. In the past (Solr 1.4) we've found that optimizing the index
 will
   double the size of the index temporarily then usually when it's done
 we
  end
   up with a smaller index and slightly faster search query times.
  
   Should I even bother optimizing? My impression was that with the
   TieredMergePolicy this would be less necessary. Would merging segments
  into
   larger ones save any space and if so is there a way to tell solr to do
  that?
  
   Thanks
   Brendan
  
  
  
  
   --
   Brendan Grainger
   www.kuripai.com
  
 
 
 
  --
  Brendan Grainger
  www.kuripai.com
 




-- 
Brendan Grainger
www.kuripai.com

Re: Transform data at index time: country - continent

2013-08-07 Thread Walter Underwood

Good point. Copying to a separate field that applied synonyms could help.

Filtering out the original countries could be tricky. The Javadoc mentiones a 
keepOrig flag, but the Solr docs do not. If you could set keepOrig=false, that 
would do the trick.

wunder

On Aug 7, 2013, at 5:13 AM, Erick Erickson wrote:

 Walter:
 
 Oooh, nice! One could even use a copyField if one wanted to
 keep them separate...
 
 Erick
 
 
 On Tue, Aug 6, 2013 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Would synonyms help? If you generate the query terms for the continents,
 you could do something like this:
 
 usa = continent-na
 canada = continent-na
 germany = continent-europe
 
 und so weiter.
 
 wunder
 
 On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote:
 
 Am 05.08.2013 15:52, schrieb Jack Krupansky:
 You can write a brute force JavaScript script using the StatelessScript
 update processor that hard-codes the mapping.
 
 I'll probably do something like this. Unfortunately I have no influence
 on the original db itself, so I have fix this in solr.
 
 Cheers
 Chris
 
 
 --
 Zoologisches Forschungsmuseum Alexander Koenig
 - Leibniz-Institut für Biodiversität der Tiere -
 Adenauerallee 160, 53113 Bonn, Germany
 www.zfmk.de
 
 Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
 Sitz: Bonn
 
 
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org

Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?

2013-08-07 Thread Stefan Matheis

Hey Dmitry

That sounds a bit odd .. those are more like notices instead of real errors .. 
sure that those are stopping the UI from working? if so .. we should see more 
reports like those.

Can you verify the problem by using another browser?

I mean .. that is really a basic javascript handler .. directly written in the 
DOM, no chance that it doesn't get loaded. and that normally stops only working 
if something really bad happens ;o

- Stefan 


On Wednesday, August 7, 2013 at 4:23 PM, Dmitry Kan wrote:

 Hi Stefan,
 
 I was able to debug the second click scenario (was tricky to catch it,
 since on click redirect happens and logs statements of the previous are
 gone; worked via setting break-points in plugins.js) and got these errors
 (firefox 23.0 ubuntu):
 
 [17:20:00.731] TypeError: anonymous function does not always return a value
 @ http://localhost:8983/solr/js/scripts/logging.js?_=4.3.1:294
 [17:20:00.743] TypeError: anonymous function does not always return a value
 @ http://localhost:8983/solr/js/scripts/plugins.js?_=4.3.1:371
 [17:20:00.769] TypeError: anonymous function does not always return a value
 @ http://localhost:8983/solr/js/scripts/replication.js?_=4.3.1:35
 [17:20:00.771] TypeError: anonymous function does not always return a value
 @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:68
 [17:20:00.772] TypeError: anonymous function does not always return a value
 @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:1185
 
 
 
 Dmitry
 
 
 On Wed, Aug 7, 2013 at 4:35 PM, Stefan Matheis matheis.ste...@gmail.com 
 (mailto:matheis.ste...@gmail.com)wrote:
 
  It shouldn't .. but from your description sounds as the javascript-onclick
  handler doesn't work on the second click (which would do a page reload).
  
  if you use chrome, firefox or safari .. can you open the developer tools
  and check if they report any javascript error? which would explain why ..
  
  BTW: You don't have to use that button in the meantime .. just refresh the
  page (that is exactly what the button does). sure, it should work, but
  shouldn't stop you from refreshing the page :)
  
  - Stefan
  
  
  On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote:
  
   On the first click the values are refreshed. On the second click the page
   gets redirected:
   
   from: http://localhost:8983/solr/#/statements/plugins/cache
   to: http://localhost:8983/solr/#/
   
   Is this intentional?
   
   Regards,
   
   Dmitry

DIH Problem: create multiple docs from a single entity

2013-08-07 Thread Lee Carroll

Hi

I've 2 tables with the following data

table 1
id treatment_list
1 a,b
2 b,c

table 2
treatment id, name
a  name1
b  name 2
c  name 3

Using DIH can you create an index of the form

id-treatment-id name
1a  name1
1b  name2
2b  name2
2c  name3

In short can I splt the comma separated field and process each as an
entity. From the docs and the wiki I can't see anything obvious.

I feel I'm missing something easier here. (Note its not my data so can't do
anything with the dodgy csv field )

Re: Solr 4.4 ShingleFilterFactroy exception

The answer will be further down in the stack trace. It will relate to an 
error that occurred when initializing the filter.


One possibility is that you have a garbage attribute name in your token 
filter XML - 4.4 checks for that kind of thing now.


-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Wednesday, August 07, 2013 7:47 AM
To: solr-user@lucene.apache.org
Subject: Solr 4.4 ShingleFilterFactroy exception

Hi,
I have setup solr 4.4 with cloud. When i start solr, I get an exception as
below,

*ERROR [CoreContainer] Unable to create core: mycore_sh1:
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
fieldType text_shingle: Plugin init failure for [schema.xml]
analyzer/filter: Error instantiating class:
'org.apache.lucene.analysis.shingle.ShingleFilterFactory'*
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) [:4.4.0
1504776 - sarowe - 2013-07-19 02:58:35]
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
[:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[:1.6.0_43]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
[:1.6.0_43]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[:1.6.0_43]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43]
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
[:1.6.0_43]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
[:1.6.0_43]
at java.lang.Thread.run(Thread.java:662) [:1.6.0_43]
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: Error instantiating class:
'org.apache.lucene.analysis.shingle.ShingleFilterFactory'



The same file works well with solr 4.2. Pls help.


Thanks,
Prasi

Re: Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)


On 8/7/2013 3:33 AM, Daniel Collins wrote:

I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but
hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec).

I then had to back-out and return to a 4.3 system, and got an error when it
tried to read the index.

Now, it was only a dev system, so not a problem, and normally I would use
restore a backup anyway, but shouldn't this work?  If I haven't changed the
codec, then Solr 4.4 should be using the same code as 4.3, so the data
should be compatible, no?


Using an index from a newer version is never guaranteed, and usually 
will NOT work.  The luceneMatchVersion setting doesn't typically affect 
index format, it usually affects how analysis and query parser 
components work - so you can tell Solr to use buggy behavior from an 
earlier release.


Unless you actually change aspects of the codec (postings format, 
docvalues format, etc), Solr uses the Lucene codec defaults, which can 
(and usually does) change from release to release.


Looking through the Lucene 4.4 CHANGES.txt file (not the Solr file), 
LUCENE-4936 looks like a change to the DocValues format.  I can't tell 
from the description whether LUCENE-5035 is a format change or a change 
in how Lucene handles sorting in memory.  The evidence I can find 
suggests that the format is still called Lucene42DocValuesFormat, but 
apparently it doesn't work the same.


Thanks,
Shawn

Re: Data Import from MYSQL and POSTGRESQL


On 8/7/2013 3:50 AM, Spadez wrote:

My issue is in the data-config.xml I have put two datasources, however, I am
stuck on what to put for the driver values and the urls.

dataSource name=quot;mysqlquot; driver=quot;lt;driver url=url
user=user password=pass /
dataSource name=quot;postgresqlquot; driver=quot;lt;driver
url=url user=user password=pass/

Is anyone able to tell me hat I should be putting for these values please?


Here's my datasource, username and password redacted.  Note that you 
should not be using quot; ... you should use actual quotes.


  dataSource type=JdbcDataSource
driver=com.mysql.jdbc.Driver
encoding=UTF-8

url=jdbc:mysql://${dih.request.dbHost}:3306/${dih.request.dbSchema}?zeroDateTimeBehavior=convertToNull
batchSize=-1
user=REDACTED
password=REDACTED/

I pass the hostname and DB name (schema) in as parameters on the URL 
when I call for a full-import.


Thanks,
Shawn

Re: Field append

Thanks for the inquiry about “append two fields”; as a result I have added it 
as an example in Early Access Release #5 of my Solr 4.x Deep Dive book in the 
chapter on update processors. Actually, there are several examples:

- Append One Field to Another with Comma and Space as Delimiter:

updateRequestProcessorChain name=append-a-onto-b-delim
  processor class=solr.CloneFieldUpdateProcessorFactory
str name=sourcealpha_s/str
str name=destbeta_s/str
  /processor
  processor class=solr.ConcatFieldUpdateProcessorFactory 
str name=fieldNamebeta_s/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

- Append One Field to Another with Space as Delimiter

updateRequestProcessorChain name=append-a-onto-b-space
  processor class=solr.CloneFieldUpdateProcessorFactory
str name=sourcealpha_s/str
str name=destbeta_s/str
  /processor
  processor class=solr.ConcatFieldUpdateProcessorFactory 
str name=fieldNamebeta_s/str
str name=delimiter /str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

- Append One Field to Another with No Delimiter

updateRequestProcessorChain name=append-a-onto-b
  processor class=solr.CloneFieldUpdateProcessorFactory
str name=sourcealpha_s/str
str name=destbeta_s/str
  /processor
  processor class=solr.ConcatFieldUpdateProcessorFactory 
str name=fieldNamebeta_s/str
str name=delimiter/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

- Append One Field to Another with Space as Delimiter and Remove the Source 
Field

updateRequestProcessorChain name=append-a-onto-b-space-delete
  processor class=solr.CloneFieldUpdateProcessorFactory
str name=sourcealpha_s/str
str name=destbeta_s/str
  /processor
  processor class=solr.ConcatFieldUpdateProcessorFactory 
str name=fieldNamebeta_s/str
str name=delimiter /str
  /processor
  processor class=solr.IgnoreFieldUpdateProcessorFactory 
str name=fieldNamealpha_s/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

Let me know if even those examples do not cover your use case.

-- Jack Krupansky

From: Luís Portela Afonso 
Sent: Monday, August 05, 2013 7:39 AM
To: solr-user@lucene.apache.org 
Subject: Field append

Hi there, 

Is that possible to append two fields on solr? i would like to append to 
filters with a custom delimiter. Is that possible?
I saw something like a CloneFieldUpdateProcessor, but when i try to use, solr 
says that cannot find the class. I saw that in the follow site: 
https://issues.apache.org/jira/browse/SOLR-2599

In the comments i saw:
processor class=solr.FieldCopyProcessorFactory
  str name=sourcecategory/str
  str name=destcategory_s/str
/processor


But i'm not able to use it too. Once again solr says that cannot find class.

Hope you can help in any way. Thanks

Re: DIH Problem: create multiple docs from a single entity

2013-08-07 Thread Raymond Wiker

On Aug 7, 2013, at 18:10 , Lee Carroll lee.a.carr...@googlemail.com wrote:
 Hi
 
 I've 2 tables with the following data
 
 table 1
 id treatment_list
 1 a,b
 2 b,c
 
 table 2
 treatment id, name
 a  name1
 b  name 2
 c  name 3
 
 Using DIH can you create an index of the form
 
 id-treatment-id name
 1a  name1
 1b  name2
 2b  name2
 2c  name3
 
 In short can I splt the comma separated field and process each as an
 entity. From the docs and the wiki I can't see anything obvious.
 
 I feel I'm missing something easier here. (Note its not my data so can't do
 anything with the dodgy csv field )

I think this is an SQL problem, rather than a DIH one. A quick google shows 
several hits for splitting a string in SQL; I expect that it should be possible 
to come up with something that fits your purpose.

Re: Solr round ratings to nearest integer value

Thanks for this inquiry; as a result I have added a round JavaScript 
script for the StatelessScriptUpdate processor to Early Access Release #5 of 
my Solr 4.x Deep Dive book in the chapter on update processors.


The script takes a field name, a number of decimal digits to round to 
(default is 0), and an output field (defaults to replacing the input field), 
and an option to convert the type of the rounded number to integer.


One thing I just noticed - your message indicates that 0.5 should round to 
0.0, but that is not the standard definition of rounding. What is your true 
intention there?


The script can replace the original value with the rounded value, or 
preserve the original value and place a rounded copy in another field. What 
is your preference? (Well, the script supports both, anyway.)


And I did give the script the integer option to convert the type, so that 
1.0 would become 1 Solr int fields.


-- Jack Krupansky

-Original Message- 
From: Thyagaraj

Sent: Thursday, August 01, 2013 2:37 AM
To: solr-user@lucene.apache.org
Subject: Solr round ratings to nearest integer value

I'm using solr 4.0 with DIH jdbc connector and I use Solr Admin web 
interface

for testing. I have a field called *ratings* which varies like 0, 0.3, 0.5,
0.75, 1, 1.5, 1.6... and so on as per user input.


I found the link
http://lucene.472066.n3.nabble.com/How-to-round-solr-score-td495198.html
http://lucene.472066.n3.nabble.com/How-to-round-solr-score-td495198.html
which is beyond of my understanding and I unable to make use of in my case.


I just want to round this rating values to nearest integer value through
solr like,
0.3  0.5 to 0
0.75  1.5 to 1
1.6 to 2


Anybody help me guiding please?

Thank you!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-round-ratings-to-nearest-integer-value-tp4081833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Error loading class 'solr.ISOLatin1AccentFilterFactory'


On 8/7/2013 7:44 AM, Parul Gupta(Knimbus) wrote:

I am trying to use solr.ISOLatin1AccentFilterFactory in solr4.3.1,But its
giving error
Error loading class 'solr.ISOLatin1AccentFilterFactory'.

However its working fine in Solr3.6


This filter is deprecated.  Here's the actual javadoc for this class in 
the latest 3.x version, which mentions what to use instead.


http://lucene.apache.org/solr/3_6_2/org/apache/solr/analysis/ISOLatin1AccentFilterFactory.html

You've now moved to a new major release, where all APIs that were 
deprecated in the previous major release are completely eliminated.


The Solr wiki also contains this information.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

Thanks,
Shawn

Question about soft commit and updateRequestProcessorChain

2013-08-07 Thread Jack Park

If one allows for a soft commit (rather than a hard commit on each
request), when does the updateRequestProcessorChain fire? Does it fire
after the commit?

Many thanks
Jack

Internal shard communication - performance?

2013-08-07 Thread Torsten Albrecht

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per second. 
At my live system (20 shards) I get 200 requests per second.

Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high 
availability system?


Regards,

Torsten

Data duplication using Cloud+HDFS+Mirroring

2013-08-07 Thread Greg Walters

While testing Solr's new ability to store data and transaction directories in 
HDFS I added an additional core to one of my testing servers that was 
configured as a backup (active but not leader) core for a shard elsewhere. It 
looks like this extra core copies the data into its own directory rather than 
just using the existing directory with the data that's already available to it.

Since HDFS likely already has redundancy of the data covered via the 
replicationFactor is there a reason for non-leader cores to create their own 
data directory rather than doing reads on the existing master copy? I searched 
Jira for anything that suggests this behavior might change and didn't find any 
issues; is there any intent to address this?

Thanks,
Greg

Some highlighted snippets aren't being returned

2013-08-07 Thread Eric O'Hanlon

Hi Everyone,

I'm facing an issue in which my solr query is returning highlighted snippets 
for some, but not all results.  For reference, I'm searching through an index 
that contains web crawls of human-rights-related websites.  I'm running solr as 
a webapp under Tomcat and I've included the query's solr params from the Tomcat 
log:

...
webapp=/solr-4.2
path=/select
params={facet=truesort=score+descgroup.limit=10spellcheck.q=Unanganf.mimetype_code.facet.limit=7hl.simple.pre=codeq.alt=*:*f.organization_type__facet.facet.limit=6f.language__facet.facet.limit=6hl=truef.date_of_capture_.facet.limit=6group.field=original_urlhl.simple.post=/codefacet.field=domainfacet.field=date_of_capture_facet.field=mimetype_codefacet.field=geographic_focus__facetfacet.field=organization_based_in__facetfacet.field=organization_type__facetfacet.field=language__facetfacet.field=creator_name__facethl.fragsize=600f.creator_name__facet.facet.limit=6facet.mincount=1qf=text^1hl.fl=contentshl.fl=titlehl.fl=original_urlwt=rubyf.geographic_focus__facet.facet.limit=6defType=edismaxrows=10f.domain.facet.limit=6q=Unanganf.organization_based_in__facet.facet.limit=6q.op=ANDgroup=truehl.usePhraseHighlighter=true}
 hits=8 status=0 QTime=108
...

For the query above (which can be simplified to say: find all documents that 
contain the word unangan and return facets, highlights, etc.), I get five 
search results.  Only three of these are returning highlighted snippets.  
Here's the highlighting portion of the solr response (note: printed in ruby 
notation because I'm receiving this response in a Rails app):


highlighting=
  
{20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
{},
   
20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
{},
   
20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
{},
   20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf=
{contents=
  [...actual snippet is returned here...]},
   20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf=
{contents=
  [...actual snippet is returned here...]},
   
20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999=
{contents=
  [...actual snippet is returned here...]},
   
20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=componentformat=raw=
{contents=
  [...actual snippet is returned here...]},
   
20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf=
{}}


I have eight (as opposed to five) results above because I'm also doing a 
grouped query, grouping by a field called original_url, and this leads to 
five grouped results.

I've confirmed that my highlight-lacking results DO contain the word unangan, 
as expected, and this term is appearing in a text field that's indexed and 
stored, and being searched for all text searches.  For example, one of the 
search results is for a crawl of this document: 
http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf

And if you view that document on the web, you'll see that it does contain 
unangan.

Has anyone seen this before?  And does anyone have any good suggestions for 
troubleshooting/fixing the problem?

Thanks!

- Eric

Re: Internal shard communication - performance?


Three zookeepers give you bare minimum high availability - one can go down.

But... I would personally assert that running embedded zookeeper is 
inherently not high availability, just by definition (okay, by MY 
definition.)


You didn't say whether you were running embedded zookeeper or not.

But if you were, to be HA, your cluster should be able to have all but one 
node per shard go down and your cluster should still service both queries 
and updates. But with embedded zookeeper on a four-node cluster, taking down 
two of the nodes running embedded zookeeper would make zookeeper no longer 
usable, and hence your cluster would not be HA.


-- Jack Krupansky

-Original Message- 
From: Torsten Albrecht

Sent: Wednesday, August 07, 2013 1:15 PM
To: solr-user
Subject: Internal shard communication - performance?

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per 
second. At my live system (20 shards) I get 200 requests per second.


Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high 
availability system?



Regards,

Torsten

RE: external zookeeper with SolrCloud

2013-08-07 Thread Joshi, Shital

I started looking into what I might have missed while upgrading to Solr 4.4. 
and I noticed that solr.xml in Solr 4.4 has this:

solr

  solrcloud
str name=host${host:}/str
int name=hostPort${jetty.port:8983}/int
str name=hostContext${hostContext:solr}/str
int name=zkClientTimeout${zkClientTimeout:15000}/int
bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool
  /solrcloud

  shardHandlerFactory name=shardHandlerFactory
class=HttpShardHandlerFactory
int name=socketTimeout${socketTimeout:0}/int
int name=connTimeout${connTimeout:0}/int
  /shardHandlerFactory

/solr


While our solr.xml has this:
solr persistent=true

cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} 
hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTim
eout=${zkClientTimeout:15000}
core name=collection1 instanceDir=collection1 shard=${shard:} 
dataDir=${solr.data.dir} /
  /cores

/solr

Do you think not having shardHandlerFactory is causing this bug to appear on 
our end? 



-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Wednesday, August 07, 2013 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: external zookeeper with SolrCloud

You said earlier that you had 6 zookeeper instances, but the zkHost param
only shows 5 instances... is that correct?


On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Machines are definitely up. Solr4 node and zookeeper instance share the
 machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know
 about the zk instances.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, August 06, 2013 5:03 PM
 To: solr-user@lucene.apache.org
 Subject: Re: external zookeeper with SolrCloud

 First off, even 6 ZK instances are overkill, vast overkill. 3 should be
 more than enough.

 That aside, however, how are you letting your Solr nodes know about the zk
 machines?
 Is it possible you've pointed some of your Solr nodes at specific ZK
 machines
 that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3

 Best
 Erick


 On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,
 
  We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
  We have 6 zookeeper instances. We are planning to change to odd number of
  zookeeper instances.
 
  With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
  never connects to zookeeper (can't see the admin page) until all
 zookeeper
  instances are up and we restart all solr nodes. It was suggested that it
  could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and
  this bug is solved in Solr 4.4
 
  We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of
 6
  zookeeper instances and then brought up all ten Solr4 nodes. We kept
 seeing
  this exception in Solr logs:
 
  751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
  0x0 for server null, unexpected error, closing socket connection and
  attempting reconnect java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
  at
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
  at
  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 
  And after a while saw this exception.
 
  INFO  - 2013-08-05 22:24:07.582;
  org.apache.solr.common.cloud.ConnectionManager; Watcher
  org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
 Watcher:
  qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
  qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
  event WatchedEvent state:SyncConnected type:None path:null path:null
  type:None
  INFO  - 2013-08-05 22:24:07.662;
  org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status
  change trigger but we are already closed
  754311 [main-EventThread] INFO
   org.apache.solr.common.cloud.ConnectionManager  ? Client-ZooKeeper
 status
  change trigger but we are already closed
 
  We brought up all zookeeper instances but the cloud never came up until
  all solr nodes were restarted. Do we need to change any settings? After
  weekend reboot, all zookeeper instances come up one by one. While
 zookeeper
  instances are coming up solr nodes are also getting started. With this
  issue, we have to put checks to make sure all zookeeper instances are up
  before we bring up any solr node.
 
  Thanks!!
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, June 11, 2013 10:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: external zookeeper with SolrCloud
 
 
  On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com
 wrote:
 
   Thanks Mark.

Re: DIH Problem: create multiple docs from a single entity

2013-08-07 Thread SolrLover

I suppose you can use Substring and Charindex to perform your task at SQL
level then use the value in another entity in DIH..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Problem-create-multiple-docs-from-a-single-entity-tp4083050p4083106.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)

2013-08-07 Thread Daniel Collins

It does seem that the Lucene42DocValuesProducer has changed its internal
version and that is what its complaining about.

Cheers Shawn, Ok my misunderstanding on the codec stuff then, as I said
probably not a common occurrence but good to know.


On 7 August 2013 17:32, Shawn Heisey s...@elyograg.org wrote:

 On 8/7/2013 3:33 AM, Daniel Collins wrote:

 I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but
 hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec).

 I then had to back-out and return to a 4.3 system, and got an error when
 it
 tried to read the index.

 Now, it was only a dev system, so not a problem, and normally I would use
 restore a backup anyway, but shouldn't this work?  If I haven't changed
 the
 codec, then Solr 4.4 should be using the same code as 4.3, so the data
 should be compatible, no?


 Using an index from a newer version is never guaranteed, and usually will
 NOT work.  The luceneMatchVersion setting doesn't typically affect index
 format, it usually affects how analysis and query parser components work -
 so you can tell Solr to use buggy behavior from an earlier release.

 Unless you actually change aspects of the codec (postings format,
 docvalues format, etc), Solr uses the Lucene codec defaults, which can (and
 usually does) change from release to release.

 Looking through the Lucene 4.4 CHANGES.txt file (not the Solr file),
 LUCENE-4936 looks like a change to the DocValues format.  I can't tell from
 the description whether LUCENE-5035 is a format change or a change in how
 Lucene handles sorting in memory.  The evidence I can find suggests that
 the format is still called Lucene42DocValuesFormat, but apparently it
 doesn't work the same.

 Thanks,
 Shawn

Re: DIH Problem: create multiple docs from a single entity

2013-08-07 Thread Mikhail Khludnev

Hello Lee,

Unfortunately no. It's possible to read csv field by
http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource but
there is no csv like EntityProcessor, which can broke line on entities.
Transformers can not emit new entities.


On Wed, Aug 7, 2013 at 8:10 PM, Lee Carroll lee.a.carr...@googlemail.comwrote:

 Hi

 I've 2 tables with the following data

 table 1
 id treatment_list
 1 a,b
 2 b,c

 table 2
 treatment id, name
 a  name1
 b  name 2
 c  name 3

 Using DIH can you create an index of the form

 id-treatment-id name
 1a  name1
 1b  name2
 2b  name2
 2c  name3

 In short can I splt the comma separated field and process each as an
 entity. From the docs and the wiki I can't see anything obvious.

 I feel I'm missing something easier here. (Note its not my data so can't do
 anything with the dodgy csv field )




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

How to parse multivalued data into single valued fields?

2013-08-07 Thread eShard

Hi,
I'm currently using solr 4.0 final with Manifoldcf v1.3 dev.
I have multivalued titles (the names are all the same so far) that must go
into a single valued field.
Can a transformer do this?
Can anyone show me how to do it?

And this has to fire off before an update chain takes place.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html
Sent from the Solr - User mailing list archive at Nabble.com.

TermFrequency in a multi-valued field

2013-08-07 Thread Jeff Wartes


This might end up being more of a Lucene question, but anyway...

For a multivalued field, it appears that term frequency is calculated as
something a little like:

sum(tf(value1), ..., tf(valueN))

I'd rather my score not give preference based on how *many* of the values
in the multivalued field matched, I want it to give preference based on
the value that matched *best*. In other words, something more like:

max(tf(value1), ..., tf(valueN))


Put another way, I want a search like q=mvf:foo against a document with a
multivalued field: 
mvf: [ foo ]
to get scored the exact same as a document with a multivalued field:
mvf: [ foo, foo ]
but worse than a document with a multivalued field:
mvf: [ foo foo ]


I'm guessing this'd require a custom Similarity implementation, but I'm
beginning to wonder if even that is low enough level.
Other thoughts? This seems like a pretty obvious desire.

Thanks.

Re: How to parse multivalued data into single valued fields?


before an update chain

Really? Why?

And if so, then you will definitely have to deal with it before handing the 
data to Solr since the update chain is where preprocessing of input data 
normally happens for updates in Solr.


Be specific as to what processing you want to occur. Provide an example if 
you can.


-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Wednesday, August 07, 2013 3:10 PM
To: solr-user@lucene.apache.org
Subject: How to parse multivalued data into single valued fields?

Hi,
I'm currently using solr 4.0 final with Manifoldcf v1.3 dev.
I have multivalued titles (the names are all the same so far) that must go
into a single valued field.
Can a transformer do this?
Can anyone show me how to do it?

And this has to fire off before an update chain takes place.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TermFrequency in a multi-valued field

A multivalued text field is directly equivalent to concatenating the values, 
with a possible position gap between the last and first terms of adjacent 
values.


Term frequency is driven by the terms from the query, not the terms from the 
field(tf(query-term), not tf(field-term)). Your max formula doesn't quite 
make sense in that sense.


Why do you have two foo in the same field if you don't mean them to be... 
two foo??


You can use the Uniq update processer to eliminate duplicate values in 
multivalued fields (where the whole value matches, not individual terms 
within values.)


You need to clarify your use case.

-- Jack Krupansky

-Original Message- 
From: Jeff Wartes

Sent: Wednesday, August 07, 2013 4:05 PM
To: solr-user@lucene.apache.org
Subject: TermFrequency in a multi-valued field


This might end up being more of a Lucene question, but anyway...

For a multivalued field, it appears that term frequency is calculated as
something a little like:

sum(tf(value1), ..., tf(valueN))

I'd rather my score not give preference based on how *many* of the values
in the multivalued field matched, I want it to give preference based on
the value that matched *best*. In other words, something more like:

max(tf(value1), ..., tf(valueN))


Put another way, I want a search like q=mvf:foo against a document with a
multivalued field:
mvf: [ foo ]
to get scored the exact same as a document with a multivalued field:
mvf: [ foo, foo ]
but worse than a document with a multivalued field:
mvf: [ foo foo ]


I'm guessing this'd require a custom Similarity implementation, but I'm
beginning to wonder if even that is low enough level.
Other thoughts? This seems like a pretty obvious desire.

Thanks.

Re: Internal shard communication - performance?

2013-08-07 Thread Torsten Albrecht

Hi Jack,

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth zookeeper 
will be a virtual machine.


Torsten


Von: Jack Krupansky
Gesendet: ?Mittwoch?, ?7?. ?August? ?2013 ?20?:?05
An: solr-user@lucene.apache.org

Three zookeepers give you bare minimum high availability - one can go down.

But... I would personally assert that running embedded zookeeper is
inherently not high availability, just by definition (okay, by MY
definition.)

You didn't say whether you were running embedded zookeeper or not.

But if you were, to be HA, your cluster should be able to have all but one
node per shard go down and your cluster should still service both queries
and updates. But with embedded zookeeper on a four-node cluster, taking down
two of the nodes running embedded zookeeper would make zookeeper no longer
usable, and hence your cluster would not be HA.

-- Jack Krupansky

-Original Message-
From: Torsten Albrecht
Sent: Wednesday, August 07, 2013 1:15 PM
To: solr-user
Subject: Internal shard communication - performance?

Hi,

I use a system with solr 3 and 20 shards (3 million docs per shard).

At a testsystem with one shard (60 million docs) I get 750 requests per
second. At my live system (20 shards) I get 200 requests per second.

Is the internal communication between the 20 shards a performance killer?

Another question. Is a solr 4 system with solrcloud and Zookeeper a high
availability system?


Regards,

Torsten

Filtering suggestion results

2013-08-07 Thread rohitrmd

Hi,
 I have question regarding suggester component.
Can we filter suggestion results depending on particular value of filed?
like fq=column1:value1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filtering-suggestion-results-tp4083121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TermFrequency in a multi-valued field

2013-08-07 Thread Jeff Wartes



A multivalued text field is directly equivalent to concatenating the
values, 
with a possible position gap between the last and first terms of adjacent
values.


That, in a nutshell, would be the problem. Maybe the discussion is over at
this point. 


It could be I dumbed down the problem a bit too much for illustration
purposes. I'm actually doing phrase query matches with slop. As such, the
search phrase I'm interested in could easily be in more than one of the
(unique) values, and the score for each value-match could be very
different when considered alone.

For document scoring purposes, I don't care that (for example) I got a
sloppy match on one value if I got a nice phrase out of another value in
the same document. In fact, I explicitly want to ignore the fact that
there was also a sloppy match. I also don't care if the exact phrase
occurred in more than one value, and I don't want the case where it does
match more than one influencing that document's score.

logging UI stops working when additional handlers defined

2013-08-07 Thread Grzegorz Sobczyk

I run solr on tomcat with configured JUL to log solr to separate file:

org.apache.solr.level = INFO
org.apache.solr.handlers = 4solrerr.org.apache.juli.FileHandler

I've noticed that logging UI stops working. Is it normal behavior or is
it bug?
(When cores are initialized JulWatcher is registered only for root logger.)

-- 
Grzegorz Sobczyk

Is it possible to use phrase query in range queries?

2013-08-07 Thread SolrLover

I am trying to use range queries to take advantage of having constant scores
in multivalued field but I am not sure if range queries support phrase
query..

Ex:

The below range query works fine.
str name=q
_query_:address:([Charlotte TO Charlotte])^5.5
/str

The below query doesn't work,

str name=q
_query_:address:([Charlotte NC TO Charlotte NC])^5.5
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-phrase-query-in-range-queries-tp4083132.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [POLL] Who how does use admin-extra ?

2013-08-07 Thread Stefan Matheis

Hmmm .. Didn't get at least one answer (except from Shawn in #solr, telling me 
he's using a 0 byte file to avoid errors :p) - does that mean, that really no 
one is using it?

Don't be afraid .. tell me, one way or another :)

- Stefan  


On Wednesday, July 17, 2013 at 8:50 AM, Stefan Matheis wrote:

 Hey List  
  
 I would be interested to hear who is using admin-extra Functionality in the 
 4.x UI and especially _how_ that is used: for displaying graphs, providing 
 links for $other_tool, adding other menu items … ?
  
 The main reason i'm asking is .. i don't use it myself and i'm always curious 
 while i have to touch it. I can test the example we provide, but that is very 
 basic and doesn't necessarily reflect real-world scenarios.
  
 So .. tell me - I'm happy to hear everything .. reports on usage, suggestions 
 for improvements, … :)
  
 - Stefan

Obtain shard routing key during document insert

2013-08-07 Thread Terry P.

Is it possible to obtain the shard routing key from within an
UpdateRequestProcessor when a document is being inserted?

Many thanks,
Terry

Re: [POLL] Who how does use admin-extra ?

2013-08-07 Thread Otis Gospodnetic

Didn't somebody once say this is used for customization of admin pages?

Otis
--
SOLR Performance Monitoring -- http://sematext.com/spm
Solr  ElasticSearch Support -- http://sematext.com/


On Thu, Aug 8, 2013 at 12:24 AM, Stefan Matheis
matheis.ste...@gmail.com wrote:
 Hmmm .. Didn't get at least one answer (except from Shawn in #solr, telling 
 me he's using a 0 byte file to avoid errors :p) - does that mean, that really 
 no one is using it?

 Don't be afraid .. tell me, one way or another :)

 - Stefan


 On Wednesday, July 17, 2013 at 8:50 AM, Stefan Matheis wrote:

 Hey List

 I would be interested to hear who is using admin-extra Functionality in 
 the 4.x UI and especially _how_ that is used: for displaying graphs, 
 providing links for $other_tool, adding other menu items … ?

 The main reason i'm asking is .. i don't use it myself and i'm always 
 curious while i have to touch it. I can test the example we provide, but 
 that is very basic and doesn't necessarily reflect real-world scenarios.

 So .. tell me - I'm happy to hear everything .. reports on usage, 
 suggestions for improvements, … :)

 - Stefan

Re: Is it possible to use phrase query in range queries?

The ends of a range query are indeed single terms - they are not queries or 
any term that would analyze into multiple terms.


In some cases you might want composite values as strings so that you can do 
a range on terms.


For example, city + ,  + state as a string.

-- Jack Krupansky

-Original Message- 
From: SolrLover

Sent: Wednesday, August 07, 2013 5:53 PM
To: solr-user@lucene.apache.org
Subject: Is it possible to use phrase query in range queries?

I am trying to use range queries to take advantage of having constant scores
in multivalued field but I am not sure if range queries support phrase
query..

Ex:

The below range query works fine.
str name=q
_query_:address:([Charlotte TO Charlotte])^5.5
/str

The below query doesn't work,

str name=q
_query_:address:([Charlotte NC TO Charlotte NC])^5.5
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-phrase-query-in-range-queries-tp4083132.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to pass ranges / values to form query in search component?

2013-08-07 Thread Noob

Hi,

I am currently passing the query by passing the values to my search
component.

For ex:

http://localhost:8983/solr/select?firstname=charleslastname=dawsonqt=person

Person search component is configured to accept the values and form the
query

 str name=q
(
  _query_:{!wp_dismax qf=fname^8.3 v=$firstname} OR
  _query_:{!wp_dismax qf=lname^8.6 v=$lastname}
)
/str

Now I am trying to figure out a way to pass the values / ranges like below
but I am getting syntax errors..

Ex:

 str name=q
(
  _query_:{!v=fname:$firstname} OR
  _query_:{!v=fname:([$firstname to $firstname])^8.3} 
)

Can someone let me know if theres a way to overcome this issue?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-pass-ranges-values-to-form-query-in-search-component-tp4083141.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Internal shard communication - performance?


On 8/7/2013 2:45 PM, Torsten Albrecht wrote:

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth zookeeper 
will be a virtual machine.


For true HA with zookepeer, you need at least three instances on 
separate physical hardware.  If you want to use VMs, that would be fine, 
but you must ensure that you aren't running more than one instance on 
the same physical server.


For best results, use an odd number of ZK instances.  With three ZK 
instances, one can go down and everything still works.  With five, two 
can go down and everything still works.


If you've got a fully switched network that's at least gigabit speed, 
then the network latency involved in internal communication shouldn't 
really matter.


Thanks,
Shawn

Re: How to pass ranges / values to form query in search component?

Something smells fishy here... why do you think you need to do this using 
nested queries and parameter names?


Sounds like you're engaging in premature complication. Try simpler 
approaches first.


-- Jack Krupansky

-Original Message- 
From: Noob

Sent: Wednesday, August 07, 2013 6:45 PM
To: solr-user@lucene.apache.org
Subject: How to pass ranges / values to form query in search component?

Hi,

I am currently passing the query by passing the values to my search
component.

For ex:

http://localhost:8983/solr/select?firstname=charleslastname=dawsonqt=person

Person search component is configured to accept the values and form the
query

str name=q
   (
 _query_:{!wp_dismax qf=fname^8.3 v=$firstname} OR
 _query_:{!wp_dismax qf=lname^8.6 v=$lastname}
   )
/str

Now I am trying to figure out a way to pass the values / ranges like below
but I am getting syntax errors..

Ex:

str name=q
(
 _query_:{!v=fname:$firstname} OR
 _query_:{!v=fname:([$firstname to $firstname])^8.3}
)

Can someone let me know if theres a way to overcome this issue?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-pass-ranges-values-to-form-query-in-search-component-tp4083141.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about soft commit and updateRequestProcessorChain

How are you allowing for a soft commit? IOW how are you triggering it?

And what do you speculate the updateRequestProcessorChain has to do with
soft commit?

Best
Erick


On Wed, Aug 7, 2013 at 1:04 PM, Jack Park jackp...@topicquests.org wrote:

 If one allows for a soft commit (rather than a hard commit on each
 request), when does the updateRequestProcessorChain fire? Does it fire
 after the commit?

 Many thanks
 Jack

Re: Question about soft commit and updateRequestProcessorChain

Most update processor chains will be configured with the Run Update 
processor as the last processor of the chain. That's were the Lucene index 
update and optional commit would be done.


-- Jack Krupansky

-Original Message- 
From: Jack Park

Sent: Wednesday, August 07, 2013 1:04 PM
To: solr-user@lucene.apache.org
Subject: Question about soft commit and updateRequestProcessorChain

If one allows for a soft commit (rather than a hard commit on each
request), when does the updateRequestProcessorChain fire? Does it fire
after the commit?

Many thanks
Jack

Re: Solr design. Choose Cores or Shards?


On 8/6/2013 8:49 PM, manju16832003 wrote:

My Confusion is it feasible to choose many cores or use shards. I do not
have much experience on how shards works and why they are used for. I would
like to know the suggestions :-) for the design like this.
What are the implications if I were to choose to use many cores and handle
stuff at application level calling different cores.


Although shards and cores refer to slightly different things, when it
comes right down to it, it's difficult to separate the two concepts. 
Short version: Shards are implemented using cores.  The long version 
follows below.


A core is a functionally complete Solr index.  You can have more than 
one core per Solr instance.  Multiple cores are discussed in the 
CoreAdmin wiki page: http://wiki.apache.org/solr/CoreAdmin


Shards refer to a concept in distributed search.  The index is divided
into pieces.  The request comes in to Solr.  Solr forwards the request 
to each shard.  It then analyzes each shard result into a combined 
result, pulls the requested fields out of each shard, and sends the 
response to the requester.


If you are planning a new deployment of a sharded index, you probably
will want to use SolrCloud.  It's possible to use shards without
SolrCloud, but SolrCloud automates everything and makes it MUCH easier.

In SolrCloud, a collection is a logical index.  A collection is composed
of one or more shards.  It is perfectly acceptable to have only one
shard in an index, in which case it won't be using distributed search,
but the following still applies:

Each shard is composed of replicas.  If your replicationFactor is 2,
then when your cloud is operating normally, you'll have two replicas of
each shard.  If the replicationFactor is 5, then you'll have five
replicas.  One of those replicas will be elected as leader for that
shard.  You can have a replicationFactor of 1, in which case there will 
only be one copy, but it will not be a fault-tolerant setup.


Now for the relationship between shards and cores:  Each replica of a 
shard *IS* a core.  All of the cores in a single collection will 
typically have the same configuration and schema.


More info about SolrCloud:

http://wiki.apache.org/solr/SolrCloud

Thanks,
Shawn

SOLR Copy field if no value on destination

2013-08-07 Thread Luís Portela Afonso

Hi,

Is possible to copy a value of a field to another if the destination doesn't 
have value?
An example:
Indexing an rss
The feed has the fields link and guid, but sometimes guid cannot be present in 
the feed
I have a field that i will copy values with the name finalLink

Now i want to copy guid to finalLink, but if guid has not value i want to copy 
link. 

My question is, is that possible just with the schema, Processors, 
solrconfig.xml, and the data-config?

Thanks a lot

smime.p7s
Description: S/MIME cryptographic signature

Re: Measuring SOLR performance

2013-08-07 Thread Roman Chyla

Hi Dmitry,
The command seems good. Are you sure your shell is not doing something
funny with the params? You could try:

python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx -a

where g1 and foo are results of the individual runs, ie. something that was
started and saved with '-R g1' and '-R foo' respectively

so, for example, i have these comparisons inside
'/var/lib/montysolr/different-java-settings/solrjmeter', so I am generating
the comparison by:

export SOLRJMETER_HOME=/var/lib/montysolr/different-java-settings/solrjmeter
python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx -a


roman


On Wed, Aug 7, 2013 at 10:03 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 One more question. I tried to compare different runs (g1 vs cms) using the
 command below, but get an error. Should I attach some other param(s)?


 python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx
 **ERROR**
   File solrjmeter.py, line 1427, in module
 main(sys.argv)
   File solrjmeter.py, line 1303, in main
 check_options(options, args)
   File solrjmeter.py, line 185, in check_options
 error(The folder '%s' does not exist % rf)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 The folder '0' does not exist

 Dmitry




 On Wed, Aug 7, 2013 at 4:13 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi Roman,
 
  Finally, this has worked! Thanks for quick support.
 
  The graphs look awesome. At least on the index sample :) It is quite easy
  to setup and run + possible to run directly on the shard server in
  background mode.
 
  my test run was:
 
  python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
  ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60
 -R
  foo -t /solr/statements -e statements
 
  Thanks!
 
  Dmitry
 
 
  On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
  Hi Dmitry,
 
  I've modified the solrjmeter to retrieve data from under the core (the
 -t
  parameter) and the rest from the /solr/admin - I could test it only
  against
  4.0, but it is there the same as 4.3 - it seems...so you can try the
 fresh
  checkout
 
  my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t
  /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate
 
  Thanks!
 
  roman
 
 
  On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
 
   Hi,
  
   Thanks for the clarification, Shawn!
  
   So with this in mind, the following work:
  
   http://localhost:8983/solr/statements/admin/system?wt=json
   http://localhost:8983/solr/statements/admin/mbeans?wt=json
  
   not copying their output to save space.
  
   Roman:
  
   is this something that should be set via -t parameter as well?
  
   Dmitry
  
  
  
   On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 8/6/2013 6:17 AM, Dmitry Kan wrote:
 Of three URLs you asked for, only the 3rd one gave response:
snip
 The rest report 404.

 On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla 
 roman.ch...@gmail.com
wrote:

 Hi Dmitry,
 So I think the admin pages are different on your version of solr,
  what
do
 you see when you request... ?

 http://localhost:8983/solr/admin/system?wt=json
 http://localhost:8983/solr/admin/mbeans?wt=json
 http://localhost:8983/solr/admin/cores?wt=json
   
Unless you have a valid defaultCoreName set in your (old-style)
solr.xml, the first two URLs won't work, as you've discovered.
   Without
that valid defaultCoreName (or if you wanted info from a different
core), you'd need to add a core name to the URL for them to work.
   
The third one, which works for you, is a global handler for
  manipulating
cores, so naturally it doesn't need a core name to function.  The
 URL
path for this handler is defined by solr.xml.
   
Thanks,
Shawn

Re: Document Similarity Algorithm at Solr/Lucene

2013-08-07 Thread Lance Norskog


Block-quoting and plagiarism are two different questions.

Block-quoting is simple: break the text apart into sentences or even 
paragraphs and make them separate documents. Make facets of the 
post-analysis text. Now just pull counts of facets and block quotes will 
be clear.


Mahout has a scalable implementation of n-gram based document 
similarity. It calculates distances between all documents and identifies 
clusters of similar documents. This is a much more general technique and 
may help you find obfuscated plagiarism.


Lance

On 07/23/2013 02:33 AM, Furkan KAMACI wrote:

Hi;

Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a blog post at another blog post.
Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
detect it?

Re: Question about soft commit and updateRequestProcessorChain

2013-08-07 Thread Jack Park

Ok. So, running the update processor chain *is* the commit process?

In answer to Erick's question: my habit, an old and apparently bad
one, has been to call a hard commit at the end of each update. My
question had to do with allowing soft commits to be controlled by
settings in solrconfig.xml, say every 30 seconds or something like
that (I really haven't studied such options yet).

I ask this question because I add an additional call to the update
processor, which, after running Lucene, the document is then sent
outside to an agent network for further processing. I needed to know
if the document was already committed by that time.

I am inferring from here that the document has been committed after
the first step in the update processor chain, even if that's based on
a soft commit.

Thanks!
JackP

On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky j...@basetechnology.com wrote:
 Most update processor chains will be configured with the Run Update
 processor as the last processor of the chain. That's were the Lucene index
 update and optional commit would be done.

 -- Jack Krupansky

 -Original Message- From: Jack Park
 Sent: Wednesday, August 07, 2013 1:04 PM
 To: solr-user@lucene.apache.org
 Subject: Question about soft commit and updateRequestProcessorChain


 If one allows for a soft commit (rather than a hard commit on each
 request), when does the updateRequestProcessorChain fire? Does it fire
 after the commit?

 Many thanks
 Jack

Re: Question about soft commit and updateRequestProcessorChain


No and No...

Commit has a life of its own. Autocommit can occur based on time and number 
of documents, independent of the update processor chain. For example, you 
can send a few updates with commit within and sit there idle doing no 
commands and then suddenly after the commitWithin interval the commit 
magically happens. CommitWithin is a recommended approach - just pick the 
desired time interval.


Unless you have an explicit commit in your update command, there is no 
guarantee of Run Update doing a commit.


No, the document is not committed after the first step in the update 
processor chain - the Run Update is usually the last or next to last (like 
if you use the Log Update processor) processor in the chain. IFF you 
requested commit, soft or hard, on your update command, the commit will 
occur on the Run Update processor step of the chain.




-- Jack Krupansky
-Original Message- 
From: Jack Park

Sent: Wednesday, August 07, 2013 7:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about soft commit and updateRequestProcessorChain

Ok. So, running the update processor chain *is* the commit process?

In answer to Erick's question: my habit, an old and apparently bad
one, has been to call a hard commit at the end of each update. My
question had to do with allowing soft commits to be controlled by
settings in solrconfig.xml, say every 30 seconds or something like
that (I really haven't studied such options yet).

I ask this question because I add an additional call to the update
processor, which, after running Lucene, the document is then sent
outside to an agent network for further processing. I needed to know
if the document was already committed by that time.

I am inferring from here that the document has been committed after
the first step in the update processor chain, even if that's based on
a soft commit.

Thanks!
JackP

On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky j...@basetechnology.com 
wrote:

Most update processor chains will be configured with the Run Update
processor as the last processor of the chain. That's were the Lucene index
update and optional commit would be done.

-- Jack Krupansky

-Original Message- From: Jack Park
Sent: Wednesday, August 07, 2013 1:04 PM
To: solr-user@lucene.apache.org
Subject: Question about soft commit and updateRequestProcessorChain


If one allows for a soft commit (rather than a hard commit on each
request), when does the updateRequestProcessorChain fire? Does it fire
after the commit?

Many thanks
Jack

Re: SOLR Copy field if no value on destination

Yes, it is possible to copy from a field to another field that has no value.

In fact, that is the only kind of copy you should be doing unless the field is 
multivalued.

IOW, copy field is not “replace field”.

-- Jack Krupansky

From: Luís Portela Afonso 
Sent: Wednesday, August 07, 2013 7:22 PM
To: solr-user@lucene.apache.org 
Subject: SOLR Copy field if no value on destination

Hi, 

Is possible to copy a value of a field to another if the destination doesn't 
have value?
An example:
  a.. Indexing an rss 
  b.. The feed has the fields link and guid, but sometimes guid cannot be 
present in the feed 
  c.. I have a field that i will copy values with the name finalLink

Now i want to copy guid to finalLink, but if guid has not value i want to copy 
link. 

My question is, is that possible just with the schema, Processors, 
solrconfig.xml, and the data-config?

Thanks a lot

Re: SOLR Copy field if no value on destination

Sorry, I am unable to untangle the logic you are expressing, but I can can 
assure you that  JavaScript and the StatelessScriptUpdate processor has full 
support for implementing spaghetti code logic as tangled as desired!

Simpler forms of logic can be implemented directly using non-script update 
processor sequences, but once you start adding conditionals, there is a 50% 
chance that you will need a script.

There is a Default Value update processor, but it takes a literal value.

Hmmm... maybe I’ll come up with a “default-value” script that takes a field 
name for the default value. IOW, it would copy a specified field to the 
destination IFF the destination had no value.

Ahhh... wait... maybe... you could do this with the First Value Update 
processor:

1. Copy guid to FinalLink. (Clone Update processor).
2. Copy link to FinalLink. (Clone Update processor).
3. First Value Update processor.

So, step 3 would leave link if guid was not there, or keep guid if it is there 
and discard link.

Yes, that should do it.

This is worth an example in the book! Thanks for the inspiration!

-- Jack Krupansky

From: Luís Portela Afonso 
Sent: Wednesday, August 07, 2013 7:22 PM
To: solr-user@lucene.apache.org 
Subject: SOLR Copy field if no value on destination

Hi, 

Is possible to copy a value of a field to another if the destination doesn't 
have value?
An example:
  a.. Indexing an rss 
  b.. The feed has the fields link and guid, but sometimes guid cannot be 
present in the feed 
  c.. I have a field that i will copy values with the name finalLink

Now i want to copy guid to finalLink, but if guid has not value i want to copy 
link. 

My question is, is that possible just with the schema, Processors, 
solrconfig.xml, and the data-config?

Thanks a lot

Re: SOLR Copy field if no value on destination