Re: solr spell correction help

2013-04-15 Thread Rohan Thakur
k thanks jack but then why does cattle not giving kettle as suggestions??


On Fri, Apr 12, 2013 at 6:46 PM, Jack Krupansky j...@basetechnology.comwrote:

 blandars its not giving correction as blender

 They have an edit distance of 3. Direct Spell is limited to a maximum ED
 of 2.

 -- Jack Krupansky

 -Original Message- From: Rohan Thakur
 Sent: Friday, April 12, 2013 8:45 AM
 To: solr-user@lucene.apache.org
 Subject: solr spell correction help


 hi all

 I have configured solr direct spell correction on spell field most of the
 words solr is correcting and giving suggestions but on some words like
 mentioned below is giving absurd results:

 1) blender(indexed)
 2) kettle(indexed)
 3) electric(indexed)

 problems:
 1) when I search for blandar its giving correct result as blender but when
 I search for blandars its not giving correction as blender

 2) for this when I search for kettle the correct spell its still showing it
 to be false but not giving suggestions and even the results documents are
 showing up. and when I search for cettle its giving correct result as
 kettle but when I search for cattle its not giving any suggestions.

 3) for this again when I search for electric the correct spell its showing
 it to be false in suggestions section but not giving any suggestions and
 documents are also returning for this spelling as its the correct one.

 even if I want solr to return samsung as spell suggetion if I search for
 sam what could be the configuration and what could be the solution for
 above problems? please help.

 thanks in advance

 regards
 Rohan



Re: solr spell correction help

2013-04-15 Thread Rohan Thakur
but jack im not using lavanstine distance measures im using jarowinker
distance


On Mon, Apr 15, 2013 at 11:50 AM, Rohan Thakur rohan.i...@gmail.com wrote:

 k thanks jack but then why does cattle not giving kettle as suggestions??


 On Fri, Apr 12, 2013 at 6:46 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 blandars its not giving correction as blender

 They have an edit distance of 3. Direct Spell is limited to a maximum ED
 of 2.

 -- Jack Krupansky

 -Original Message- From: Rohan Thakur
 Sent: Friday, April 12, 2013 8:45 AM
 To: solr-user@lucene.apache.org
 Subject: solr spell correction help


 hi all

 I have configured solr direct spell correction on spell field most of the
 words solr is correcting and giving suggestions but on some words like
 mentioned below is giving absurd results:

 1) blender(indexed)
 2) kettle(indexed)
 3) electric(indexed)

 problems:
 1) when I search for blandar its giving correct result as blender but when
 I search for blandars its not giving correction as blender

 2) for this when I search for kettle the correct spell its still showing
 it
 to be false but not giving suggestions and even the results documents are
 showing up. and when I search for cettle its giving correct result as
 kettle but when I search for cattle its not giving any suggestions.

 3) for this again when I search for electric the correct spell its showing
 it to be false in suggestions section but not giving any suggestions and
 documents are also returning for this spelling as its the correct one.

 even if I want solr to return samsung as spell suggetion if I search for
 sam what could be the configuration and what could be the solution for
 above problems? please help.

 thanks in advance

 regards
 Rohan





Re: Does solr cloud support rename or swap function for collection?

2013-04-15 Thread Tim Vaillancourt

I added a brief description on CREATEALIAS here, feel free to tweak:

http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

Tim

On 07/04/13 05:29 PM, Mark Miller wrote:

It's pretty simple - just as Brad said, it's just

http://localhost:8983/solr/admin/collections?action=CREATEALIASname=aliascollections=collection1,collection2,…

You also have action=DELETEALIAS

CREATEALIAS will create and update.

For update requests, you only want a 1to1 alias. For read requests, you can map 
1to1 or 1toN.

I've also started work on shard level aliases, but I've yet to get back to 
finishing it.

- Mark

On Apr 7, 2013, at 5:10 PM, Tim Vaillancourtt...@elementspace.com  wrote:


I aim to use this feature in more in testing soon. I'll be sure to doc what I 
can.

Cheers,

Tim

On 07/04/13 12:28 PM, Mark Miller wrote:

On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com   wrote:


Thanks Mark for this great feature but I suggest you can update the wiki
too.

Yeah, I've stopped updating the wiki for a while now looking back - paralysis 
on how to handle versions (I didn't want to do the std 'this applies to 4.1', 
'this applied to 4.0' all over the page) and the current likely move to a new 
Confluence wiki with Docs based on documentation LucidWorks recently donated to 
the project.

That's all a lot of work away still I guess.

I'll try and add some basic doc for this to the SolrCloud wiki page soon.

- Mark


how to update document with DIH (FileDataSource)

2013-04-15 Thread Jeong-dae Ha
Hi, all

I am trying to index from both DB and file.
and informations from DB and file make one document.
so I decided update document which I have already indexed from DB.
I will use DIH because of millions of files if I find how to update
document with DIH.
I need your help.

Thanks in advance.

regards


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
 Our memory requirements are running amok. We have less than a quarter of
 our customers running now and even though we have allocated 25GB to the JVM
 already, we are still seeing daily OOM crashes.

Out of curiosity: Did you manage to pinpoint the memory eater in your
setup?

- Toke Eskildsen



Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
Yes and no,

The FieldCache is the big culprit. We do a huge amount of faceting so it
seems right. Unfortunately I am super swamped at work so I have precious
little time to work on this, which is what explains my silence.

Out of desperation, I added another 32G of memory to each server and
increased the JVM size to 64G from 25G. The servers are running with 96G
memory right now (this is the max amount supported by the hardware) which
leaves solr somewhat starved for memory. I am aware of the performance
implications of doing this but I have little choice.

The extra memory helped a lot, but it still OOM with about 180 clients
using it. Unfortunately I need to support at least double that. After
upgrading the RAM, I ran for almost two weeks with the same workload that
used to OOM a couple of times a day, so it doesn't look like a leak.

Today I finally managed to set up a test core so I can begin to play around
with docValues.

I actually have a couple of questions regarding docValues:
1) If I facet on multible fields and only some of those fields are using
docValues, will I still get the memory saving benefit of docValues? (one of
the facet fields use null values and will require a lot of work in our
product to fix)
2) If i just use docValues on one small core with very limited traffic at
first for testing purposes, how can I test that it is actually using the
disk for caching?

I really appreciate all the help I have received on this list so far. I do
feel confident that I will be able to solve this issue eventually.



On Mon, Apr 15, 2013 at 9:00 AM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
  Our memory requirements are running amok. We have less than a quarter of
  our customers running now and even though we have allocated 25GB to the
 JVM
  already, we are still seeing daily OOM crashes.

 Out of curiosity: Did you manage to pinpoint the memory eater in your
 setup?

 - Toke Eskildsen




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: SolrCloud vs Solr master-slave replication

2013-04-15 Thread Victor Ruiz
Hi Shawn,

thank you for your reply. 

I'll check if network card drivers are ok. About the RAM, the JVM max heap
size is currently 6GB, but it never reaches the maximum, tipically the used
RAM is not more than 5GB. should I assign more RAM? I've read that excess of
RAM assigned could have also a bad effect on the performance. Apart of the
RAM used by JVM, the server has more than 10GB of unused RAM, which should
be enough to cache the index.

About SolrCloud, I know it doesn't use master-slave replication, but
incremental updates, item by item. That's why I thought it could work for
us, since our bottleneck appear to be the replication cycles. But another
point is, if the indexing occurs in all servers, 1200 updates/min could also
overload the servers? and therefore have a worst performance than with
master-slave replication?

Regards,
Victor





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4055995.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to migrate solr 1.4 index to solr 4.2 index

2013-04-15 Thread Montu v Boda
hi

right now we have just moved 1.4 indexes to 4.2.1 and apply the test on that

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-migrate-solr-1-4-index-to-solr-4-2-index-tp4055531p4055997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Which tokenizer or analizer should use and field type

2013-04-15 Thread Erick Erickson
try executing these with debug=all and examine the resulting parsed query,
that'll show you exactly how the query is parsed.

Also, the query language is not strictly boolean, see:
http://searchhub.org/2011/12/28/why-not-and-or-and-not/

The first thing I would try would be to parenthesize explicitly as

keyword:((assistant AND coach) OR (iit AND kanpur))

Best
Erick

On Sat, Apr 13, 2013 at 7:06 PM, anurag.jain anurag.k...@gmail.com wrote:
 Hi, If you can help me in. It will solve my problem.

 keyword:(*assistant AND coach*) giving me 1 result.

 keyword:(*iit AND kanpur*)  giving me 2 result.

 But query:-

 keyword:(*assistant AND coach* OR (*iit AND kanpur*)) giving me only 1
 result.

 Also i tried. keyword:(*assistant AND coach* OR (*:* *iit AND kanpur*))
 giving me only 1 result. Don't know why.

 How query should look like ?? please help me to find out solution.

 Thanks in advance.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055837.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing My SQL Timestamp or Date Time field

2013-04-15 Thread Erick Erickson
Solr requires precise date formats, see:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/schema/DateField.html

Best
Erick

On Sun, Apr 14, 2013 at 11:43 AM, ursswak...@gmail.com
ursswak...@gmail.com wrote:
 Hi,

 To index Date in Solr, Date should be in ISO format.
 Can we index MySQL Timestamp or Date Time feild with out modifying SQL
 Select Statement ?

 I have used

 fieldType name=tdate class=solr.TrieDateField omitNorms=true
 precisionStep=6 positionIncrementGap=0/

  field name=CreatedDate type=tdate indexed=true stored=true /

 CreatedDate is of Type Date Time in MySQL

 I am getting following exception

 11:23:39,117 WARN
 [org.apache.solr.handler.dataimport.DateFormatTransformer] (Thread-72) Could
 not parse a Date field : java.text.ParseException: Unparseable date:
 2013-04-14 11:22:48.0


 Any help in fixing this Issue is really appreciated



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Indexing-My-SQL-Timestamp-or-Date-Time-field-tp4055894.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Test harness can not load existing index data in Solr 4.2

2013-04-15 Thread zhu kane
I'm extending Solr's *AbstractSolrTestCase* for unit testing.

I have existing 'schema.xml', 'solrconfig.xml' and index data. I want to
start an embedded solr server to load existing collection and its data.
Then test searching doc in solr.

This way works well in Solr 3.6. However it does not work any more after
adapting to Solr 4.2.1. After some investigating, I found it looks like the
index data is not loaded by SolrCore created by Test harness.

This also can be reproduced when using index of example doc of Solr, I
posted the detail test class in my stackoverflow question[1].

Is it a bug of test harness? Or is there better way to load existing index
data in unit test?

Thanks.
[1]
http://stackoverflow.com/questions/15947116/solr-4-2-test-harness-can-not-load-existing-index-data

Mengxin Zhu


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

 The FieldCache is the big culprit. We do a huge amount of faceting so
 it seems right.

Yes, you wrote that earlier. The mystery is that the math does not check
out with the description you have given us.

 Unfortunately I am super swamped at work so I have precious little
 time to work on this, which is what explains my silence.

No problem, we've all been there.
 
[Band aid: More memory]

 The extra memory helped a lot, but it still OOM with about 180 clients
 using it.

You stated earlier that you has a solr cluster and your total(?) index
size was 35GB, with each register being between 15k and 30k. I am
using the quotes to signify that it is unclear what you mean. Is your
cluster multiple machines (I'm guessing no), multiple Solr's, cores,
shards or maybe just a single instance prepared for later distribution?
Is a register a core, shard or a simply logical part (one client's data)
of the index?

If each client has their own core or shard, that would mean that each
client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
~= 200MB of index. That sounds quite high and you would need a very
heavy facet to reach that.

If you could grep UnInverted from the Solr log file and paste the
entries here, that would help to clarify things.


Another explanation for the large amount of memory presents itself if
you use a single index: If each of your clients facet on at least one
fields specific to the client (client123_persons or something like
that), then your memory usage goes through the roof.

Assuming an index with 10M documents, each with 5 references to a modest
10K unique values in a facet field, the simplified formula
  #documents*log2(#references) + #references*log2(#unique_values) bit
tells us that this takes at least 110MB with field cache based faceting.

180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
least double that. This fits neatly with your new heap of 64GB.


If my guessing is correct, you can solve your memory problems very
easily by sharing _all_ the facet fields between your clients.
This should bring your memory usage down to a few GB.

You are probably already restricting their searches to their own data by
filtering, so this should not influence the returned facet values and
counts, as compared to separate fields.

This is very similar to the thread Facets with 5000 facet fields BTW.

 Today I finally managed to set up a test core so I can begin to play
 around with docValues.

If you are using a single index with the individual-facet-fields for
each client approach, the DocValues will also have scaling issues, as
the amount of values (of which the majority will be null) will be
  #clients*#documents*#facet_fields
This means that the adding a new client will be progressively more
expensive.

On the other hand, if you use a lot of small shards, DocValues should
work for you.

Regards,
Toke Eskildsen




Re: Some Questions About Using Solr as Cloud

2013-04-15 Thread Furkan KAMACI
Hi Jack;

I see that SolrCloud makes everything automated. When I use SolrCloud is it
true that: there may be more than one computer responsible for indexing at
any time?

2013/4/15 Jack Krupansky j...@basetechnology.com

 There are no masters or slaves in SolrCloud - it's fully distributed. Some
 cluster nodes will be leaders (of the shard on that node) at a given
 point in time, but different nodes may be leaders at different points in
 time as they become elected.

 In a distributed cluster you would never want to store documents only on
 one node. Sure, you can do that by setting the replication factor to 1, but
 that defeats half the purpose for SolrCloud.

 Index transfer is automatic - SolrCloud supports fully distributed update.

 You might be getting confused with the old Master-Slave-Replication
 model that Solr had (and still has) which is distinct from SolrCloud.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Sunday, April 14, 2013 7:45 PM
 To: solr-user@lucene.apache.org
 Subject: Some Questions About Using Solr as Cloud


 I read wiki and reading SolrGuide of Lucidworks. However I want to clear
 something in my mind. Here are my questions:

 1) Does SolrCloud lets a multi master design (is there any document that I
 can read about it)?
 2) Let's assume that I use multiple cores i.e. core A and core B. Let's
 assume that there is a document just indexed at core B. If I send a search
 request to core A can I get result?
 3) When I use multi master design (if exists) can I transfer one master's
 index data into another (with its slaves or not)?
 4) When I use multi core design can I transfer one index data into another
 core or anywhere else?

 By the way thanks for the quick responses and kindness at mail list.



Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-15 Thread Dmitry Kan
Hi,

Does it work well, if you remove synonyms with spaces in them, like eighty
six ?

Dmitry


On Fri, Apr 5, 2013 at 3:43 AM, juancesarvillalba 
juancesarvilla...@gmail.com wrote:

 Hi I saw some similar problems in other threads but I think that this is a
 little different and couldn't get any solution.*I get the exception
 */org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
 eightysix exceeds length of provided text sized 80/This happens for example
 when I made a query for a word that have synonyms and with highlighting.For
 example I have made a query for 86 , I have a eightysix synonym for this
 and with highlighting I got the previous exception.The relevant conf
 is:*Field Type:*
 *Synonyms.txt*Brand 86, 86, eightysix, eight six, eighty six,
 eighty-six*Default Highlighting Component*100
 700.5[-\w ,/\n\quot;apos;]{20,200}
 10  .,!? #9;#10;#13;WORD  en
 US  Also I saw that we I removed some words from the synonyms list, it
 works right.Anyone has any idea about what is wrong ?Best Regards.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?

2013-04-15 Thread Dmitry Kan
Hi,

They are available in the HighlighterComponent. You will need to read the
source code.

Dmitry


On Wed, Mar 27, 2013 at 4:28 PM, Skealler Nametic bchaillou...@gmail.comwrote:

 Hi,

 I would like to retrieve the position and offset of each highlighting
 found.
 I searched on the internet, but I have not found the exact solution to my
 problem...



Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
I did a search. I have no occurrence of UnInverted in the solr logs.

 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

This is exactly how we facet right now! I will definetely rewrite the
relevant parts of our product to test this out before moving further down
the docValues path.

I will let you know as soon as I know one way or the other.


On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

  The FieldCache is the big culprit. We do a huge amount of faceting so
  it seems right.

 Yes, you wrote that earlier. The mystery is that the math does not check
 out with the description you have given us.

  Unfortunately I am super swamped at work so I have precious little
  time to work on this, which is what explains my silence.

 No problem, we've all been there.
 
 [Band aid: More memory]

  The extra memory helped a lot, but it still OOM with about 180 clients
  using it.

 You stated earlier that you has a solr cluster and your total(?) index
 size was 35GB, with each register being between 15k and 30k. I am
 using the quotes to signify that it is unclear what you mean. Is your
 cluster multiple machines (I'm guessing no), multiple Solr's, cores,
 shards or maybe just a single instance prepared for later distribution?
 Is a register a core, shard or a simply logical part (one client's data)
 of the index?

 If each client has their own core or shard, that would mean that each
 client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
 ~= 200MB of index. That sounds quite high and you would need a very
 heavy facet to reach that.

 If you could grep UnInverted from the Solr log file and paste the
 entries here, that would help to clarify things.


 Another explanation for the large amount of memory presents itself if
 you use a single index: If each of your clients facet on at least one
 fields specific to the client (client123_persons or something like
 that), then your memory usage goes through the roof.

 Assuming an index with 10M documents, each with 5 references to a modest
 10K unique values in a facet field, the simplified formula
   #documents*log2(#references) + #references*log2(#unique_values) bit
 tells us that this takes at least 110MB with field cache based faceting.

 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
 least double that. This fits neatly with your new heap of 64GB.


 If my guessing is correct, you can solve your memory problems very
 easily by sharing _all_ the facet fields between your clients.
 This should bring your memory usage down to a few GB.

 You are probably already restricting their searches to their own data by
 filtering, so this should not influence the returned facet values and
 counts, as compared to separate fields.

 This is very similar to the thread Facets with 5000 facet fields BTW.

  Today I finally managed to set up a test core so I can begin to play
  around with docValues.

 If you are using a single index with the individual-facet-fields for
 each client approach, the DocValues will also have scaling issues, as
 the amount of values (of which the majority will be null) will be
   #clients*#documents*#facet_fields
 This means that the adding a new client will be progressively more
 expensive.

 On the other hand, if you use a lot of small shards, DocValues should
 work for you.

 Regards,
 Toke Eskildsen





-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does number of leaders at a SolrCloud is equal to number of shards?


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Upayavira
Might be obvious, but just in case - remember that you'll need to
re-index your content once you've added docValues to your schema, in
order to get the on-disk files to be created.

Upayavira

On Mon, Mar 25, 2013, at 03:16 PM, John Nielsen wrote:
 I apologize for the slow reply. Today has been killer. I will reply to
 everyone as soon as I get the time.
 
 I am having difficulties understanding how docValues work.
 
 Should I only add docValues to the fields that I actually use for sorting
 and faceting or on all fields?
 
 Will the docValues magic apply to the fields i activate docValues on or
 on
 the entire document when sorting/faceting on a field that has docValues
 activated?
 
 I'm not even sure which question to ask. I am struggling to understand
 this
 on a conceptual level.
 
 
 On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote:
 
  On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote:
 
   Schema with DocValues attempt at solving problem:
   http://pastebin.com/Ne23NnW4
   Config: http://pastebin.com/x1qykyXW
  
 
  This schema isn't using docvalues, due to a typo in your config.
  it should not be DocValues=true but docValues=true.
 
  Are you not getting an error? Solr needs to throw exception if you
  provide invalid attributes to the field. Nothing is more frustrating
  than having a typo or something in your configuration and solr just
  ignores this, reports no error, and doesnt work the way you want.
  I'll look into this (I already intend to add these checks to analysis
  factories for the same reason).
 
  Separately, if you really want the terms data and so on to remain on
  disk, it is not enough to just enable docvalues for the field. The
  default implementation uses the heap. So if you want that, you need to
  set docValuesFormat=Disk on the fieldtype. This will keep the
  majority of the data on disk, and only some key datastructures in heap
  memory. This might have significant performance impact depending upon
  what you are doing so you need to test that.
 
 
 
 
 -- 
 Med venlig hilsen / Best regards
 
 *John Nielsen*
 Programmer
 
 
 
 *MCB A/S*
 Enghaven 15
 DK-7500 Holstebro
 
 Kundeservice: +45 9610 2824
 p...@mcb.dk
 www.mcb.dk


Re: SolrCloud Leaders

2013-04-15 Thread Upayavira
It is supposed to be one leader per shard, yes.

Upayavira

On Mon, Apr 15, 2013, at 01:21 PM, Furkan KAMACI wrote:
 Does number of leaders at a SolrCloud is equal to number of shards?


Re: SolrCloud Leaders

2013-04-15 Thread Jack Krupansky
When the cluster is fully operational, yes. But if part of the cluster is 
down or split and unable to communicate, or leader election is in progress, 
the actual count of leaders will not be indicative of the number of shards.


Leaders and shards are apples and oranges. If you take down a cluster, by 
definition it would have no leaders (because leaders are running code), but 
shards are the files in the index on disk that continue to exist even if the 
code is not running. So, in the extreme, the number of leaders can be zero 
while the number of shards is non-zero on disk.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders

Does number of leaders at a SolrCloud is equal to number of shards? 



Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does leaders may response search requests (I mean do they store indexes) at
when I run SolrCloud at first and after a time later?

2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?



Number of unique terms in a field

2013-04-15 Thread Andreas Hubold

Hi,

in previous versions of Solr (at least with 1.4.1) the admin page 
displayed the number of unique terms in the index / in a field.

I cannot find this on the new admin page anymore (Solr 4.0.0).

Can somebody please give me a pointer or is this info not available anymore?

Thank you,
Andreas


Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Here writes something:
https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com

 Does leaders may response search requests (I mean do they store indexes)
 at when I run SolrCloud at first and after a time later?


 2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?





Re: Number of unique terms in a field

2013-04-15 Thread Stefan Matheis
Andreas

It's still there :)

Open the UI, select a core, go to the Schema Browser, select the field from 
the drop down and click on the Load Term Info Button (right side, below 
properties  analyzer).

Then there's a [10] / 20315 Top-Terms row - right hand of the button you've 
actually clicked on :)

HTH?
Stefan


On Monday, April 15, 2013 at 3:33 PM, Andreas Hubold wrote:

 Hi,
 
 in previous versions of Solr (at least with 1.4.1) the admin page 
 displayed the number of unique terms in the index / in a field.
 I cannot find this on the new admin page anymore (Solr 4.0.0).
 
 Can somebody please give me a pointer or is this info not available anymore?
 
 Thank you,
 Andreas
 
 




RE: Getting page number of result with tika

2013-04-15 Thread Gian Maria Ricci
Thanks a lot, I'm curious if anyone has this kind of need and tried that old
patch to Solr 4+ and got it working.

Gian Maria.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, April 13, 2013 3:40 PM
To: solr-user@lucene.apache.org; Gian Maria Ricci
Subject: Re: Getting page number of result with tika

You can't assume that Fix Version/s 4.3 means anybody is actively working on
it, and the age of the patches suggests nobody is. The Fix Version/s gets
updated when releases are made, otherwise you'd have open JIRAs for, say,
Solr 1.4.1.

Near as I can tell, that JIRA is dead, don't look for it unless someone
picks it up again.

Best
Erick

On Thu, Apr 11, 2013 at 11:55 AM, Gian Maria Ricci alkamp...@nablasoft.com
wrote:
 As far as I know SOLR-380 
 https://issues.apache.org/jira/browse/SOLR-380
 deal with the problem of kowing page number with tika indexing. The 
 issue contains a patch but it is really old, and I'm curious how is 
 the status of this issue (since I see Fix Version/s 4.3, so it seems 
 that it will be implemented in the next version).



 Anyone has a good workaround/patch/solution to search into tika 
 indexed documents and having the list of pages where match was found?



 Thanks in advance.



 Gian Maria.






Re: Number of unique terms in a field

2013-04-15 Thread Andreas Hubold

Hi Stefan,

with Solr 4.0.0 I just get 10 / -1.

I just tried it with Solr 4.2.1 and the example application and it seems 
to work there. Maybe this has been fixed/improved since 4.0.0.


Thanks,
Andreas

Stefan Matheis wrote on 15.04.2013 15:49:

Andreas

It's still there :)

Open the UI, select a core, go to the Schema Browser, select the field from the drop down 
and click on the Load Term Info Button (right side, below properties  analyzer).

Then there's a [10] / 20315 Top-Terms row - right hand of the button you've 
actually clicked on :)

HTH?
Stefan


On Monday, April 15, 2013 at 3:33 PM, Andreas Hubold wrote:


Hi,

in previous versions of Solr (at least with 1.4.1) the admin page
displayed the number of unique terms in the index / in a field.
I cannot find this on the new admin page anymore (Solr 4.0.0).

Can somebody please give me a pointer or is this info not available anymore?

Thank you,
Andreas










Usage of CloudSolrServer?

2013-04-15 Thread Furkan KAMACI
I am reading Lucidworks Solr Guide it says at SolrCloud section:

*Read Side Fault Tolerance*
With earlier versions of Solr, you had to set up your own load balancer.
Now each individual node
load balances requests across the replicas in a cluster. You still need a
load balancer on the
'outside' that talks to the cluster, or you need a smart client. (Solr
provides a smart Java Solrj
client called CloudSolrServer.)

My system is as follows: I crawl data with Nutch and send them into
SolrCloud. Users will search at Solr.

What is that CloudSolrServer, should I use it for load balancing or is it
something else different?


Re: Tokenize on paragraphs and sentences

2013-04-15 Thread Jack Krupansky
Technically, yes, but you would have to do a lot of work yourself. Like, a 
sentence/paragraph recognizer that inserted sentence and paragraph markers, 
and a query parser that allows you to do SpanNear and SpanNot (to 
selectively exclude sentence or paragraph marks based on your granularity of 
search.)


The LucidWorks Search query parser has SpanNot support (or at least did at 
one point in time), but no sentence/paragraph marking.


You could come up with some heuristic regular expressions for sentence and 
paragraph marks, like consecutive newlines for a paragraph and dot followed 
by white space for sentence (with some more heuristics for abbreviations.)


Or you could have an update processor do the marking.

-- Jack Krupansky

-Original Message- 
From: Alex Cougarman

Sent: Monday, April 15, 2013 9:48 AM
To: solr-user@lucene.apache.org
Subject: Tokenize on paragraphs and sentences

Hi. Is it possible to search within paragraphs or sentences in Solr? The 
PatternTokenizerFactory uses regular expressions, but how can this be done 
with plain ASCII docs that don't have p tags (HTML), yet they're broken 
into paragraphs? Thanks.


Warm regards,
Alex




Re: SolrCloud Leaders

2013-04-15 Thread Jack Krupansky
All nodes are replicas in SolrCloud since there are no masters. It's a fully 
distributed model. A leader is also a replica. A leader is simply a replica 
which was elected to be a leader, for now. An hour from now some other 
replica may be the leader.


It is indeed misleading and inaccurate to suggest that leader and 
replicas are disjoint.


Once again, I think you are confusing SolrCloud with the older Solr 
master/slave/replication.


Every node in SolrCloud can do indexing. That's the same as saying that 
every replica in SolrCloud can do indexing.


Although we do need to be clear that a given replica will only index 
documents for the shard(s) to which it belongs.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 15, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Here writes something:
https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com


Does leaders may response search requests (I mean do they store indexes)
at when I run SolrCloud at first and after a time later?


2013/4/15 Jack Krupansky j...@basetechnology.com


When the cluster is fully operational, yes. But if part of the cluster is
down or split and unable to communicate, or leader election is in 
progress,
the actual count of leaders will not be indicative of the number of 
shards.


Leaders and shards are apples and oranges. If you take down a cluster, by
definition it would have no leaders (because leaders are running code), 
but

shards are the files in the index on disk that continue to exist even if
the code is not running. So, in the extreme, the number of leaders can be
zero while the number of shards is non-zero on disk.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders


Does number of leaders at a SolrCloud is equal to number of shards?








Dynamic data model design questions

2013-04-15 Thread Marko Asplund
I'm implementing a backend service that stores data in JSON format and I'd
like to provide a search operation in the service.
The data model is dynamic and will contain arbitrarily complex object
graphs.

How do I index object graphs with Solr?
Does the data need to be flattened before indexing?

Apparently the service needs to deliver new data and updates to Solr,
but which one should be responsible for converting the data model to adhere
to Solr schema? The service or Solr?
Should the service deliver data to Solr in a form that adheres to Solr
schema or should Solr be extended to digest data provided by the service?

How does Solr handle dynamic data models?
Solr seems to support dynamic data models with the dynamic fields feature
in schemas.
How are data types inferred when using dynamic fields?

An alternative to using dynamic fields seems to be to change the schema
when the data model changes.
How easy is it to modify an existing schema?
Do I need to reindex all the data?
Can you do it online using an API?

I'm planning on using Solr 4.2.


marko


solr tdate field

2013-04-15 Thread hassancrowdc
Hi,

I have date field being indexed into solr. in my schema i have the following
code for it, 

field name=createdDate type=date indexed=true stored=true
required=true /

but in java, i get the following error when i search using solr:

 java.lang.ClassCastException: java.lang.String cannot be cast to
java.util.Date

Why is solr returning me String back where i have type=date in schema.xml?

Thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr tdate field

2013-04-15 Thread Jack Krupansky
Check your date field type to make sure it really is solr.DateField or 
solr.TrieDateField


Then check whether you have a function query with an ms function that 
references a non-TrieDateField.


-- Jack Krupansky

-Original Message- 
From: hassancrowdc

Sent: Monday, April 15, 2013 10:54 AM
To: solr-user@lucene.apache.org
Subject: solr tdate field

Hi,

I have date field being indexed into solr. in my schema i have the following
code for it,

field name=createdDate type=date indexed=true stored=true
required=true /

but in java, i get the following error when i search using solr:

java.lang.ClassCastException: java.lang.String cannot be cast to
java.util.Date

Why is solr returning me String back where i have type=date in schema.xml?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069.html
Sent from the Solr - User mailing list archive at Nabble.com. 



SolrException parsing error

2013-04-15 Thread Luis Lebolo
Hi All,

I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException
parsing error with root cause java.io.EOFException (see below for stack
trace). The query I'm performing is long/complex and I wonder if its size
is causing the issue?

I am querying via POST through SolrJ. The query (fq) itself is ~20,000
characters long in the form of:

fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR +
mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ...

In short, I am querying for an ID throughout multiple dynamically created
fields (mutation_prot_mt_#_#).

Any thoughts on how to further debug?

Thanks in advance,
Luis

--

SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw
exception [Request processing failed; nested exception is
org.apache.solr.common.SolrException: parsing error] with root cause
java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107)
 at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387)
 at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at x.x.x.x.x.x.someMethod(x.java:111)
 at x.x.x.x.x.x.otherMethod(x.java:222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
at
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
 at
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
at
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
 at
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
at
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
 at
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
at
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
 at
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
 at
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 at x.x.x.x.x.yetAnotherMethod(x.java:333)
at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
 at
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
at
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
 at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
 at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
 at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at
org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146)
 at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at
org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
 at
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at

3 general questions about SolrCloud

2013-04-15 Thread SuoNayi
Dear list,
Sorry for these general questions and I'm really be mess now.
1. What's the model between the master and replicas in one shard?
If the replicas are able to catch up with the master when the master 
receives a update request it will scatter the request to all the active 
replicas and expect responses before the request get executed 
by itself.This is called push model,right?
When a new replica is present it will download the whole index from 
the master can this be called pull model?but when the master pushes 
updates to it how the replica behaves,continuing to download the whole
index while keeping a track of the updates in a log(tlog) ?


2.What's the main use of the transaction log?
Is it only used to serve the NRT get requests and not related with data sync 
between the master and replica? 


3.Will the leader election be related with index version?
If a shard has 3 replicas and when the master goes down how to choose the 
master,
do they compare the lastest index version for each one or only consider the 
earliest presence time?




RegardsThanks





Query Parser OR AND and NOT

2013-04-15 Thread Peter Sch�tt
Hallo,
I do not really understand the query language of the SOLR-Queryparser.

I use SOLR 4.2 und I have nearly 20 sample address records in the 
SOLR-Database.

I only use the q field in the SOLR Admin Web GUI and every other 
controls  on this website is on default.


First category: 

zip:30* numFound=2896 

city:H* OR zip:30*  numFound=12519

city:H* AND zip:30* numFound=376

These results seems to me correct.

Now I tried with negations:

!city:H*numFound:194577(seems to be correct)

!city:H* AND zip:30*numFound:2520(seems to be correct)


!city:H* OR zip:30* numFound:2520(!! this is wrong !!)

Or do I do not understand something?


(!city:H*) OR zip:30*numFound: 2896  

This is also wrong.

Thanks for any hint to understand the negation handling of the query 
language.

Ciao
  Peter Schütt






Re: Query Parser OR AND and NOT

2013-04-15 Thread Roman Chyla
should be: -city:H* OR zip:30*




On Mon, Apr 15, 2013 at 12:03 PM, Peter Schütt newsgro...@pstt.de wrote:

 Hallo,
 I do not really understand the query language of the SOLR-Queryparser.

 I use SOLR 4.2 und I have nearly 20 sample address records in the
 SOLR-Database.

 I only use the q field in the SOLR Admin Web GUI and every other
 controls  on this website is on default.


 First category:

 zip:30* numFound=2896

 city:H* OR zip:30*  numFound=12519

 city:H* AND zip:30* numFound=376

 These results seems to me correct.

 Now I tried with negations:

 !city:H*numFound:194577(seems to be correct)

 !city:H* AND zip:30*numFound:2520(seems to be correct)


 !city:H* OR zip:30* numFound:2520(!! this is wrong !!)

 Or do I do not understand something?


 (!city:H*) OR zip:30*numFound: 2896

 This is also wrong.

 Thanks for any hint to understand the negation handling of the query
 language.

 Ciao
   Peter Schütt







Re: solr tdate field

2013-04-15 Thread hassancrowdc
fieldType name=date class=solr.TrieDateField precisionStep=0
positionIncrementGap=0/

this is the date field in my schema.xml

and i do not get the second point; how reference a non-TrieDateField.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr tdate field

2013-04-15 Thread Jack Krupansky
Show us the full query URL (at least all the parameters) and the defaults 
from the request handler in solrconfig.


-- Jack Krupansky

-Original Message- 
From: hassancrowdc

Sent: Monday, April 15, 2013 12:17 PM
To: solr-user@lucene.apache.org
Subject: Re: solr tdate field

fieldType name=date class=solr.TrieDateField precisionStep=0
positionIncrementGap=0/

this is the date field in my schema.xml

and i do not get the second point; how reference a non-TrieDateField.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056088.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Query Parser OR AND and NOT

2013-04-15 Thread Peter Sch�tt
Hallo,


Roman Chyla roman.ch...@gmail.com wrote in
news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com: 

 should be: -city:H* OR zip:30*
 
-city:H* OR zip:30*   numFound:2520 

gives the same wrong result.


Another Idea?

Ciao
  Peter Schütt




Re: solr tdate field

2013-04-15 Thread hassancrowdc
query is as following:
localhost:8080/solr/collection1/select?wt=jsonomitHeader=truedefType=dismaxrows=11qf=manufacturer%20model%20displayNamefl=idq=samsung

and 

requesthandler:

requestHandler name=standard class=solr.StandardRequestHandler
default=true /
  requestHandler name=/update class=solr.XmlUpdateRequestHandler /
  requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers /



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Parser OR AND and NOT

2013-04-15 Thread Chris Hostetter
: Hallo,
: I do not really understand the query language of the SOLR-Queryparser.

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

The one comment i would add regarding your specific examples...

: (!city:H*) OR zip:30*numFound: 2896  

...you can't have a boolean query -- the parens -- containing purely 
negative clauses like.  that boolean query doesn't match anything, just 
just explcudes things.  If the *entire* query is negative, then solr helps 
you out by implicitly making the negation relative to a query that matches 
all documents, but if you are creating boolean sub-queries with parens, 
then you need something positive in that sub-query to match some 
criteria X and then your negations provide exclusions from that criteria.


-Hoss


Storing Solr Index on NFS

2013-04-15 Thread Ali, Saqib
Greetings,

Are there any issues with storing Solr Indexes on a NFS share? Also any
recommendations for using NFS for Solr indexes?

Thanks,
Saqib


Re: Query Parser OR AND and NOT

2013-04-15 Thread Luis Lebolo
What if you try

city:(*:* -H*) OR zip:30*

Sometimes Solr requires a list of documents to subtract from (think of *:*
-someQuery converts to all documents without someQuery).

You can also try looking at your query with debugQuery = true.

-Luis


On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote:

 Hallo,


 Roman Chyla roman.ch...@gmail.com wrote in
 news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com:

  should be: -city:H* OR zip:30*
 
 -city:H* OR zip:30*   numFound:2520

 gives the same wrong result.


 Another Idea?

 Ciao
   Peter Schütt





Re: updateLog in Solr 4.2

2013-04-15 Thread Shawn Heisey

On 4/12/2013 7:17 AM, vicky desai wrote:

and solr fails to start . However if i add updatelog in my solrconfig.xml it
starts. Is the update log parameter mandatory for solr4.2


You are using SolrCloud.  SolrCloud requires both updateLog and 
replication to be enabled.  As you probably know, updateLog requires the 
presence of a _version_ field, see the example schema for the full 
definition of that field.


If you are using Solr without SolrCloud, these features are not 
required.  The updateLog is recommended for all 4.x installs because 
NRTCachingDirectoryFactory is now the default.  The way I understand it, 
with that Directory implementation, part of an update may not be 
persisted to disk even with a hard commit, so the updateLog is needed to 
ensure the data won't be lost.


Thanks,
Shawn



Re: Storing Solr Index on NFS

2013-04-15 Thread Walter Underwood
On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:

 Greetings,
 
 Are there any issues with storing Solr Indexes on a NFS share? Also any
 recommendations for using NFS for Solr indexes?

I recommend that you do not put Solr indexes on NFS.

It can be very slow, I measured indexing as 100X slower on NFS a few years ago.

It is not safe to share Solr index files between two Solr servers, so there is 
no benefit to NFS.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Storing Solr Index on NFS

2013-04-15 Thread Ali, Saqib
Hello Walter,

Thanks for the response. That has been my experience in the past as well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood wun...@wunderwood.orgwrote:

 On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:

  Greetings,
 
  Are there any issues with storing Solr Indexes on a NFS share? Also any
  recommendations for using NFS for Solr indexes?

 I recommend that you do not put Solr indexes on NFS.

 It can be very slow, I measured indexing as 100X slower on NFS a few years
 ago.

 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org






Re: Query Parser OR AND and NOT

2013-04-15 Thread Roman Chyla
Oh, sorry, I have assumed lucene query parser. I think SOLR qp must be
different then, because for me it works as expected (our qp parser is
identical with lucene in the way it treats modifiers +/- and operators
AND/OR/NOT -- NOT must be joining two clauses: a NOT b, the first cannot be
negative, as Chris points out; the modifier however can be first - but it
cannot be alone, there must be at least one positive clause). Otherwise,
-field:x it is changed into field:x

http://labs.adsabs.harvard.edu/adsabs/search/?q=%28*+-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE
http://labs.adsabs.harvard.edu/adsabs/search/?q=%28-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE

roman


On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote:

 Hallo,


 Roman Chyla roman.ch...@gmail.com wrote in
 news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com:

  should be: -city:H* OR zip:30*
 
 -city:H* OR zip:30*   numFound:2520

 gives the same wrong result.


 Another Idea?

 Ciao
   Peter Schütt





Re: SolrCloud vs Solr master-slave replication

2013-04-15 Thread Shawn Heisey

On 4/15/2013 3:38 AM, Victor Ruiz wrote:

About SolrCloud, I know it doesn't use master-slave replication, but
incremental updates, item by item. That's why I thought it could work for
us, since our bottleneck appear to be the replication cycles. But another
point is, if the indexing occurs in all servers, 1200 updates/min could also
overload the servers? and therefore have a worst performance than with
master-slave replication?


One version (4.1, I think) has a problem that results in the entire 
index being replicated every time.  The I/O required for that makes 
everything slow down on both master and slave.


There are reports of new master/slave replication problems with 4.2 and 
4.2.1, but I'm not entirely clear on whether those are just cosmetic 
problems with index version reporting or whether some people are having 
actual real problems.


In 3.x and older, replication was generally the best option for multiple 
copies of your index, because there was no NRT indexing capability. 
Updating the index was a resource-intensive process with a high impact 
on searching, loading a replicated index was better.


Version 4.x adds NRT capabilities, so indexing impacts searches far less 
than it used to.  SolrCloud with NRT features (frequent soft commits, 
less frequent hard commits) is the recommended configuration path now.


Thanks,
Shawn



Re: Storing Solr Index on NFS

2013-04-15 Thread Walter Underwood
Solr 4.2 does have field compression which makes smaller indexes. That will 
reduce the amount of network traffic. That probably does not help much, because 
I think the latency of NFS is what causes problems.

wunder

On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:

 Hello Walter,
 
 Thanks for the response. That has been my experience in the past as well.
 But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
 the storing of indexes on a NFS mount feasible.
 
 Thanks,
 Saqib
 
 
 On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:
 
 Greetings,
 
 Are there any issues with storing Solr Indexes on a NFS share? Also any
 recommendations for using NFS for Solr indexes?
 
 I recommend that you do not put Solr indexes on NFS.
 
 It can be very slow, I measured indexing as 100X slower on NFS a few years
 ago.
 
 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.
 
 wunder
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Grouping performance problem

2013-04-15 Thread davidduffett
Agnieszka,

Did you find a good solution to your performance problem with grouping?  I
have an index with 45m records and am using grouping and the performance is
atrocious.

Any advice would be very welcome!

Thanks in advance,
David



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4056113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Usage of CloudSolrServer?

2013-04-15 Thread Shawn Heisey

On 4/15/2013 8:05 AM, Furkan KAMACI wrote:

My system is as follows: I crawl data with Nutch and send them into
SolrCloud. Users will search at Solr.

What is that CloudSolrServer, should I use it for load balancing or is it
something else different?


It appears that the Solr integration in Nutch currently does not use 
CloudSolrServer.  There is an issue to add it.  The mutual dependency on 
HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses 
HttpClient 4.


https://issues.apache.org/jira/browse/NUTCH-1377

Until that is fixed, a load balancer would be required for full 
redundancy for updates with SolrCloud.  You don't have to use a load 
balancer for it to work, but if the Solr server that Nutch is using goes 
down, then indexing will stop unless you reconfigure Nutch or bring the 
Solr server back up.


Thanks,
Shawn



Re: SolrException parsing error [Solved]

2013-04-15 Thread Luis Lebolo
Sorry, spoke to soon. Turns out I was not sending the query via POST.
Changing the method to POST solved the issue. Apologies for the spam!

-Luis


On Mon, Apr 15, 2013 at 11:47 AM, Luis Lebolo luis.leb...@gmail.com wrote:

 Hi All,

 I'm using Solr 4.1 and am receiving an
 org.apache.solr.common.SolrException parsing error with root cause
 java.io.EOFException (see below for stack trace). The query I'm performing
 is long/complex and I wonder if its size is causing the issue?

 I am querying via POST through SolrJ. The query (fq) itself is ~20,000
 characters long in the form of:

 fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
 mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR +
 mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ...

 In short, I am querying for an ID throughout multiple dynamically created
 fields (mutation_prot_mt_#_#).

 Any thoughts on how to further debug?

 Thanks in advance,
 Luis

 --

 SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw
 exception [Request processing failed; nested exception is
 org.apache.solr.common.SolrException: parsing error] with root cause
 java.io.EOFException
 at
 org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193)
 at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107)
  at
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387)
  at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
  at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at x.x.x.x.x.x.someMethod(x.java:111)
  at x.x.x.x.x.x.otherMethod(x.java:222)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
  at java.lang.reflect.Method.invoke(Unknown Source)
 at
 org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
  at
 org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
 at
 org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
  at
 org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
 at
 org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
  at
 org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
 at
 org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
  at
 org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
 at
 org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
  at
 org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
  at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
  at x.x.x.x.x.yetAnotherMethod(x.java:333)
 at
 org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
  at
 org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
 at
 org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
  at
 org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
 at
 org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
  at
 org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
 at
 org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
  at
 org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
 at
 org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146)
  at
 

Re: Dynamic data model design questions

2013-04-15 Thread Shawn Heisey

On 4/15/2013 8:40 AM, Marko Asplund wrote:

I'm implementing a backend service that stores data in JSON format and I'd
like to provide a search operation in the service.
The data model is dynamic and will contain arbitrarily complex object
graphs.

How do I index object graphs with Solr?
Does the data need to be flattened before indexing?


Solr does have some *very* limited capability for doing joins between 
indexes, but generally speaking, you need to flatten the data.



Apparently the service needs to deliver new data and updates to Solr,
but which one should be responsible for converting the data model to adhere
to Solr schema? The service or Solr?
Should the service deliver data to Solr in a form that adheres to Solr
schema or should Solr be extended to digest data provided by the service?


Solr's ability to change your data after receiving it is fairly limited. 
 The schema has some ability in this regard for indexed values, but the 
stored data is 100% verbatim as Solr receives it.  If you will be using 
the dataimport handler, it does have some transform capability before 
sending to Solr.  Most of the time, the rule of thumb is that changing 
the data on the Solr side will require contrib/custom plugins, so it may 
be easier to do it before Solr receives it.



How does Solr handle dynamic data models?
Solr seems to support dynamic data models with the dynamic fields feature
in schemas.
How are data types inferred when using dynamic fields?


A wildcard field name is used, like i_* or *_int and that definition 
includes the data type.



An alternative to using dynamic fields seems to be to change the schema
when the data model changes.
How easy is it to modify an existing schema?
Do I need to reindex all the data?
Can you do it online using an API?


Changing the schema is as simple as modifying schema.xml and reloading 
the core or restarting Solr.  An API for online schema changes is 
coming, I don't know if it will be ready in time for 4.3 or if it will 
get pushed back to 4.4.  No matter how you make the change, the 
following applies:


If you add fields, reindexing is not necessary, but existing documents 
will not have the new fields until you do.  If you change the query 
analyzer chain, no reindex is required.  If you change the index 
analyzer chain or options that affect indexing, reindexing IS required.


Thanks,
Shawn



Re: SolrException parsing error

2013-04-15 Thread Shawn Heisey

On 4/15/2013 9:47 AM, Luis Lebolo wrote:

Hi All,

I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException
parsing error with root cause java.io.EOFException (see below for stack
trace). The query I'm performing is long/complex and I wonder if its size
is causing the issue?

I am querying via POST through SolrJ. The query (fq) itself is ~20,000
characters long in the form of:

fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR +
mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ...

In short, I am querying for an ID throughout multiple dynamically created
fields (mutation_prot_mt_#_#).

Any thoughts on how to further debug?

Thanks in advance,
Luis

--

SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw
exception [Request processing failed; nested exception is
org.apache.solr.common.SolrException: parsing error] with root cause
java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107)
  at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387)
  at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
  at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)


I am guessing that this log is coming from your SolrJ client, but That 
is not completely clear, so is it SolrJ or Solr that is logging this 
error?  If it's SolrJ, do you see anything in the Solr log, and vice versa?


This looks to me like a network problem, where something is dropping the 
connection before transfer is complete.  It could be an unusual 
server-side config, OS problems, timeout settings in the SolrJ code, NIC 
drivers/firmware, bad cables, bad network hardware, etc.


Thanks,
Shawn



Re: 3 general questions about SolrCloud

2013-04-15 Thread Shawn Heisey

On 4/15/2013 9:58 AM, SuoNayi wrote:

1. What's the model between the master and replicas in one shard?
If the replicas are able to catch up with the master when the master
receives a update request it will scatter the request to all the active
replicas and expect responses before the request get executed
by itself.This is called push model,right?
When a new replica is present it will download the whole index from
the master can this be called pull model?but when the master pushes
updates to it how the replica behaves,continuing to download the whole
index while keeping a track of the updates in a log(tlog) ?


There is no master.  SolrCloud is fully distributed.  One replica on 
each shard is elected leader, but that is not a permanent designation.



2.What's the main use of the transaction log?
Is it only used to serve the NRT get requests and not related with data sync
between the master and replica?


The transaction log is used to replay transactions when a node starts 
up.  If the differences between the leader replica and a replica that 
just started are small enough, the transaction log will be used to bring 
them back into sync.  If they are too different, the one that just 
started will replicate the full index from the leader.  I am pretty sure 
that the _version_ field present on every document is used to determine 
whether replicas are in sync, not the index version.



3.Will the leader election be related with index version?
If a shard has 3 replicas and when the master goes down how to choose the 
master,
do they compare the lastest index version for each one or only consider the 
earliest presence time?


Here is my understanding about leader elections, I hope it's right! 
Leader elections only take place when the leader goes down.  Once a 
replica is elected leader, it will remain leader unless it goes down. 
The other replicas can go up and down and the leader will retain that 
role.  I do not think the index version is used at all, the _version_ 
field in the index is probably used instead.


Thanks,
Shawn



Re: Spellchecker not working for Solr 4.1

2013-04-15 Thread davers
I am using spellcheck=true when i post the search. ex.
solr/productindex/productQuery?q=fuacetspellcheck=true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Trigger documents update in a collection

2013-04-15 Thread Francois Perron
Hi all,

I want to use Solr4 as a NoSQL.  

My 'ideal' workflow is to add/update documents in a collection (NoSQL) and 
automatically update changes in another collection with more specific search 
capabilities. The nosql collection will contains all my documents (750M docs).  
The 'searchable' collection will only contain a subset of this collection 
(active documents based on a field).

Is it possible ?

Thank you

Document adds, deletes, and commits ... a question about visibility.

2013-04-15 Thread Shawn Heisey
Simple question first: Is there anything in SolrJ that prevents indexing 
more than 500 documents in one request? I'm not aware of anything 
myself, but a co-worker remembers running into something, so his code is 
restricting them to 490 docs.  The only related limit I'm aware of is 
the POST buffer size limit, which defaults in recent Solr versions to 2MiB.


A more complex question: If I am doing both deletes and adds in separate 
update requests, and I want to ensure that a delete in the next request 
can delete a document that I am adding in the current one, do I need to 
commit between the two requests?  This is probably more of a Lucene 
question than Solr, but Solr is what I'm using.


To simplify:  Let's say I start with an empty index.  I add documents 
a and b in one request ... then I send a deleteByQuery request for 
a c and e.  If I don't do a commit between these two requests, 
will a still be in the index when I commit after the second request? 
If so, would there be an easy fix?


Thanks,
Shawn


Re: Document adds, deletes, and commits ... a question about visibility.

2013-04-15 Thread Michael McCandless
At the Lucene level, you don't have to commit before doing the
deleteByQuery, i.e. 'a' will be correctly deleted without any
intervening commit.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Apr 15, 2013 at 3:57 PM, Shawn Heisey s...@elyograg.org wrote:
 Simple question first: Is there anything in SolrJ that prevents indexing
 more than 500 documents in one request? I'm not aware of anything myself,
 but a co-worker remembers running into something, so his code is restricting
 them to 490 docs.  The only related limit I'm aware of is the POST buffer
 size limit, which defaults in recent Solr versions to 2MiB.

 A more complex question: If I am doing both deletes and adds in separate
 update requests, and I want to ensure that a delete in the next request can
 delete a document that I am adding in the current one, do I need to commit
 between the two requests?  This is probably more of a Lucene question than
 Solr, but Solr is what I'm using.

 To simplify:  Let's say I start with an empty index.  I add documents a
 and b in one request ... then I send a deleteByQuery request for a c
 and e.  If I don't do a commit between these two requests, will a still
 be in the index when I commit after the second request? If so, would there
 be an easy fix?

 Thanks,
 Shawn


Re: Trigger documents update in a collection

2013-04-15 Thread Otis Gospodnetic
Hi,

Doable with a custom Update Request Processor, yes.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Apr 15, 2013 3:14 PM, Francois Perron 
francois.per...@wantedanalytics.com wrote:

 Hi all,

 I want to use Solr4 as a NoSQL.

 My 'ideal' workflow is to add/update documents in a collection (NoSQL) and
 automatically update changes in another collection with more specific
 search capabilities. The nosql collection will contains all my documents
 (750M docs).  The 'searchable' collection will only contain a subset of
 this collection (active documents based on a field).

 Is it possible ?

 Thank you


Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-15 Thread juancesarvillalba

Hi,

Before I had a different configuration that was working but with Synonyms in
Query time.

Now I have a requirement to add multi-word synonyms is for that I am
checking this configuration.

It doesn't work with this configuration still without multi-words synonyms.
The problem happens only with Highlighting ON.

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056186.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing Solr Index on NFS

2013-04-15 Thread Tim Vaillancourt
If centralization of storage is your goal by choosing NFS, iSCSI works 
reasonably well with SOLR indexes, although good local-storage will 
always be the overall winner.


I noticed a near 5% degredation in overall search performance (casual 
testing, nothing scientific) when moving a 40-50GB indexes to iSCSI 
(10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup.


Tim

On 15/04/13 09:59 AM, Walter Underwood wrote:

Solr 4.2 does have field compression which makes smaller indexes. That will 
reduce the amount of network traffic. That probably does not help much, because 
I think the latency of NFS is what causes problems.

wunder

On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:


Hello Walter,

Thanks for the response. That has been my experience in the past as well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwun...@wunderwood.orgwrote:


On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:


Greetings,

Are there any issues with storing Solr Indexes on a NFS share? Also any
recommendations for using NFS for Solr indexes?

I recommend that you do not put Solr indexes on NFS.

It can be very slow, I measured indexing as 100X slower on NFS a few years
ago.

It is not safe to share Solr index files between two Solr servers, so
there is no benefit to NFS.

wunder
--
Walter Underwood
wun...@wunderwood.org





--
Walter Underwood
wun...@wunderwood.org






Re: using maven to deploy solr on tomcat

2013-04-15 Thread Shawn Heisey
On 4/15/2013 2:33 PM, Adeel Qureshi wrote:
 Environment name=solr/home override=true type=java.lang.String
 value=src/main/resources/solr-dev/
 
 but this leads to absolute path of
 
 INFO: Using JNDI solr.home: src/main/resources/solr-dev
 INFO: looking for solr.xml:
 C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml

If you use a relative path for the solr home as you have done, it will
be relative to the current working directory.  The CWD can vary
depending on how tomcat gets started.  In your case, the CWD seems to be
C:\springsource\sts-2.8.1.RELEASE.  If you change the CWD in the
tomcat startup, you will probably have to set the TOMCAT_HOME
environment variable for tomcat to start correctly, so I don't recommend
doing that.

It is usually best to choose an absolute path for the solr home.  Solr
will find solr.xml there, which it will use to find the rest of your
config(s).  All paths in solr.xml and other solr config files can be
relative.

What you are seeing as an absolute path is likely the current working
directory plus your solr home setting.

Thanks,
Shawn



Re:Re: 3 general questions about SolrCloud

2013-04-15 Thread SuoNayi
Thanks for clarification and I think I did make it clear.



At 2013-04-16 01:59:59,Shawn Heisey s...@elyograg.org wrote:
On 4/15/2013 9:58 AM, SuoNayi wrote:
 1. What's the model between the master and replicas in one shard?
 If the replicas are able to catch up with the master when the master
 receives a update request it will scatter the request to all the active
 replicas and expect responses before the request get executed
 by itself.This is called push model,right?
 When a new replica is present it will download the whole index from
 the master can this be called pull model?but when the master pushes
 updates to it how the replica behaves,continuing to download the whole
 index while keeping a track of the updates in a log(tlog) ?

There is no master.  SolrCloud is fully distributed.  One replica on 
each shard is elected leader, but that is not a permanent designation.

 2.What's the main use of the transaction log?
 Is it only used to serve the NRT get requests and not related with data sync
 between the master and replica?

The transaction log is used to replay transactions when a node starts 
up.  If the differences between the leader replica and a replica that 
just started are small enough, the transaction log will be used to bring 
them back into sync.  If they are too different, the one that just 
started will replicate the full index from the leader.  I am pretty sure 
that the _version_ field present on every document is used to determine 
whether replicas are in sync, not the index version.

 3.Will the leader election be related with index version?
 If a shard has 3 replicas and when the master goes down how to choose the 
 master,
 do they compare the lastest index version for each one or only consider the 
 earliest presence time?

Here is my understanding about leader elections, I hope it's right! 
Leader elections only take place when the leader goes down.  Once a 
replica is elected leader, it will remain leader unless it goes down. 
The other replicas can go up and down and the leader will retain that 
role.  I do not think the index version is used at all, the _version_ 
field in the index is probably used instead.

Thanks,
Shawn



Push/pull model between leader and replica in one shard

2013-04-15 Thread SuoNayi
Hi, can someone explain more details about what model is used to sync docs 
between the lead and 
replica in the shard?
The model can be push or pull.Supposing I have only one shard that has 1 leader 
and 2 replicas,
when the leader receives a update request, does it will scatter the request to 
each available and active 
replica at first and then processes the request locally at last?In this case if 
the replicas are able to catch
up with the leader can I think this is a push model that the leader pushes 
updates to it's replicas?


What happens if a replica is behind the leader?Will the replica pull docs from 
the leader and keep 
a track of the coming updates from the lead in a log(called tlog)?If so when it 
complete pulling docs
it will replay updates in the tlog at last?




regards









Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-15 Thread Dmitry Kan
Do you use the standard highlighter or FastVectorHighlighter /
PhraseHighlighter ?
Do you use 
hl.highlightMultiTermhttp://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
 option?


On Tue, Apr 16, 2013 at 2:51 AM, juancesarvillalba 
juancesarvilla...@gmail.com wrote:


 Hi,

 Before I had a different configuration that was working but with Synonyms
 in
 Query time.

 Now I have a requirement to add multi-word synonyms is for that I am
 checking this configuration.

 It doesn't work with this configuration still without multi-words synonyms.
 The problem happens only with Highlighting ON.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056186.html
 Sent from the Solr - User mailing list archive at Nabble.com.