building custom cache - using lucene docids

2013-11-22 Thread Roman Chyla
Hi,
docids are 'ephemeral', but i'd still like to build a search cache with
them (they allow for the fastest joins).

i'm seeing docids keep changing with updates (especially, in the last index
segment) - as per
https://issues.apache.org/jira/browse/LUCENE-2897

That would be fine, because i could build the cache from diff (of index
state) + reading the latest index segment in its entirety. But can I assume
that docids in other segments (other than the last one) will be relatively
stable? (ie. when an old doc is deleted, the docid is marked as removed;
update doc = delete old & create a new docid)?

thanks

roman


Re: Solrcloud: external fields and frequent commits

2013-11-22 Thread Erick Erickson
about <1>. Well, at a high level you're right, of course.
Having the EFF stuff in a single place seems more elegant. But
then ugly details crop up. I.e. "one place" implies that you'd have
to fetch them over the network, potentially a very expensive
operation every time there was a commit. Is this really a good
tradeoff? With high network latency, this could be a performance
killer. But I suspect that the real reason is that nobody has found
a compelling use-case for this kind of thing. Until and unless
someone does, and is willing to make a patch, it'll be theory :).

bq:  modifications also sent to replicas
with this kind of commits

brief review:

Update process:
1> Update goes to a node.
2> node forwards to all leaders
3> leader forward to replicas
4> replicas respond to their leader.
5> leader responds to originating node.
6> originating node responds to caller.

At this point all the replicas for your entire cluster have the
update. This is entirely independent of commits. Whenever a
commit is issued the documents currently pending on a node
are committed and made visible to a searcher.

If one is relying on solrconfig settings, then the commit happens
a little bit out of synch. Let's say that the commit (hard with
opensearcher=true or soft) is set to 60 seconds. Each node may
have a different commit time, depending upon when it was started.
So there may be a slight difference in when documents are visible.
You'll probably never notice.

If you issue commits from a client, then the commit is propagated
to all nodes in the cluster.

HTH,
Erick


On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier wrote:

> On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson  >wrote:
>
> > 1> I'm not quite sure I understand. External File Fields are keyed
> > by the unique id of the doc. So every shard _must_ have the
> > eff available for at least the documents in that shard. At first glance
> > this doesn't look simple. Perhaps a bit more explanation of what
> > you're using EFF for?
> >
> Thanks Erick for the reply, I use EFF for boosting results by popularity.
> So I was right, I should put popularity in every shard data dir..right? But
> why not keeping that file in just one place (obviously the file should be
> reachable by all solrcloud nodes...) and allow external fields to be
> outside data dir?
>
> >
> > 2> Let's be sure we're talking about the same thing here. In Solr,
> > a "commit" is the command that makes documents visible, often
> > controlled by the autoCommit and autoSoftCommit settings in
> > solrconfig.xml. You will not be able to issue 100 commits/second.
> >
> > If you're using "commit" to mean adding a document to the index,
> > then 100/s should be no problem. I regularly see many times that
> > ingestion rate. The documents won't be visible to search until
> > you do a commit however.
> >
> Yeah, now it is more clear. Still a question: for my client is not a
> problem to soft commit but, are the modifications also sent to replicas
> with this kind of commits?
>
> >
> > Best
> > Erick
> >
> >
> > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier <
> pomperma...@okkam.it
> > >wrote:
> >
> > > Hi to all,
> > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have
> two
> > > big doubts:
> > >
> > > 1) External fields. When I compute such a file do I have to copy it in
> > the
> > >  data directory of shards..? The external fields boosts the results of
> > the
> > > query to a specific collection, for me it doesn't make sense to put it
> in
> > > all shard's data dir, it should be something related to the collection
> > > itself.
> > > Am I wrong or missing something? Is there a simple way to upload the
> > > popularity file (for the external field) at one in all shards?
> > >
> > > 2) My index requires frequently commits (i.e. sometimes up to 100/s).
> How
> > > do I have to manage this? Do I have to use soft commits..? Any simple
> > > configuration/code snippet to use them? Is it true that external fields
> > > affect performance on commit?
> > >
> > > Best,
> > > Flavio
> > >
> >
>


Re: Solrcloud: external fields and frequent commits

2013-11-22 Thread Flavio Pompermaier
On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson wrote:

> 1> I'm not quite sure I understand. External File Fields are keyed
> by the unique id of the doc. So every shard _must_ have the
> eff available for at least the documents in that shard. At first glance
> this doesn't look simple. Perhaps a bit more explanation of what
> you're using EFF for?
>
Thanks Erick for the reply, I use EFF for boosting results by popularity.
So I was right, I should put popularity in every shard data dir..right? But
why not keeping that file in just one place (obviously the file should be
reachable by all solrcloud nodes...) and allow external fields to be
outside data dir?

>
> 2> Let's be sure we're talking about the same thing here. In Solr,
> a "commit" is the command that makes documents visible, often
> controlled by the autoCommit and autoSoftCommit settings in
> solrconfig.xml. You will not be able to issue 100 commits/second.
>
> If you're using "commit" to mean adding a document to the index,
> then 100/s should be no problem. I regularly see many times that
> ingestion rate. The documents won't be visible to search until
> you do a commit however.
>
Yeah, now it is more clear. Still a question: for my client is not a
problem to soft commit but, are the modifications also sent to replicas
with this kind of commits?

>
> Best
> Erick
>
>
> On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier  >wrote:
>
> > Hi to all,
> > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have two
> > big doubts:
> >
> > 1) External fields. When I compute such a file do I have to copy it in
> the
> >  data directory of shards..? The external fields boosts the results of
> the
> > query to a specific collection, for me it doesn't make sense to put it in
> > all shard's data dir, it should be something related to the collection
> > itself.
> > Am I wrong or missing something? Is there a simple way to upload the
> > popularity file (for the external field) at one in all shards?
> >
> > 2) My index requires frequently commits (i.e. sometimes up to 100/s). How
> > do I have to manage this? Do I have to use soft commits..? Any simple
> > configuration/code snippet to use them? Is it true that external fields
> > affect performance on commit?
> >
> > Best,
> > Flavio
> >
>


Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 2:17 PM, Dave Seltzer wrote:

So I made a few changes, but I still seem to be dealing with this pesky
periodic slowness.

Changes:
1) I'm now only forcing commits every 5 minutes. This was done by
specifying commitWithin=30 when doing document adds.
2) I'm specifying an -Xmx12g to force the java heap to take more memory
3) I'm using the GC configuration parameters from the wiki (
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning)





I'm still seeing the same periodic slowness about every 3.5 minutes. This
slowness occurs whether or not I'm indexing content, so it appears to be
unrelated to my commit schedule.


It sounds like your heap isn't too small.  Try reducing it to 5GB, then 
to 4GB after some testing, so more memory gets used by the OS disk 
cache.  I would also recommend trying perhaps 100 threads on your test 
app rather than 200.  Work your way up until you find the point where it 
just can't handle the load.



See the most recent graph here:
http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png

To keep things consistent I'm still testing with 200 threads. When I test
with 10 threads everything is much faster, but I still get the same
periodic slowness.

One thing I've noticed is that while Java is aware of the 12 gig heap, Solr
doesn't seem to be using much of it. The system panel of the Web UI shows
11.5GB of JVM-Memory available, but only 2.11GB in use.


The memory usage in the admin UI is an instantaneous snapshot.  If you 
use jvisualvm or jconsole (included in the Java JDK) to get a graph of 
memory usage, you'll see it change over time.  As Java allocates 
objects, memory usage increases until it's using all the heap.  Some 
amount of that allocation will be objects that are no longer in use -- 
garbage.  Then garbage collection will kick in and memory usage will 
drop down to however much is actually in use in the particular memory 
pool that's being collected.  This is what people often refer to as the 
sawtooth pattern.


Here's a couple of screenshots.  The jconsole program is running on 
Windows 7, Solr is running on Linux.  One screenshot is the graph, the 
other is the VM summary where you can see that Solr has been running for 
nearly 8 days.  This is one of my production Solr servers, so some of 
the parameters are slightly different than what's on my wiki:


https://dl.dropboxusercontent.com/u/97770508/solr-jconsole.png
https://dl.dropboxusercontent.com/u/97770508/solr-jconsole-summary.png

If you do not have a GUI installed on the actual Solr machine, you'll 
need to use remote JMX to connect jconsole.  In the init script on my 
wiki page, you can see JMX options.  With those, you can tell a remote 
jconsole to use server.example.com:8686 instead of a local PID.  You can 
use any port you want that's not already in use instead of 8686.  
Running jconsole with -interval=1 will make the graph update once a 
second, I think it's every 5 seconds by default.


You can also hit reload on the dashboard page to see how memory usage 
changes over time, but it's not as useful as a graph.  Memory usage will 
not change by much if you are not actively querying or indexing.


Thanks,
Shawn



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
So I made a few changes, but I still seem to be dealing with this pesky
periodic slowness.

Changes:
1) I'm now only forcing commits every 5 minutes. This was done by
specifying commitWithin=30 when doing document adds.
2) I'm specifying an -Xmx12g to force the java heap to take more memory
3) I'm using the GC configuration parameters from the wiki (
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning)

The new startup args are:
-DzkRun
-Xmx12g
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:+ParallelRefProcEnabled
-XX:+CMSParallelRemarkEnabled
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:CMSTriggerPermRatio=80
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSFullGCsBeforeCompaction=1
-XX:PretenureSizeThreshold=64m
-XX:+CMSScavengeBeforeRemark
-XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=8
-XX:TargetSurvivorRatio=90
-XX:SurvivorRatio=4
-XX:NewRatio=3

I'm still seeing the same periodic slowness about every 3.5 minutes. This
slowness occurs whether or not I'm indexing content, so it appears to be
unrelated to my commit schedule.

See the most recent graph here:
http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png

To keep things consistent I'm still testing with 200 threads. When I test
with 10 threads everything is much faster, but I still get the same
periodic slowness.

One thing I've noticed is that while Java is aware of the 12 gig heap, Solr
doesn't seem to be using much of it. The system panel of the Web UI shows
11.5GB of JVM-Memory available, but only 2.11GB in use.

Screenshot: http://farm4.staticflickr.com/3822/10999509515_72a9013ec7_o.jpg

So I've told Java to use more memory. Do I need to tell Solr to use more as
well?

Thanks everyone!

-Dave



On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey  wrote:

> On 11/22/2013 10:01 AM, Shawn Heisey wrote:
>
>> You can see how much the max heap is in the Solr admin UI dashboard -
>> it'll be the right-most number on the JVM-Memory graph.  On my 64-bit linux
>> development machine with 16GB of RAM, it looks like Java defaults to a 4GB
>> max heap.  I have the heap size manually set to 7GB for Solr on that
>> machine.  The 6GB heap you have mentioned might not be enough, or it might
>> be more than you need.  It all depends on the kind of queries you are doing
>> and exactly how Solr is configured.
>>
>
> Followup: I would also recommend starting with my garbage collection
> settings.  This wiki page is linked on the wiki page I've already given you.
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> You might need a script to start Solr.  There is also a redhat-specific
> init script on that wiki page.  I haven't included any instructions for
> installing it.  Someone who already knows about init scripts won't have
> much trouble getting it working on a redhat-derived OS, and someone who
> doesn't will need extensive instructions or an install script, neither of
> which has been written.
>
> Thanks,
> Shawn
>
>


RE: Reverse mm(min-should-match)

2013-11-22 Thread Doug Turnbull
If I could get at the number of tokens in a query or query norms I
might be able to use that in conjunction with field norms to measure
how close the query is to the field in terms of number of tokens. Then
regular mm could do the trick.

Sent from my Windows Phone From: Doug Turnbull
Sent: 11/22/2013 4:05 PM
To: Erik Hatcher; solr-user@lucene.apache.org
Subject: RE: Reverse mm(min-should-match)
Hmm... Not necessarily. I'd be happy with any ordering for now. Though
some notion of order and slop would be nice in the future

Sent from my Windows Phone From: Erik Hatcher
Sent: 11/22/2013 3:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Reverse mm(min-should-match)
Does order matter?By "exact" you mean the same tokens in the same positions?

Erik

On Nov 22, 2013, at 2:54 PM, Doug Turnbull
 wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
>
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
>
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
>
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
>
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
>
> mm=100%
> q=solr
>
> This will match the title above, as 100% of [solr] matches the field
>
> What I really want to get at is a reverse mm:
>
> Rmm=100%
> q=solr
>
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
>
> However an exact search would match:
>
> Rmm=100%
> q=solr the worlds greatest search engine
>
> Here 100% of the query matches the title, so I'm good.
>
> Is there any way to achieve this in Solr?
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 


RE: Reverse mm(min-should-match)

2013-11-22 Thread Doug Turnbull
Hmm... Not necessarily. I'd be happy with any ordering for now. Though
some notion of order and slop would be nice in the future

Sent from my Windows Phone From: Erik Hatcher
Sent: 11/22/2013 3:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Reverse mm(min-should-match)
Does order matter?By "exact" you mean the same tokens in the same positions?

Erik

On Nov 22, 2013, at 2:54 PM, Doug Turnbull
 wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
>
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
>
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
>
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
>
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
>
> mm=100%
> q=solr
>
> This will match the title above, as 100% of [solr] matches the field
>
> What I really want to get at is a reverse mm:
>
> Rmm=100%
> q=solr
>
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
>
> However an exact search would match:
>
> Rmm=100%
> q=solr the worlds greatest search engine
>
> Here 100% of the query matches the title, so I'm good.
>
> Is there any way to achieve this in Solr?
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 


Re: Reverse mm(min-should-match)

2013-11-22 Thread Erik Hatcher
Does order matter?By "exact" you mean the same tokens in the same positions?

Erik

On Nov 22, 2013, at 2:54 PM, Doug Turnbull 
 wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 



Re: Reverse mm(min-should-match)

2013-11-22 Thread Bill Bell
This is an awesome idea!

Sent from my iPad

> On Nov 22, 2013, at 12:54 PM, Doug Turnbull 
>  wrote:
> 
> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 


Re: csv does not return custom fields (distance)

2013-11-22 Thread Gopal Patwa
if you are using Solr 4.0 there was some issue related to field alias which
was fixed in Solr 4.3

https://issues.apache.org/jira/browse/SOLR-4671

you should try to reproduce this issue using latest Solr version 4.5.1



On Fri, Nov 22, 2013 at 11:28 AM, GaneshSe  wrote:

> Any help on this is greatly appreciated.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/csv-does-not-return-custom-fields-distance-tp4102313p4102656.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Reverse mm(min-should-match)

2013-11-22 Thread Doug Turnbull
Instead of specifying a percentage or number of query terms must match
tokens in a field, I'd like to do the opposite -- specify how much of a
field must match a query.

The problem I'm trying to solve is to boost document titles that closely
match the query string. If a title looks something like

*Title: *[solr] [the] [worlds] [greatest] [search] [engine]

I want to be able to specify how much of the field must match the query
string. This differs from normal mm. Normal mm specifies a how much of the
query must match a field.

As an example, with this title, if I use normal mm=100% and perform the
following query:

mm=100%
q=solr

This will match the title above, as 100% of [solr] matches the field

What I really want to get at is a reverse mm:

Rmm=100%
q=solr

The title above will not match in this case. Only 1/6 of the tokens in the
field match the query.

However an exact search would match:

Rmm=100%
q=solr the worlds greatest search engine

Here 100% of the query matches the title, so I'm good.

Is there any way to achieve this in Solr?

-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections 


Re: removing dead replicas in solrcloud 4.4

2013-11-22 Thread Timothy Potter
Yes, I've done this ... but I had to build my own utility to update
clusterstate.json (for reasons I can't recall now). So make your
changes to clusterstate.json manually and then do something like the
following with SolrJ:

public static void updateClusterstateJsonInZk(CloudSolrServer
cloudSolrServer, CommandLine cli) throws Exception {
String updateClusterstateJson =
cli.getOptionValue("updateClusterstateJson");

ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader();
SolrZkClient zkClient = zkStateReader.getZkClient();

File jsonFile = new File(updateClusterstateJson);
if (!jsonFile.isFile()) {
System.err.println(jsonFile.getAbsolutePath()+" not found.");
return;
}

byte[] clusterstateJson = readFile(jsonFile);

// validate the user is passing is valid JSON
InputStreamReader bytesReader = new InputStreamReader(new
ByteArrayInputStream(clusterstateJson), "UTF-8");
JSONParser parser = new JSONParser(bytesReader);
parser.toString();

zkClient.setData("/clusterstate.json", clusterstateJson, true);
System.out.println("Updated /clusterstate.json with data from
"+jsonFile.getAbsolutePath());
}

On Fri, Nov 22, 2013 at 12:33 PM, Eric Parish
 wrote:
> My 4.4 sorlcloud cluster has several down replicas that need to be removed. I 
> am looking for a solution to clean them up like the deletereplica api 
> available in 4.6.
>
> Will manually removing the replicas from the clusterstate.json file in 
> zookeeper accomplish my needs?
>
> Thanks,
> Eric


Re: Document Security Model Question

2013-11-22 Thread kchellappa
Thanks Rajinimaski for the reposnse.

Agree that if the changes are frequent, then first option wouldn't work
efficiently.  Also the other challenge is that in our case for each
resource, it is easy/efficient to get a list of changes since last
checkpoint (because of our model of deployment of customer databases) rather
than getting a snapshot of allowed/disallowed across all customers for each
resource.


In your PostFilter implementation, do you cache the acls in memory, then
they get updated periodically externally to solr and the post filter just
uses the cache or something along these lines?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078p4102664.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NullPointerException

2013-11-22 Thread Bill Bell
It seems to be a modified row and referenced in EvaluatorBag.

I am not familiar with either.

Sent from my iPad

> On Nov 22, 2013, at 3:05 AM, Adrien RUFFIE  wrote:
> 
> Hello all,
> 
> I have perform a full indexation with solr, but when I try to perform an 
> incrementation indexation I get the following exception (cf attachment).
> 
> Any one have a idea of the problem ?
> 
> Greate thank
> 


Re: Boosting documents by categorical preferences

2013-11-22 Thread Chris Hostetter

: I thought about that but my concern/question was how. If I used the pow
: function then I'm still boosting the bad categories by a small
: amount..alternatively I could multiply by a negative number but does that
: work as expected?

I'm not sure i understand your concern: negative powers would give you 
values less then 1, positive powers would give you values greater then 1, 
and then you'd use those values as multiplicitive boosts -- so the values 
less then 1 would penalize the scores of existing matching docs in the 
categories the user dislikes.

Oh wait ... i see, in your original email (and in my subsequent suggested 
tweak to use pow()) you were talking about sum()ing up these 3 category 
boosts (and i cut/pasted sum() in my example as well) ... yeah, 
using multiplcation there would make more sense if you wanted to do the 
"negative prefrences" as well, because then then score of any matching doc 
will be reduced if it matches on an "undesired" category -- and the 
amount it will be reduced will be determined by how strongly it 
matches on that category (ie: the base score returned by the nested 
query() func) and "how negative" the undesired prefrence value (ie: 
the pow() exponent) is


qq=...
q={!boost b=$b v=$qq}
b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
cat1=...action...
cat1z=1.48
cat2=...comedy...
cat2z=1.33
cat3=...kids...
cat3z=-1.7


-Hoss


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Bill Bell
Wouldn't that be true means use cold searcher? It seems backwards to me...

Sent from my iPad

> On Nov 22, 2013, at 2:44 AM, ade-b  wrote:
> 
> Hi
> 
> The definition of useColdSearcher config element in solrconfig.xml is
> 
> "If a search request comes in and there is no current registered searcher,
> then immediately register the still warming searcher and use it.  If "false"
> then all requests will block until the first searcher is done warming".
> 
> By the term 'block', I assume SOLR returns a non 200 response to requests.
> Does anybody know the exact response code returned when the server is
> blocking requests?
> 
> If a new SOLR server is introduced into an existing array of SOLR servers
> (in SOLR Cloud setup), it will sync it's index from the leader. To save you
> having to specify warm-up queries in the solrconfig.xml file for first
> searchers, would/could the new server not auto warm it's caches from the
> caches of an existing server?
> 
> Thanks
> Ade 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
> Sent from the Solr - User mailing list archive at Nabble.com.


removing dead replicas in solrcloud 4.4

2013-11-22 Thread Eric Parish
My 4.4 sorlcloud cluster has several down replicas that need to be removed. I 
am looking for a solution to clean them up like the deletereplica api available 
in 4.6.

Will manually removing the replicas from the clusterstate.json file in 
zookeeper accomplish my needs?

Thanks,
Eric


Re: can't overwrite and can't delete by id

2013-11-22 Thread Mingfeng Yang
BTW:  it's a 4 shards solorcloud cluster using zookeeper 3.3.5


On Fri, Nov 22, 2013 at 11:07 AM, Mingfeng Yang wrote:

> Recently, I found out that  I can't delete doc by id or overwrite a doc
>  from/in my SOLR index which is based on SOLR 4.4.0 version.
>
> Say, I have a doc  http://pastebin.com/GqPP4Uw4  (to make it easier to
> view, I use pastebin here).  And I tried to add a dynamic field "rank_ti"
> to it, want to make it like http://pastebin.com/dGnRRwux
>
> Funny thing is that after I inserted the new version of doc, if I do query
> "curl 'localhost:8995/solr/select?wt=json&indent=true&q=id:28583776' " ,
>  the two versions above will appear randomly. And after half a minute,
>  version 2 will disappear, which means the update is not get write into the
> disk.
>
> I tried to delete by id with rsolr, and the doc just can't be removed.
>
> Insert new doc into the index is fine though.
>
> Anyone ran into this strange behavior before?
>
> Thanks
> Ming
>


Re: How to work with remote solr savely?

2013-11-22 Thread Bill Bell
Do you have a sample jetty XML to setup basic auth for updates in Solr?

Sent from my iPad

> On Nov 22, 2013, at 7:34 AM, "michael.boom"  wrote:
> 
> Use HTTP basic authentication, setup in your servlet container
> (jetty/tomcat).
> 
> That should work fine if you are *not* using SolrCloud.
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: csv does not return custom fields (distance)

2013-11-22 Thread GaneshSe
Any help on this is greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/csv-does-not-return-custom-fields-distance-tp4102313p4102656.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Split shard and stream sub-shards to remote nodes?

2013-11-22 Thread Otis Gospodnetic
Ouch :(
I guess it's as efficient as it can be but too bad, because writing to
a remove node sounds awesomely cool to me at least. :)

Thanks for explaining the key bits, Shalin.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Nov 22, 2013 at 7:54 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> The splitting process is nothing but the creation of a bitset with
> which a LiveDocsReader is created. These readers are then added to the
> a new index via IW.addIndexes(IndexReader[] readers) method. All this
> is performed below the IR/IW API and no documents are actually ever
> read or written directly by Solr. This is why it isn't feasible to
> stream docs to a remote node.
>
> On Fri, Nov 22, 2013 at 5:59 AM, Otis Gospodnetic
>  wrote:
> > Hi,
> >
> > On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> At the Lucene level, I think it would require a directory
> >> implementation which writes to a remote node directly. Otherwise, on
> >> the solr side, we must move the leader itself to another node which
> >> has enough disk space and then split the index.
> >>
> >
> > Hm what about taking the source shard, splitting it, and sending docs
> > that come out of each sub-shards to a remote node at Solr level, as if
> > these documents are just being added (i.e. nothing at Lucene level)?
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> >>
> >> On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
> >>  wrote:
> >> > Do you think this is something that is actually implementable?  If so,
> >> > I'll open an issue.
> >> >
> >> > One use-case where this may come in handy is when the disk space is
> >> > tight.  If a shard is using > 50% of the disk space on some node X,
> >> > you can't really split that shard because the 2 new sub-shards will
> >> > not fit on the local disk.  Or is there some trick one could use in
> >> > this situation?
> >> >
> >> > Thanks,
> >> > Otis
> >> > --
> >> > Performance Monitoring * Log Analytics * Search Analytics
> >> > Solr & Elasticsearch Support * http://sematext.com/
> >> >
> >> >
> >> > On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
> >> >  wrote:
> >> >> No, it is not supported yet. We can't split to a remote node
> directly.
> >> >> The best bet is trigger a new leader election by unloading the leader
> >> >> node once all replicas are active.
> >> >>
> >> >> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
> >> >>  wrote:
> >> >>> Hi,
> >> >>>
> >> >>> Is it possible to perform a shard split and stream data for the
> >> >>> new/sub-shards to remote nodes, avoiding persistence of
> new/sub-shards
> >> >>> on the local/source node first?
> >> >>>
> >> >>> Thanks,
> >> >>> Otis
> >> >>> --
> >> >>> Performance Monitoring * Log Analytics * Search Analytics
> >> >>> Solr & Elasticsearch Support * http://sematext.com/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Regards,
> >> >> Shalin Shekhar Mangar.
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


can't overwrite and can't delete by id

2013-11-22 Thread Mingfeng Yang
Recently, I found out that  I can't delete doc by id or overwrite a doc
 from/in my SOLR index which is based on SOLR 4.4.0 version.

Say, I have a doc  http://pastebin.com/GqPP4Uw4  (to make it easier to
view, I use pastebin here).  And I tried to add a dynamic field "rank_ti"
to it, want to make it like http://pastebin.com/dGnRRwux

Funny thing is that after I inserted the new version of doc, if I do query
"curl 'localhost:8995/solr/select?wt=json&indent=true&q=id:28583776' " ,
 the two versions above will appear randomly. And after half a minute,
 version 2 will disappear, which means the update is not get write into the
disk.

I tried to delete by id with rsolr, and the doc just can't be removed.

Insert new doc into the index is fine though.

Anyone ran into this strange behavior before?

Thanks
Ming


Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Wow. That is one noisy command!

Full output is below. The grepped output looks like:

[solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version | grep -i -E
'heapsize|permsize|version'
uintx AdaptivePermSizeWeight= 20
 {product}
uintx ErgoHeapSizeLimit = 0
{product}
uintx HeapSizePerGCThread   = 87241520
 {product}
uintx InitialHeapSize  := 447247104
{product}
uintx LargePageHeapSizeThreshold= 134217728
{product}
uintx MaxHeapSize  := 7157579776
 {product}
uintx MaxPermSize   = 85983232{pd
product}
uintx PermSize  = 21757952{pd
product}
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

It looks like Java is correctly determining that this is in-fact a
"server." It seems to start with an Xmx of 25% of the RAM or around 7GB.

So, in addition to tweaking GC I'm going to increase Xmx. Any advise as to
how much memory should go to the Heap and how much should go to the OS disk
cache? Should I split it 50/50?

Again. Many Thanks.

-Dave


 Full output from printflags
--
[solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version
[Global flags]
uintx AdaptivePermSizeWeight= 20
 {product}
uintx AdaptiveSizeDecrementScaleFactor  = 4
{product}
uintx AdaptiveSizeMajorGCDecayTimeScale = 10
 {product}
uintx AdaptiveSizePausePolicy   = 0
{product}
uintx AdaptiveSizePolicyCollectionCostMargin= 50
 {product}
uintx AdaptiveSizePolicyInitializingSteps   = 20
 {product}
uintx AdaptiveSizePolicyOutputInterval  = 0
{product}
uintx AdaptiveSizePolicyWeight  = 10
 {product}
uintx AdaptiveSizeThroughPutPolicy  = 0
{product}
uintx AdaptiveTimeWeight= 25
 {product}
 bool AdjustConcurrency = false
{product}
 bool AggressiveOpts= false
{product}
 intx AliasLevel= 3   {C2
product}
 bool AlignVector   = false   {C2
product}
 intx AllocateInstancePrefetchLines = 1
{product}
 intx AllocatePrefetchDistance  = 192
{product}
 intx AllocatePrefetchInstr = 0
{product}
 intx AllocatePrefetchLines = 4
{product}
 intx AllocatePrefetchStepSize  = 64
 {product}
 intx AllocatePrefetchStyle = 1
{product}
 bool AllowJNIEnvProxy  = false
{product}
 bool AllowNonVirtualCalls  = false
{product}
 bool AllowParallelDefineClass  = false
{product}
 bool AllowUserSignalHandlers   = false
{product}
 bool AlwaysActAsServerClassMachine = false
{product}
 bool AlwaysCompileLoopMethods  = false
{product}
 bool AlwaysLockClassLoader = false
{product}
 bool AlwaysPreTouch= false
{product}
 bool AlwaysRestoreFPU  = false
{product}
 bool AlwaysTenure  = false
{product}
 bool AssertOnSuspendWaitFailure= false
{product}
 intx Atomics   = 0
{product}
 intx AutoBoxCacheMax   = 128 {C2
product}
uintx AutoGCSelectPauseMillis   = 5000
 {product}
 intx BCEATraceLevel= 0
{product}
 intx BackEdgeThreshold = 10  {pd
product}
 bool BackgroundCompilation = true{pd
product}
uintx BaseFootPrintEstimate = 268435456
{product}
 intx BiasedLockingBulkRebiasThreshold  = 20
 {product}
 intx BiasedLockingBulkRevokeThreshold  = 40
 {product}
 intx BiasedLockingDecayTime= 25000
{product}
 intx BiasedLockingStartupDelay = 4000
 {product}
 bool BindGCTaskThreadsToCPUs   = false
{product}
 bool BlockLayoutByFrequency= true{C2
product}
 intx BlockLayoutMinDiamondPercentage   = 20  {C2
product}
 bool BlockLayoutRotateLoops= true{C2
product}
 bool BranchOnRegister  = false   {C2
product}
 bool BytecodeVerificationLocal = false
{product}
 bool BytecodeVerificationRemote= true
 {product}
 bool C1OptimizeVirtualCallProfiling   

Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Raymond Wiker
You mentioned earlier that you are not setting -Xms/-Xmx; the values actually 
in use would then depend on the Java version, whether you're running 32- or 
64-bit Java, whether Java thinks your machines are "servers", and whether you 
have specified the "-server" flag – and possibly a few other things.

What do you get if you run the command below?

java -XX:+PrintFlagsFinal -version

(Ref: 
http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5
 for details; I "stole" the incantation above from that location, but there are 
more complete examples of how it could be used there.)

Note: you need to adjust the command line so that it uses the same java version 
as the one you're using, and also add whatever JRE-modifying parameters that 
you use when starting Solr.

On 22 Nov 2013, at 18:12 , Dave Seltzer  wrote:

> Thanks so much Shawn,
> 
> I think you (and others) are completely right about this being heap and GC
> related. I just did a test while not indexing data and the same periodic
> slowness was observable.
> 
> On to GC/Memory Tuning!



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Thanks so much Shawn,

I think you (and others) are completely right about this being heap and GC
related. I just did a test while not indexing data and the same periodic
slowness was observable.

On to GC/Memory Tuning!

Many Thanks!

-Dave


On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey  wrote:

> On 11/22/2013 10:01 AM, Shawn Heisey wrote:
>
>> You can see how much the max heap is in the Solr admin UI dashboard -
>> it'll be the right-most number on the JVM-Memory graph.  On my 64-bit linux
>> development machine with 16GB of RAM, it looks like Java defaults to a 4GB
>> max heap.  I have the heap size manually set to 7GB for Solr on that
>> machine.  The 6GB heap you have mentioned might not be enough, or it might
>> be more than you need.  It all depends on the kind of queries you are doing
>> and exactly how Solr is configured.
>>
>
> Followup: I would also recommend starting with my garbage collection
> settings.  This wiki page is linked on the wiki page I've already given you.
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> You might need a script to start Solr.  There is also a redhat-specific
> init script on that wiki page.  I haven't included any instructions for
> installing it.  Someone who already knows about init scripts won't have
> much trouble getting it working on a redhat-derived OS, and someone who
> doesn't will need extensive instructions or an install script, neither of
> which has been written.
>
> Thanks,
> Shawn
>


Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 10:01 AM, Shawn Heisey wrote:
You can see how much the max heap is in the Solr admin UI dashboard - 
it'll be the right-most number on the JVM-Memory graph.  On my 64-bit 
linux development machine with 16GB of RAM, it looks like Java 
defaults to a 4GB max heap.  I have the heap size manually set to 7GB 
for Solr on that machine.  The 6GB heap you have mentioned might not 
be enough, or it might be more than you need.  It all depends on the 
kind of queries you are doing and exactly how Solr is configured.


Followup: I would also recommend starting with my garbage collection 
settings.  This wiki page is linked on the wiki page I've already given you.


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

You might need a script to start Solr.  There is also a redhat-specific 
init script on that wiki page.  I haven't included any instructions for 
installing it.  Someone who already knows about init scripts won't have 
much trouble getting it working on a redhat-derived OS, and someone who 
doesn't will need extensive instructions or an install script, neither 
of which has been written.


Thanks,
Shawn



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 8:13 AM, Dave Seltzer wrote:

Regarding memory: Including duplicate data in shard replicas the entire
index is 350GB. Each server hosts a total of 44GB of data. Each server has
28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java
would take the memory it needs and leave the rest to the OS for cache.


That's not how Java works.  Java has a min heap and max heap setting.  
If you (or the auto-detected settings) tell it that the max heap is 4GB, 
it will only ever use slightly more than 4GB of RAM.  If the app needs 
more than that, this will lead to terrible performance and/or out of 
memory errors.


You can see how much the max heap is in the Solr admin UI dashboard - 
it'll be the right-most number on the JVM-Memory graph.  On my 64-bit 
linux development machine with 16GB of RAM, it looks like Java defaults 
to a 4GB max heap.  I have the heap size manually set to 7GB for Solr on 
that machine.  The 6GB heap you have mentioned might not be enough, or 
it might be more than you need.  It all depends on the kind of queries 
you are doing and exactly how Solr is configured.


If it were me, I'd want a memory size between 48 and 64GB for a total 
index size of 44GB.  Whether you really need that much is very dependent 
on your exact requirements, index makeup, and queries.  To support the 
high query load you're sending, it probably is a requirement.  More 
memory is likely to help performance, but I can't guarantee it without 
looking a lot deeper into your setup, and that's difficult to do via email.


One thing I can tell you about checking performance - see how much of 
your 70% CPU usage is going to I/O wait.  If it's more than a few 
percent, more memory might help.  First try increasing the max heap by 1 
or 2GB.



Given that I'll never need to serve 200 concurrent connections in
production, do you think my servers need more memory?
Should I be tinkering with -Xmx and -Xms?


If you'll never need to serve that many, test with a lower number.  Make 
it higher than you'll need, but not a lot higher. The test with 200 
connections isn't a bad idea -- you do want to stress test things way 
beyond your actual requirements, but you'll also want to see how it does 
with a more realistic load.


Those are the min/max heap settings I just mentioned.  IMHO you should 
set at least the max heap.  If you want to handle a high load, it's a 
good idea to set the min heap to the same value as the max heap, so that 
it doesn't need to worry about hitting limits in order to allocate 
additional memory.  It'll eventually allocate the max heap anyway.



Regarding commits: My end-users want new data to be made available quickly.
Thankfully I'm only inserting between 1 and 3 documents per second so the
change-rate isn't crazy.

Should I just slow down my commit frequency, and depend on soft-commits? If
I do this, will the commits take even longer?
Given 1000 documents, is it generally faster to do 10 commits of 100, or 1
commit of 1000?


Fewer commits is always better.  The amount of time they take isn't 
strongly affected by the number of new documents, unless there are a LOT 
of them.  Figure out the timeframe that's the maximum amount of time (in 
milliseconds) that you think people are willing to wait for new data to 
become visible.  Use that as your autoSoftCommit interval, or as the 
commitWithin parameter on your indexing requests.  Set your autoCommit 
interval to around five minutes, as described on the wiki page I 
linked.  If you are using auto settings and/or commitWithin, then you 
will never need to send an explicit commit command.  Reducing commit 
frequency is one of the first things you'll want to try.  Frequent 
commits use a *lot* of I/O and CPU resources.


Although there are exceptions, most installs rarely NEED commits to 
happen more often than about once a minute, and longer intervals are 
often perfectly acceptable.  Even in situations where a higher frequency 
is required, 10-15 seconds is often good enough.  Getting sub-second 
commit times is *possible*, but usually requires significant hardware 
investment or changing the config in a way that is detrimental to query 
performance.


Thanks,
Shawn



Re: Solr XSLT Problems

2013-11-22 Thread Furkan KAMACI
Ok, I investigated the reason. There was unstability at some folders of my
test system.


2013/11/22 Furkan KAMACI 

> I use Solr 4.5.1 and run xslt examples on it. "*Whenever*" I make a
> request to
>
> q=*:*&wt=xslt&tr=example.xsl
>
> Sometimes it says me there are 0 records sometimes 30 (actually there is
> 30).
>
> I tried some other  xsls and it is same. I t does not work every time.
>
> Are there any body who had same issue?
>
> Thanks;
> Furkan KAMACI
>


Solr XSLT Problems

2013-11-22 Thread Furkan KAMACI
I use Solr 4.5.1 and run xslt examples on it. "*Whenever*" I make a request
to

q=*:*&wt=xslt&tr=example.xsl

Sometimes it says me there are 0 records sometimes 30 (actually there is
30).

I tried some other  xsls and it is same. I t does not work every time.

Are there any body who had same issue?

Thanks;
Furkan KAMACI


Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Hi Shawn,

Wow! Thank you for your considered reply!

I'm going to dig into these issues, but I have a few questions:

Regarding memory: Including duplicate data in shard replicas the entire
index is 350GB. Each server hosts a total of 44GB of data. Each server has
28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java
would take the memory it needs and leave the rest to the OS for cache.

Given that I'll never need to serve 200 concurrent connections in
production, do you think my servers need more memory?
Should I be tinkering with -Xmx and -Xms?

Regarding commits: My end-users want new data to be made available quickly.
Thankfully I'm only inserting between 1 and 3 documents per second so the
change-rate isn't crazy.

Should I just slow down my commit frequency, and depend on soft-commits? If
I do this, will the commits take even longer?
Given 1000 documents, is it generally faster to do 10 commits of 100, or 1
commit of 1000?

Thanks so much!

-D



On Fri, Nov 22, 2013 at 2:27 AM, Shawn Heisey  wrote:

> On 11/21/2013 6:41 PM, Dave Seltzer wrote:
> > In digging a little deeper and looking at the config I see that
> > true is commented out.  I believe this is the default
> > setting. So I don't know if NRT is enabled or not. Maybe just a red
> herring.
>
> I had never seen this setting before.  The default is true.  SolrCloud
> requires that it be set to true.  Looks like it's a new parameter in
> 4.5, added by SOLR-4909.  From what I can tell reading the issue,
> turning it off effectively disables soft commits.
>
> https://issues.apache.org/jira/browse/SOLR-4909
>
> You've said that you are adding about 3 documents per second, but you
> haven't said anything about how often you are doing commits.  Erick's
> question basically boils down to this:  How quickly after indexing do
> you expect the changes to be visible on a search, and how often are you
> doing commits?
>
> Generally speaking (and ignoring the fact that nrtMode now exists), NRT
> is not something you enable, it's something you try to achieve, by using
> soft commits quickly and often, and by adjusting the configuration to
> make the commits go faster.
>
> If you are trying to keep the interval between indexing and document
> visibility down to less than a few seconds (especially if it's less than
> one second), then you are trying to achieve NRT.
>
> There's a lot of information on the following wiki page about
> performance problems.  This specific link is to the last part of that
> page, which deals with slow commits:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits
>
> > I don't know what Garbage Collector we're using. In this test I'm running
> > Solr 4.5.1 using Jetty from the example directory.
>
> If you aren't using any tuning parameters beyond setting the max heap,
> then you are using the default parallel collector.  It's a poor choice
> for Solr unless your heap is very small.  At 6GB, yours isn't very
> small.  It's not particularly huge either, but not small.
>
> > The CPU on the 8 nodes all stay around 70% use during the test. The nodes
> > have 28GB of RAM. Java is using about 6GB and the rest is being used by
> OS
> > cache.
>
> How big is your index?  If it's larger than about 30 GB, you probably
> need more memory.  If it's much larger than about 40 GB, you definitely
> need more memory.
>
> > To perform the test we're running 200 concurrent threads in JMeter. The
> > threads hit HAProxy which loadbalances the requests among the nodes. Each
> > query is for a random word out of a list of about 10,000 words. Some of
> the
> > queries have faceting turned on.
>
> That's a pretty high query load.  If you want to get anywhere near top
> performance out of it, you'll want to have enough memory to fit your
> entire index into RAM.  You'll also need to reduce the load introduced
> by indexing.  A large part of the load from indexing comes from commits.
>
> > Because we're heavily loading the system the queries are returning quite
> > slowly. For a simple search, the average response time was 300ms. The
> peak
> > response time was 11,000ms. The spikes in latency seem to occur about
> every
> > 2.5 minutes.
>
> I would bet that you're having one or both of the following issues:
>
> 1) Garbage collection issues from one or more of the following:
>  a) Heap too small.
>  b) Using the default GC instead of CMS with tuning.
> 2) General performance issues from one or more of the following:
>  a) Not enough cache memory for your index size.
>  b) Too-frequent commits.
>  c) Commits taking a lot of time and resources due to cache warming.
>
> With a high query and index load, any problems become magnified.
>
> > I haven't spent that much time messing with SolrConfig, so most of the
> > settings are the out-of-the-box defaults.
>
> The defaults are very good for small to medium indexes and low to medium
> query load.  If you have a big index and/or high query load, you'll
> generally need to tu

Re: How to work with remote solr savely?

2013-11-22 Thread Stavros Delisavas
Thanks for the suggestions. I will have a look at the suggestions and
try them out.



Am 22.11.2013 16:01, schrieb Hoggarth, Gil:
> You could also use one of the proxy scripts, such as
> http://code.google.com/p/solr-php-client/, which is coincidentally
> linked (eventually) from Michael's suggested SolrSecurity URL.
>
> -Original Message-
> From: michael.boom [mailto:my_sky...@yahoo.com] 
> Sent: 22 November 2013 14:53
> To: solr-user@lucene.apache.org
> Subject: Re: How to work with remote solr savely?
>
> http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication
>
> Maybe you could achieve write/read access limitation by setting path
> based
> authentication:
> The update handler "/solr/core/update"  should be protected by
> authentication, with credentials only known to you. But then of course,
> your indexing client will need to authenticate in order to add docs to
> solr.
> Your select handler "/solr/core/select" could then be open or protected
> by http auth with credentials open to developers.
>
> That's the first idea that comes to mind - haven't tested it. 
> If you do, feedback and let us know how it went.
>
>
>
> -
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-t
> p4102612p4102618.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
You could also use one of the proxy scripts, such as
http://code.google.com/p/solr-php-client/, which is coincidentally
linked (eventually) from Michael's suggested SolrSecurity URL.

-Original Message-
From: michael.boom [mailto:my_sky...@yahoo.com] 
Sent: 22 November 2013 14:53
To: solr-user@lucene.apache.org
Subject: Re: How to work with remote solr savely?

http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication

Maybe you could achieve write/read access limitation by setting path
based
authentication:
The update handler "/solr/core/update"  should be protected by
authentication, with credentials only known to you. But then of course,
your indexing client will need to authenticate in order to add docs to
solr.
Your select handler "/solr/core/select" could then be open or protected
by http auth with credentials open to developers.

That's the first idea that comes to mind - haven't tested it. 
If you do, feedback and let us know how it went.



-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-t
p4102612p4102618.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to work with remote solr savely?

2013-11-22 Thread michael.boom
http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication

Maybe you could achieve write/read access limitation by setting path based
authentication:
The update handler "/solr/core/update"  should be protected by
authentication, with credentials only known to you. But then of course, your
indexing client will need to authenticate in order to add docs to solr.
Your select handler "/solr/core/select" could then be open or protected by
http auth with credentials open to developers.

That's the first idea that comes to mind - haven't tested it. 
If you do, feedback and let us know how it went.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102618.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to work with remote solr savely?

2013-11-22 Thread Stavros Delisavas
Thanks for your fast reply.
First of all http basic authentication unfortunatly is not secure. Also
this would give every developer full admin priviliges. Anyways, can you
tell me where I can do those configurations?

Are there any alternative or more secure ways to restrict solr-access?
In general extern developers need search-query-access only. They should
not be able to write/change the documents or access solr-admin-pages.

Thank you


Am 22.11.2013 15:34, schrieb michael.boom:
> Use HTTP basic authentication, setup in your servlet container
> (jetty/tomcat).
>
> That should work fine if you are *not* using SolrCloud.
>
>
>
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
We solved this issue outside of Solr. As you've done, restrict the
server to localhost access to Solr, add firewall rules to allow your
developers on port 80, and proxypass allowed port 80 transfer to Solr.
Remember to include the proxypassreverse too.
(This runs on linux and apache httpd btw.)

-Original Message-
From: Stavros Delisavas [mailto:stav...@delisavas.de] 
Sent: 22 November 2013 14:24
To: solr-user@lucene.apache.org
Subject: How to work with remote solr savely?

Hello Solr-Friends,
I have a question about working with solr which is installed on a remote
server.
I have a php-project with a very big mysql-database of about 10gb and I
am also using solr for about 10,000,000 entries indexed for fast search
and access of the mysql-data.
I have a local copy myself so I can continue to work on the php-project
itself, but I want to make it available for more developers too. How can
I make solr accessable ONLY for those exclusive developers? For mysql
it's no problem to add an additional mysql-user with limited access.

But for Solr it seems difficult to me. I have had my administrator
restrict the java-port 8080 to localhost only. That way no one outside
can access solr or the solr-admin interface.
How can I allow access to other developers without making the whole
solr-interface (port 8080) available to the public?

Thanks,

Stavros


Re: How to work with remote solr savely?

2013-11-22 Thread michael.boom
Use HTTP basic authentication, setup in your servlet container
(jetty/tomcat).

That should work fine if you are *not* using SolrCloud.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Possible parent/child query bug

2013-11-22 Thread Mikhail Khludnev
On Fri, Nov 22, 2013 at 4:28 PM, Neil Ireson wrote:

> returns all the child docs, as expected, however
>
> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}
>
> returns all the parent docs.
>

aha. I remember it. I implemented this special case for reusing segmented
parent filter in Solr's q/fq
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java?source=cc#L55
It's lack of documentation.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


How to work with remote solr savely?

2013-11-22 Thread Stavros Delisavas
Hello Solr-Friends,
I have a question about working with solr which is installed on a remote
server.
I have a php-project with a very big mysql-database of about 10gb and I
am also using solr for about 10,000,000 entries indexed for fast search
and access of the mysql-data.
I have a local copy myself so I can continue to work on the php-project
itself, but I want to make it available for more developers too. How can
I make solr accessable ONLY for those exclusive developers? For mysql
it's no problem to add an additional mysql-user with limited access.

But for Solr it seems difficult to me. I have had my administrator
restrict the java-port 8080 to localhost only. That way no one outside
can access solr or the solr-admin interface.
How can I allow access to other developers without making the whole
solr-interface (port 8080) available to the public?

Thanks,

Stavros


Re: Possible parent/child query bug

2013-11-22 Thread Neil Ireson
Hi Mikhail,

You are right. 

If the "child of” query matches both parent and child docs it returns the child 
documents but a spurious numFound.

For the  “parent which” query if it matches both parent and child docs it 
returns a handy error message “child query must only match non-parent docs..."




On 22 Nov 2013, at 14:03, Mikhail Khludnev  wrote:

> Neil,
> quick hint. Can't you run Solr (jetty) with -ea ? my feeling is that nested
> query (which you put
> *:*)
> should be orthogonal to children, that's confirmed by assert. That's true
> for {!parent} at least.
> 
> 
> On Fri, Nov 22, 2013 at 5:40 PM, Neil Ireson wrote:
> 
>> Some further odd behaviour. For my index
>> 
>> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:*
>> 
>> Returns a numFound=“22984”, when there are only 2910 documents in the
>> index (748 parents, 2162 children).
>> 
>> 
>> 
>> 
>> On 22 Nov 2013, at 12:28, Neil Ireson  wrote:
>> 
>>> 
>>> Note sure if this is a bug but, for me, it was unexpected behaviour.
>>> 
>>> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:*
>>> 
>>> returns all the child docs, as expected, however
>>> 
>>> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}
>>> 
>>> returns all the parent docs.
>>> 
>>> This seems wrong to me, especially as the following query also returns
>> all the parent docs, which would make the two query equivalent:
>>> 
>>> http://localhost:8090/solr/select?q={!parent+which=doc_type:parent}
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 



Re: Possible parent/child query bug

2013-11-22 Thread Mikhail Khludnev
Neil,
quick hint. Can't you run Solr (jetty) with -ea ? my feeling is that nested
query (which you put
*:*)
should be orthogonal to children, that's confirmed by assert. That's true
for {!parent} at least.


On Fri, Nov 22, 2013 at 5:40 PM, Neil Ireson wrote:

> Some further odd behaviour. For my index
>
> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:*
>
> Returns a numFound=“22984”, when there are only 2910 documents in the
> index (748 parents, 2162 children).
>
>
>
>
> On 22 Nov 2013, at 12:28, Neil Ireson  wrote:
>
> >
> > Note sure if this is a bug but, for me, it was unexpected behaviour.
> >
> > http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:*
> >
> > returns all the child docs, as expected, however
> >
> > http://localhost:8090/solr/select?q={!child+of=doc_type:parent}
> >
> > returns all the parent docs.
> >
> > This seems wrong to me, especially as the following query also returns
> all the parent docs, which would make the two query equivalent:
> >
> > http://localhost:8090/solr/select?q={!parent+which=doc_type:parent}
> >
> >
> >
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Leading and trailing wildcard with phrase query and positional ordering

2013-11-22 Thread Dmitry Kan
Hi Ankur,

For the leading wildcard you may want to try the
ReversedWildcardFilterFactory:

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-ReversedWildcardFilter

in the code of CPQ there is a loop over filters of your text field and a
specific check:

  if (factory instanceof ReversedWildcardFilterFactory) {
allow = true;
leadingWildcards.put(e.getKey(),
(ReversedWildcardFilterFactory) factory);
  }

HTH,

Dmitry




On Tue, Nov 19, 2013 at 5:27 PM, GOYAL, ANKUR  wrote:

> Hi,
>
> I am using Solr 4.2.1. I have a couple of questions regarding using
> leading and trailing wildcards with phrase queries and doing positional
> ordering.
>
> *   I have a field called text which is defined as the text_general
> field. I downloaded the ComplexPhraseQuery plugin (
> https://issues.apache.org/jira/browse/SOLR-1604) and it works perfectly
> for trailing wildcards and wildcards within the phrase. However, if we use
> a leading wildcard, then it leads to an error saying that WildCard query
> does not permit usage of leading wildcard. So, is there any other way that
> we can use leading and trailing wildcards along with a phrase ?
> *   I am using boosting (qf parameter in requestHandler in
> solrConfig.xml) to do ordering of results that are returned from Solr.
> However, the order is not correct. The fields that I am doing boosting on
> are "text_general" fields. So, is it possible that boosting does not occur
> when the wildcards are used ?
>
> -Ankur
>
>


-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Possible parent/child query bug

2013-11-22 Thread Neil Ireson
Some further odd behaviour. For my index

http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:* 

Returns a numFound=“22984”, when there are only 2910 documents in the index 
(748 parents, 2162 children).




On 22 Nov 2013, at 12:28, Neil Ireson  wrote:

> 
> Note sure if this is a bug but, for me, it was unexpected behaviour.
> 
> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:* 
> 
> returns all the child docs, as expected, however
> 
> http://localhost:8090/solr/select?q={!child+of=doc_type:parent}
> 
> returns all the parent docs. 
> 
> This seems wrong to me, especially as the following query also returns all 
> the parent docs, which would make the two query equivalent:
> 
> http://localhost:8090/solr/select?q={!parent+which=doc_type:parent}
> 
> 
> 



Re: Solr logs encoding to UTF8

2013-11-22 Thread Erick Erickson
what are you using to view the file? Looks like whatever it is isn't
configured to do the proper thing with UTF-8.

Best,
Erick


On Fri, Nov 22, 2013 at 8:28 AM, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:

> Hi everybody:
>
> Is there any way of forcing an UTF-8 conversion on the queries that are
> logged into the log? I've deployed solr in tomcat7. The file appears to be
> an UTF-8 file but I'm seeing this in the logs:
>
> INFO: [] webapp=/solr path=/select
> params={fl=*,score&start=0&q=disñemos+el+mundo&hl.simple.pre=&hl.simple.post=&hl.fl=title,content,url,description,keywords&wt=json&hl=true&rows=20}
> hits=48865 status=0 QTime=155.
>
> 
> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> del 2014. Ver www.uci.cu
>


Solr logs encoding to UTF8

2013-11-22 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi everybody:

Is there any way of forcing an UTF-8 conversion on the queries that are logged 
into the log? I've deployed solr in tomcat7. The file appears to be an UTF-8 
file but I'm seeing this in the logs:

INFO: [] webapp=/solr path=/select 
params={fl=*,score&start=0&q=disñemos+el+mundo&hl.simple.pre=&hl.simple.post=&hl.fl=title,content,url,description,keywords&wt=json&hl=true&rows=20}
 hits=48865 status=0 QTime=155.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Erick Erickson
bq: By the term 'block', I assume SOLR returns a non 200

Pretty sure not. The query just waits around in the queue
in the server until the searcher is done warming, then the
search is executed and results returned.

bq: If a new SOLR server

No. Apart from any ugly details about caches and internal
doc IDs, you'd have to pull the caches over the wire to the
new machine, and the caches could well be gigabytes in size.
This is almost certainly much slower than just firing
the warming queries locally. Seems like far too complex
a functionality to put in place to save the effort of specifying a
warmup query. Which you have to do anyway since you have to
be ready to restart your cluster.

Best,
Erick



On Fri, Nov 22, 2013 at 4:44 AM, ade-b  wrote:

> Hi
>
> The definition of useColdSearcher config element in solrconfig.xml is
>
> "If a search request comes in and there is no current registered searcher,
> then immediately register the still warming searcher and use it.  If
> "false"
> then all requests will block until the first searcher is done warming".
>
> By the term 'block', I assume SOLR returns a non 200 response to requests.
> Does anybody know the exact response code returned when the server is
> blocking requests?
>
> If a new SOLR server is introduced into an existing array of SOLR servers
> (in SOLR Cloud setup), it will sync it's index from the leader. To save you
> having to specify warm-up queries in the solrconfig.xml file for first
> searchers, would/could the new server not auto warm it's caches from the
> caches of an existing server?
>
> Thanks
> Ade
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud unstable

2013-11-22 Thread Martin de Vries
 

We did some more monitoring and have some new information: 

Before
the issue happens the garbage collector's "collection count" increases a
lot. The increase seems to start about an hour before the real problem
occurs: 

http://www.analyticsforapplications.com/GC.png [1] 

We tried
both the g1 garbage collector and the regular one, the problem happens
with both of them. 

We use Java 1.6 on some servers. Will Java 1.7 be
better? 

Martin 

Martin de Vries schreef op 12.11.2013 10:45: 

>
Hi,
> 
> We have:
> 
> Solr 4.5.1 - 5 servers
> 36 cores, 2 shards each,
2 servers per shard (every core is on 4 
> servers)
> about 4.5 GB total
data on disk per server
> 4GB JVM-Memory per server, 3GB average in
use
> Zookeeper 3.3.5 - 3 servers (one shared with Solr)
> haproxy load
balancing
> 
> Our Solrcloud is very unstable. About one time a week
some cores go in 
> recovery state or down state. Many timeouts occur
and we have to restart 
> servers to get them back to work. The failover
doesn't work in many 
> cases, because one server has the core in down
state, the other in 
> recovering state. Other cores work fine. When the
cloud is stable I 
> sometimes see log messages like:
> - shard update
error StdNode: 
>
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:

> IOException occured when talking to server at: 
>
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1
> -
forwarding update to 
>
http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed - 
>
retrying ...
> - null:ClientAbortException: java.io.IOException: Broken
pipe
> 
> Before the the cloud problems start there are many large
Qtime's in the 
> log (sometimes over 50 seconds), but there are no
other errors until the 
> recovery problems start.
> 
> Any clue about
what can be wrong?
> 
> Kinds regards,
> 
> Martin

 

Links:
--
[1]
http://www.analyticsforapplications.com/GC.png


Re: Solrcloud: external fields and frequent commits

2013-11-22 Thread Erick Erickson
1> I'm not quite sure I understand. External File Fields are keyed
by the unique id of the doc. So every shard _must_ have the
eff available for at least the documents in that shard. At first glance
this doesn't look simple. Perhaps a bit more explanation of what
you're using EFF for?

2> Let's be sure we're talking about the same thing here. In Solr,
a "commit" is the command that makes documents visible, often
controlled by the autoCommit and autoSoftCommit settings in
solrconfig.xml. You will not be able to issue 100 commits/second.

If you're using "commit" to mean adding a document to the index,
then 100/s should be no problem. I regularly see many times that
ingestion rate. The documents won't be visible to search until
you do a commit however.

Best
Erick


On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier wrote:

> Hi to all,
> we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have two
> big doubts:
>
> 1) External fields. When I compute such a file do I have to copy it in the
>  data directory of shards..? The external fields boosts the results of the
> query to a specific collection, for me it doesn't make sense to put it in
> all shard's data dir, it should be something related to the collection
> itself.
> Am I wrong or missing something? Is there a simple way to upload the
> popularity file (for the external field) at one in all shards?
>
> 2) My index requires frequently commits (i.e. sometimes up to 100/s). How
> do I have to manage this? Do I have to use soft commits..? Any simple
> configuration/code snippet to use them? Is it true that external fields
> affect performance on commit?
>
> Best,
> Flavio
>


Re: Few Clarification on Apache Solr front

2013-11-22 Thread Erick Erickson
1> indexing a few contents from a node. Well, you build
the ingestion pipeline so it's up to the code you build.

2> It's all about analysis. When you build your schema,
you determine how you need to treat your data and
you you're searching on it and build the analysis chain
for each field accordingly. See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
as a place to start.

3> Stopwords are built in to Solr, you have to provide the
list however. I have no idea what you mean by "list of nouns",
what do you want to do with that list?

Best,
Erick


On Fri, Nov 22, 2013 at 4:09 AM, topgun wrote:

> We are planning to migrate a website from its proprietary CMS to Drupal. As
> they have been using a 3rd party enterprise search service(Endeca), we have
> proposed Apache-solr as replacement. We are in the process of proof of
> concept with respect to Apache-Solr. We would like to understand certain
> aspects with respect to Apache-solr,
>
> * In Apache-Solr, just want to understand whether it it possible to index
> only few content from a node.
> * Do we have phonetic mismatch and typographical error and misplaced
> wordbreaks or punctuation detection ?
> * Possibility of having stop word configuration and list of nouns.
>
> Thanks so much in Advance.
>
> Warm Regards,
> Saravanan
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Few-Clarification-on-Apache-Solr-front-tp4102566.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: a function query of time, frequency and score.

2013-11-22 Thread Erick Erickson
Not quite sure what you're asking. The field() function query brings the
value of a field into the score, something like:
http://localhost:8983/solr/select?wt=json&fl=id%20score&q={!boost%20b=field(popularity)}ipod

Best,
Erick


On Thu, Nov 21, 2013 at 10:43 PM, sling  wrote:

> Hi, guys.
>
> I indexed 1000 documents, which have fields like title, ptime and
> frequency.
>
> The title is a text fild, the ptime is a date field, and the frequency is a
> int field.
> Frequency field is ups and downs. say sometimes its value is 0, and
> sometimes its value is 999.
>
> Now, in my app, the query could work with function query well. The function
> query is implemented as the score multiplied by an decreased date-weight
> array.
>
> However, I have got no idea to add the frequency to this formula...
>
> so could someone give me a clue?
>
> Thanks again!
>
> sling
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Split shard and stream sub-shards to remote nodes?

2013-11-22 Thread Shalin Shekhar Mangar
The splitting process is nothing but the creation of a bitset with
which a LiveDocsReader is created. These readers are then added to the
a new index via IW.addIndexes(IndexReader[] readers) method. All this
is performed below the IR/IW API and no documents are actually ever
read or written directly by Solr. This is why it isn't feasible to
stream docs to a remote node.

On Fri, Nov 22, 2013 at 5:59 AM, Otis Gospodnetic
 wrote:
> Hi,
>
> On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> At the Lucene level, I think it would require a directory
>> implementation which writes to a remote node directly. Otherwise, on
>> the solr side, we must move the leader itself to another node which
>> has enough disk space and then split the index.
>>
>
> Hm what about taking the source shard, splitting it, and sending docs
> that come out of each sub-shards to a remote node at Solr level, as if
> these documents are just being added (i.e. nothing at Lucene level)?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>>
>> On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
>>  wrote:
>> > Do you think this is something that is actually implementable?  If so,
>> > I'll open an issue.
>> >
>> > One use-case where this may come in handy is when the disk space is
>> > tight.  If a shard is using > 50% of the disk space on some node X,
>> > you can't really split that shard because the 2 new sub-shards will
>> > not fit on the local disk.  Or is there some trick one could use in
>> > this situation?
>> >
>> > Thanks,
>> > Otis
>> > --
>> > Performance Monitoring * Log Analytics * Search Analytics
>> > Solr & Elasticsearch Support * http://sematext.com/
>> >
>> >
>> > On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
>> >  wrote:
>> >> No, it is not supported yet. We can't split to a remote node directly.
>> >> The best bet is trigger a new leader election by unloading the leader
>> >> node once all replicas are active.
>> >>
>> >> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
>> >>  wrote:
>> >>> Hi,
>> >>>
>> >>> Is it possible to perform a shard split and stream data for the
>> >>> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
>> >>> on the local/source node first?
>> >>>
>> >>> Thanks,
>> >>> Otis
>> >>> --
>> >>> Performance Monitoring * Log Analytics * Search Analytics
>> >>> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



-- 
Regards,
Shalin Shekhar Mangar.


Possible parent/child query bug

2013-11-22 Thread Neil Ireson

Note sure if this is a bug but, for me, it was unexpected behaviour.

http://localhost:8090/solr/select?q={!child+of=doc_type:parent}*:* 

returns all the child docs, as expected, however

http://localhost:8090/solr/select?q={!child+of=doc_type:parent}

returns all the parent docs. 

This seems wrong to me, especially as the following query also returns all the 
parent docs, which would make the two query equivalent:

http://localhost:8090/solr/select?q={!parent+which=doc_type:parent}





Saravanan Chinnadurai/Actionimages is out of the office.

2013-11-22 Thread Saravanan . Chinnadurai
I will be out of the office starting  17/11/2013 and will not return until
01/12/2013.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


NullPointerException

2013-11-22 Thread Adrien RUFFIE
Hello all,

I have perform a full indexation with solr, but when I try to perform an 
incrementation indexation I get the following exception (cf attachment).

Any one have a idea of the problem ?

Greate thank
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DataImporter 
doDeltaImport
INFO: Starting Delta Import
23 oct. 2013 08:34:40 org.apache.solr.core.SolrCore execute
INFO: [knowledgebase] webapp=null path=/dataimport 
params={qt=%2Fdataimport&command=delta-import&core=knowledgebase} status=0 
QTime=0 
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Starting delta collection.
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: Tag
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: Tag rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: Tag rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: Tag
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: MvaFAHTagCode
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: MvaFAHTagCode rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: MvaFAHTagCode rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: MvaFAHTagCode
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: Attachment
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: Attachment rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: Attachment rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: Attachment
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: MvaFahAttID
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: MvaFahAttID rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: MvaFahAttID rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: MvaFahAttID
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: RefFaqLngID
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed ModifiedRowKey for Entity: RefFaqLngID rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed DeletedRowKey for Entity: RefFaqLngID rows obtained : 0
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Completed parentDeltaQuery for Entity: RefFaqLngID
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DocBuilder collectDelta
INFO: Running ModifiedRowKey() for Entity: FAQ
23 oct. 2013 08:34:40 org.apache.solr.handler.dataimport.DataImporter 
doDeltaImport
GRAVE: Delta Import Failed
java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.EvaluatorBag$4.evaluate(EvaluatorBag.java:146)
at 
org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:222)
at 
org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:209)
at 
org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:118)
at 
org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
at 
org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
at 
org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:96)
at 
org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:256)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:84)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:262)
at 
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuild

Solrcloud: external fields and frequent commits

2013-11-22 Thread Flavio Pompermaier
Hi to all,
we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have two
big doubts:

1) External fields. When I compute such a file do I have to copy it in the
 data directory of shards..? The external fields boosts the results of the
query to a specific collection, for me it doesn't make sense to put it in
all shard's data dir, it should be something related to the collection
itself.
Am I wrong or missing something? Is there a simple way to upload the
popularity file (for the external field) at one in all shards?

2) My index requires frequently commits (i.e. sometimes up to 100/s). How
do I have to manage this? Do I have to use soft commits..? Any simple
configuration/code snippet to use them? Is it true that external fields
affect performance on commit?

Best,
Flavio


useColdSearcher in SolrCloud config

2013-11-22 Thread ade-b
Hi

The definition of useColdSearcher config element in solrconfig.xml is

"If a search request comes in and there is no current registered searcher,
then immediately register the still warming searcher and use it.  If "false"
then all requests will block until the first searcher is done warming".

By the term 'block', I assume SOLR returns a non 200 response to requests.
Does anybody know the exact response code returned when the server is
blocking requests?

If a new SOLR server is introduced into an existing array of SOLR servers
(in SOLR Cloud setup), it will sync it's index from the leader. To save you
having to specify warm-up queries in the solrconfig.xml file for first
searchers, would/could the new server not auto warm it's caches from the
caches of an existing server?

Thanks
Ade 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
Sent from the Solr - User mailing list archive at Nabble.com.


Few Clarification on Apache Solr front

2013-11-22 Thread topgun
We are planning to migrate a website from its proprietary CMS to Drupal. As
they have been using a 3rd party enterprise search service(Endeca), we have
proposed Apache-solr as replacement. We are in the process of proof of
concept with respect to Apache-Solr. We would like to understand certain
aspects with respect to Apache-solr,

* In Apache-Solr, just want to understand whether it it possible to index
only few content from a node.
* Do we have phonetic mismatch and typographical error and misplaced
wordbreaks or punctuation detection ?
* Possibility of having stop word configuration and list of nouns.

Thanks so much in Advance.

Warm Regards,
Saravanan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Few-Clarification-on-Apache-Solr-front-tp4102566.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-11-22 Thread Sukanta Dey
Hi Team,

I am attaching all the required files we are using to get the VJOIN 
functionality along with the actual requirement statement.
Hope this would help you understand better the requirement for VJOIN 
functionality.

Thanks,
Sukanta

From: Sukanta Dey
Sent: Wednesday, September 04, 2013 1:50 PM
To: 'solr-user@lucene.apache.org'
Cc: Sukanta Dey
Subject: Need help on Joining and sorting syntax and limitations between 
multiple documents in solr-4.4.0

Hi Team,

In my project I am going to use Apache solr-4.4.0 version for searching. While 
doing that I need to join between multiple solr documents within the same core 
on one of the common field across the documents.
Though I successfully join the documents using solr-4.4.0 join syntax, it is 
returning me the expected result, but, since my next requirement is to sort the 
returned result on basis of the fields from the documents
Involved in join condition's "from" clause, which I was not able to get. Let me 
explain the problem in detail along with the files I am using ...


1)  Files being used :

a.   Picklist_1.xml

--



t1324838

7

956

130712901

Draft

Draoft





b.  Picklist_2.xml

---



t1324837

7

87749

130712901

New

Neuo





c.   AssetID_1.xml

---



t1324837

a180894808

1

true

2013-09-02T09:28:18Z

130713716

130712901





d.  AssetID_2.xml





 t1324838

 a171658357

1

130713716

2283961

2290309

7

7

13503796
15485964

38052

41133

130712901





2)  Requirement:



i. It needs to have a join  between the files using 
"def14227_picklist" field from AssetID_1.xml and AssetID_2.xml and 
"describedObjectId" field from Picklist_1.xml and Picklist_2.xml files.

ii.   After joining we need to have all the fields from the 
files AssetID_*.xml and "en","gr" fields from Picklist_*.xml files.

iii.  While joining we also sort the result based on the "en" 
field value.



3)  I was trying with "q={!join from=inner_id to=outer_id}zzz:vvv" syntax 
but no luck.

Any help/suggestion would be appreciated.

Thanks,
Sukanta Dey