Delete Documents

2015-07-17 Thread naga sharathrayapati
Hey,

I understand that "DocExpirationUpdateProcessorFactory" in the config.xml
can be specified to delete documents based on the expiration.

I would like to understand whether there is a chance of these deleted
documents getting re-indexed?

Solr 5.2

Thanks


Re: solr blocking and client timeout issue

2015-07-17 Thread Jeremy Ashcraft
I turned on GC logging and verified that its definitely being caused by 
a GC pause.  I tried the tuning option from the article and get this 
warning:


 OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory 
(errno = 1).


any recommendations on how to get rid of that warning, and should I be 
worried about it?


On 7/17/2015 4:14 PM, Shawn Heisey wrote:
With a 4GB or 5GB heap, I would *definitely* do some GC tuning. I 
would also strongly recommend upgrading your Java version to at least 
the latest Java 7. This is what I use for getting the latest Java 7 
version onto CentOS 6. It is an excellent package that puts relevant 
versions of all the important commandline tools into your path: 
http://www.city-fan.org/tips/OracleJava7OnFedora There aren't really 
any good Java 8 packages for CentOS 6, so you may be stuck with Java 
7, but if you *can* go to the latest Java 8, there are rumored to be 
some big performance gains due to better memory management. For GC 
tuning, I have a page in the Solr wiki with tuning options that have 
been working very well for me: 
https://wiki.apache.org/solr/ShawnHeisey If you want to be 
conservative, skip the G1 tuning and go for the CMS tuning options. We 
also have a page dedicated to performance problems with Solr, which 
can, as I've described, be related to Java heap issues. One of the 
things on that page is a section dedicated to reducing your heap 
requirements, which you can find in the Table Of Contents near the 
top: https://wiki.apache.org/solr/SolrPerformanceProblems If you have 
some time, reading that entire page is a good idea. Thanks, Shawn 




Re: solr blocking and client timeout issue

2015-07-17 Thread Jeremy Ashcraft
thanks! i will definitely try some of those and let you know how it 
turns out


On 7/17/2015 4:14 PM, Shawn Heisey wrote:

On 7/17/2015 4:39 PM, Jeremy Ashcraft wrote:

Solr 4.4.0

/usr/bin/java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

CentOS6

/usr/bin/java -Xms1G -Xmx4G -Dsolr.solr.home=/opt/solr/solr 
-Djetty.logs=/opt/solr/logs -Djetty.home=/opt/solr -Djava.io.tmpdir=/tmp -jar 
/opt/solr/start.jar --daemon

8GB total system memory
approx 7.3M docs, 26 segments, 2.34GB in size
one instance, no replication

You do have enough available memory compared to your index size that you
could bump your heap to 5GB,  which might alleviate what appears to be
*extreme* memory pressure.

With a 4GB or 5GB heap, I would *definitely* do some GC tuning.  I would
also strongly recommend upgrading your Java version to at least the
latest Java 7.  This is what I use for getting the latest Java 7 version
onto CentOS 6.  It is an excellent package that puts relevant versions
of all the important commandline tools into your path:

http://www.city-fan.org/tips/OracleJava7OnFedora

There aren't really any good Java 8 packages for CentOS 6, so you may be
stuck with Java 7, but if you *can* go to the latest Java 8, there are
rumored to be some big performance gains due to better memory management.

For GC tuning, I have a page in the Solr wiki with tuning options that
have been working very well for me:

https://wiki.apache.org/solr/ShawnHeisey

If you want to be conservative, skip the G1 tuning and go for the CMS
tuning options.

We also have a page dedicated to performance problems with Solr, which
can, as I've described, be related to Java heap issues.  One of the
things on that page is a section dedicated to reducing your heap
requirements, which you can find in the Table Of Contents near the top:

https://wiki.apache.org/solr/SolrPerformanceProblems

If you have some time, reading that entire page is a good idea.

Thanks,
Shawn





Re: solr blocking and client timeout issue

2015-07-17 Thread Shawn Heisey
On 7/17/2015 4:39 PM, Jeremy Ashcraft wrote:
> Solr 4.4.0
> 
> /usr/bin/java -version
> java version "1.7.0_25"
> OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> 
> CentOS6
> 
> /usr/bin/java -Xms1G -Xmx4G -Dsolr.solr.home=/opt/solr/solr 
> -Djetty.logs=/opt/solr/logs -Djetty.home=/opt/solr -Djava.io.tmpdir=/tmp -jar 
> /opt/solr/start.jar --daemon
> 
> 8GB total system memory
> approx 7.3M docs, 26 segments, 2.34GB in size
> one instance, no replication

You do have enough available memory compared to your index size that you
could bump your heap to 5GB,  which might alleviate what appears to be
*extreme* memory pressure.

With a 4GB or 5GB heap, I would *definitely* do some GC tuning.  I would
also strongly recommend upgrading your Java version to at least the
latest Java 7.  This is what I use for getting the latest Java 7 version
onto CentOS 6.  It is an excellent package that puts relevant versions
of all the important commandline tools into your path:

http://www.city-fan.org/tips/OracleJava7OnFedora

There aren't really any good Java 8 packages for CentOS 6, so you may be
stuck with Java 7, but if you *can* go to the latest Java 8, there are
rumored to be some big performance gains due to better memory management.

For GC tuning, I have a page in the Solr wiki with tuning options that
have been working very well for me:

https://wiki.apache.org/solr/ShawnHeisey

If you want to be conservative, skip the G1 tuning and go for the CMS
tuning options.

We also have a page dedicated to performance problems with Solr, which
can, as I've described, be related to Java heap issues.  One of the
things on that page is a section dedicated to reducing your heap
requirements, which you can find in the Table Of Contents near the top:

https://wiki.apache.org/solr/SolrPerformanceProblems

If you have some time, reading that entire page is a good idea.

Thanks,
Shawn



Re: Programmatically find out if node is overseer

2015-07-17 Thread Chris Hostetter

: Hello - i need to run a thread on a single instance of a cloud so need 
: to find out if current node is the overseer. I know we can already 
: programmatically find out if this replica is the leader of a shard via 
: isLeader(). I have looked everywhere but i cannot find an isOverseer. I 

At one point, i woked up a utility method to give internal plugins 
access to an "isOverseer()" type utility method...

   https://issues.apache.org/jira/browse/SOLR-5823

...but ultimately i abandoned this because i was completley forgetting 
(until much much too late) that there's really no reason to assume that 
any/all collections will have a single shard on the same node as the 
overseer -- so having a plugin that only does stuff if it's running on the 
overseer node is a really bad idea, because it might not run at all. (even 
if it's configured in every collection)


what i ultimately wound up doing (see SOLR-5795) is implementing a 
solution where every core (of each collection configured to want this 
functionality) has a thread running (a TimedExecutor) which would do 
nothing unless...
 * my slice is active? (ie: not in the process of being shut down)
 * my slice is 'first' in a sorted list of slices?
 * i am currently the leader of my slice?

...that way when the timer goes off ever X minutes, at *most* one thread 
fires (we might sporadically get no evens triggered if/when there is 
leader election in progress for the slice that matters)

the choice of "first" slice name alphabetically is purely becuase it's 
something cheap to compute and garunteeded to be unique.


If you truly want exactly one thread for the entire cluster, regardless of 
collection, you could do the same basic idea by just adding a "my 
collection is 'first' in a sorted list of collection names?"



-Hoss
http://www.lucidworks.com/


Re: Programmatically find out if node is overseer

2015-07-17 Thread Vincenzo D'Amore
Hi, in solrj I did this to find leaders url :

CloudSolrServer solr = (CloudSolrServer) Configurazione.getSolrServer();
String collection = "collection1";

Map collectionAliases =
solr.getZkStateReader().getAliases().getCollectionAliasMap();
if (collectionAliases.containsKey(collection)) {
corename = collectionAliases.get(collection);
 for (Slice slice :
solr.getZkStateReader().getClusterState().getSlices(corename)) {
Replica sliceLeader = slice.getLeader();
log.info("faccio backup su {} {}", sliceLeader.getNodeName(),
sliceLeader.getProperties().get("core"));
log.info(String.format("%s/%s/",sliceLeader.get(
"base_url"), sliceLeader.getProperties().get("core")));
}
}


On Fri, Jul 17, 2015 at 9:37 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Sat, Jul 18, 2015 at 1:00 AM, Shai Erera  wrote:
> >>
> >> Also, ideally, there shouldn't be a point where you have multiple active
> >> Overseers in a single cluster.
> >>
> >
> > In the reference guide, CLUSTERSTATUS shows as if the overseer role can
> > return more than one node. Does it mean that these nodes were designated
> > potential 'overseers', but OVERSEERSTATUS' will return the actual one?
>
> Yes, the OVERSEERSTATUS will return the current (actual) leader always.
>
> >
> > Shai
> >
> > On Fri, Jul 17, 2015 at 10:11 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> Just for the record, SOLR-5859 only affects people who used the
> >> overseer roles feature released in 4.7 and no one else. This was fixed
> >> in 4.8
> >>
> >> On Sat, Jul 18, 2015 at 12:18 AM, Anshum Gupta 
> >> wrote:
> >> > It shouldn't happen unless you're using an older version of Solr (<
> 4.8)
> >> in
> >> > which case, you might end up hitting SOLR-5859
> >> > .
> >> >
> >> > On Fri, Jul 17, 2015 at 11:29 AM,  wrote:
> >> >
> >> >> Hi Anshum what do you mean by:
> >> >> >ideally, there shouldn't be a point where you have multiple active
> >> >> Overseers in a single cluster
> >> >>
> >> >> How can multiple Overseers happen? And what are the consequences?
> >> >>
> >> >> Regards
> >> >>
> >> >> > On 17 Jul 2015, at 19:37, Anshum Gupta 
> >> wrote:
> >> >> >
> >> >> > ideally, there shouldn't be a point where you have multiple active
> >> >> > Overseers in a single cluster
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Anshum Gupta
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: solr blocking and client timeout issue

2015-07-17 Thread Jeremy Ashcraft
Solr 4.4.0

/usr/bin/java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

CentOS6

/usr/bin/java -Xms1G -Xmx4G -Dsolr.solr.home=/opt/solr/solr 
-Djetty.logs=/opt/solr/logs -Djetty.home=/opt/solr -Djava.io.tmpdir=/tmp -jar 
/opt/solr/start.jar --daemon

8GB total system memory
approx 7.3M docs, 26 segments, 2.34GB in size
one instance, no replication

no GC options that I'm aware of


From: Shawn Heisey 
Sent: Friday, July 17, 2015 3:24 PM
To: solr-user@lucene.apache.org
Subject: Re: solr blocking and client timeout issue



Since there's basically no information about your setup here, my best
guess is that you're seeing huge garbage collection pauses.

There are two problems that can lead to long GC pauses.  One is a super
large heap with no GC tuning, the other is a heap that's just a little
bit too small, a problem that would be made a lot worse if there's no GC
tuning.

Seeing *minutes* of no apparent activity suggests the latter problem.

Some generic info that we'll need:  What versions of Solr and Java are
you running?  What operating system is it on, and is that OS 64-bit?  If
the OS is 64-bit, is Java 64-bit?  How much total memory is in the
machine that's running Solr?  If it's a virtual machine, I'm after the
total VM memory size.

Some very specific information that we'll need:  What is the max heap
you've allocated to Java?  How many documents are in all your indexes on
that one machine, and how much disk space do they take up?  If you have
multiple index replicas on the same machine, please count all of them,
don't summarize.

If you happen to know, what GC tuning options are being provided to the
container that's running Solr?

Thanks,
Shawn



Re: solr blocking and client timeout issue

2015-07-17 Thread Shawn Heisey
On 7/17/2015 4:02 PM, Jeremy Ashcraft wrote:
> having an issue where solr becomes unresponsive for unknown reasons. 
> Client requests timeout for minutes at a time (sometimes only some
> requests time out while others work fine).  The logs don't reveal any
> clues, other than just a big gap



> you'll notice that is a 7 minute gap between the 3rd and 4th lines
> there.  The only exceptions that show up are a few EofExceptions/Broken
> Pipes, but my assumption is that they are end users prematurely stopping
> their requests.  Updates happen periodically throughout the day with
> soft commits.  hard commits are configured to run every 15sec, and we
> only optimize once at night. disk IO and memory usage is normal during
> these hiccups.  the only thing abnormal is a load avg of about 3 (where
> 1.5 is the normal load)

Since there's basically no information about your setup here, my best
guess is that you're seeing huge garbage collection pauses.

There are two problems that can lead to long GC pauses.  One is a super
large heap with no GC tuning, the other is a heap that's just a little
bit too small, a problem that would be made a lot worse if there's no GC
tuning.

Seeing *minutes* of no apparent activity suggests the latter problem.

Some generic info that we'll need:  What versions of Solr and Java are
you running?  What operating system is it on, and is that OS 64-bit?  If
the OS is 64-bit, is Java 64-bit?  How much total memory is in the
machine that's running Solr?  If it's a virtual machine, I'm after the
total VM memory size.

Some very specific information that we'll need:  What is the max heap
you've allocated to Java?  How many documents are in all your indexes on
that one machine, and how much disk space do they take up?  If you have
multiple index replicas on the same machine, please count all of them,
don't summarize.

If you happen to know, what GC tuning options are being provided to the
container that's running Solr?

Thanks,
Shawn



solr blocking and client timeout issue

2015-07-17 Thread Jeremy Ashcraft
having an issue where solr becomes unresponsive for unknown reasons.  
Client requests timeout for minutes at a time (sometimes only some 
requests time out while others work fine).  The logs don't reveal any 
clues, other than just a big gap


example:

INFO  - 2015-07-17 14:39:57.195; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] 
webapp=/solr path=/update params={omitHeader=false&wt=json} 
{add=[16ce1c27-558c-4307-bdb6-e88ea93c9b2a 
(1506981128084389888)],commit=} 0 32


INFO  - 2015-07-17 14:40:02.716; org.apache.solr.core.SolrCore; 
[collection1] webapp=/solr path=/select 
params={omitHeader=true&sort=objectid+asc&fl=objectid&start=0&q=*:*&wt=json&fq=grades:("39")&fq=keywords:("1193"+OR+"1198"+OR+"11532206"+OR+"11532216"+OR+"17787406"+OR+"147664140"+OR+"147664142"+OR+"147664180"+OR+"325388273"+OR+"342808011")&fq=orgobjectid:("514119")&fq=subjects:("Language+Arts")&rows=1000} 
hits=4 status=0 QTime=54


INFO  - 2015-07-17 14:40:07.756; org.apache.solr.core.SolrCore; 
[collection1] webapp=/solr path=/select 
params={omitHeader=true&sort=objectid+asc&fl=objectid&start=0&q=*:*&wt=json&fq=grades:("43")&fq=keywords:("1134"+OR+"1158"+OR+"1160"+OR+"1175"+OR+"1193"+OR+"1208"+OR+"1209"+OR+"1213"+OR+"1215"+OR+"7838251"+OR+"7838265"+OR+"8877368"+OR+"11532189"+OR+"14736433"+OR+"14736436"+OR+"15392964"+OR+"15392969"+OR+"17787380"+OR+"17787385"+OR+"17787388"+OR+"17787389"+OR+"17787396"+OR+"17787397"+OR+"17787400"+OR+"17787405"+OR+"17787406"+OR+"26538072"+OR+"27982226"+OR+"28551934"+OR+"28551953"+OR+"466877542"+OR+"466877543"+OR+"476555246")&fq=orgobjectid:("392236052")&fq=subjects:("Language+Arts")&rows=1000} 
hits=17 status=0 QTime=381


INFO  - 2015-07-17 14:47:27.223; org.apache.solr.core.SolrCore; 
[collection1] webapp=/solr path=/select 
params={omitHeader=true&fl=*,score&start=0&q=*:*&wt=json&fq=orgobjectid:(672130365)&rows=1000} 
hits=0 status=0 QTime=0



you'll notice that is a 7 minute gap between the 3rd and 4th lines 
there.  The only exceptions that show up are a few EofExceptions/Broken 
Pipes, but my assumption is that they are end users prematurely stopping 
their requests.  Updates happen periodically throughout the day with 
soft commits.  hard commits are configured to run every 15sec, and we 
only optimize once at night. disk IO and memory usage is normal during 
these hiccups.  the only thing abnormal is a load avg of about 3 (where 
1.5 is the normal load)


Any ideas as to what's going on?

--
*jeremy ashcraft*

//


Re: SolrCloud 5.2.1 - collection creation error

2015-07-17 Thread Aaron Gibbons
I started from scratch with fresh Ubuntu machines and just wiped them and
tried again. I run my Ansible playbook (below) to install Java 8 (Tried
Oracle this time and even tried installing it manually) and SolrCloud 5.2.1
as described previously.  Solr cloud appears to be working fine but I still
get the same error creating a collection. There is nothing else on these
machines so I'm not sure where the conflict would come from. I'm using all
the standard locations and settings just adding our external zookeepers.  I
can't see where Ansible would be causing a conflict here either it's just
running the commands from the tutorial across the 4 machines.

...

roles:

- role: 'williamyeh.oracle-java'

tasks:

- name: Download Solr.

get_url:

  url: "http://archive.apache.org/dist/lucene/solr/{{ solr_version
}}/{{ solr_filename }}.tgz"

  dest: "{{ solr_workspace }}/{{ solr_filename }}.tgz"

  force: no

  - name: Extract the installation script.

command: >

  tar xzf {{ solr_workspace }}/{{ solr_filename }}.tgz {{ solr_filename
}}/bin/install_solr_service.sh --strip-components=2

  - name: Run installation Script.

command: "sudo bash ./install_solr_service.sh {{ solr_filename }}.tgz"

  - name: Stop solr.

service: name=solr state=stopped

  - name: Copy Template init Config file into bin and restart.

template:

  src: "solr-init-5.x.j2"

  dest: /var/solr/solr.in.sh

  - name: Replace Log4j.properties file for production logging settings.

copy:

  src: "log4j.properties"

  dest: "/var/solr/log4j.properties"
  ...

  - name: Add sqljdbc.jar file to solr dist files

copy:

  src: "sqljdbc4.jar"

  dest: "/opt/solr/dist/sqljdbc4.jar"

  - name: Start solr with new config.

service: name=solr state=restarted


On Thu, Jul 16, 2015 at 8:48 PM, Erick Erickson 
wrote:

> It looks at a glance like you're in "Jar hell" and have one or more jar
> files from "somewhere else" in your classpath, possibly a jar file from
> an older Solr or one of the libraries.
>
> Best,
> Erick
>
> On Thu, Jul 16, 2015 at 6:17 AM, Aaron Gibbons
>  wrote:
> > I'm installing SolrCloud 5.2.1 on 4 Ubuntu 14.04 machines with 3 external
> > zookeepers.  I've installed the solr machines using Ansible following the
> > "Taking Solr to Production" steps.
> >
> >1. Download 5.2.1
> >2. Extract installation script
> >3. Run installation script
> >
> > Then I stop solr and make my configuration changes to the solr.in.sh
> file
> > (adding zookeepers) and log4j.properties (recommended changes).  Restart
> > solr and everything looks good.
> >
> > The problem I have is that I can't create a collection.  I create the
> > collection folder in /var/solr/data and tried both the bin script and API
> > but get the error below. I've tried 5.2.0 also and both Java 7 and 8 with
> > the same result.
> >
> > 50047java.io.InvalidClassException:
> > org.apache.solr.client.solrj.SolrResponse; local class incompatible:
> stream
> > classdesc serialVersionUID = 3123208377723774018, local class
> > serialVersionUID =
> 3945300637328478755org.apache.solr.common.SolrException:
> > java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> > local class incompatible: stream classdesc serialVersionUID =
> > 3123208377723774018, local class serialVersionUID = 3945300637328478755
> at
> >
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:62)
> > at
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:228)
> > at
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> > at
> >
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:646)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417) at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> > at
> >
> org.eclipse.jetty.server.h

Re: Programmatically find out if node is overseer

2015-07-17 Thread Shalin Shekhar Mangar
On Sat, Jul 18, 2015 at 1:00 AM, Shai Erera  wrote:
>>
>> Also, ideally, there shouldn't be a point where you have multiple active
>> Overseers in a single cluster.
>>
>
> In the reference guide, CLUSTERSTATUS shows as if the overseer role can
> return more than one node. Does it mean that these nodes were designated
> potential 'overseers', but OVERSEERSTATUS' will return the actual one?

Yes, the OVERSEERSTATUS will return the current (actual) leader always.

>
> Shai
>
> On Fri, Jul 17, 2015 at 10:11 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Just for the record, SOLR-5859 only affects people who used the
>> overseer roles feature released in 4.7 and no one else. This was fixed
>> in 4.8
>>
>> On Sat, Jul 18, 2015 at 12:18 AM, Anshum Gupta 
>> wrote:
>> > It shouldn't happen unless you're using an older version of Solr (< 4.8)
>> in
>> > which case, you might end up hitting SOLR-5859
>> > .
>> >
>> > On Fri, Jul 17, 2015 at 11:29 AM,  wrote:
>> >
>> >> Hi Anshum what do you mean by:
>> >> >ideally, there shouldn't be a point where you have multiple active
>> >> Overseers in a single cluster
>> >>
>> >> How can multiple Overseers happen? And what are the consequences?
>> >>
>> >> Regards
>> >>
>> >> > On 17 Jul 2015, at 19:37, Anshum Gupta 
>> wrote:
>> >> >
>> >> > ideally, there shouldn't be a point where you have multiple active
>> >> > Overseers in a single cluster
>> >>
>> >
>> >
>> >
>> > --
>> > Anshum Gupta
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Programmatically find out if node is overseer

2015-07-17 Thread Shai Erera
>
> Also, ideally, there shouldn't be a point where you have multiple active
> Overseers in a single cluster.
>

In the reference guide, CLUSTERSTATUS shows as if the overseer role can
return more than one node. Does it mean that these nodes were designated
potential 'overseers', but OVERSEERSTATUS' will return the actual one?

Shai

On Fri, Jul 17, 2015 at 10:11 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Just for the record, SOLR-5859 only affects people who used the
> overseer roles feature released in 4.7 and no one else. This was fixed
> in 4.8
>
> On Sat, Jul 18, 2015 at 12:18 AM, Anshum Gupta 
> wrote:
> > It shouldn't happen unless you're using an older version of Solr (< 4.8)
> in
> > which case, you might end up hitting SOLR-5859
> > .
> >
> > On Fri, Jul 17, 2015 at 11:29 AM,  wrote:
> >
> >> Hi Anshum what do you mean by:
> >> >ideally, there shouldn't be a point where you have multiple active
> >> Overseers in a single cluster
> >>
> >> How can multiple Overseers happen? And what are the consequences?
> >>
> >> Regards
> >>
> >> > On 17 Jul 2015, at 19:37, Anshum Gupta 
> wrote:
> >> >
> >> > ideally, there shouldn't be a point where you have multiple active
> >> > Overseers in a single cluster
> >>
> >
> >
> >
> > --
> > Anshum Gupta
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Programmatically find out if node is overseer

2015-07-17 Thread Shalin Shekhar Mangar
Just for the record, SOLR-5859 only affects people who used the
overseer roles feature released in 4.7 and no one else. This was fixed
in 4.8

On Sat, Jul 18, 2015 at 12:18 AM, Anshum Gupta  wrote:
> It shouldn't happen unless you're using an older version of Solr (< 4.8) in
> which case, you might end up hitting SOLR-5859
> .
>
> On Fri, Jul 17, 2015 at 11:29 AM,  wrote:
>
>> Hi Anshum what do you mean by:
>> >ideally, there shouldn't be a point where you have multiple active
>> Overseers in a single cluster
>>
>> How can multiple Overseers happen? And what are the consequences?
>>
>> Regards
>>
>> > On 17 Jul 2015, at 19:37, Anshum Gupta  wrote:
>> >
>> > ideally, there shouldn't be a point where you have multiple active
>> > Overseers in a single cluster
>>
>
>
>
> --
> Anshum Gupta



-- 
Regards,
Shalin Shekhar Mangar.


Re: Programmatically find out if node is overseer

2015-07-17 Thread Anshum Gupta
It shouldn't happen unless you're using an older version of Solr (< 4.8) in
which case, you might end up hitting SOLR-5859
.

On Fri, Jul 17, 2015 at 11:29 AM,  wrote:

> Hi Anshum what do you mean by:
> >ideally, there shouldn't be a point where you have multiple active
> Overseers in a single cluster
>
> How can multiple Overseers happen? And what are the consequences?
>
> Regards
>
> > On 17 Jul 2015, at 19:37, Anshum Gupta  wrote:
> >
> > ideally, there shouldn't be a point where you have multiple active
> > Overseers in a single cluster
>



-- 
Anshum Gupta


Re: Programmatically find out if node is overseer

2015-07-17 Thread solr . user . 1507
Hi Anshum what do you mean by:
>ideally, there shouldn't be a point where you have multiple active
Overseers in a single cluster

How can multiple Overseers happen? And what are the consequences?

Regards

> On 17 Jul 2015, at 19:37, Anshum Gupta  wrote:
> 
> ideally, there shouldn't be a point where you have multiple active
> Overseers in a single cluster


Re: Programmatically find out if node is overseer

2015-07-17 Thread Anshum Gupta
As Shai mentioned, OVERSEERSTATUS is the most straight forward and
recommended way to go. It basically does what Erick suggested i.e. get the
first entry from '/overseer_elect/leader' in zk.

Also, ideally, there shouldn't be a point where you have multiple active
Overseers in a single cluster.

On Thu, Jul 16, 2015 at 9:36 PM, Shai Erera  wrote:

> An easier way (IMO) and more 'official' is to use the CLUSTERSTATUS (
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18
> )
> or OVERSEERSTATUS (
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api17
> )
> API.
>
> The OVERSEERSTATUS returns a 'leader' item which says who is the overseer,
> at least as far as I understand. Not sure what is returned in case there
> are multiple nodes with the overseer role.
>
> The CLUSTERSTATUS returns an 'overseer' item with all nodes that have the
> overseer role assigned. I'm usually using that API to query for the status
> of my Solr cluster.
>
> Shai
>
> On Fri, Jul 17, 2015 at 3:55 AM, Erick Erickson 
> wrote:
>
> > look at the overseer election ephemeral node in ZK, the first one in
> > line is the current overseer.
> >
> > Best,
> > Erick
> >
> > On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma
> >  wrote:
> > > Hello - i need to run a thread on a single instance of a cloud so need
> > to find out if current node is the overseer. I know we can already
> > programmatically find out if this replica is the leader of a shard via
> > isLeader(). I have looked everywhere but i cannot find an isOverseer. I
> did
> > find the election stuff but i am unsure if that is what i need to use.
> > >
> > > Any thoughts?
> > >
> > > Thanks!
> > > Markus
> >
>



-- 
Anshum Gupta


Re: Programmatically find out if node is overseer

2015-07-17 Thread Erick Erickson
good point Shai!

On Thu, Jul 16, 2015 at 9:36 PM, Shai Erera  wrote:
> An easier way (IMO) and more 'official' is to use the CLUSTERSTATUS (
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18)
> or OVERSEERSTATUS (
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api17)
> API.
>
> The OVERSEERSTATUS returns a 'leader' item which says who is the overseer,
> at least as far as I understand. Not sure what is returned in case there
> are multiple nodes with the overseer role.
>
> The CLUSTERSTATUS returns an 'overseer' item with all nodes that have the
> overseer role assigned. I'm usually using that API to query for the status
> of my Solr cluster.
>
> Shai
>
> On Fri, Jul 17, 2015 at 3:55 AM, Erick Erickson 
> wrote:
>
>> look at the overseer election ephemeral node in ZK, the first one in
>> line is the current overseer.
>>
>> Best,
>> Erick
>>
>> On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma
>>  wrote:
>> > Hello - i need to run a thread on a single instance of a cloud so need
>> to find out if current node is the overseer. I know we can already
>> programmatically find out if this replica is the leader of a shard via
>> isLeader(). I have looked everywhere but i cannot find an isOverseer. I did
>> find the election stuff but i am unsure if that is what i need to use.
>> >
>> > Any thoughts?
>> >
>> > Thanks!
>> > Markus
>>


Question about indexing html file

2015-07-17 Thread Huiying Ma
Hi everyone,

I'm a new user for solr and I need to index some html files based on the
tags and the classes and then complete a web interface to fulfill the
search document search function. Now I have some question about how to
index those html files using my own rules. I have checked the documents
online but didn't find anything useful and I'm a little lost. Could anyone
give my some help or hint? Thank you very much!

For example:
[image: Inline image 1]
I want to index the Raliegh to be a location. and similarly I have name and
organization etc.

I appreciate anyone that could give me any idea for this.

Sincerely,
Cathy


Re: DIH question: importing string containing comma-delimited list into a multiValued field

2015-07-17 Thread Shawn Heisey
On 7/17/2015 8:23 AM, Bill Au wrote:
> One of my database column is a varchar containing a comma-delimited list of
> values.  I wold like to import these values into a multiValued field.  I
> figure that I will need to write a ScriptTransformer to do that.  Is there
> a better way?

DIH provides the RegexTransformer.

https://wiki.apache.org/solr/DataImportHandler#RegexTransformer

With this transformer, you can do "splitBy" on  config elements. 
The "mailId" field in the RegexTransformer example on the wiki shows
splitting on commas.

This splitBy functionality is good if you need individual values
returned in search results.  If your goal is to make it possible to
search on those individual values without regard to how they are
returned in search results, you can break it apart at index analysis
time in your schema.  I have a fieldType in my schema that tokenizes on
semicolons, with optional whitespace.  Here's the tokenizer:

  

Thanks,
Shawn



DIH question: importing string containing comma-delimited list into a multiValued field

2015-07-17 Thread Bill Au
One of my database column is a varchar containing a comma-delimited list of
values.  I wold like to import these values into a multiValued field.  I
figure that I will need to write a ScriptTransformer to do that.  Is there
a better way?

Bill


Re: Multiple boost queries on a specific field

2015-07-17 Thread bengates
Hello,

I'm using q.alt because q=*:* provides 0 result, since it is not compatible
with the Dismax parser.
The "real terms" is irrelevant here, since I want to boost some documents,
either on the whole collection, either after defining some filter queries.
My queries have nothing to deal with a fulltext search in this case.

For the 2nd query, "close enough" is not... enough :) 
I already read the docs for this, but the relevancy here is on a
second-basis, and I want a month-basis.
I chose Solr for the ease of boosting, otherwise I would have chosen a basic
MySql database to store and query my documents. Sad to realize things are
not that easy... :(

Thanks,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678p4217829.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Analytics on Solr logs

2015-07-17 Thread Erik Hatcher
Yes, that’s how it is generally done - adding extra Solr-ignoring parameters to 
the requests.  


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Jul 17, 2015, at 5:47 AM, marotosg  wrote:
> 
> Hi,
> 
> i have a use case where We would like to know what are the users searching
> for. Most commonly used criteria etc.
> One requirement is related to the user who is searching. We need to know who
> is making each search but this is not criteria itself. It is just analysis
> information.
> 
> I was wondering what is the approach to add to the query to be logged
> analytical information which is not searchable and which is the best
> approach to do analytics on Solr.
> 
> For Instance.  I have a person collection and a user searches for Person
> name. I need to inject the user who is searching but it's is not going to
> filter by that user. I am thinking on adding this as an extra param.
> Person collection q=PersonName:Peter&*UserID:55*
> 
> Thanks for your help
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Analytics-on-Solr-logs-tp4217807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Analytics on Solr logs

2015-07-17 Thread marotosg
Hi,

i have a use case where We would like to know what are the users searching
for. Most commonly used criteria etc.
One requirement is related to the user who is searching. We need to know who
is making each search but this is not criteria itself. It is just analysis
information.

I was wondering what is the approach to add to the query to be logged
analytical information which is not searchable and which is the best
approach to do analytics on Solr.

For Instance.  I have a person collection and a user searches for Person
name. I need to inject the user who is searching but it's is not going to
filter by that user. I am thinking on adding this as an extra param.
Person collection q=PersonName:Peter&*UserID:55*

Thanks for your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analytics-on-Solr-logs-tp4217807.html
Sent from the Solr - User mailing list archive at Nabble.com.


Extracting article keywords using tf-idf algorithm

2015-07-17 Thread Ali Nazemian
Dear Lucene/Solr developers,
Hi,
I decided to develop a plugin for Solr in order to extract main keywords
from article. Since Solr already did the hard-working for calculating
tf-idf scores I decided to use that for the sake of better performance. I
know that UpdateRequestProcessor is the best suited extension point for
adding keyword value to documents. I also find out that I have not any
access to tf-idf scores inside the UpdateRequestProcessor, because of the
fact that UpdateRequestProcessor chain will be applied before the process
of calculating tf-idf scores. Hence, with consulting with Solr/Lucene
developers I decided to go for searchComponent in order to calculate
keywords based on tf-idf (Lucene Interesting Terms) on commit/optimize.
Unfortunately toward this approach, strange core behavior was observed. For
example sometimes facet wont work on this keyword field or the index
becomes unstable in search results.
I really appreciate if someone help me to make it stable.


NamedList response = new SimpleOrderedMap();
keyword.init(searcher, params);
BooleanQuery query = new BooleanQuery();
for (String fieldName : keywordSourceFields) {
  TermQuery termQuery = new TermQuery(new Term(fieldName, "noval"));
  query.add(termQuery, Occur.MUST_NOT);
}
TermQuery termQuery = new TermQuery(new Term(keywordField, "noval"));
query.add(termQuery, Occur.MUST);
RefCounted iw = null;
IndexWriter writer = null;
try {
  TopDocs results = searcher.search(query, maxNumDocs);
  ScoreDoc[] hits = results.scoreDocs;
  iw = solrCoreState.getIndexWriter(core);
  writer = iw.get();
  FieldType type = new FieldType(StringField.TYPE_STORED);
  for (int i = 0; i < hits.length; i++) {
Document document = searcher.doc(hits[i].doc);
List keywords = keyword.getKeywords(hits[i].doc);
if (keywords.size() > 0) document.removeFields(keywordField);
for (String word : keywords) {
  document.add(new Field(keywordField, word, type));
}
String uniqueKey =
searcher.getSchema().getUniqueKeyField().getName();
writer.updateDocument(new Term(uniqueKey, document.get(uniqueKey)),
document);
  }
  response.add("Number of Selected Docs", results.totalHits);
  writer.commit();
} catch (IOException | SyntaxError e) {
  throw new RuntimeException();
} finally {
  if (iw != null) {
iw.decref();
  }
}


public List getKeywords(int docId) throws SyntaxError {
String[] fields = new String[keywordSourceFields.size()];
List terms = new ArrayList();
fields = keywordSourceFields.toArray(fields);
mlt.setFieldNames(fields);
mlt.setAnalyzer(indexSearcher.getSchema().getIndexAnalyzer());
mlt.setMinTermFreq(minTermFreq);
mlt.setMinDocFreq(minDocFreq);
mlt.setMinWordLen(minWordLen);
mlt.setMaxQueryTerms(maxNumKeywords);
mlt.setMaxNumTokensParsed(maxTokensParsed);
try {

  terms = Arrays.asList(mlt.retrieveInterestingTerms(docId));
} catch (IOException e) {
  LOGGER.error(e.getMessage());
  throw new RuntimeException();
}

return terms;
  }

Best regards.
-- 
A.Nazemian