Re: DataImportHandler scheduling

2015-09-01 Thread Troy Edwards
My initial thought was to use scheduling built with DIH:
http://wiki.apache.org/solr/DataImportHandler#Scheduling

But I think just a cron job should do the same for me.

Thanks

On Tue, Sep 1, 2015 at 8:51 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> On 8/31/2015 11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
> Troy, is your question how to use scheduled tasks?   Shawn pointed you to
> the right direction.   I thought it more likely that you want to schedule a
> cron task to run on any of your servers running SolrCloud, and you want the
> job to run even if the cluster is degraded.
>
> Here's an idea - schedule your job Monday on node 1, Tuesday on node 2,
> etc.   That way, if the cluster is degraded (a node is down),
> re-indexing/delta indexing still happens, it just happens slower.You
> can certainly write a zookeeper client to make each cron job compete to see
> who does the job - questions on how to do this should be directed to a
> zookeeper users' mailing list.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Monday, August 31, 2015 7:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler scheduling
>
> On 8/31/2015 11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
> Every modern operating system (and most of the previous versions of every
> modern OS) has a built-in task scheduling system.  For Windows, it's
> literally called Task Scheduler.  For most other operating systems, it's
> called cron.
>
> Including dataimport scheduling capability in Solr has been discussed, and
> I think someone even wrote a working version ... but since every OS already
> has scheduling capability that has had years of time to mature, why should
> Solr reinvent the wheel and take the risk that the implementation will have
> bugs?
>
> Currently virtually all updates to Solr's index must be initiated outside
> of Solr, and there is good reason to make sure that Solr doesn't ever
> modify the index without outside input.  The only thing I know of right now
> that can update the index automatically is Document Expiration, but the
> expiration time is decided when the document is indexed, and the original
> indexing action is external to Solr.
>
> https://lucidworks.com/blog/document-expiration/
>
> Thanks,
> Shawn
>
>


Re: DataImportHandler scheduling

2015-09-01 Thread Shawn Heisey
On 9/1/2015 11:45 AM, Troy Edwards wrote:
> My initial thought was to use scheduling built with DIH:
> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>
> But I think just a cron job should do the same for me.

The dataimport scheduler does not exist in any Solr version.  This is a
proposed feature, with the enhancement issue open for more than four years:

https://issues.apache.org/jira/browse/SOLR-2305

I have updated the wiki page to state the fact that the scheduler is a
proposed improvement, not a usable feature.

Thanks,
Shawn



Re: DataImportHandler scheduling

2015-09-01 Thread William Bell
We should add a simple scheduler in the UI. It is very useful. To schedule
various actions:

- Full index
- Delta Index
- Replicate




On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/1/2015 11:45 AM, Troy Edwards wrote:
> > My initial thought was to use scheduling built with DIH:
> > http://wiki.apache.org/solr/DataImportHandler#Scheduling
> >
> > But I think just a cron job should do the same for me.
>
> The dataimport scheduler does not exist in any Solr version.  This is a
> proposed feature, with the enhancement issue open for more than four years:
>
> https://issues.apache.org/jira/browse/SOLR-2305
>
> I have updated the wiki page to state the fact that the scheduler is a
> proposed improvement, not a usable feature.
>
> Thanks,
> Shawn
>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: DataImportHandler scheduling

2015-09-01 Thread Kevin Lee
While it may be useful to have a scheduler for simple cases, I think there are 
too many variables to make it useful for everyone's case.  For example, I 
recently wrote a script that uses the data import handler api to get the 
status, kick off the import, etc.  However, before allowing it to just kick 
off, I needed to query the database where the data was coming from to make sure 
it had finished it's daily load and then if it hadn't finished, wait for awhile 
to see if it would, then the script could do the load.  After the load is 
finished it does another check to ensure the expected number of docs was 
actually loaded by Solr based on the data from the database.

If a scheduler were built into Solr it probably would only cover the simple 
case and for production you'd probably need to write your own scripts and use 
your own scheduler anyways to ensure the loads are starting/completing as 
expected.

> On Sep 1, 2015, at 1:09 PM, William Bell <billnb...@gmail.com> wrote:
> 
> We should add a simple scheduler in the UI. It is very useful. To schedule
> various actions:
> 
> - Full index
> - Delta Index
> - Replicate
> 
> 
> 
> 
>> On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>> 
>>> On 9/1/2015 11:45 AM, Troy Edwards wrote:
>>> My initial thought was to use scheduling built with DIH:
>>> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>>> 
>>> But I think just a cron job should do the same for me.
>> 
>> The dataimport scheduler does not exist in any Solr version.  This is a
>> proposed feature, with the enhancement issue open for more than four years:
>> 
>> https://issues.apache.org/jira/browse/SOLR-2305
>> 
>> I have updated the wiki page to state the fact that the scheduler is a
>> proposed improvement, not a usable feature.
>> 
>> Thanks,
>> Shawn
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


RE: DataImportHandler scheduling

2015-09-01 Thread Davis, Daniel (NIH/NLM) [C]
On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Troy, is your question how to use scheduled tasks?   Shawn pointed you to the 
right direction.   I thought it more likely that you want to schedule a cron 
task to run on any of your servers running SolrCloud, and you want the job to 
run even if the cluster is degraded.   

Here's an idea - schedule your job Monday on node 1, Tuesday on node 2, etc.   
That way, if the cluster is degraded (a node is down), re-indexing/delta 
indexing still happens, it just happens slower.You can certainly write a 
zookeeper client to make each cron job compete to see who does the job - 
questions on how to do this should be directed to a zookeeper users' mailing 
list.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, August 31, 2015 7:50 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler scheduling

On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Every modern operating system (and most of the previous versions of every 
modern OS) has a built-in task scheduling system.  For Windows, it's literally 
called Task Scheduler.  For most other operating systems, it's called cron.

Including dataimport scheduling capability in Solr has been discussed, and I 
think someone even wrote a working version ... but since every OS already has 
scheduling capability that has had years of time to mature, why should Solr 
reinvent the wheel and take the risk that the implementation will have bugs?

Currently virtually all updates to Solr's index must be initiated outside of 
Solr, and there is good reason to make sure that Solr doesn't ever modify the 
index without outside input.  The only thing I know of right now that can 
update the index automatically is Document Expiration, but the expiration time 
is decided when the document is indexed, and the original indexing action is 
external to Solr.

https://lucidworks.com/blog/document-expiration/

Thanks,
Shawn



DataImportHandler scheduling

2015-08-31 Thread Troy Edwards
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
requirement that the DIH should be initiated at a specific time Monday
through Friday.

Thanks!


Re: DataImportHandler scheduling

2015-08-31 Thread Ahmet Arslan
Hi Troy,

I think folks use corncobs (with curl utility) provided by the Operating System.

Ahmet



On Monday, August 31, 2015 8:26 PM, Troy Edwards <tedwards415...@gmail.com> 
wrote:
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
requirement that the DIH should be initiated at a specific time Monday
through Friday.

Thanks!


RE: DataImportHandler scheduling

2015-08-31 Thread Davis, Daniel (NIH/NLM) [C]
So, I think corncobs is not a utility, but a pattern - you have cron run curl 
to invoke something on your web application on the localhost (and elsewhere), 
and it runs the job if the job needs running, thus the webapp keeps the state.

There's a utility cronlock (https://github.com/kvz/cronlock) that runs on top 
of Redis.   I was thinking that a common pattern would be something similar 
written in python using the kazoo module to dialog with zookeeper.   No point 
writing much Java for a cronjob, but python should be OK.   What I don't like 
about cronlock is that it isn't "run once", but instead avoids overlap, so 
there's good reason to write something specific to that case.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Monday, August 31, 2015 1:35 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler scheduling

Hi Troy,

I think folks use corncobs (with curl utility) provided by the Operating System.

Ahmet



On Monday, August 31, 2015 8:26 PM, Troy Edwards <tedwards415...@gmail.com> 
wrote:
I am having a hard time finding documentation on DataImportHandler scheduling 
in SolrCloud. Can someone please post a link to that? I have a requirement that 
the DIH should be initiated at a specific time Monday through Friday.

Thanks!


Re: DataImportHandler scheduling

2015-08-31 Thread Shawn Heisey
On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler
> scheduling in SolrCloud. Can someone please post a link to that? I have a
> requirement that the DIH should be initiated at a specific time Monday
> through Friday.

Every modern operating system (and most of the previous versions of
every modern OS) has a built-in task scheduling system.  For Windows,
it's literally called Task Scheduler.  For most other operating systems,
it's called cron.

Including dataimport scheduling capability in Solr has been discussed,
and I think someone even wrote a working version ... but since every OS
already has scheduling capability that has had years of time to mature,
why should Solr reinvent the wheel and take the risk that the
implementation will have bugs?

Currently virtually all updates to Solr's index must be initiated
outside of Solr, and there is good reason to make sure that Solr doesn't
ever modify the index without outside input.  The only thing I know of
right now that can update the index automatically is Document
Expiration, but the expiration time is decided when the document is
indexed, and the original indexing action is external to Solr.

https://lucidworks.com/blog/document-expiration/

Thanks,
Shawn



Weird memory leak problem with dataimporthandler scheduling

2012-04-03 Thread janne mattila
I have implemented dataimporthandler scheduling based on
http://wiki.apache.org/solr/DataImportHandler#Scheduling. It
periodically triggers full and delta updates. I'm unpacking the
original solr.war, adding a few scheduling-related classes such as
ApplicationListener etc (I have modified the example a lot) and
repacking the web application.

The scheduling works fine, but when I undeploy solr web application,
Tomcat gives errors about ThreadLocals that were not cleared:

SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.handler.dataimport.DataImporter$2] (value
[org.apache.solr.
handler.dataimport.DataImporter$2@b0e2096]) and a value of type
[java.util.concurrent.atomic.AtomicLong] (value [2]) but failed to
remove it when the web applic
ation was stopped. Threads are going to be renewed over time to try
and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.handler.dataimport.DataImporter$3] (value
[org.apache.solr.
handler.dataimport.DataImporter$3@4c7d5d85]) and a value of type
[java.text.SimpleDateFormat] (value
[java.text.SimpleDateFormat@4f76f1a0]) but failed to remove
 it when the web application was stopped. Threads are going to be
renewed over time to try and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [java.lang.ThreadLocal] (value
[java.lang.ThreadLocal@3a86edfe]) and a value
 of type [org.apache.solr.handler.dataimport.ContextImpl] (value
[org.apache.solr.handler.dataimport.ContextImpl@7072dcb6]) but failed
to remove it when the web
 application was stopped. Threads are going to be renewed over time to
try and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat]
(value [org.apache.
solr.schema.DateField$ThreadLocalDateFormat@4f86a67]) and a value of
type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.
schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to
remove it when the web application was stopped. Threads are going to
be renewed over time t
o try and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.handler.dataimport.DataImporter$2] (value
[org.apache.solr.
handler.dataimport.DataImporter$2@b0e2096]) and a value of type
[java.util.concurrent.atomic.AtomicLong] (value [2]) but failed to
remove it when the web applic
ation was stopped. Threads are going to be renewed over time to try
and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [java.lang.ThreadLocal] (value
[java.lang.ThreadLocal@3a86edfe]) and a value
 of type [org.apache.solr.handler.dataimport.ContextImpl] (value
[org.apache.solr.handler.dataimport.ContextImpl@511192bd]) but failed
to remove it when the web
 application was stopped. Threads are going to be renewed over time to
try and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.handler.dataimport.DataImporter$3] (value
[org.apache.solr.
handler.dataimport.DataImporter$3@4c7d5d85]) and a value of type
[java.text.SimpleDateFormat] (value
[java.text.SimpleDateFormat@4f76f1a0]) but failed to remove
 it when the web application was stopped. Threads are going to be
renewed over time to try and avoid a probable memory leak.
3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/my-solr] created a ThreadLocal with key
of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat]
(value [org.apache.
solr.schema.DateField$ThreadLocalDateFormat@4f86a67]) and a value of
type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.
schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to
remove it when the web application was stopped. Threads are going to
be renewed over time t
o try and avoid a probable memory leak.

I have rechecked my code to make sure it should not have any memory
leaks. I have identified the cause to method:

private void sendHttpPost(String completeUrl, String coreName) {
HttpClient client = new HttpClient();
PostMethod method = new PostMethod

Re: Weird memory leak problem with dataimporthandler scheduling

2012-04-03 Thread janne mattila
OK. Just typing out the question fixed it.

Changing from post to get:

GetMethod method = new GetMethod(completeUrl);

removed the errors. The reason, I cannot explain...

On Tue, Apr 3, 2012 at 6:46 PM, janne mattila
jannepostilis...@gmail.com wrote:
 I have implemented dataimporthandler scheduling based on
 http://wiki.apache.org/solr/DataImportHandler#Scheduling. It
 periodically triggers full and delta updates. I'm unpacking the
 original solr.war, adding a few scheduling-related classes such as
 ApplicationListener etc (I have modified the example a lot) and
 repacking the web application.

 The scheduling works fine, but when I undeploy solr web application,
 Tomcat gives errors about ThreadLocals that were not cleared:

 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.handler.dataimport.DataImporter$2] (value
 [org.apache.solr.
 handler.dataimport.DataImporter$2@b0e2096]) and a value of type
 [java.util.concurrent.atomic.AtomicLong] (value [2]) but failed to
 remove it when the web applic
 ation was stopped. Threads are going to be renewed over time to try
 and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.handler.dataimport.DataImporter$3] (value
 [org.apache.solr.
 handler.dataimport.DataImporter$3@4c7d5d85]) and a value of type
 [java.text.SimpleDateFormat] (value
 [java.text.SimpleDateFormat@4f76f1a0]) but failed to remove
  it when the web application was stopped. Threads are going to be
 renewed over time to try and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [java.lang.ThreadLocal] (value
 [java.lang.ThreadLocal@3a86edfe]) and a value
  of type [org.apache.solr.handler.dataimport.ContextImpl] (value
 [org.apache.solr.handler.dataimport.ContextImpl@7072dcb6]) but failed
 to remove it when the web
  application was stopped. Threads are going to be renewed over time to
 try and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat]
 (value [org.apache.
 solr.schema.DateField$ThreadLocalDateFormat@4f86a67]) and a value of
 type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
 (value [org.apache.solr.
 schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to
 remove it when the web application was stopped. Threads are going to
 be renewed over time t
 o try and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.handler.dataimport.DataImporter$2] (value
 [org.apache.solr.
 handler.dataimport.DataImporter$2@b0e2096]) and a value of type
 [java.util.concurrent.atomic.AtomicLong] (value [2]) but failed to
 remove it when the web applic
 ation was stopped. Threads are going to be renewed over time to try
 and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [java.lang.ThreadLocal] (value
 [java.lang.ThreadLocal@3a86edfe]) and a value
  of type [org.apache.solr.handler.dataimport.ContextImpl] (value
 [org.apache.solr.handler.dataimport.ContextImpl@511192bd]) but failed
 to remove it when the web
  application was stopped. Threads are going to be renewed over time to
 try and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.handler.dataimport.DataImporter$3] (value
 [org.apache.solr.
 handler.dataimport.DataImporter$3@4c7d5d85]) and a value of type
 [java.text.SimpleDateFormat] (value
 [java.text.SimpleDateFormat@4f76f1a0]) but failed to remove
  it when the web application was stopped. Threads are going to be
 renewed over time to try and avoid a probable memory leak.
 3.4.2012 18:36:49 org.apache.catalina.loader.WebappClassLoader
 checkThreadLocalMapForLeaks
 SEVERE: The web application [/my-solr] created a ThreadLocal with key
 of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat]
 (value [org.apache.
 solr.schema.DateField$ThreadLocalDateFormat@4f86a67]) and a value of
 type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
 (value [org.apache.solr.
 schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to
 remove it when the web application was stopped. Threads are going