Automating Solr

2014-10-30 Thread Craig Hoffman
Simple question:
What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? 

Thanks,
Craig
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman















Re: Automating Solr

2014-10-30 Thread Alexandre Rafalovitch
You don't reindex Solr. You reindex data into Solr. So, this depends
where you data is coming from and how often it changes. If the data
does not change, no point re-indexing it. And how do you get the data
into the Solr in the first place?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com wrote:
 Simple question:
 What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script?

 Thanks,
 Craig
 --
 Craig Hoffman
 w: http://www.craighoffmanphotography.com
 FB: www.facebook.com/CraigHoffmanPhotography
 TW: https://twitter.com/craiglhoffman















Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Right, of course. The data changes every few days. According to this
article, you can run a CRON Job to create a new index.
http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips

On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 You don't reindex Solr. You reindex data into Solr. So, this depends
 where you data is coming from and how often it changes. If the data
 does not change, no point re-indexing it. And how do you get the data
 into the Solr in the first place?

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com wrote:
  Simple question:
  What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl
 Script?
 
  Thanks,
  Craig
  --
  Craig Hoffman
  w: http://www.craighoffmanphotography.com
  FB: www.facebook.com/CraigHoffmanPhotography
  TW: https://twitter.com/craiglhoffman
 
 
 
 
 
 
 
 
 
 
 
 
 




-- 
__
Craig Hoffman
iChat / AIM:mountain.do
__


Re: Automating Solr

2014-10-30 Thread Craig Hoffman
The data gets into Solr via MySQL script.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













 On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com wrote:
 
 Right, of course. The data changes every few days. According to this article, 
 you can run a CRON Job to create a new index.
 http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips 
 http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips
 
 On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com 
 mailto:arafa...@gmail.com wrote:
 You don't reindex Solr. You reindex data into Solr. So, this depends
 where you data is coming from and how often it changes. If the data
 does not change, no point re-indexing it. And how do you get the data
 into the Solr in the first place?
 
 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ http://www.outerthoughts.com/ and 
 @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ 
 http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 
 https://www.linkedin.com/groups?gid=6713853
 
 
 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com 
 mailto:mountain@gmail.com wrote:
  Simple question:
  What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl 
  Script?
 
  Thanks,
  Craig
  --
  Craig Hoffman
  w: http://www.craighoffmanphotography.com 
  http://www.craighoffmanphotography.com/
  FB: www.facebook.com/CraigHoffmanPhotography 
  http://www.facebook.com/CraigHoffmanPhotography
  TW: https://twitter.com/craiglhoffman https://twitter.com/craiglhoffman
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 __
 Craig Hoffman 
 iChat / AIM:mountain.do
 __



Re: Automating Solr

2014-10-30 Thread Håvard Wahl Kongsgård
Then you have to run it again and again
30. okt. 2014 19:18 skrev Craig Hoffman mountain@gmail.com følgende:

 The data gets into Solr via MySQL script.
 --
 Craig Hoffman
 w: http://www.craighoffmanphotography.com
 FB: www.facebook.com/CraigHoffmanPhotography
 TW: https://twitter.com/craiglhoffman













  On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com
 wrote:
 
  Right, of course. The data changes every few days. According to this
 article, you can run a CRON Job to create a new index.
  http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips 
 http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips
 
  On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch 
 arafa...@gmail.com mailto:arafa...@gmail.com wrote:
  You don't reindex Solr. You reindex data into Solr. So, this depends
  where you data is coming from and how often it changes. If the data
  does not change, no point re-indexing it. And how do you get the data
  into the Solr in the first place?
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ http://www.outerthoughts.com/
 and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ 
 http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com mailto:
 mountain@gmail.com wrote:
   Simple question:
   What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl
 Script?
  
   Thanks,
   Craig
   --
   Craig Hoffman
   w: http://www.craighoffmanphotography.com 
 http://www.craighoffmanphotography.com/
   FB: www.facebook.com/CraigHoffmanPhotography 
 http://www.facebook.com/CraigHoffmanPhotography
   TW: https://twitter.com/craiglhoffman 
 https://twitter.com/craiglhoffman
  
  
  
  
  
  
  
  
  
  
  
  
  
 
 
 
  --
  __
  Craig Hoffman
  iChat / AIM:mountain.do
  __




Re: Automating Solr

2014-10-30 Thread Alexandre Rafalovitch
Do you mean DataImportHandler? If so, you can create full and
incremental queries and trigger them - from CRON - as often as you
would like. E.g. 1am nightly.

Regards,
   Alex.
On 30 October 2014 14:17, Craig Hoffman mountain@gmail.com wrote:
 The data gets into Solr via MySQL script.


Re: Automating Solr

2014-10-30 Thread Ramzi Alqrainy
Simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget
http://solr_host:8983/solr/core_name/dataimport?command=full-import 

This will full import every 30 minutes. Replace solr_host and core_name
with your configuration

*Using delta-import command*

Delta Import operation can be started by hitting the URL
http://localhost:8983/solr/dataimport?command=delta-import. This operation
will be started in a new thread and the status attribute in the response
should be shown busy now. Depending on the size of your data set, this
operation may take some time. At any time, you can hit
http://localhost:8983/solr/dataimport to see the status flag.

When delta-import command is executed, it reads the start time stored in
conf/dataimport.properties. It uses that timestamp to run delta queries and
after completion, updates the timestamp in conf/dataimport.properties.

Note: there is an alternative approach for updating documents in Solr, which
is in many cases more efficient and also requires less configuration
explained on DataImportHandlerDeltaQueryViaFullImport.

*Delta-Import Example*

We will use the same example database used in the full import example. Note
that the database schema has been updated and each table contains an
additional column last_modified of timestamp type. You may want to download
the database again since it has been updated recently. We use this timestamp
field to determine what rows in each table have changed since the last
indexed time.

Take a look at the following data-config.xml


dataConfig
dataSource driver=org.hsqldb.jdbcDriver
url=jdbc:hsqldb:/temp/example/ex user=sa /
document name=products
entity name=item pk=ID
query=select * from item
deltaImportQuery=select * from item where
ID='${dih.delta.id}'
deltaQuery=select id from item where last_modified gt;
'${dih.last_index_time}'
entity name=feature pk=ITEM_ID
query=select description as features from feature where
item_id='${item.ID}'
/entity
entity name=item_category pk=ITEM_ID, CATEGORY_ID
query=select CATEGORY_ID from item_category where
ITEM_ID='${item.ID}'
entity name=category pk=ID
   query=select description as cat from category where
id = '${item_category.CATEGORY_ID}'
/entity
/entity
/entity
/document
/dataConfig
Pay attention to the deltaQuery attribute which has an SQL statement capable
of detecting changes in the item table. Note the variable
${dataimporter.last_index_time} The DataImportHandler exposes a variable
called last_index_time which is a timestamp value denoting the last time
full-import 'or' delta-import was run. You can use this variable anywhere in
the SQL you write in data-config.xml and it will be replaced by the value
during processing.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Thanks! One more question. WGET seems to choking on a my URL in particular the 
# and the  character . What’s the best method escaping? 

http://My Host 
:8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













 On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote:
 
 Simple add this line to your crontab with crontab -e command:
 
 0,30 * * * * /usr/bin/wget
 http://solr_host:8983/solr/core_name/dataimport?command=full-import 
 
 This will full import every 30 minutes. Replace solr_host and core_name
 with your configuration
 
 *Using delta-import command*
 
 Delta Import operation can be started by hitting the URL
 http://localhost:8983/solr/dataimport?command=delta-import. This operation
 will be started in a new thread and the status attribute in the response
 should be shown busy now. Depending on the size of your data set, this
 operation may take some time. At any time, you can hit
 http://localhost:8983/solr/dataimport to see the status flag.
 
 When delta-import command is executed, it reads the start time stored in
 conf/dataimport.properties. It uses that timestamp to run delta queries and
 after completion, updates the timestamp in conf/dataimport.properties.
 
 Note: there is an alternative approach for updating documents in Solr, which
 is in many cases more efficient and also requires less configuration
 explained on DataImportHandlerDeltaQueryViaFullImport.
 
 *Delta-Import Example*
 
 We will use the same example database used in the full import example. Note
 that the database schema has been updated and each table contains an
 additional column last_modified of timestamp type. You may want to download
 the database again since it has been updated recently. We use this timestamp
 field to determine what rows in each table have changed since the last
 indexed time.
 
 Take a look at the following data-config.xml
 
 
 dataConfig
dataSource driver=org.hsqldb.jdbcDriver
 url=jdbc:hsqldb:/temp/example/ex user=sa /
document name=products
entity name=item pk=ID
query=select * from item
deltaImportQuery=select * from item where
 ID='${dih.delta.id}'
deltaQuery=select id from item where last_modified gt;
 '${dih.last_index_time}'
entity name=feature pk=ITEM_ID
query=select description as features from feature where
 item_id='${item.ID}'
/entity
entity name=item_category pk=ITEM_ID, CATEGORY_ID
query=select CATEGORY_ID from item_category where
 ITEM_ID='${item.ID}'
entity name=category pk=ID
   query=select description as cat from category where
 id = '${item_category.CATEGORY_ID}'
/entity
/entity
/entity
/document
 /dataConfig
 Pay attention to the deltaQuery attribute which has an SQL statement capable
 of detecting changes in the item table. Note the variable
 ${dataimporter.last_index_time} The DataImportHandler exposes a variable
 called last_index_time which is a timestamp value denoting the last time
 full-import 'or' delta-import was run. You can use this variable anywhere in
 the SQL you write in data-config.xml and it will be replaced by the value
 during processing.
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Automating Solr

2014-10-30 Thread Michael Della Bitta

You probably just need to put double quotes around the url.


On 10/30/14 15:27, Craig Hoffman wrote:

Thanks! One more question. WGET seems to choking on a my URL in particular the # 
and the  character . What’s the best method escaping?

http://My Host 
:8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman














On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote:

Simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget
http://solr_host:8983/solr/core_name/dataimport?command=full-import

This will full import every 30 minutes. Replace solr_host and core_name
with your configuration

*Using delta-import command*

Delta Import operation can be started by hitting the URL
http://localhost:8983/solr/dataimport?command=delta-import. This operation
will be started in a new thread and the status attribute in the response
should be shown busy now. Depending on the size of your data set, this
operation may take some time. At any time, you can hit
http://localhost:8983/solr/dataimport to see the status flag.

When delta-import command is executed, it reads the start time stored in
conf/dataimport.properties. It uses that timestamp to run delta queries and
after completion, updates the timestamp in conf/dataimport.properties.

Note: there is an alternative approach for updating documents in Solr, which
is in many cases more efficient and also requires less configuration
explained on DataImportHandlerDeltaQueryViaFullImport.

*Delta-Import Example*

We will use the same example database used in the full import example. Note
that the database schema has been updated and each table contains an
additional column last_modified of timestamp type. You may want to download
the database again since it has been updated recently. We use this timestamp
field to determine what rows in each table have changed since the last
indexed time.

Take a look at the following data-config.xml


dataConfig
dataSource driver=org.hsqldb.jdbcDriver
url=jdbc:hsqldb:/temp/example/ex user=sa /
document name=products
entity name=item pk=ID
query=select * from item
deltaImportQuery=select * from item where
ID='${dih.delta.id}'
deltaQuery=select id from item where last_modified gt;
'${dih.last_index_time}'
entity name=feature pk=ITEM_ID
query=select description as features from feature where
item_id='${item.ID}'
/entity
entity name=item_category pk=ITEM_ID, CATEGORY_ID
query=select CATEGORY_ID from item_category where
ITEM_ID='${item.ID}'
entity name=category pk=ID
   query=select description as cat from category where
id = '${item_category.CATEGORY_ID}'
/entity
/entity
/entity
/document
/dataConfig
Pay attention to the deltaQuery attribute which has an SQL statement capable
of detecting changes in the item table. Note the variable
${dataimporter.last_index_time} The DataImportHandler exposes a variable
called last_index_time which is a timestamp value denoting the last time
full-import 'or' delta-import was run. You can use this variable anywhere in
the SQL you write in data-config.xml and it will be replaced by the value
during processing.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Automating Solr

2014-10-30 Thread Shawn Heisey
On 10/30/2014 1:27 PM, Craig Hoffman wrote:
 Thanks! One more question. WGET seems to choking on a my URL in particular 
 the # and the  character . What’s the best method escaping? 

 http://My Host 
 :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true

Putting the URL in quotes would work ... but if you are calling a Solr
URL with /#/ in it, you're doing it wrong.

URLs with /#/ in them are specifically for the admin UI.  They only work
properly in a browser, where javascript and AJAX are available.  They
will NOT work like you expect with wget, even if you get the URL escaped
properly.

See the cron example that Ramzi Alqrainy gave you for the proper way of
requesting a full-import.

Thanks,
Shawn



Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Thanks everyone. I got it working.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













 On Oct 30, 2014, at 1:48 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 10/30/2014 1:27 PM, Craig Hoffman wrote:
 Thanks! One more question. WGET seems to choking on a my URL in particular 
 the # and the  character . What’s the best method escaping? 
 
 http://My Host 
 :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true
 
 Putting the URL in quotes would work ... but if you are calling a Solr
 URL with /#/ in it, you're doing it wrong.
 
 URLs with /#/ in them are specifically for the admin UI.  They only work
 properly in a browser, where javascript and AJAX are available.  They
 will NOT work like you expect with wget, even if you get the URL escaped
 properly.
 
 See the cron example that Ramzi Alqrainy gave you for the proper way of
 requesting a full-import.
 
 Thanks,
 Shawn