Automating Solr
Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman
Re: Automating Solr
You don't reindex Solr. You reindex data into Solr. So, this depends where you data is coming from and how often it changes. If the data does not change, no point re-indexing it. And how do you get the data into the Solr in the first place? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com wrote: Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman
Re: Automating Solr
Right, of course. The data changes every few days. According to this article, you can run a CRON Job to create a new index. http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You don't reindex Solr. You reindex data into Solr. So, this depends where you data is coming from and how often it changes. If the data does not change, no point re-indexing it. And how do you get the data into the Solr in the first place? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com wrote: Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman -- __ Craig Hoffman iChat / AIM:mountain.do __
Re: Automating Solr
The data gets into Solr via MySQL script. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com wrote: Right, of course. The data changes every few days. According to this article, you can run a CRON Job to create a new index. http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com mailto:arafa...@gmail.com wrote: You don't reindex Solr. You reindex data into Solr. So, this depends where you data is coming from and how often it changes. If the data does not change, no point re-indexing it. And how do you get the data into the Solr in the first place? Regards, Alex. Personal: http://www.outerthoughts.com/ http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 https://www.linkedin.com/groups?gid=6713853 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com mailto:mountain@gmail.com wrote: Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com http://www.craighoffmanphotography.com/ FB: www.facebook.com/CraigHoffmanPhotography http://www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman https://twitter.com/craiglhoffman -- __ Craig Hoffman iChat / AIM:mountain.do __
Re: Automating Solr
Then you have to run it again and again 30. okt. 2014 19:18 skrev Craig Hoffman mountain@gmail.com følgende: The data gets into Solr via MySQL script. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com wrote: Right, of course. The data changes every few days. According to this article, you can run a CRON Job to create a new index. http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com mailto:arafa...@gmail.com wrote: You don't reindex Solr. You reindex data into Solr. So, this depends where you data is coming from and how often it changes. If the data does not change, no point re-indexing it. And how do you get the data into the Solr in the first place? Regards, Alex. Personal: http://www.outerthoughts.com/ http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 https://www.linkedin.com/groups?gid=6713853 On 30 October 2014 13:58, Craig Hoffman mountain@gmail.com mailto: mountain@gmail.com wrote: Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com http://www.craighoffmanphotography.com/ FB: www.facebook.com/CraigHoffmanPhotography http://www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman https://twitter.com/craiglhoffman -- __ Craig Hoffman iChat / AIM:mountain.do __
Re: Automating Solr
Do you mean DataImportHandler? If so, you can create full and incremental queries and trigger them - from CRON - as often as you would like. E.g. 1am nightly. Regards, Alex. On 30 October 2014 14:17, Craig Hoffman mountain@gmail.com wrote: The data gets into Solr via MySQL script.
Re: Automating Solr
Simple add this line to your crontab with crontab -e command: 0,30 * * * * /usr/bin/wget http://solr_host:8983/solr/core_name/dataimport?command=full-import This will full import every 30 minutes. Replace solr_host and core_name with your configuration *Using delta-import command* Delta Import operation can be started by hitting the URL http://localhost:8983/solr/dataimport?command=delta-import. This operation will be started in a new thread and the status attribute in the response should be shown busy now. Depending on the size of your data set, this operation may take some time. At any time, you can hit http://localhost:8983/solr/dataimport to see the status flag. When delta-import command is executed, it reads the start time stored in conf/dataimport.properties. It uses that timestamp to run delta queries and after completion, updates the timestamp in conf/dataimport.properties. Note: there is an alternative approach for updating documents in Solr, which is in many cases more efficient and also requires less configuration explained on DataImportHandlerDeltaQueryViaFullImport. *Delta-Import Example* We will use the same example database used in the full import example. Note that the database schema has been updated and each table contains an additional column last_modified of timestamp type. You may want to download the database again since it has been updated recently. We use this timestamp field to determine what rows in each table have changed since the last indexed time. Take a look at the following data-config.xml dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document name=products entity name=item pk=ID query=select * from item deltaImportQuery=select * from item where ID='${dih.delta.id}' deltaQuery=select id from item where last_modified gt; '${dih.last_index_time}' entity name=feature pk=ITEM_ID query=select description as features from feature where item_id='${item.ID}' /entity entity name=item_category pk=ITEM_ID, CATEGORY_ID query=select CATEGORY_ID from item_category where ITEM_ID='${item.ID}' entity name=category pk=ID query=select description as cat from category where id = '${item_category.CATEGORY_ID}' /entity /entity /entity /document /dataConfig Pay attention to the deltaQuery attribute which has an SQL statement capable of detecting changes in the item table. Note the variable ${dataimporter.last_index_time} The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. -- View this message in context: http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Automating Solr
Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote: Simple add this line to your crontab with crontab -e command: 0,30 * * * * /usr/bin/wget http://solr_host:8983/solr/core_name/dataimport?command=full-import This will full import every 30 minutes. Replace solr_host and core_name with your configuration *Using delta-import command* Delta Import operation can be started by hitting the URL http://localhost:8983/solr/dataimport?command=delta-import. This operation will be started in a new thread and the status attribute in the response should be shown busy now. Depending on the size of your data set, this operation may take some time. At any time, you can hit http://localhost:8983/solr/dataimport to see the status flag. When delta-import command is executed, it reads the start time stored in conf/dataimport.properties. It uses that timestamp to run delta queries and after completion, updates the timestamp in conf/dataimport.properties. Note: there is an alternative approach for updating documents in Solr, which is in many cases more efficient and also requires less configuration explained on DataImportHandlerDeltaQueryViaFullImport. *Delta-Import Example* We will use the same example database used in the full import example. Note that the database schema has been updated and each table contains an additional column last_modified of timestamp type. You may want to download the database again since it has been updated recently. We use this timestamp field to determine what rows in each table have changed since the last indexed time. Take a look at the following data-config.xml dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document name=products entity name=item pk=ID query=select * from item deltaImportQuery=select * from item where ID='${dih.delta.id}' deltaQuery=select id from item where last_modified gt; '${dih.last_index_time}' entity name=feature pk=ITEM_ID query=select description as features from feature where item_id='${item.ID}' /entity entity name=item_category pk=ITEM_ID, CATEGORY_ID query=select CATEGORY_ID from item_category where ITEM_ID='${item.ID}' entity name=category pk=ID query=select description as cat from category where id = '${item_category.CATEGORY_ID}' /entity /entity /entity /document /dataConfig Pay attention to the deltaQuery attribute which has an SQL statement capable of detecting changes in the item table. Note the variable ${dataimporter.last_index_time} The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. -- View this message in context: http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Automating Solr
You probably just need to put double quotes around the url. On 10/30/14 15:27, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote: Simple add this line to your crontab with crontab -e command: 0,30 * * * * /usr/bin/wget http://solr_host:8983/solr/core_name/dataimport?command=full-import This will full import every 30 minutes. Replace solr_host and core_name with your configuration *Using delta-import command* Delta Import operation can be started by hitting the URL http://localhost:8983/solr/dataimport?command=delta-import. This operation will be started in a new thread and the status attribute in the response should be shown busy now. Depending on the size of your data set, this operation may take some time. At any time, you can hit http://localhost:8983/solr/dataimport to see the status flag. When delta-import command is executed, it reads the start time stored in conf/dataimport.properties. It uses that timestamp to run delta queries and after completion, updates the timestamp in conf/dataimport.properties. Note: there is an alternative approach for updating documents in Solr, which is in many cases more efficient and also requires less configuration explained on DataImportHandlerDeltaQueryViaFullImport. *Delta-Import Example* We will use the same example database used in the full import example. Note that the database schema has been updated and each table contains an additional column last_modified of timestamp type. You may want to download the database again since it has been updated recently. We use this timestamp field to determine what rows in each table have changed since the last indexed time. Take a look at the following data-config.xml dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document name=products entity name=item pk=ID query=select * from item deltaImportQuery=select * from item where ID='${dih.delta.id}' deltaQuery=select id from item where last_modified gt; '${dih.last_index_time}' entity name=feature pk=ITEM_ID query=select description as features from feature where item_id='${item.ID}' /entity entity name=item_category pk=ITEM_ID, CATEGORY_ID query=select CATEGORY_ID from item_category where ITEM_ID='${item.ID}' entity name=category pk=ID query=select description as cat from category where id = '${item_category.CATEGORY_ID}' /entity /entity /entity /document /dataConfig Pay attention to the deltaQuery attribute which has an SQL statement capable of detecting changes in the item table. Note the variable ${dataimporter.last_index_time} The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. -- View this message in context: http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Automating Solr
On 10/30/2014 1:27 PM, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true Putting the URL in quotes would work ... but if you are calling a Solr URL with /#/ in it, you're doing it wrong. URLs with /#/ in them are specifically for the admin UI. They only work properly in a browser, where javascript and AJAX are available. They will NOT work like you expect with wget, even if you get the URL escaped properly. See the cron example that Ramzi Alqrainy gave you for the proper way of requesting a full-import. Thanks, Shawn
Re: Automating Solr
Thanks everyone. I got it working. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 1:48 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/30/2014 1:27 PM, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true Putting the URL in quotes would work ... but if you are calling a Solr URL with /#/ in it, you're doing it wrong. URLs with /#/ in them are specifically for the admin UI. They only work properly in a browser, where javascript and AJAX are available. They will NOT work like you expect with wget, even if you get the URL escaped properly. See the cron example that Ramzi Alqrainy gave you for the proper way of requesting a full-import. Thanks, Shawn