Re: Batch Update Fields
OK so the way I understand this is that if there is a synonym on a specific field at index time, that value will be stored rather than the one in the csv that I am indexing? I will give it a whirl and report back... Thanks! Adam On Sat, Dec 4, 2010 at 2:27 PM, Erick Erickson erickerick...@gmail.comwrote: When you define your fieldType at index time. My idea was that you substitue these on the way in to your index. You may need a specific field type just for your country conversion Perhaps in a copyField if you need both the code and full name Best Erick On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Synonyms eh? I have a synonym list like the following so how do I identify the synonyms on a specific field. The only place the field is used is as a facet. original field = country name AF = AFGHANISTAN AX = ÅLAND ISLANDS AL = ALBANIA DZ = ALGERIA AS = AMERICAN SAMOA AD = ANDORRA AO = ANGOLA AI = ANGUILLA AQ = ANTARCTICA AG = ANTIGUA AND BARBUDA AR = ARGENTINA AM = ARMENIA AW = ARUBA AU = AUSTRALIA AT = AUSTRIA etc... Any advise on that would be great and very much appreciated! Adam On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com wrote: That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
Synonyms eh? I have a synonym list like the following so how do I identify the synonyms on a specific field. The only place the field is used is as a facet. original field = country name AF = AFGHANISTAN AX = ÅLAND ISLANDS AL = ALBANIA DZ = ALGERIA AS = AMERICAN SAMOA AD = ANDORRA AO = ANGOLA AI = ANGUILLA AQ = ANTARCTICA AG = ANTIGUA AND BARBUDA AR = ARGENTINA AM = ARMENIA AW = ARUBA AU = AUSTRALIA AT = AUSTRIA etc... Any advise on that would be great and very much appreciated! Adam On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.comwrote: That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
When you define your fieldType at index time. My idea was that you substitue these on the way in to your index. You may need a specific field type just for your country conversion Perhaps in a copyField if you need both the code and full name Best Erick On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Synonyms eh? I have a synonym list like the following so how do I identify the synonyms on a specific field. The only place the field is used is as a facet. original field = country name AF = AFGHANISTAN AX = ÅLAND ISLANDS AL = ALBANIA DZ = ALGERIA AS = AMERICAN SAMOA AD = ANDORRA AO = ANGOLA AI = ANGUILLA AQ = ANTARCTICA AG = ANTIGUA AND BARBUDA AR = ARGENTINA AM = ARMENIA AW = ARUBA AU = AUSTRALIA AT = AUSTRIA etc... Any advise on that would be great and very much appreciated! Adam On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com wrote: That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
You must reindex the complete document, even if you just want to update a single field. On Friday 03 December 2010 04:52:04 Adam Estrada wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Batch Update Fields
No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
On Friday 03 December 2010 18:20:44 Adam Estrada wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? It'll most likely corrupt your index. Offsets, positions etc won't have the proper meaning anymore. find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Batch Update Fields
Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.comwrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam