Re: Batch Update Fields

2010-12-05 Thread Adam Estrada
OK so the way I understand this is that if there is a synonym on a specific
field at index time, that value will be stored rather than the one in the
csv that I am indexing? I will give it a whirl and report back...

Thanks!
Adam

On Sat, Dec 4, 2010 at 2:27 PM, Erick Erickson erickerick...@gmail.comwrote:

 When you define your fieldType at index time. My idea
 was that you substitue these on the way in to your
 index. You may need a specific field type just for your
 country conversion Perhaps in a copyField if
 you need both the code and full name

 Best
 Erick

 On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  Synonyms eh? I have a synonym list like the following so how do I
 identify
  the synonyms on a specific field. The only place the field is used is as
 a
  facet.
 
  original field = country name
 
  AF = AFGHANISTAN
  AX = ÅLAND ISLANDS
  AL = ALBANIA
  DZ = ALGERIA
  AS = AMERICAN SAMOA
  AD = ANDORRA
  AO = ANGOLA
  AI = ANGUILLA
  AQ = ANTARCTICA
  AG = ANTIGUA AND BARBUDA
  AR = ARGENTINA
  AM = ARMENIA
  AW = ARUBA
  AU = AUSTRALIA
  AT = AUSTRIA
  etc...
 
  Any advise on that would be great and very much appreciated!
 
  Adam
 
  On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   That will certainly work. Another option, assuming the country codes
 are
   in their own field would be to put the transformations into a synonym
  file
   that was only used on that field. That way you'd get this without
 having
   to do the pre-process step of the raw data...
  
   That said, if you pre-processing is working for you it may  not be
 worth
   your while
   to worry about doing it differently
  
   Best
   Erick
  
   On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
First off...I know enough about Solr to be VERY dangerous so please
  bare
with me ;-) I am indexing the geonames database which only provides
   country
codes. I can facet the codes but to the end user who may not know all
  249
codes, it isn't really all that helpful. Therefore, I want to map the
   full
country names to the country codes provided in the geonames db.
http://download.geonames.org/export/dump/
   
http://download.geonames.org/export/dump/I used a simple split
   function
to
chop the 850 meg txt file in to manageable csv's that I can import in
  to
Solr. Now that all 7 million + documents are in there, I want to
 change
   the
country codes to the actual country names. I would of liked to have
  done
   it
in the index but finding and replacing the strings in the csv seems
 to
  be
working fine. After that I can just reindex the entire thing.
   
Adam
   
On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 Have you consider defining synonyms for your code -country
 conversion at index time (or query time for that matter)?

 We may have an XY problem here. Could you state the high-level
 problem you're trying to solve? Maybe there's a better solution...

 Best
 Erick

 On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  I wonder...I know that sed would work to find and replace the
 terms
   in
 all
  of the csv files that I am indexing but would it work to find and
replace
  key terms in the index?
 
  find C:\\tmp\\index\\data -type f -exec sed -i
 's/AF/AFGHANISTAN/g'
   {}
\;
 
  That command would iterate through all the files in the data
   directory
 and
  replace the country code with the full country name. I many just
  back
up
  the
  directory and try it. I have it running on csv files right now
 and
   it's
  working wonderfully. For those of you interested, I am indexing
 the
 entire
  Geonames dataset
 http://download.geonames.org/export/dump/(allCountries.zip)
  which gives me a pretty comprehensive world gazetteer. My next
 step
   is
  gonna
  be to display the results as KML to view over a google globe.
 
  Thoughts?
 
  Adam
 
  On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
erickerick...@gmail.com
  wrote:
 
   No, there's no equivalent to SQL update for all values in a
  column.
  You'll
   have to reindex all the documents.
  
   On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
OK part 2 of my previous question...
   
Is there a way to batch update field values based on a
 certain
  criteria?
For example, if thousands of documents have a field value of
  'US'
can
 I
update all of them to 'United States' programmatically?
   
Adam
  
 

   
  
 



Re: Batch Update Fields

2010-12-04 Thread Adam Estrada
Synonyms eh? I have a synonym list like the following so how do I identify
the synonyms on a specific field. The only place the field is used is as a
facet.

original field = country name

AF = AFGHANISTAN
AX = ÅLAND ISLANDS
AL = ALBANIA
DZ = ALGERIA
AS = AMERICAN SAMOA
AD = ANDORRA
AO = ANGOLA
AI = ANGUILLA
AQ = ANTARCTICA
AG = ANTIGUA AND BARBUDA
AR = ARGENTINA
AM = ARMENIA
AW = ARUBA
AU = AUSTRALIA
AT = AUSTRIA
etc...

Any advise on that would be great and very much appreciated!

Adam

On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.comwrote:

 That will certainly work. Another option, assuming the country codes are
 in their own field would be to put the transformations into a synonym file
 that was only used on that field. That way you'd get this without having
 to do the pre-process step of the raw data...

 That said, if you pre-processing is working for you it may  not be worth
 your while
 to worry about doing it differently

 Best
 Erick

 On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  First off...I know enough about Solr to be VERY dangerous so please bare
  with me ;-) I am indexing the geonames database which only provides
 country
  codes. I can facet the codes but to the end user who may not know all 249
  codes, it isn't really all that helpful. Therefore, I want to map the
 full
  country names to the country codes provided in the geonames db.
  http://download.geonames.org/export/dump/
 
  http://download.geonames.org/export/dump/I used a simple split
 function
  to
  chop the 850 meg txt file in to manageable csv's that I can import in to
  Solr. Now that all 7 million + documents are in there, I want to change
 the
  country codes to the actual country names. I would of liked to have done
 it
  in the index but finding and replacing the strings in the csv seems to be
  working fine. After that I can just reindex the entire thing.
 
  Adam
 
  On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Have you consider defining synonyms for your code -country
   conversion at index time (or query time for that matter)?
  
   We may have an XY problem here. Could you state the high-level
   problem you're trying to solve? Maybe there's a better solution...
  
   Best
   Erick
  
   On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
I wonder...I know that sed would work to find and replace the terms
 in
   all
of the csv files that I am indexing but would it work to find and
  replace
key terms in the index?
   
find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g'
 {}
  \;
   
That command would iterate through all the files in the data
 directory
   and
replace the country code with the full country name. I many just back
  up
the
directory and try it. I have it running on csv files right now and
 it's
working wonderfully. For those of you interested, I am indexing the
   entire
Geonames dataset
   http://download.geonames.org/export/dump/(allCountries.zip)
which gives me a pretty comprehensive world gazetteer. My next step
 is
gonna
be to display the results as KML to view over a google globe.
   
Thoughts?
   
Adam
   
On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 No, there's no equivalent to SQL update for all values in a column.
You'll
 have to reindex all the documents.

 On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  OK part 2 of my previous question...
 
  Is there a way to batch update field values based on a certain
criteria?
  For example, if thousands of documents have a field value of 'US'
  can
   I
  update all of them to 'United States' programmatically?
 
  Adam

   
  
 



Re: Batch Update Fields

2010-12-04 Thread Erick Erickson
When you define your fieldType at index time. My idea
was that you substitue these on the way in to your
index. You may need a specific field type just for your
country conversion Perhaps in a copyField if
you need both the code and full name

Best
Erick

On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 Synonyms eh? I have a synonym list like the following so how do I identify
 the synonyms on a specific field. The only place the field is used is as a
 facet.

 original field = country name

 AF = AFGHANISTAN
 AX = ÅLAND ISLANDS
 AL = ALBANIA
 DZ = ALGERIA
 AS = AMERICAN SAMOA
 AD = ANDORRA
 AO = ANGOLA
 AI = ANGUILLA
 AQ = ANTARCTICA
 AG = ANTIGUA AND BARBUDA
 AR = ARGENTINA
 AM = ARMENIA
 AW = ARUBA
 AU = AUSTRALIA
 AT = AUSTRIA
 etc...

 Any advise on that would be great and very much appreciated!

 Adam

 On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  That will certainly work. Another option, assuming the country codes are
  in their own field would be to put the transformations into a synonym
 file
  that was only used on that field. That way you'd get this without having
  to do the pre-process step of the raw data...
 
  That said, if you pre-processing is working for you it may  not be worth
  your while
  to worry about doing it differently
 
  Best
  Erick
 
  On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   First off...I know enough about Solr to be VERY dangerous so please
 bare
   with me ;-) I am indexing the geonames database which only provides
  country
   codes. I can facet the codes but to the end user who may not know all
 249
   codes, it isn't really all that helpful. Therefore, I want to map the
  full
   country names to the country codes provided in the geonames db.
   http://download.geonames.org/export/dump/
  
   http://download.geonames.org/export/dump/I used a simple split
  function
   to
   chop the 850 meg txt file in to manageable csv's that I can import in
 to
   Solr. Now that all 7 million + documents are in there, I want to change
  the
   country codes to the actual country names. I would of liked to have
 done
  it
   in the index but finding and replacing the strings in the csv seems to
 be
   working fine. After that I can just reindex the entire thing.
  
   Adam
  
   On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
Have you consider defining synonyms for your code -country
conversion at index time (or query time for that matter)?
   
We may have an XY problem here. Could you state the high-level
problem you're trying to solve? Maybe there's a better solution...
   
Best
Erick
   
On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
estrada.adam.gro...@gmail.com
 wrote:
   
 I wonder...I know that sed would work to find and replace the terms
  in
all
 of the csv files that I am indexing but would it work to find and
   replace
 key terms in the index?

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g'
  {}
   \;

 That command would iterate through all the files in the data
  directory
and
 replace the country code with the full country name. I many just
 back
   up
 the
 directory and try it. I have it running on csv files right now and
  it's
 working wonderfully. For those of you interested, I am indexing the
entire
 Geonames dataset
http://download.geonames.org/export/dump/(allCountries.zip)
 which gives me a pretty comprehensive world gazetteer. My next step
  is
 gonna
 be to display the results as KML to view over a google globe.

 Thoughts?

 Adam

 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
   erickerick...@gmail.com
 wrote:

  No, there's no equivalent to SQL update for all values in a
 column.
 You'll
  have to reindex all the documents.
 
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   OK part 2 of my previous question...
  
   Is there a way to batch update field values based on a certain
 criteria?
   For example, if thousands of documents have a field value of
 'US'
   can
I
   update all of them to 'United States' programmatically?
  
   Adam
 

   
  
 



Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma
You must reindex the complete document, even if you just want to update a 
single field.

On Friday 03 December 2010 04:52:04 Adam Estrada wrote:
 OK part 2 of my previous question...
 
 Is there a way to batch update field values based on a certain criteria?
 For example, if thousands of documents have a field value of 'US' can I
 update all of them to 'United States' programmatically?
 
 Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
No, there's no equivalent to SQL update for all values in a column. You'll
have to reindex all the documents.

On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 OK part 2 of my previous question...

 Is there a way to batch update field values based on a certain criteria?
 For example, if thousands of documents have a field value of 'US' can I
 update all of them to 'United States' programmatically?

 Adam


Re: Batch Update Fields

2010-12-03 Thread Adam Estrada
I wonder...I know that sed would work to find and replace the terms in all
of the csv files that I am indexing but would it work to find and replace
key terms in the index?

find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;

That command would iterate through all the files in the data directory and
replace the country code with the full country name. I many just back up the
directory and try it. I have it running on csv files right now and it's
working wonderfully. For those of you interested, I am indexing the entire
Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip)
which gives me a pretty comprehensive world gazetteer. My next step is gonna
be to display the results as KML to view over a google globe.

Thoughts?

Adam

On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote:

 No, there's no equivalent to SQL update for all values in a column. You'll
 have to reindex all the documents.

 On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  OK part 2 of my previous question...
 
  Is there a way to batch update field values based on a certain criteria?
  For example, if thousands of documents have a field value of 'US' can I
  update all of them to 'United States' programmatically?
 
  Adam



Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma


On Friday 03 December 2010 18:20:44 Adam Estrada wrote:
 I wonder...I know that sed would work to find and replace the terms in all
 of the csv files that I am indexing but would it work to find and replace
 key terms in the index?

It'll most likely corrupt your index. Offsets, positions etc won't have the 
proper meaning anymore.

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
 
 That command would iterate through all the files in the data directory and
 replace the country code with the full country name. I many just back up
 the directory and try it. I have it running on csv files right now and
 it's working wonderfully. For those of you interested, I am indexing the
 entire Geonames dataset http://download.geonames.org/export/dump/
 (allCountries.zip) which gives me a pretty comprehensive world gazetteer.
 My next step is gonna be to display the results as KML to view over a
 google globe.
 
 Thoughts?
 
 Adam
 
 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
erickerick...@gmail.comwrote:
  No, there's no equivalent to SQL update for all values in a column.
  You'll have to reindex all the documents.
  
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
  
   wrote:
   
   OK part 2 of my previous question...
   
   Is there a way to batch update field values based on a certain
   criteria? For example, if thousands of documents have a field value of
   'US' can I update all of them to 'United States' programmatically?
   
   Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
Have you consider defining synonyms for your code -country
conversion at index time (or query time for that matter)?

We may have an XY problem here. Could you state the high-level
problem you're trying to solve? Maybe there's a better solution...

Best
Erick

On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 I wonder...I know that sed would work to find and replace the terms in all
 of the csv files that I am indexing but would it work to find and replace
 key terms in the index?

 find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;

 That command would iterate through all the files in the data directory and
 replace the country code with the full country name. I many just back up
 the
 directory and try it. I have it running on csv files right now and it's
 working wonderfully. For those of you interested, I am indexing the entire
 Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip)
 which gives me a pretty comprehensive world gazetteer. My next step is
 gonna
 be to display the results as KML to view over a google globe.

 Thoughts?

 Adam

 On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  No, there's no equivalent to SQL update for all values in a column.
 You'll
  have to reindex all the documents.
 
  On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   OK part 2 of my previous question...
  
   Is there a way to batch update field values based on a certain
 criteria?
   For example, if thousands of documents have a field value of 'US' can I
   update all of them to 'United States' programmatically?
  
   Adam
 



Re: Batch Update Fields

2010-12-03 Thread Adam Estrada
First off...I know enough about Solr to be VERY dangerous so please bare
with me ;-) I am indexing the geonames database which only provides country
codes. I can facet the codes but to the end user who may not know all 249
codes, it isn't really all that helpful. Therefore, I want to map the full
country names to the country codes provided in the geonames db.
http://download.geonames.org/export/dump/

http://download.geonames.org/export/dump/I used a simple split function to
chop the 850 meg txt file in to manageable csv's that I can import in to
Solr. Now that all 7 million + documents are in there, I want to change the
country codes to the actual country names. I would of liked to have done it
in the index but finding and replacing the strings in the csv seems to be
working fine. After that I can just reindex the entire thing.

Adam

On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.comwrote:

 Have you consider defining synonyms for your code -country
 conversion at index time (or query time for that matter)?

 We may have an XY problem here. Could you state the high-level
 problem you're trying to solve? Maybe there's a better solution...

 Best
 Erick

 On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
 estrada.adam.gro...@gmail.com
  wrote:

  I wonder...I know that sed would work to find and replace the terms in
 all
  of the csv files that I am indexing but would it work to find and replace
  key terms in the index?
 
  find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
 
  That command would iterate through all the files in the data directory
 and
  replace the country code with the full country name. I many just back up
  the
  directory and try it. I have it running on csv files right now and it's
  working wonderfully. For those of you interested, I am indexing the
 entire
  Geonames dataset
 http://download.geonames.org/export/dump/(allCountries.zip)
  which gives me a pretty comprehensive world gazetteer. My next step is
  gonna
  be to display the results as KML to view over a google globe.
 
  Thoughts?
 
  Adam
 
  On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   No, there's no equivalent to SQL update for all values in a column.
  You'll
   have to reindex all the documents.
  
   On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
   estrada.adam.gro...@gmail.com
wrote:
  
OK part 2 of my previous question...
   
Is there a way to batch update field values based on a certain
  criteria?
For example, if thousands of documents have a field value of 'US' can
 I
update all of them to 'United States' programmatically?
   
Adam
  
 



Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
That will certainly work. Another option, assuming the country codes are
in their own field would be to put the transformations into a synonym file
that was only used on that field. That way you'd get this without having
to do the pre-process step of the raw data...

That said, if you pre-processing is working for you it may  not be worth
your while
to worry about doing it differently

Best
Erick

On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com
 wrote:

 First off...I know enough about Solr to be VERY dangerous so please bare
 with me ;-) I am indexing the geonames database which only provides country
 codes. I can facet the codes but to the end user who may not know all 249
 codes, it isn't really all that helpful. Therefore, I want to map the full
 country names to the country codes provided in the geonames db.
 http://download.geonames.org/export/dump/

 http://download.geonames.org/export/dump/I used a simple split function
 to
 chop the 850 meg txt file in to manageable csv's that I can import in to
 Solr. Now that all 7 million + documents are in there, I want to change the
 country codes to the actual country names. I would of liked to have done it
 in the index but finding and replacing the strings in the csv seems to be
 working fine. After that I can just reindex the entire thing.

 Adam

 On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Have you consider defining synonyms for your code -country
  conversion at index time (or query time for that matter)?
 
  We may have an XY problem here. Could you state the high-level
  problem you're trying to solve? Maybe there's a better solution...
 
  Best
  Erick
 
  On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada 
  estrada.adam.gro...@gmail.com
   wrote:
 
   I wonder...I know that sed would work to find and replace the terms in
  all
   of the csv files that I am indexing but would it work to find and
 replace
   key terms in the index?
  
   find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {}
 \;
  
   That command would iterate through all the files in the data directory
  and
   replace the country code with the full country name. I many just back
 up
   the
   directory and try it. I have it running on csv files right now and it's
   working wonderfully. For those of you interested, I am indexing the
  entire
   Geonames dataset
  http://download.geonames.org/export/dump/(allCountries.zip)
   which gives me a pretty comprehensive world gazetteer. My next step is
   gonna
   be to display the results as KML to view over a google globe.
  
   Thoughts?
  
   Adam
  
   On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
No, there's no equivalent to SQL update for all values in a column.
   You'll
have to reindex all the documents.
   
On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada 
estrada.adam.gro...@gmail.com
 wrote:
   
 OK part 2 of my previous question...

 Is there a way to batch update field values based on a certain
   criteria?
 For example, if thousands of documents have a field value of 'US'
 can
  I
 update all of them to 'United States' programmatically?

 Adam