Re: bulk commits
what is it that you are trying to commit? a On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: What have people found as the best way to do bulk commits either from the web or from a file on the system? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: bulk commits
,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xam.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xan.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xao.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xap.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update -H Content-Type: text/xml --data-binary 'optimize/' Adam On Thu, Dec 16, 2010 at 1:44 PM, Dennis Gearon gear...@sbcglobal.netwrote: Might be Csv or tab delimited text. Sent from Yahoo! Mail on Android -- * From: * Adam Estrada estrada.adam.gro...@gmail.com; * To: * solr-user@lucene.apache.org; * Subject: * Re: bulk commits * Sent: * Thu, Dec 16, 2010 6:35:17 PM what is it that you are trying to commit? a On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon gear...@sbcglobal.net wrote: What have people found as the best way to do bulk commits either from the web or from a file on the system? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: bulk commits
That easy, huh? Heck, this gets better and better. BTW, how about escaping? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: Dennis Gearon gear...@sbcglobal.net; solr-user@lucene.apache.org Sent: Thu, December 16, 2010 10:58:47 AM Subject: Re: bulk commits This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 This seems to load everything in to some kind of temporary location before it's actually committed. If something goes wrong there is a rollback feature that will undo anything that happened before the commit. As far as batching a bunch of files, I copied and pasted the following in to Cygwin and it worked just fine. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xab.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xac.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xad.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xae.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xaf.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xag.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xah.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xai.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xaj.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code
Re: bulk commits
On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon gear...@sbcglobal.net wrote: That easy, huh? Heck, this gets better and better. BTW, how about escaping? The CSV escaping? It's configurable to allow for loading different CSV dialects. http://wiki.apache.org/solr/UpdateCSV By default it uses double quote encapsulation, like excel would. The bottom of the wiki page shows how to configure tab separators and backslash escaping like MySQL produces by default. -Yonik http://www.lucidimagination.com Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: Dennis Gearon gear...@sbcglobal.net; solr-user@lucene.apache.org Sent: Thu, December 16, 2010 10:58:47 AM Subject: Re: bulk commits This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 This seems to load everything in to some kind of temporary location before it's actually committed. If something goes wrong there is a rollback feature that will undo anything that happened before the commit. As far as batching a bunch of files, I copied and pasted the following in to Cygwin and it worked just fine. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xab.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xac.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xad.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xae.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xaf.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xag.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xah.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xai.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv
Re: bulk commits
One very important thing I forgot to mention is that you will have to increase the JAVA heap size for larger data sets. Set JAVA_OPT to something acceptable. Adam On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon gear...@sbcglobal.net wrote: That easy, huh? Heck, this gets better and better. BTW, how about escaping? The CSV escaping? It's configurable to allow for loading different CSV dialects. http://wiki.apache.org/solr/UpdateCSV By default it uses double quote encapsulation, like excel would. The bottom of the wiki page shows how to configure tab separators and backslash escaping like MySQL produces by default. -Yonik http://www.lucidimagination.com Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: Dennis Gearon gear...@sbcglobal.net; solr-user@lucene.apache.org Sent: Thu, December 16, 2010 10:58:47 AM Subject: Re: bulk commits This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 This seems to load everything in to some kind of temporary location before it's actually committed. If something goes wrong there is a rollback feature that will undo anything that happened before the commit. As far as batching a bunch of files, I copied and pasted the following in to Cygwin and it worked just fine. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xab.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xac.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xad.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xae.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xaf.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xag.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xah.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr
Re: bulk commits
Thanks Adam! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Thu, 12/16/10, Adam Estrada estrada.a...@gmail.com wrote: From: Adam Estrada estrada.a...@gmail.com Subject: Re: bulk commits To: solr-user@lucene.apache.org Date: Thursday, December 16, 2010, 6:18 PM One very important thing I forgot to mention is that you will have to increase the JAVA heap size for larger data sets. Set JAVA_OPT to something acceptable. Adam On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon gear...@sbcglobal.net wrote: That easy, huh? Heck, this gets better and better. BTW, how about escaping? The CSV escaping? It's configurable to allow for loading different CSV dialects. http://wiki.apache.org/solr/UpdateCSV By default it uses double quote encapsulation, like excel would. The bottom of the wiki page shows how to configure tab separators and backslash escaping like MySQL produces by default. -Yonik http://www.lucidimagination.com Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: Dennis Gearon gear...@sbcglobal.net; solr-user@lucene.apache.org Sent: Thu, December 16, 2010 10:58:47 AM Subject: Re: bulk commits This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 This seems to load everything in to some kind of temporary location before it's actually committed. If something goes wrong there is a rollback feature that will undo anything that happened before the commit. As far as batching a bunch of files, I copied and pasted the following in to Cygwin and it worked just fine. curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,catstream.file=C :\tmp\cities1000.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xab.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xac.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xad.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xae.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983/solr/update/csv?commit=trueseparator=%2Cfieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdatestream.file=C :\tmp\xaf.csvoverwrite=truestream.contentType=text/plain;charset=utf-8 curl http://localhost:8983