One very important thing I forgot to mention is that you will have to increase the JAVA heap size for larger data sets.
Set JAVA_OPT to something acceptable. Adam On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley <yo...@lucidimagination.com>wrote: > On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon <gear...@sbcglobal.net> > wrote: > > That easy, huh? Heck, this gets better and better. > > > > BTW, how about escaping? > > The CSV escaping? It's configurable to allow for loading different > CSV dialects. > > http://wiki.apache.org/solr/UpdateCSV > > By default it uses double quote encapsulation, like excel would. > The bottom of the wiki page shows how to configure tab separators and > backslash escaping like MySQL produces by default. > > -Yonik > http://www.lucidimagination.com > > > > > > Dennis Gearon > > > > > > Signature Warning > > ---------------- > > It is always a good idea to learn from your own mistakes. It is usually a > better > > idea to learn from others’ mistakes, so you do not have to make them > yourself. > > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > > > > ----- Original Message ---- > > From: Adam Estrada <estrada.adam.gro...@gmail.com> > > To: Dennis Gearon <gear...@sbcglobal.net>; solr-user@lucene.apache.org > > Sent: Thu, December 16, 2010 10:58:47 AM > > Subject: Re: bulk commits > > > > This is how I import a lot of data from a cvs file. There are close to > 100k > > records in there. Note that you can either pre-define the column names > using > > the fieldnames param like I did here *or* include header=true which will > > automatically pick up the column header if your file has it. > > > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,cat&stream.file=C > > > > > :\tmp\cities1000.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > > This seems to load everything in to some kind of temporary location > before > > it's actually committed. If something goes wrong there is a rollback > feature > > that will undo anything that happened before the commit. > > > > As far as batching a bunch of files, I copied and pasted the following in > to > > Cygwin and it worked just fine. > > > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,cat&stream.file=C > > > > > :\tmp\cities1000.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xab.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xac.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xad.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xae.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xaf.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xag.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xah.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xai.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xaj.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xak.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xal.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xam.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xan.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xao.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl " > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > :\tmp\xap.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > curl http://localhost:8983/solr/update -H "Content-Type: text/xml" > > --data-binary '<optimize/>' > > > > Adam > > > > On Thu, Dec 16, 2010 at 1:44 PM, Dennis Gearon <gear...@sbcglobal.net > >wrote: > > > >> Might be Csv or tab delimited text. > >> > >> Sent from Yahoo! Mail on Android > >> > >> ------------------------------ > >> * From: * Adam Estrada <estrada.adam.gro...@gmail.com>; > >> * To: * <solr-user@lucene.apache.org>; > >> * Subject: * Re: bulk commits > >> * Sent: * Thu, Dec 16, 2010 6:35:17 PM > >> > >> what is it that you are trying to commit? > >> > >> a > >> > >> On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon <gear...@sbcglobal.net > >> >wrote: > >> > >> > What have people found as the best way to do bulk commits either from > the > >> > web or > >> > from a file on the system? > >> > > >> > Dennis Gearon > >> > > >> > > >> > Signature Warning > >> > ---------------- > >> > It is always a good idea to learn from your own mistakes. It is > usually a > >> > better > >> > idea to learn from others’ mistakes, so you do not have to make them > >> > yourself. > >> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > >> > > >> > > >> > EARTH has a Right To Life, > >> > otherwise we all die. > >> > > >> > > >> > > > > >