Uhm, inline: On 18 September 2018 at 17:05, Dan Brown <d...@likethecolor.com> wrote: > 1. Thank you. > > 2. I think this is what you're looking for. You'd be able to be more > specific than with bin/post. For instance: > a. specify the CSV delimiter, CSV quote character, and multivalued field > delimiter http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html separator - (global and field local for multivalued) encapsulator - for CSV quote characters
> b. the dynamic-fields feature let's you write plugins in Java to define > values (very simple example: combine field values f_name, m_name, l_name to > populate a full_name field) UpdateRequestProcessors. Your example specifically: > c. specify field order for mapping onto SOLR fields, data types, date > formats of source data; perhaps your CSV headers/JSON keys don't cleanly > map to SOLR field names > d. flag whether the first row of a CSV is the header and should not be > indexed > e. use literal values - e.g., instead of having to alter the source data to > have a column whose value is "foo" you can configure a field to always have > the same literal value for all documents > f. set the number of times to retry when there is an error and the amount > of time between retries (e.g., sometimes zk was not consistently responsive) > g. skip fields - e.g., your data have 10 columns but you only want to index > columns 1, 3, 5, and 9 > h. send soft commits after a specified number of batches > i. combine fields to generate the uniqueKey value > > 3. Yes, atomic updates. For instance, index data using DIH then use this > index to provide additional values to fields in those documents (e.g., > maybe the extra data come from a different data source like BigQuery). > > I hope this brings more clarity to this tool's features and answers all > your questions. Please ask questions if anyone has more. > > Dan > > > On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz < > ch...@christopherschultz.net> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> Dan, >> >> On 9/18/18 2:51 PM, Dan Brown wrote: >> > I've been working on this for a while and it's finally in a state >> > where it's ready for public consumption. >> > >> > This is a command line indexer that will index CSV or JSON >> > documents: https://github.com/likethecolor/solr-indexer >> > >> > There are quite a few parameters/options that can be set. >> > >> > One thing to note is that it will update individual fields. That >> > is, unlike the Data Import Handler, it does not replace entire >> > documents. >> > >> > Please check it out and let me know what you think. >> >> How is this different from the bin/post tool that ships with Solr? >> >> Or is that you meant when you said "this is unlike the Data Import >> Handler". >> >> AIUI, Solr doesn't support updating a single field in a document. The >> document is replaced no matter how hard to try to be surgical about >> updating a single field. >> >> - -chris >> -----BEGIN PGP SIGNATURE----- >> Comment: GPGTools - http://gpgtools.org >> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ >> >> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8 >> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt >> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et >> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P >> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn >> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG >> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG >> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0 >> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm >> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO >> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF >> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA= >> =AmkJ >> -----END PGP SIGNATURE----- >>