Hi,

btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH
regarding the CSV format (James Dyer) and the effort to maintain the
CSVLoader (Ken Krugler). How about merging your efforts and migrating the
CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-)

Best Regards

On Thu, Jun 9, 2011 at 11:17 PM, Helmut Hoffer von Ankershoffen <
helmut...@googlemail.com> wrote:

>
>
> On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler 
> <kkrugler_li...@transpac.com>wrote:
>
>>
>> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote:
>>
>> > Hi,
>> >
>> > ... that would be an option if there is a defined set of field names and
>> a
>> > single column/CSV layout. The scenario however is different csv files
>> (from
>> > different shops) with individual column layouts (separators, encodings
>> > etc.). The idea is to map known field names to defined field names in
>> the
>> > solr schema. If I understand the capabilities of the CSVLoader correctly
>> > (sorry, I am completely new to Solr, started work on it today) this is
>> not
>> > possible - is it?
>>
>> As per the documentation on
>> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the
>> names/positions of fields in the CSV file, and ignore fieldnames.
>>
>> So this seems like it would solve your requirement, as each different
>> layout could specify its own such mapping during import.
>>
>> Sure, but the requirement (to keep the process of integrating new shops
> efficient) is not to have one mapping per import (cp. the Email regarding
> "more or less schema free") but to enhance one mapping that maps common
> field names to defined fields disregarding order of known fields/columns. As
> far as I understand that is not a problem at all with DIH, however DIH and
> CSV are not a perfect match ,-)
>
>
>> It could be handy to provide a fieldname map (versus the value map that
>> UpdateCSV supports).
>
> Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in
> DIH ...
>
>
>> Then you could use the header, and just provide a mapping from header
>> fieldnames to schema fieldnames.
>>
> That's the idea -)
>
> => what's the best way to progress. Either someone enhances the CSVLoader
> by a field mapper (with multipel input field names mapping to one field name
> in the Solr schema) or someone enhances the DIH with a robust CSV loader
> ,-). As I am completely new to this Community, please give me the direction
> to go (or wait :-).
>
> best regards
>
>
>> -- Ken
>>
>> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley <
>> yo...@lucidimagination.com>wrote:
>> >
>> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen
>> >> <helmut...@googlemail.com> wrote:
>> >>> Hi,
>> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into a
>> >>> shopping search engine.
>> >>> The CSV Loader cannot map fields (only field values) etc.
>> >>
>> >> You can provide your own list of fieldnames and optionally ignore the
>> >> first line of the CSV file (assuming it contains the field names).
>> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>>
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> custom data mining solutions
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to