Re: Processing/Indexing CSV

2011-06-10 Thread Erick Erickson
Well, here's a place to start if you want to patch the code: http://wiki.apache.org/solr/HowToContribute If you do want to take this on, hop on over to the dev list and start a discussion. I'd start with some posts on that list before entering or working on a JIRA issue, just ask for some

Re: Processing/Indexing CSV

2011-06-10 Thread Helmut Hoffer von Ankershoffen
Hi, thanks for the Intro, will do next week :-) greetings from berlin On Fri, Jun 10, 2011 at 2:49 PM, Erick Erickson erickerick...@gmail.comwrote: Well, here's a place to start if you want to patch the code: http://wiki.apache.org/solr/HowToContribute If you do want to take this on, hop

Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor and RegexTransformerhttp://wiki.apache.org/solr/DataImportHandler#RegexTransformer as proposed in

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, to make my point more clear: if the CSV has a fixed schema / column layout, using the RegexTransformer is of course a possibility (however awkward). But if you want to implement a (more or less) schema free shopping search engine ... regards On Thu, Jun 9, 2011 at 9:31 PM, Helmut Hoffer von

RE: Processing/Indexing CSV

2011-06-09 Thread Dyer, James
, 2011 2:32 PM To: solr-user@lucene.apache.org Subject: Processing/Indexing CSV Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor and RegexTransformerhttp

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, there seems to be no way to index CSV using the DataImportHandler. Looking over the features you want, it looks like you're starting from a CSV file (as opposed to CSV stored in a database). Is

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Content Group (615) 213-4311 -Original Message- From: Helmut Hoffer von Ankershoffen [mailto:helmut...@googlemail.com] Sent: Thursday, June 09, 2011 2:32 PM To: solr-user@lucene.apache.org Subject: Processing/Indexing CSV Hi, there seems to be no way to index CSV using

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
09, 2011 2:32 PM To: solr-user@lucene.apache.org Subject: Processing/Indexing CSV Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessor http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor and RegexTransformer

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. DIH is flexible enough for building the importing part of such a thing but misses elegant handling of CSV data ... Regards On Thu, Jun 9,

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. You can provide your own list of

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings etc.). The idea is to map known field names to defined field names in

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler kkrugler_li...@transpac.comwrote: On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-) Best

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote: Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the