Re: CsvRecordFactory usage recomendation

Ted Dunning Sun, 12 Jun 2011 01:06:48 -0700

I am not very happy with the CsvRecordFactory design (my fault, I
know).  Some of the ideas are useful, but the final outcome was not
general enough.


My own tendency is to build custom vector encoding, if only for
performance.  Of course if you are reading from a database,
performance is clearly not a priority.

On Sat, Jun 11, 2011 at 11:13 PM, Svetlomir Kasabov
<[email protected]> wrote:
> Hello,
>
> I have a question:
>
> I have seen, that some of the mahout examples use the class
> CsvRecordFactory.java for parsing training and test examples. Would you
> recommend this class also for actual usage in production? This would mean,
> that I should create a CSV file from my real data (in my case, it is in a
> relational database), and then use the CSV file in order to train my (online
> logistic regression) model. This approach would have the advantage of having
> the 'extracted' data as CSV which can be used for quick re-training, without
> DB access...
>
> Or should I omit the intermediate step with the CSV file and train my
> (online logistic regression) model directly with the data from the
> relational database? Which of the both approaches would be better?
>
> Thank you!
>
> Svetlomir.
>
>
>
>
>

Re: CsvRecordFactory usage recomendation

Reply via email to