Seems like a bug in the MR version of seqdirectory. (I am assuming u r working off of trunk or Mahout 0.8)
Could you try running this again by specifying the '-xm sequential' option and check if the behavior is correct? ________________________________ From: Liz Merkhofer <[email protected]> To: [email protected] Sent: Monday, August 26, 2013 1:19 PM Subject: seqdirectory -filter arg: not found, default used, no exception Hello list, I'm trying to inject my own filter into "seqdirectory" so I can use a .json file in the format {"docid": "text", } as input. I understand that a custom filter can be specified as -filter, replacing the default PrefixAdditionFilter. However, when I put what I thought was a json-reading filter in the dependancies as MahoutFilter.JsonFilter, it read the whole json file up with the file's path as the key and the whole json file as the value - that is, exactly as if the default filter were working. Command for that: mahout seqdirectory -o test_json -i json_stems.json -filter MahoutFilter.JsonFilter -ow (MahoutFilter.JsonFilter is the whole classpath.) Then I tried putting my a filter name in there that definitely didn't exist: mahout seqdirectory -o test_json -i json_stems.json -filter NoSuchFilter -ow Once again, no exception thrown, and the default filter seems to have been used. Still, it does recognize that it was given the argument: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase= [2147483647], --fileFilterClass=[NoSuchFilter], --input=[json_stems.json], --keyPrefix=[], --method=[mapreduce], --output=[test_json], --overwrite=null, --startPhase=[0], --tempDir=[temp]} My take-away from this is: 1. When mahout does not find the filter specified, it uses the default. Minimally, a user should be warned when their argument is ignored. Perhaps I should document this in the jira. 2. Any ideas on helping mahout find my filter? 3. There was a csv filter up to 0.5 that also would have done the trick here - any reason it's no longer included? Thanks, Liz
