Sure Erik,
Or since we already default to full path name as id, perhaps we could change
literal.resourcename to be the filename only. Guess that one is mostly for Tika
to have more hints to guess the type of file, so it doesn't need to be
absolute, especially when you have it in the ID
Since this is a POC you could simply run this command with the default example
schema:
cd solr/example/exampledocs
java -Dauto -Drecursive=0 -jar post.jar path/to/folder
You will get the full file name with path in field resourcename
If you need to search just the filename, you can achieve that
Thanks, Jan, for making the post tool do this type of thing. Great stuff.
The filename would be a good one add for out of the box goodness. We can
easily add just the filename to the index with something like the patch below.
And on that note, what else would folks want in an easy to use
HI,
I am new to apache solr,
I am doing a poc, where there is a folder (in sys or some repository) which
has different files with diff extensions pdf, doc, xls..,
I want to search with a file name and retrieve all the files with the name
matching
How do i proceed on this.
Please help me on
You could use DataImportHandler with FileListEntityProcessor to get the
file names in:
http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
Then, if it is recursive enumeration and not just one level, you probably
want a tokenizer that splits on path separator characters (e.g.