Thanks for you suggestions Ahmet. We are using the Typo3 CMS (with custom extensions / db-schemas). We are using Solarium to connect to the Solr instance.
The schema is pretty simple: <script><![CDATA[ function PrependPath(row) { var files = row.get('di_file').split(',') var cleaned = new java.util.ArrayList(); // TODO var path = "/Users/b/Sites/"; for (var i = 0; i < files.length;i++) { var fullPath = path + files[i]; cleaned.add(fullPath); } row.put("di_file",cleaned); return row; } ]]></script> <dataSource type="JdbcDataSource" name="ds1" driver="com.mysql.jdbc.Driver" url="jdbc:xxxx" user="xxx" password="xxx"/> <!-- Apache Tika Datasource --> <dataSource type="BinFileDataSource" name="bin"/> ... SOME PARENT ENTITIES ... <entity name="projectfiles" query="SELECT ...." transformer="script:PrependPath" > <field name="download_title" column="di_title"/> <field name="download_longtitle" column="di_longtitle"/> <field name="download_filenames" column="di_file"/> <field name="download_notes" column="di_notes"/> <field name="download_desc" column="di_description"/> <field name="download_cdate" column="di_date"/> <entity name="binaryImport" processor="TikaEntityProcessor" dataSource="bin" format="text" url="${projectfiles.di_file}"> <field column="text"/> </entity> </entity> Thanks in advance, Sam On Wed, Jan 29, 2014 at 12:43 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > Hi Bustaa, > > Can you paste your data-config.xml? > > Also, did you consider using ManifoldCF [1] to crawl/index your CMS? What CMS > are you using? > > [1] > http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#repositoryconnectiontypes > > > > > > > On Wednesday, January 29, 2014 1:03 PM, Bustaa <bus...@gmail.com> wrote: > Hello Solr Users, > > i'm trying to get Tika's "BinFileDataSource" to take the filenames > from a multivalue field (array) but I'm getting the following > exception: > > Debug output from running dataimport (shortenend): > > > "query", > "<<< LONG SQL-QUERY >>>", > "time-taken", > "0:0:0.11", > null, > "----------- row #1-------------", > "di_description", > "asdad", > "di_longtitle", > "", > "di_file", > > "fileadmin/user_upload/dateien/abc/file1.pdf,fileadmin/user_upload/dateien/abc/file2.pdf", > "di_title", > "test", > "di_date", > "2014-01-30T00:00:00Z", > "di_notes", > "", > null, > "---------------------------------------------", > "transformer:script:PrependPath", > [ > null, > "---------------------------------------------", > "di_description", > "asdad", > "di_longtitle", > "", > "di_file", > [ > "/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf", > "/Users/b/Sites/fileadmin/user_upload/dateien/abc/file2.pdf" > ], > "di_title", > "test", > "di_date", > "2014-01-30T00:00:00Z", > "di_notes", > "", > null, > "---------------------------------------------", > "entity:binaryImport", > [ > "query", > "[/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf, > /Users/b/Sites/fileadmin/user_upload/dateien/abc/file2.pdf]", > "EXCEPTION", > "java.lang.RuntimeException: > java.io.FileNotFoundException: Could not find file: > [/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf, > /Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf] <<< MORE > STACKTRACE >>>", > "time-taken", > "0:0:0.1" > ] > ] > ] > ] > > Is there a way to get Tika's "BinFileDataSource" to accept the > multiple values or is there a workaround (the CMS we are using save > the file comma-separated into on big text field). > > Thanks in advance, > > Sam >