Thanks for you suggestions Ahmet.

We are using the Typo3 CMS (with custom extensions / db-schemas). We
are using Solarium to connect to the Solr instance.

The schema is pretty simple:

<script><![CDATA[
                function PrependPath(row)    {
                    var files = row.get('di_file').split(',')
                    var cleaned = new java.util.ArrayList();

                    // TODO
                    var path = "/Users/b/Sites/";

                    for (var i = 0; i < files.length;i++) {
                        var fullPath = path + files[i];
                        cleaned.add(fullPath);
                    }

                    row.put("di_file",cleaned);
                    return row;
                }
        ]]></script>

    <dataSource type="JdbcDataSource" name="ds1"
driver="com.mysql.jdbc.Driver" url="jdbc:xxxx"
                user="xxx" password="xxx"/>
    <!-- Apache Tika Datasource -->
    <dataSource type="BinFileDataSource" name="bin"/>
... SOME PARENT ENTITIES ...

            <entity name="projectfiles" query="SELECT ...."
                transformer="script:PrependPath"
                >
                <field name="download_title" column="di_title"/>
                <field name="download_longtitle" column="di_longtitle"/>

                <field name="download_filenames" column="di_file"/>

                <field name="download_notes" column="di_notes"/>
                <field name="download_desc" column="di_description"/>
                <field name="download_cdate" column="di_date"/>

                <entity name="binaryImport" processor="TikaEntityProcessor"
                    dataSource="bin" format="text"
                    url="${projectfiles.di_file}">
                    <field column="text"/>
                </entity>
            </entity>

Thanks in advance,
Sam


On Wed, Jan 29, 2014 at 12:43 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
> Hi Bustaa,
>
> Can you paste your data-config.xml?
>
> Also, did you consider using ManifoldCF [1] to crawl/index your CMS? What CMS 
> are you using?
>
> [1] 
> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#repositoryconnectiontypes
>
>
>
>
>
>
> On Wednesday, January 29, 2014 1:03 PM, Bustaa <bus...@gmail.com> wrote:
> Hello Solr Users,
>
> i'm trying to get Tika's "BinFileDataSource" to take the filenames
> from a multivalue field (array) but I'm getting the following
> exception:
>
> Debug output from running dataimport (shortenend):
>
>
>           "query",
>           "<<< LONG SQL-QUERY >>>",
>           "time-taken",
>           "0:0:0.11",
>           null,
>           "----------- row #1-------------",
>           "di_description",
>           "asdad",
>           "di_longtitle",
>           "",
>           "di_file",
>           
> "fileadmin/user_upload/dateien/abc/file1.pdf,fileadmin/user_upload/dateien/abc/file2.pdf",
>           "di_title",
>           "test",
>           "di_date",
>           "2014-01-30T00:00:00Z",
>           "di_notes",
>           "",
>           null,
>           "---------------------------------------------",
>           "transformer:script:PrependPath",
>           [
>             null,
>             "---------------------------------------------",
>             "di_description",
>             "asdad",
>             "di_longtitle",
>             "",
>             "di_file",
>             [
>               "/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf",
>               "/Users/b/Sites/fileadmin/user_upload/dateien/abc/file2.pdf"
>             ],
>             "di_title",
>             "test",
>             "di_date",
>             "2014-01-30T00:00:00Z",
>             "di_notes",
>             "",
>             null,
>             "---------------------------------------------",
>             "entity:binaryImport",
>             [
>               "query",
>               "[/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf,
> /Users/b/Sites/fileadmin/user_upload/dateien/abc/file2.pdf]",
>               "EXCEPTION",
>               "java.lang.RuntimeException:
> java.io.FileNotFoundException: Could not find file:
> [/Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf,
> /Users/b/Sites/fileadmin/user_upload/dateien/abc/file1.pdf] <<< MORE
> STACKTRACE >>>",
>               "time-taken",
>               "0:0:0.1"
>             ]
>           ]
>         ]
>       ]
>
> Is there a way to get Tika's "BinFileDataSource" to accept the
> multiple values or is there a workaround (the CMS we are using save
> the file comma-separated into on big text field).
>
> Thanks in advance,
>
> Sam
>

Reply via email to