I have a column in my database that is of type long text and holds xml
content. I was wondering when I define the entity record is there a way to
provide a custom extractor that will take in the xml and return rows with
appropriate fields to be indexed.

Thank you,
Peri Subrahmanya





On 4/26/13 12:24 PM, "Shawn Heisey" <s...@elyograg.org> wrote:

>On 4/25/2013 9:00 AM, xiaoqi wrote:
>> i using DIH to build index is slow , when it fetch 2 million rows , it
>>will
>> spend 20 minutes , very slow.
>
>If it takes 20 minutes for two million records, I'd say it's working
>very well.  I do six simultaneous MySQL imports of 13 million records
>each.  It takes a little over 3 hours on Solr 3.5.0, a little over four
>hours on Solr 4.2.1 (due to compression and the transaction log).  If I
>do them one at a time instead of all at once, it will go *slightly*
>faster for each one, but the overall process would take a whole day.
>For comparison purposes, that's about 20 minutes each time it does 1
>million rows.  Yours is going twice as fast as mine.
>
>Looking at your config file, I don't see a batchSize parameter.  This is
>a change that is specific to MySQL.  You can greatly reduce the memory
>usage by including this attribute in the dataSource tag along with the
>user and password:
>
>batchSize="-1"
>
>With two million records and no batchSize parameter, I'm surprised you
>aren't hitting an Out Of Memory error.  By default JDBC will pull down
>all the results and store them in memory, then DIH will begin indexing.
> A batchSize of -1 makes DIH tell the MySQL JDBC driver to stream the
>results instead of storing them.  Reducing the memory usage in this way
>might make it go faster.
>
>Thanks,
>Shawn
>
>
>
>
>




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Reply via email to