Re: Data Import from RDBMS+File

Raheel Hasan Mon, 08 Jul 2013 08:20:14 -0700

ok great.....

can I use this EntityProcessor within JdbcDataSource?


Like this:

<dataConfig>
  <dataSource type="JdbcDataSource"
              driver="com.mysql.jdbc.Driver"
              url="jdbc:mysql://localhost/db_1"
              user="root"
              password=""
              autoCommit="true"
              />

  <document>

  <entity name="table_1_fetch"
 query="SELECT field_1 FROM table_1 WHERE ('${dataimporter.request.clean}'
!= 'false' OR added_on > '${dataimporter.last_index_time}')">

<entity name="genesis_case_documents"
query="SELECT original_document FROM case_documents WHERE case_md5
='${genesis_case_info.case_md5}'">
 </entity>
 <entity processor="PlainTextEntityProcessor"
name="table_2_from_file_fetch" url="http://localhost/project_1/files/a.txt";
dataSource="data-source-name">
  <field column="plainText" name="text"/>
</entity>



By the way, I currently load the field into "text_en_splitting" as defined
in schema.xml...




On Mon, Jul 8, 2013 at 7:59 PM, Alexandre Rafalovitch <arafa...@gmail.com>wrote:

> http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor or
> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor ?
>
> The file name gets exposed as a ${entityname.fieldname} variable. You can
> probably copy/manipulate it with a transformer on the external entity
> before it hits an inner one.
>
> Regards,
>   Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Mon, Jul 8, 2013 at 10:42 AM, Raheel Hasan <raheelhasan....@gmail.com
> >wrote:
>
> > On this page (http://wiki.apache.org/solr/DataImportHandler), I cant see
> > how its possible. Perhaps there is another guide..
> >
> > Basically, this is what I am doing:
> > Index data from multiple tables into Solr (see here
> > http://wiki.apache.org/solr/DIHQuickStart). I need to skip 1 very big
> > heavy
> > table as it only have 1 field that is a complete file. So I want to skip
> > the step of loading that file per record into my RDB and then indexing
> > it... Instead, I want to directly index that file with the rest of the
> > records from coming from database...
> >
> >
> >
> >
> > On Mon, Jul 8, 2013 at 7:30 PM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >wrote:
> >
> > > Did you have a chance to look at DIH with nested entities yet? That's
> > > probably the way to go to start out.
> > >
> > > Or a custom client, of course. Or, ETL solutions that support Solr
> (e.g.
> > > Apache Flume - not personally tested yet).
> > >
> > > Regards,
> > >    Alex.
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > > On Mon, Jul 8, 2013 at 10:08 AM, Raheel Hasan <
> raheelhasan....@gmail.com
> > > >wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I am looking for a way to import/index data such that i load data
> from
> > > > table_1 and instead of joining from table_2, i import the rest of the
> > > > "joined" data from a file instead. The name of the file comes from a
> > > field
> > > > from table_1....
> > > >
> > > > Is it possible? and is it easily possible?
> > > >
> > > > --
> > > > Regards,
> > > > Raheel Hasan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Raheel Hasan
> >
>



-- 
Regards,
Raheel Hasan

Re: Data Import from RDBMS+File

Reply via email to