Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Stefan Matheis
Kaushik,

i just remembered an ML-Post few weeks ago .. same problem while
importing geo-data
(http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html)
- the solution was:

 CAST( CONCAT( lat, ',', lng ) AS CHAR )

at that time i search a little bit for the reason and afaik there was
a bug in mysql/jdbc which produces that binary output under certain
conditions

Regards
Stefan

On Wed, Mar 16, 2011 at 4:57 AM, Kaushik Chakraborty kaych...@gmail.com wrote:
 I've a column for posts in MySQL of type `text`, I've tried corresponding
 `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
 whenever I'm importing it using the DIH, it's getting imported as a BLOB
 object. I checked, this thing is happening only for columns of type `text`
 and not for `varchar`(they are getting indexed as string). Hence, the posts
 field is not becoming searchable.

 I found about this issue, after repeated search failures, when I did a `*:*`
 query search on Solr. A sample response:

        result name=response numFound=223 start=0 maxScore=1.0
        doc
        float name=score1.0/float
        str name=solr_post_bio[B@10a33ce2/str
        date name=solr_post_created_at2011-02-21T07:02:55Z/date
        str name=solr_post_emailtest.acco...@gmail.com/str
        str name=solr_post_first_nameTest/str
        str name=solr_post_last_nameAccount/str
        str name=solr_post_message[B@2c93c4f1/str
        str name=solr_post_status_message_id1/str
        /doc

 The `data-config.xml` :

        document
     entity name=posts dataSource=jdbc  query=select
     p.person_id as solr_post_person_id,
     pr.first_name as solr_post_first_name,
     pr.last_name as solr_post_last_name,
     u.email as solr_post_email,
     p.message as solr_post_message,
     p.id as solr_post_status_message_id,
     p.created_at as solr_post_created_at,
     pr.bio as solr_post_bio
     from posts p,users u,profiles pr where p.person_id = u.id and
 p.person_id = pr.person_id and p.type='StatusMessage'
             field column=solr_post_person_id /
     field column=solr_post_first_name/
     field column=solr_post_last_name /
     field column=solr_post_email /
     field column=solr_post_message /
     field column=solr_post_status_message_id /
     field column=solr_post_created_at /
     field column=solr_post_bio/
           /entity
      /document

 The `schema.xml` :

    fields
        field name=solr_post_status_message_id type=string
 indexed=true stored=true required=true /
     field name=solr_post_message type=text_ws indexed=true
 stored=true required=true /
     field name=solr_post_bio type=text indexed=false stored=true
 /
     field name=solr_post_first_name type=string indexed=false
 stored=true /
     field name=solr_post_last_name type=string indexed=false
 stored=true /
     field name=solr_post_email type=string indexed=false
 stored=true /
     field name=solr_post_created_at type=date indexed=false
 stored=true /
    /fields
    uniqueKeysolr_post_status_message_id/uniqueKey
    defaultSearchFieldsolr_post_message/defaultSearchField


 Thanks,
 Kaushik



Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Gora Mohanty
On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
matheis.ste...@googlemail.com wrote:
 Kaushik,

 i just remembered an ML-Post few weeks ago .. same problem while
 importing geo-data
 (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html)
 - the solution was:

 CAST( CONCAT( lat, ',', lng ) AS CHAR )

 at that time i search a little bit for the reason and afaik there was
 a bug in mysql/jdbc which produces that binary output under certain
 conditions
[...]

As Stefan mentions, there might be a way to solve this.

Could you show us the query in DIH that you are using
when you get this BLOB, i.e., the SELECT statement
that goes to the database?

It might also be instructive for you to try that same
SELECT directly in a mysql interface.

Regards,
Gora


Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Kaushik Chakraborty
The query's there in the data-config.xml. And the query's fetching as
expected from the database.

Thanks,
Kaushik


On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
 matheis.ste...@googlemail.com wrote:
  Kaushik,
 
  i just remembered an ML-Post few weeks ago .. same problem while
  importing geo-data
  (
 http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html
 )
  - the solution was:
 
  CAST( CONCAT( lat, ',', lng ) AS CHAR )
 
  at that time i search a little bit for the reason and afaik there was
  a bug in mysql/jdbc which produces that binary output under certain
  conditions
 [...]

 As Stefan mentions, there might be a way to solve this.

 Could you show us the query in DIH that you are using
 when you get this BLOB, i.e., the SELECT statement
 that goes to the database?

 It might also be instructive for you to try that same
 SELECT directly in a mysql interface.

 Regards,
 Gora



Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Gora Mohanty
On Wed, Mar 16, 2011 at 9:50 PM, Kaushik Chakraborty kaych...@gmail.com wrote:
 The query's there in the data-config.xml. And the query's fetching as
 expected from the database.
[...]

Doh! Sorry, had missed that somehow.

So, the relevant part is:
SELECT ... p.message as solr_post_message,

What is the field type for p.message in mysql?
Cannot remember off the top of my head for
mysql, but if it is a TextField, you might want
to look into a ClobTransformer:
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

Regards,
Gora


Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Jayendra Patil
Hi Kaushik,

If the field is being treated as blobs, you can try using the
FieldStreamDataSource mapping.
This handles the blob objects to extract contents from it.

This feature is available only after Solr 3.1, I suppose.
http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/FieldStreamDataSource.html

Regards,
Jayendra

On Tue, Mar 15, 2011 at 11:57 PM, Kaushik Chakraborty
kaych...@gmail.com wrote:
 I've a column for posts in MySQL of type `text`, I've tried corresponding
 `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
 whenever I'm importing it using the DIH, it's getting imported as a BLOB
 object. I checked, this thing is happening only for columns of type `text`
 and not for `varchar`(they are getting indexed as string). Hence, the posts
 field is not becoming searchable.

 I found about this issue, after repeated search failures, when I did a `*:*`
 query search on Solr. A sample response:

        result name=response numFound=223 start=0 maxScore=1.0
        doc
        float name=score1.0/float
        str name=solr_post_bio[B@10a33ce2/str
        date name=solr_post_created_at2011-02-21T07:02:55Z/date
        str name=solr_post_emailtest.acco...@gmail.com/str
        str name=solr_post_first_nameTest/str
        str name=solr_post_last_nameAccount/str
        str name=solr_post_message[B@2c93c4f1/str
        str name=solr_post_status_message_id1/str
        /doc

 The `data-config.xml` :

        document
     entity name=posts dataSource=jdbc  query=select
     p.person_id as solr_post_person_id,
     pr.first_name as solr_post_first_name,
     pr.last_name as solr_post_last_name,
     u.email as solr_post_email,
     p.message as solr_post_message,
     p.id as solr_post_status_message_id,
     p.created_at as solr_post_created_at,
     pr.bio as solr_post_bio
     from posts p,users u,profiles pr where p.person_id = u.id and
 p.person_id = pr.person_id and p.type='StatusMessage'
             field column=solr_post_person_id /
     field column=solr_post_first_name/
     field column=solr_post_last_name /
     field column=solr_post_email /
     field column=solr_post_message /
     field column=solr_post_status_message_id /
     field column=solr_post_created_at /
     field column=solr_post_bio/
           /entity
      /document

 The `schema.xml` :

    fields
        field name=solr_post_status_message_id type=string
 indexed=true stored=true required=true /
     field name=solr_post_message type=text_ws indexed=true
 stored=true required=true /
     field name=solr_post_bio type=text indexed=false stored=true
 /
     field name=solr_post_first_name type=string indexed=false
 stored=true /
     field name=solr_post_last_name type=string indexed=false
 stored=true /
     field name=solr_post_email type=string indexed=false
 stored=true /
     field name=solr_post_created_at type=date indexed=false
 stored=true /
    /fields
    uniqueKeysolr_post_status_message_id/uniqueKey
    defaultSearchFieldsolr_post_message/defaultSearchField


 Thanks,
 Kaushik



SOLR DIH importing MySQL text column as a BLOB

2011-03-15 Thread Kaushik Chakraborty
I've a column for posts in MySQL of type `text`, I've tried corresponding
`field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
whenever I'm importing it using the DIH, it's getting imported as a BLOB
object. I checked, this thing is happening only for columns of type `text`
and not for `varchar`(they are getting indexed as string). Hence, the posts
field is not becoming searchable.

I found about this issue, after repeated search failures, when I did a `*:*`
query search on Solr. A sample response:

result name=response numFound=223 start=0 maxScore=1.0
doc
float name=score1.0/float
str name=solr_post_bio[B@10a33ce2/str
date name=solr_post_created_at2011-02-21T07:02:55Z/date
str name=solr_post_emailtest.acco...@gmail.com/str
str name=solr_post_first_nameTest/str
str name=solr_post_last_nameAccount/str
str name=solr_post_message[B@2c93c4f1/str
str name=solr_post_status_message_id1/str
/doc

The `data-config.xml` :

document
 entity name=posts dataSource=jdbc  query=select
 p.person_id as solr_post_person_id,
 pr.first_name as solr_post_first_name,
 pr.last_name as solr_post_last_name,
 u.email as solr_post_email,
 p.message as solr_post_message,
 p.id as solr_post_status_message_id,
 p.created_at as solr_post_created_at,
 pr.bio as solr_post_bio
 from posts p,users u,profiles pr where p.person_id = u.id and
p.person_id = pr.person_id and p.type='StatusMessage'
 field column=solr_post_person_id /
 field column=solr_post_first_name/
 field column=solr_post_last_name /
 field column=solr_post_email /
 field column=solr_post_message /
 field column=solr_post_status_message_id /
 field column=solr_post_created_at /
 field column=solr_post_bio/
   /entity
  /document

The `schema.xml` :

fields
field name=solr_post_status_message_id type=string
indexed=true stored=true required=true /
 field name=solr_post_message type=text_ws indexed=true
stored=true required=true /
 field name=solr_post_bio type=text indexed=false stored=true
/
 field name=solr_post_first_name type=string indexed=false
stored=true /
 field name=solr_post_last_name type=string indexed=false
stored=true /
 field name=solr_post_email type=string indexed=false
stored=true /
 field name=solr_post_created_at type=date indexed=false
stored=true /
/fields
uniqueKeysolr_post_status_message_id/uniqueKey
defaultSearchFieldsolr_post_message/defaultSearchField


Thanks,
Kaushik