Re: SOLR DIH importing MySQL text column as a BLOB
Kaushik, i just remembered an ML-Post few weeks ago .. same problem while importing geo-data (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html) - the solution was: CAST( CONCAT( lat, ',', lng ) AS CHAR ) at that time i search a little bit for the reason and afaik there was a bug in mysql/jdbc which produces that binary output under certain conditions Regards Stefan On Wed, Mar 16, 2011 at 4:57 AM, Kaushik Chakraborty kaych...@gmail.com wrote: I've a column for posts in MySQL of type `text`, I've tried corresponding `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But whenever I'm importing it using the DIH, it's getting imported as a BLOB object. I checked, this thing is happening only for columns of type `text` and not for `varchar`(they are getting indexed as string). Hence, the posts field is not becoming searchable. I found about this issue, after repeated search failures, when I did a `*:*` query search on Solr. A sample response: result name=response numFound=223 start=0 maxScore=1.0 doc float name=score1.0/float str name=solr_post_bio[B@10a33ce2/str date name=solr_post_created_at2011-02-21T07:02:55Z/date str name=solr_post_emailtest.acco...@gmail.com/str str name=solr_post_first_nameTest/str str name=solr_post_last_nameAccount/str str name=solr_post_message[B@2c93c4f1/str str name=solr_post_status_message_id1/str /doc The `data-config.xml` : document entity name=posts dataSource=jdbc query=select p.person_id as solr_post_person_id, pr.first_name as solr_post_first_name, pr.last_name as solr_post_last_name, u.email as solr_post_email, p.message as solr_post_message, p.id as solr_post_status_message_id, p.created_at as solr_post_created_at, pr.bio as solr_post_bio from posts p,users u,profiles pr where p.person_id = u.id and p.person_id = pr.person_id and p.type='StatusMessage' field column=solr_post_person_id / field column=solr_post_first_name/ field column=solr_post_last_name / field column=solr_post_email / field column=solr_post_message / field column=solr_post_status_message_id / field column=solr_post_created_at / field column=solr_post_bio/ /entity /document The `schema.xml` : fields field name=solr_post_status_message_id type=string indexed=true stored=true required=true / field name=solr_post_message type=text_ws indexed=true stored=true required=true / field name=solr_post_bio type=text indexed=false stored=true / field name=solr_post_first_name type=string indexed=false stored=true / field name=solr_post_last_name type=string indexed=false stored=true / field name=solr_post_email type=string indexed=false stored=true / field name=solr_post_created_at type=date indexed=false stored=true / /fields uniqueKeysolr_post_status_message_id/uniqueKey defaultSearchFieldsolr_post_message/defaultSearchField Thanks, Kaushik
Re: SOLR DIH importing MySQL text column as a BLOB
On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Kaushik, i just remembered an ML-Post few weeks ago .. same problem while importing geo-data (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html) - the solution was: CAST( CONCAT( lat, ',', lng ) AS CHAR ) at that time i search a little bit for the reason and afaik there was a bug in mysql/jdbc which produces that binary output under certain conditions [...] As Stefan mentions, there might be a way to solve this. Could you show us the query in DIH that you are using when you get this BLOB, i.e., the SELECT statement that goes to the database? It might also be instructive for you to try that same SELECT directly in a mysql interface. Regards, Gora
Re: SOLR DIH importing MySQL text column as a BLOB
The query's there in the data-config.xml. And the query's fetching as expected from the database. Thanks, Kaushik On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Kaushik, i just remembered an ML-Post few weeks ago .. same problem while importing geo-data ( http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html ) - the solution was: CAST( CONCAT( lat, ',', lng ) AS CHAR ) at that time i search a little bit for the reason and afaik there was a bug in mysql/jdbc which produces that binary output under certain conditions [...] As Stefan mentions, there might be a way to solve this. Could you show us the query in DIH that you are using when you get this BLOB, i.e., the SELECT statement that goes to the database? It might also be instructive for you to try that same SELECT directly in a mysql interface. Regards, Gora
Re: SOLR DIH importing MySQL text column as a BLOB
On Wed, Mar 16, 2011 at 9:50 PM, Kaushik Chakraborty kaych...@gmail.com wrote: The query's there in the data-config.xml. And the query's fetching as expected from the database. [...] Doh! Sorry, had missed that somehow. So, the relevant part is: SELECT ... p.message as solr_post_message, What is the field type for p.message in mysql? Cannot remember off the top of my head for mysql, but if it is a TextField, you might want to look into a ClobTransformer: http://wiki.apache.org/solr/DataImportHandler#ClobTransformer Regards, Gora
Re: SOLR DIH importing MySQL text column as a BLOB
Hi Kaushik, If the field is being treated as blobs, you can try using the FieldStreamDataSource mapping. This handles the blob objects to extract contents from it. This feature is available only after Solr 3.1, I suppose. http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/FieldStreamDataSource.html Regards, Jayendra On Tue, Mar 15, 2011 at 11:57 PM, Kaushik Chakraborty kaych...@gmail.com wrote: I've a column for posts in MySQL of type `text`, I've tried corresponding `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But whenever I'm importing it using the DIH, it's getting imported as a BLOB object. I checked, this thing is happening only for columns of type `text` and not for `varchar`(they are getting indexed as string). Hence, the posts field is not becoming searchable. I found about this issue, after repeated search failures, when I did a `*:*` query search on Solr. A sample response: result name=response numFound=223 start=0 maxScore=1.0 doc float name=score1.0/float str name=solr_post_bio[B@10a33ce2/str date name=solr_post_created_at2011-02-21T07:02:55Z/date str name=solr_post_emailtest.acco...@gmail.com/str str name=solr_post_first_nameTest/str str name=solr_post_last_nameAccount/str str name=solr_post_message[B@2c93c4f1/str str name=solr_post_status_message_id1/str /doc The `data-config.xml` : document entity name=posts dataSource=jdbc query=select p.person_id as solr_post_person_id, pr.first_name as solr_post_first_name, pr.last_name as solr_post_last_name, u.email as solr_post_email, p.message as solr_post_message, p.id as solr_post_status_message_id, p.created_at as solr_post_created_at, pr.bio as solr_post_bio from posts p,users u,profiles pr where p.person_id = u.id and p.person_id = pr.person_id and p.type='StatusMessage' field column=solr_post_person_id / field column=solr_post_first_name/ field column=solr_post_last_name / field column=solr_post_email / field column=solr_post_message / field column=solr_post_status_message_id / field column=solr_post_created_at / field column=solr_post_bio/ /entity /document The `schema.xml` : fields field name=solr_post_status_message_id type=string indexed=true stored=true required=true / field name=solr_post_message type=text_ws indexed=true stored=true required=true / field name=solr_post_bio type=text indexed=false stored=true / field name=solr_post_first_name type=string indexed=false stored=true / field name=solr_post_last_name type=string indexed=false stored=true / field name=solr_post_email type=string indexed=false stored=true / field name=solr_post_created_at type=date indexed=false stored=true / /fields uniqueKeysolr_post_status_message_id/uniqueKey defaultSearchFieldsolr_post_message/defaultSearchField Thanks, Kaushik
SOLR DIH importing MySQL text column as a BLOB
I've a column for posts in MySQL of type `text`, I've tried corresponding `field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But whenever I'm importing it using the DIH, it's getting imported as a BLOB object. I checked, this thing is happening only for columns of type `text` and not for `varchar`(they are getting indexed as string). Hence, the posts field is not becoming searchable. I found about this issue, after repeated search failures, when I did a `*:*` query search on Solr. A sample response: result name=response numFound=223 start=0 maxScore=1.0 doc float name=score1.0/float str name=solr_post_bio[B@10a33ce2/str date name=solr_post_created_at2011-02-21T07:02:55Z/date str name=solr_post_emailtest.acco...@gmail.com/str str name=solr_post_first_nameTest/str str name=solr_post_last_nameAccount/str str name=solr_post_message[B@2c93c4f1/str str name=solr_post_status_message_id1/str /doc The `data-config.xml` : document entity name=posts dataSource=jdbc query=select p.person_id as solr_post_person_id, pr.first_name as solr_post_first_name, pr.last_name as solr_post_last_name, u.email as solr_post_email, p.message as solr_post_message, p.id as solr_post_status_message_id, p.created_at as solr_post_created_at, pr.bio as solr_post_bio from posts p,users u,profiles pr where p.person_id = u.id and p.person_id = pr.person_id and p.type='StatusMessage' field column=solr_post_person_id / field column=solr_post_first_name/ field column=solr_post_last_name / field column=solr_post_email / field column=solr_post_message / field column=solr_post_status_message_id / field column=solr_post_created_at / field column=solr_post_bio/ /entity /document The `schema.xml` : fields field name=solr_post_status_message_id type=string indexed=true stored=true required=true / field name=solr_post_message type=text_ws indexed=true stored=true required=true / field name=solr_post_bio type=text indexed=false stored=true / field name=solr_post_first_name type=string indexed=false stored=true / field name=solr_post_last_name type=string indexed=false stored=true / field name=solr_post_email type=string indexed=false stored=true / field name=solr_post_created_at type=date indexed=false stored=true / /fields uniqueKeysolr_post_status_message_id/uniqueKey defaultSearchFieldsolr_post_message/defaultSearchField Thanks, Kaushik