Re: Is there a way to capture div tag by id?
Hi. I ran into this issue a while ago. In my case, the div I was trying to extract was the main content of the page. If that is your case, boilerpipe way help. There is a patch at https://issues.apache.org/jira/browse/SOLR-3808 that worked for me. Arcadius. On 25 June 2013 18:17, eShard zim...@yahoo.com wrote: let's say I have a div with id=myDiv Is there a way to set up the solr upate/extract handler to capture just that particular div? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to capture div tag by id?
On 06/25/2013 01:17 PM, eShard wrote: let's say I have a div with id=myDiv Is there a way to set up the solr upate/extract handler to capture just that particular div? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html Sent from the Solr - User mailing list archive at Nabble.com. You might be interested in Lux (see at http://luxdb.org), which provides XML-aware indexing for Solr. It indexes text in the context of every element, and also allows you to explicitly define indexes using any XPath 2.0 expression, including //div[@id='myDiv'], for example. -- Michael Sokolov Senior Architect Safari Books Online
Is there a way to capture div tag by id?
let's say I have a div with id=myDiv Is there a way to set up the solr upate/extract handler to capture just that particular div? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to capture div tag by id?
Sorry, but not only can you not capture that specific div, but you cannot capture ANY div. Really. For some mysterious reasoning, Tika silently eats div HTML parsing events. Plenty of other HTML tags can be captured, but not div. Both the Solr Wiki for Solr Cell and the new/Lucid Apache Solr Reference Guide mislead people with examples that clearly can never run as expected with real data. -- Jack Krupansky -Original Message- From: eShard Sent: Tuesday, June 25, 2013 1:17 PM To: solr-user@lucene.apache.org Subject: Is there a way to capture div tag by id? let's say I have a div with id=myDiv Is there a way to set up the solr upate/extract handler to capture just that particular div? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html Sent from the Solr - User mailing list archive at Nabble.com.