Re: Is there a way to capture div tag by id?

2013-06-26 Thread Arcadius Ahouansou
Hi.

I ran into this issue a while ago.
In my case, the div I was trying to extract was the main content of the
page.
If that is your case, boilerpipe way help.
There is a patch at https://issues.apache.org/jira/browse/SOLR-3808  that
worked for me.

Arcadius.


On 25 June 2013 18:17, eShard zim...@yahoo.com wrote:

 let's say I have a div with id=myDiv
 Is there a way to set up the solr upate/extract handler to capture just
 that
 particular div?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is there a way to capture div tag by id?

2013-06-26 Thread Michael Sokolov

On 06/25/2013 01:17 PM, eShard wrote:

let's say I have a div with id=myDiv
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.
   
You might be interested in Lux (see at http://luxdb.org), which provides 
XML-aware indexing for Solr.  It indexes text in the context of every 
element, and also allows you to explicitly define indexes using any 
XPath 2.0 expression, including //div[@id='myDiv'], for example.


--
Michael Sokolov
Senior Architect
Safari Books Online



Is there a way to capture div tag by id?

2013-06-25 Thread eShard
let's say I have a div with id=myDiv
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to capture div tag by id?

2013-06-25 Thread Jack Krupansky
Sorry, but not only can you not capture that specific div, but you cannot 
capture ANY div. Really. For some mysterious reasoning, Tika silently eats 
div HTML parsing events. Plenty of other HTML tags can be captured, but 
not div.


Both the Solr Wiki for Solr Cell and the new/Lucid Apache Solr Reference 
Guide mislead people with examples that clearly can never run as expected 
with real data.


-- Jack Krupansky
-Original Message- 
From: eShard

Sent: Tuesday, June 25, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Is there a way to capture div tag by id?

let's say I have a div with id=myDiv
Is there a way to set up the solr upate/extract handler to capture just that
particular div?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-capture-div-tag-by-id-tp4073120.html
Sent from the Solr - User mailing list archive at Nabble.com.