On Sun, Aug 16, 2009 at 9:45 AM, swamynathan <mesw...@gmail.com> wrote:

> hi,
> im swamynathan a computer science engineering studying in jaya engg college
> which is under anna univercity,chennai,India
> as a part of my curriculam in the final year i need to do a proj.
> i spoke with some solr users and programmers and found out that all content
> that are indexed to it are stored in a plain text and the structure is not
> preserver(as in the heading,bold,underlined all have same preference)
>

Welcome to Solr!

First, I think you do not have the right information about Solr. Solr stores
documents which have key/value pairs where some values may be multi-valued.
If you store html in a field, it will be stored as text. If you can parse
your html and store headline, bold, italics as separate fields, Solr will
store them separately.


>
>
> !) ABC:
>  <para>.........
> ..........
> .........
>
> 2)XYZ:
> <para>........
> ABC.........
> ......
>
>
> now ABC in both have same prefeence though the first one should have a
> higher one as it is in heading
>

You may be interested in Apache Nutch which is a crawler/indexer. AFAIK, it
already does these kind of things.

Apart from this particular thing, there are lots of things to be done in
Solr. We're close to release 1.4 but we have lots of interesting things
coming up for 1.5:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310230&fixfor=12313566

http://wiki.apache.org/solr/HowToContribute



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to