Re: [ol-tech] Full text search and some code clarification

Anand Chitipothu Thu, 26 Jun 2014 00:11:26 -0700

Hi Ankush,

I've recently updated the repository with the current schema of search inside 
solr.


https://github.com/internetarchive/openlibrary/blob/master/conf/solr-biblio/inside/conf/schema.xml

Yes, we are still running Solr 1.4. 

Anand
 
On 25-Jun-2014, at 8:45 PM, Ankush wrote:

> Hey Guys,
> 
> I am able to use the schema in solr-biblio/inside/ for the one called by 
> BookReader. However, I feel the schema is a bit outdated. Are you using solr 
> 1.4.1 here? 
> 
> I have made my setup with 1.4.1 , however, I am running into some problems 
> here. For any search query , solr isnt highlighting all the matched 
> occurences . Matches coming in an approximately 54K length of the field are 
> being highlighted. 
> 
> This is the stackoverflow question I have raised for the query - 
> http://stackoverflow.com/questions/24364900/show-all-occurrences-of-query-while-highlighting-in-solr-1-4
> 
> I know this is a solr question, but I suppose if I could have any 
> clarifications on this, it would be immensly helpful.
> Like, If I am using incorrect schema, please direct me to the correct schema.
>  
> Ankush Chadda
> about.me/iamkhush
> 
>  
> 
> 
> On Mon, Jun 16, 2014 at 9:40 AM, Anand Chitipothu <[email protected]> wrote:
> On 16-Jun-2014, at 12:49 AM, Ankush wrote:
> 
>> Hey,
>> 
>> I am trying to implement fulltext search on my website, which uses 
>> openlibrary framework.I dont have prior experience on solr . Can you help me 
>> clear my doubts - 
>> 
>> The schema that you currently use for fulltext search is the inside core of 
>> solr-biblio 
>> (https://github.com/internetarchive/openlibrary/tree/master/conf/solr-biblio).
>>  
> 
> No, it is used for searching work records in openlibray, not fulltext search.
> 
>  http://openlibrary.org/search
>> Is solr-biblio used for all the searches on website?
> 
> No, fulltext search uses completely different solr instance with different 
> schema.
>> Now in order to index the books, I saw the script 
>> inside_all.py(https://github.com/internetarchive/openlibrary/tree/master/openlibrary/solr/inside/index_all.py).This
>>  scripts makes hit to fulltext/abbyy_to_text.php, Gets page_count and body 
>> and uses it to index. Now abby_to_text.php is in the BookReaderIA dir, which 
>> uses extract_paragraph.py to return the data. What I cannot understand is, 
>> that extract_paragraphs.py prints page_count in 'meta:...' 
>> (https://github.com/openlibrary/bookreader/blob/master/BookReaderIA/fulltext/extract_paragraphs.py#L155)
>>  , but abby_to_text.php is trying to fetch a string 'page count' from the 
>> data 
>> (https://github.com/internetarchive/openlibrary/blob/master/openlibrary/solr/inside/index_all.py#L130).
>>  How is this working on your end
> 
> It is not in my head right now (I'm not the one who implemented it). I'll 
> look at how it works and let you know.
> 
> Anand
> 
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> Archives: http://www.mail-archive.com/[email protected]/
> To unsubscribe from this mailing list, send email to 
> [email protected]
> 
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> Archives: http://www.mail-archive.com/[email protected]/
> To unsubscribe from this mailing list, send email to 
> [email protected]

_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/[email protected]/
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Full text search and some code clarification

Reply via email to