Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-13 Thread Bob Stayton
I would be willing to help with this effort, but not lead it.  If 
someone were willing to evaluate better alternatives and integrate the 
code into DocBook's webhelp, I could write the XSL templates that 
generate the index files.


Bob Stayton
Sagehill Enterprises
b...@sagehill.net

On 11/11/2016 1:45 AM, Fekete, Róbert wrote:

Hi,

A quick google search shows lunr.js
(https://github.com/olivernn/lunr.js ), and a possibly improved for of
it called elasticlunrjs (http://elasticlunr.com/). Does anyone have an
idea about how to integrate one of these into webhelp? (Specifically,
how to generate the index file form the webhelp HTMLs? This post has
some pointers, but I'm not a developer, so I'm unsure how to get it
going: https://29a.ch/2014/12/03/full-text-search-example-lunrjs )

HTH,

Robert


On Thu, Nov 10, 2016 at 9:21 PM, Jan Tosovsky > wrote:

On 2016-11-10 Janice Manwiller wrote:
>
> The WebHelp search is a source of frustration, mostly because it does
> not support phrase searches...
>
> Has there been any effort to improve the search? Has anyone else
> implemented a custom search that supports phrase searches?
>

In the current implementation there is no way to do it. When search
index is built, the original content is split into words, from which
kind of look-up table is created (which word in which file is present).

When search phrase is entered, it is again split into separate words
and each of them is searched in that look-up table.

The result is the number of occurrences of the given word in the
particual file, which is used for ordering the search results.

In case of phrase searches the search index would have to store the
full content. When performing the search all those content snippets
would have to be processed using more complex algorithms.

But instead of reinventing the wheel I believe there are some
lightweight JavaScript ports of Lucene engine, which could be
somehow integrated. However, I am not expert in this field.

Jan


-
To unsubscribe, e-mail:
docbook-apps-unsubscr...@lists.oasis-open.org

For additional commands, e-mail:
docbook-apps-h...@lists.oasis-open.org





-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-11 Thread Fekete , Róbert
Hi,

A quick google search shows lunr.js (https://github.com/olivernn/lunr.js ),
and a possibly improved for of it called elasticlunrjs (
http://elasticlunr.com/). Does anyone have an idea about how to integrate
one of these into webhelp? (Specifically, how to generate the index file
form the webhelp HTMLs? This post has some pointers, but I'm not a
developer, so I'm unsure how to get it going:
https://29a.ch/2014/12/03/full-text-search-example-lunrjs )

HTH,

Robert


On Thu, Nov 10, 2016 at 9:21 PM, Jan Tosovsky  wrote:

> On 2016-11-10 Janice Manwiller wrote:
> >
> > The WebHelp search is a source of frustration, mostly because it does
> > not support phrase searches...
> >
> > Has there been any effort to improve the search? Has anyone else
> > implemented a custom search that supports phrase searches?
> >
>
> In the current implementation there is no way to do it. When search index
> is built, the original content is split into words, from which kind of
> look-up table is created (which word in which file is present).
>
> When search phrase is entered, it is again split into separate words and
> each of them is searched in that look-up table.
>
> The result is the number of occurrences of the given word in the particual
> file, which is used for ordering the search results.
>
> In case of phrase searches the search index would have to store the full
> content. When performing the search all those content snippets would have
> to be processed using more complex algorithms.
>
> But instead of reinventing the wheel I believe there are some lightweight
> JavaScript ports of Lucene engine, which could be somehow integrated.
> However, I am not expert in this field.
>
> Jan
>
>
> -
> To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
>
>


RE: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-10 Thread Jan Tosovsky
On 2016-11-10 Janice Manwiller wrote:
> 
> The WebHelp search is a source of frustration, mostly because it does
> not support phrase searches...
> 
> Has there been any effort to improve the search? Has anyone else
> implemented a custom search that supports phrase searches?
> 

In the current implementation there is no way to do it. When search index is 
built, the original content is split into words, from which kind of look-up 
table is created (which word in which file is present).

When search phrase is entered, it is again split into separate words and each 
of them is searched in that look-up table.

The result is the number of occurrences of the given word in the particual 
file, which is used for ordering the search results.

In case of phrase searches the search index would have to store the full 
content. When performing the search all those content snippets would have to be 
processed using more complex algorithms. 

But instead of reinventing the wheel I believe there are some lightweight 
JavaScript ports of Lucene engine, which could be somehow integrated. However, 
I am not expert in this field.

Jan


-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-10 Thread Peter Lavin
Current versions of Lucene are capable of finding single words or
phrases and can use various operators such as 'AND', 'NOT' etc. (See
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html.) So the
search packaged with Webhelp is either an old version of Lucene or not
Lucene at all. Other search capabilities packaged with documentation
tools seem to suffer from the same limitations, the search used by
Sphinx (http://www.sphinx-doc.org/) for example.

I have used Apache Solr--a web wrapper for Lucene--with DocBook HTML
and PHP output and been very pleased with the result. But you're
right, people expect  Google-like behaviour and results and it's
difficult to get them to use the syntax required for sophisticated
Solr searches.

My 2 cents worth,

Peter

On 10 November 2016 at 09:07, Janice Manwiller  wrote:
> Actually, it looks like I get the same results with or without the quotes.
> I'm not sure the quotes are enforcing phrase searching.
>
> The results list for either source connectors or "source connectors" is
> divided into the following sections:
>
> Results for: connectors, source
> Results for: source
> Results for: connectors
>
> Janice
>
> On Thu, Nov 10, 2016 at 8:41 AM, Barton Wright  wrote:
>>
>> Of course, using quotes around “source connectors” does enforce phrase
>> searching. But you’re right that having that become more automatic as in
>> Google searches would be wonderful.
>>
>> As I remember DocBook WebHelp ends up with Apache Lucene as its search
>> engine, and Lucene is quite good. But, alas, the world has gotten very used
>> to Dr. Google.
>>
>> On Nov 10, 2016, at 7:09 AM, Camille Bégnis  wrote:
>>
>> We've been facing the issue too, and any tip will be appreciated.
>>
>> The issue is that people are used to Google Search, and doing that on the
>> client side is not easy ;-)
>>
>> Cheers,
>>
>> NeoDoc
>> Camille Bégnis
>> Gérant
>> cami...@neodoc.fr
>> Tél: 04.42.52.24.20
>> http://www.neodoc.fr/
>> 789, rue de la gare
>> F-13770 Venelles
>>
>> Le 10/11/2016 à 13:02, Janice Manwiller a écrit :
>>
>> I use DocBook source with the docbkx Maven plugin to generate PDFs,
>> WebHelp, and some HTML.
>>
>> The WebHelp search is a source of frustration, mostly because it does not
>> support phrase searches. So if you search for "source connectors", it looks
>> for topics that have either the word source or the word connectors. While
>> topics that contain both words do bubble to the top, the search does not
>> specifically look for the phrase "source connectors", where both words are
>> together in that order.
>>
>> Has there been any effort to improve the search? Has anyone else
>> implemented a custom search that supports phrase searches?
>>
>> Thanks,
>>
>> Janice
>>
>> --
>> Janice Manwiller
>> Principal Technical Writer
>> Sqrrl Data, Inc.
>> www.sqrrl.com | @SqrrlData
>>
>>
>>
>
>
>
> --
> Janice Manwiller
> Principal Technical Writer
> Sqrrl Data, Inc.
> www.sqrrl.com | @SqrrlData



-- 

Peter Lavin
Telephone:  1 416 461 4991
Mobile:1 416 882 9194
Skype: peter.lavin
(GMT -05:00 Canada/US Eastern)

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-10 Thread Janice Manwiller
Actually, it looks like I get the same results with or without the quotes.
I'm not sure the quotes are enforcing phrase searching.

The results list for either source connectors or "source connectors" is
divided into the following sections:

Results for: connectors, source
Results for: source
Results for: connectors

Janice

On Thu, Nov 10, 2016 at 8:41 AM, Barton Wright  wrote:

> Of course, using quotes around “source connectors” does enforce phrase
> searching. But you’re right that having that become more automatic as in
> Google searches would be wonderful.
>
> As I remember DocBook WebHelp ends up with Apache Lucene as its search
> engine, and Lucene is quite good. But, alas, the world has gotten very used
> to Dr. Google.
>
> On Nov 10, 2016, at 7:09 AM, Camille Bégnis  wrote:
>
> We've been facing the issue too, and any tip will be appreciated.
>
> The issue is that people are used to Google Search, and doing that on the
> client side is not easy ;-)
>
> Cheers,
> NeoDoc
> Camille Bégnis
> Gérant
> cami...@neodoc.fr
> Tél: 04.42.52.24.20
> http://www.neodoc.fr/
> 789, rue de la gare
> F-13770 Venelles
>
> Le 10/11/2016 à 13:02, Janice Manwiller a écrit :
>
> I use DocBook source with the docbkx Maven plugin to generate PDFs,
> WebHelp, and some HTML.
>
> The WebHelp search is a source of frustration, mostly because it does not
> support phrase searches. So if you search for "source connectors", it looks
> for topics that have either the word source or the word connectors. While
> topics that contain both words do bubble to the top, the search does not
> specifically look for the phrase "source connectors", where both words are
> together in that order.
>
> Has there been any effort to improve the search? Has anyone else
> implemented a custom search that supports phrase searches?
>
> Thanks,
>
> Janice
>
> --
> Janice Manwiller
> Principal Technical Writer
> Sqrrl Data, Inc.
> www.sqrrl.com | @SqrrlData
>
>
>
>


-- 
Janice Manwiller
Principal Technical Writer
Sqrrl Data, Inc.
www.sqrrl.com | @SqrrlData


Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-10 Thread Barton Wright
Of course, using quotes around “source connectors” does enforce phrase 
searching. But you’re right that having that become more automatic as in Google 
searches would be wonderful. 

As I remember DocBook WebHelp ends up with Apache Lucene as its search engine, 
and Lucene is quite good. But, alas, the world has gotten very used to Dr. 
Google.

> On Nov 10, 2016, at 7:09 AM, Camille Bégnis  wrote:
> 
> We've been facing the issue too, and any tip will be appreciated.
> 
> The issue is that people are used to Google Search, and doing that on the 
> client side is not easy ;-)
> 
> Cheers,
> 
> NeoDoc
> Camille Bégnis
> Gérant
> cami...@neodoc.fr 
> Tél: 04.42.52.24.20
> http://www.neodoc.fr/ 
> 789, rue de la gare
> F-13770 Venelles
> 
> Le 10/11/2016 à 13:02, Janice Manwiller a écrit :
>> I use DocBook source with the docbkx Maven plugin to generate PDFs, WebHelp, 
>> and some HTML.
>> 
>> The WebHelp search is a source of frustration, mostly because it does not 
>> support phrase searches. So if you search for "source connectors", it looks 
>> for topics that have either the word source or the word connectors. While 
>> topics that contain both words do bubble to the top, the search does not 
>> specifically look for the phrase "source connectors", where both words are 
>> together in that order.
>> 
>> Has there been any effort to improve the search? Has anyone else implemented 
>> a custom search that supports phrase searches?
>> 
>> Thanks,
>> 
>> Janice
>> 
>> -- 
>> Janice Manwiller
>> Principal Technical Writer
>> Sqrrl Data, Inc.
>> www.sqrrl.com  | @SqrrlData
> 



Re: [docbook-apps] WebHelp search - anybody working on improving?

2016-11-10 Thread Camille Bégnis
We've been facing the issue too, and any tip will be appreciated.

The issue is that people are used to Google Search, and doing that on
the client side is not easy ;-)

Cheers,

NeoDoc
Camille Bégnis
Gérant
cami...@neodoc.fr
Tél: 04.42.52.24.20
http://www.neodoc.fr/
789, rue de la gare
F-13770 Venelles

Le 10/11/2016 à 13:02, Janice Manwiller a écrit :
> I use DocBook source with the docbkx Maven plugin to generate PDFs,
> WebHelp, and some HTML.
>
> The WebHelp search is a source of frustration, mostly because it does
> not support phrase searches. So if you search for "source connectors",
> it looks for topics that have either the word source or the word
> connectors. While topics that contain both words do bubble to the top,
> the search does not specifically look for the phrase "source
> connectors", where both words are together in that order.
>
> Has there been any effort to improve the search? Has anyone else
> implemented a custom search that supports phrase searches?
>
> Thanks,
>
> Janice
>
> -- 
> Janice Manwiller
> Principal Technical Writer
> Sqrrl Data, Inc.
> www.sqrrl.com  | @SqrrlData