Extracting top level URL when indexing document

2018-06-12 Thread Hanjan, Harinder
Hello! I am indexing web documents and have a need to extract their top-level URL to be stored in a different field. I have had some success with the PatternTokenizerFactory (relevant schema bits at the bottom) but the behavior appears to be inconsistent. Most of the times, the top level URL

RE: [EXT] Re: Extracting top level URL when indexing document

2018-06-13 Thread Hanjan, Harinder
. Regards, Alex On Wed, Jun 13, 2018, 01:02 Hanjan, Harinder, wrote: > Hello! > > I am indexing web documents and have a need to extract their top-level > URL to be stored in a different field. I have had some success with > the PatternTokenizerFactory (relevant schema bit

Type ahead functionality using complex phrase query parser

2018-08-15 Thread Hanjan, Harinder
Hello! I can't get Solr to give the results I would expect, would appreciate if someone could point me in the right direction here. /select?q={!complexphrase}"gar*" shows me the following terms -garages -garburator -gardening -gardens -garage -

RE: Type ahead functionality using complex phrase query parser

2018-08-15 Thread Hanjan, Harinder
Keeping the field as string so that no analysis is done on it has yielded promising results. I will test more tomorrow and report back. -Original Message- From: Hanjan, Harinder [mailto:harinder.han...@calgary.ca] Sent: Wednesday, August 15, 2018 5:01 PM To: solr-user

RE: [EXT] Re: field was indexed without position data; cannot run SpanTermQuery

2018-08-22 Thread Hanjan, Harinder
at 9:58 AM, Hanjan, Harinder wrote: > Hello! > > I am doing wildcard queries to satisfy our search type ahead requirement for > both single and mutli word (phrases) queries. > I just noticed this error in the logs. > > 2018-08-22 16:36:48.433 INFO (qtp1654589030-

field was indexed without position data; cannot run SpanTermQuery

2018-08-22 Thread Hanjan, Harinder
Hello! I am doing wildcard queries to satisfy our search type ahead requirement for both single and mutli word (phrases) queries. I just noticed this error in the logs. 2018-08-22 16:36:48.433 INFO (qtp1654589030-18) [ x:suggestions] o.a.s.c.S.Request [suggestions] webapp=/solr

Getting "zip bomb" exception while sending HTML document to solr

2018-04-05 Thread Hanjan, Harinder
Hello! I'm sending a HTML document to Solr and Tika is throwing the "Zip bomb detected!" exception back. Looks like Tika has an arbitrary limit of 100 level of XML element nesting

How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
Hello! Solr (i.e. Tika) throws a "zip bomb" exception with certain documents we have in our Sharepoint system. I have used the tika-app.jar directly to extract the document in question and it does _not_ throw an exception and extract the contents just fine. So it would seem Solr is doing

RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
and prevent it bringing down your Solr installation. Cheers Charlie On 9 April 2018 at 16:59, Hanjan, Harinder <harinder.han...@calgary.ca> wrote: > Hello! > > Solr (i.e. Tika) throws a "zip bomb" exception with certain documents > we have in our Sharepoint

RE: [EXT] Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-09 Thread Hanjan, Harinder
-HO9gO9CysWnvGGoKrSNEuM3U=RkNfel_ImtzaUi1-fKXjGS0tiL3Vg2u2A2HKc0iMBGM=VrGqjG23NC5KbsEV-SZuu6s-Njx_XZRPp4uHkrmM_KY= written by a colleague of mine at Flax. Hope this is useful. Cheers Charlie On 9 April 2018 at 19:26, Hanjan, Harinder <harinder.han...@calgary.ca> wrote: > Thank you Charlie,

RE: Search Analytics Help

2018-04-26 Thread Hanjan, Harinder
This seems promising https://github.com/lucidworks/banana -Original Message- From: Ennio Bozzetti [mailto:ebozze...@thorlabs.com] Sent: Thursday, April 26, 2018 1:39 PM To: solr-user@lucene.apache.org Subject: [EXT] Search Analytics Help Hello, I'm setting up SOLR on an internal

RE: [EXT] Re: Faceting with a multi valued field

2018-09-27 Thread Hanjan, Harinder
query to be a facet query, this will apply the query to the resulting facet set instead of the Communities field itself. -- John Blythe On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder wrote: > Hello! > > I am doing faceting on a field which has multiple values and it's > yielding e

RE: [EXT] Re: Faceting with a multi valued field

2018-09-27 Thread Hanjan, Harinder
2018 at 16:50, John Blythe wrote: > you can update your filter query to be a facet query, this will apply > the query to the resulting facet set instead of the Communities field itself. > > -- > John Blythe > > > On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder >

Faceting with a multi valued field

2018-09-25 Thread Hanjan, Harinder
Hello! I am doing faceting on a field which has multiple values and it's yielding expected but undesireable results. I need different behaviour but not sure how to formulate a query for it. Here is my current setup. = Data Set = { "Communities":["BANFF TRAIL - BNF", "PARKDALE -