date:20201110

exclude a solr node to not to take select requests

2020-11-10 Thread yaswanth kumar

Is there a way where I can configure one solr node to not take the select
requests in a solr cloud?

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com

RE: Using Multiple collections with streaming expressions

2020-11-10 Thread ufuk yılmaz

Thanks again Erick, that’s a good idea!

Alternatively, I use an alias covering multiple collections in these 
situations, but there may be too many combinations of collections, so it’s not 
always suitable.

Merged significantTerms streams will have meaningles scores in tuples I think, 
it would be comparing apples and oranges, but in this case I’m only interested 
in getting foreground counts, so it’s another day’s problem

What seemed strange to me was source code for streams appeared to be handling 
this case.


Sent from Mail for Windows 10

From: Erick Erickson
Sent: 10 November 2020 16:48
To: solr-user@lucene.apache.org
Subject: Re: Using Multiple collections with streaming expressions

Y

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Walter Underwood

By far the simplest solution is to leave stopwords in the index. That also 
improves
relevance, because it becomes possible to search for “vitamin a” or “to be or 
not to be”.

Stopword remove was a performance and disk space hack from the 1960s. It is no 
longer needed. We were keeping stopwords in the index at Infoseek, back in 1996.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 10, 2020, at 1:16 AM, Edward Turner  wrote:
> 
> Hi all,
> 
> Okay, I've been doing more research about this problem and from what I
> understand, phrase queries + stopwords are known to have some difficulties
> working together in some circumstances.
> 
> E.g.,
> https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1
> https://issues.apache.org/jira/browse/SOLR-6468
> 
> I was thinking about workarounds, but each solution I've attempted doesn't
> quite work.
> 
> Therefore, maybe one possible solution is to take a step back and
> preprocess index/query data going to Solr, something like:
> 
> String wordsForSolr = removeStopWordsFrom("This is pretend index or query
> data")
> // wordsForSolr = "pretend index query data"
> 
> Off the top of my head, this will by-pass position issues.
> 
> I will give this a go, but was wondering whether this is something others
> have done?
> 
> Best wishes,
> Edd
> 
> 
> Edward Turner
> 
> 
> On Fri, 6 Nov 2020 at 13:58, Edward Turner  wrote:
> 
>> Hi all,
>> 
>> We are experiencing some unexpected behaviour for phrase queries which we
>> believe might be related to the FlattenGraphFilterFactory and stopwords.
>> 
>> Brief description: when performing a phrase query
>> "Molecular cloning and evolution of the" => we get expected hits
>> "Molecular cloning and evolution of the genes" => we get no hits
>> (unexpected behaviour)
>> 
>> I think it's worthwhile adding the analyzers we use to help you see what
>> we're doing:
>>  Analyzers 
>> >   sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
>>   
>>  > pattern="[- /()]+" />
>>  > ignoreCase="true" />
>>  > preserveOriginal="false" />
>>  
>>  > generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>> splitOnNumerics="0" stemEnglishPossessive="1"
>> generateWordParts="1"
>> catenateNumbers="0" catenateWords="1" catenateAll="1" />
>>  
>>   
>>   
>>  > pattern="[- /()]+" />
>>  > ignoreCase="true" />
>>  > preserveOriginal="false" />
>>  
>>  > generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>> splitOnNumerics="0" stemEnglishPossessive="1"
>> generateWordParts="1"
>> catenateNumbers="0" catenateWords="0" catenateAll="0" />
>>   
>> 
>>  End of Analyzers 
>> 
>>  Stopwords 
>> We use the following stopwords:
>> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
>> of, on, or, such, that, the, their, then, there, these, they, this, to,
>> was, will, with, which
>>  End of Stopwords 
>> 
>>  Analysis Admin page output ---
>> ... And to see what's going on when we're indexing/querying, I created a
>> gist with an image of the (non-verbose) output of the analysis admin page
>> for, index data/query, "Molecular cloning and evolution of the genes":
>> 
>> https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png
>> 
>> Hopefully this link works, and you can see that the resulting terms and
>> positions are identical until the FlattenGraphFilterFactory step in the
>> "index" phase.
>> 
>> Final stage of index analysis:
>> (1)molecular (2)cloning (3) (4)evolution (5) (6)genes
>> 
>> Final stage of query analysis:
>> (1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes
>> 
>> The empty positions are because of stopwords (presumably)
>>  End of Analysis Admin page output ---
>> 
>> Main question:
>> Could someone explain why the FlattenGraphFilterFactory changes the
>> position of the "genes" token? From what we see, this happens after a,
>> "the" (but we've not checked exhaustively, and continue to test).
>> 
>> Perhaps, we are doing something wrong in our analysis setup?
>> 
>> Any help would be much appreciated -- getting phrase queries to work is an
>> important use-case of ours.
>> 
>> Kind regards and thank you in advance,
>> Edd
>> 
>> Edward Turner
>>

Re: s3 or other cloud hosted storage options?

2020-11-10 Thread Edward Ribeiro

Not yet. People at Salesforce are working on shared blob storage for Solr,
but afaik they are redesigning the approach taken, that is, still under
active development and not production ready. See the talks below:

https://www.youtube.com/watch?v=6fE5KvOfb6A

https://www.youtube.com/watch?v=UeTFpNeJ1Fo=19s

Best,
Edward

On Mon, Oct 19, 2020 at 5:43 PM Michael Conrad  wrote:

> Hi all,
>
> Hopefully someone can provide insight.
>
> We are looking to see if there are any viable options for S3 or similar
> for index/data storage.
>
> Preferably (if possible) shared between nodes for dynamic scalability
> needs.
>
> -Mike/NewsRx
>

Re: Using Multiple collections with streaming expressions

2020-11-10 Thread Erick Erickson

You need to open multiple streams, one to each collection then combine them. 
For instance,
open a significantTerms stream to collection1, another to collection2 and wrap 
both
in a merge stream.

Best,
Erick

> On Nov 9, 2020, at 1:58 PM, ufuk yılmaz  wrote:
> 
> For example the streaming expression significantTerms:
> 
> https://lucene.apache.org/solr/guide/8_4/stream-source-reference.html#significantterms
> 
> 
> significantTerms(collection1,
> q="body:Solr",
> field="author",
> limit="50",
> minDocFreq="10",
> maxDocFreq=".20",
> minTermLength="5")
> 
> Solr supports querying multiple collections at once, but I can’t figure  out 
> how I can do that with streaming expressions.
> When I try enclosing them in quotes like:
> 
> significantTerms(“collection1, collection2”,
> q="body:Solr",
> field="author",
> limit="50",
> minDocFreq="10",
> maxDocFreq=".20",
> minTermLength="5")
> 
> It gives the error: "EXCEPTION":"java.io.IOException: Slices not found for \" 
> collection1, collection2\""
> I think Solr thinks quotes as part of the collection names, hence it can’t 
> find slices for it.
> 
> When I just use it without quotes:
> significantTerms(collection1, collection2,…
> It gives the error: "EXCEPTION":"invalid expression 
> significantTerms(collection1, collection2, …
> 
> I tried single quotes, escaping the quotation mark but nothing Works…
> 
> Any ideas?
> 
> Best, ufuk
> 
> Windows 10 için Posta ile gönderildi
>

get 'columns' name via JDBC ?

2020-11-10 Thread Vincent Bossuet

Hi all,

I'm trying to use DatabaseMetaData java object, to get "column" names
dynamically.

I can get some informations with this object, but not so much. Is there
some documentation to see what is possible via JDBC Driver (metadata,
sql...) ?
Thanks !

Vincent
---
For example, I get :

databaseMetaData.getSchemas() :
  => TABLE_SCHEM : localhost:9983, TABLE_CATALOG : null
  => TABLE_SCHEM : metadata, TABLE_CATALOG : null

databaseMetaData.getCatalogs() :
  => TABLE_CAT  : null

databaseMetaData.getTables(null, null, null, null) :
  => TABLE_CAT : null, TABLE_SCHEM : localhost:9983, TABLE_NAME : test,
TABLE_TYPE : TABLE, REMARKS : null
  => TABLE_CAT : null, TABLE_SCHEM : metadata, TABLE_NAME : COLUMNS,
TABLE_TYPE : SYSTEM_TABLE, REMARKS : null
  => TABLE_CAT : null, TABLE_SCHEM : metadata, TABLE_NAME : TABLES,
TABLE_TYPE : SYSTEM_TABLE, REMARKS : null
(columns with index >6 not available)

databaseMetaData.getColumns(null, null, null, null) :
  => null

databaseMetaData.getColumns(null, "metadata", "COLUMNS", null) :
  => null

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Edward Turner

Hi all,

Okay, I've been doing more research about this problem and from what I
understand, phrase queries + stopwords are known to have some difficulties
working together in some circumstances.

E.g.,
https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1
https://issues.apache.org/jira/browse/SOLR-6468

I was thinking about workarounds, but each solution I've attempted doesn't
quite work.

Therefore, maybe one possible solution is to take a step back and
preprocess index/query data going to Solr, something like:

String wordsForSolr = removeStopWordsFrom("This is pretend index or query
data")
// wordsForSolr = "pretend index query data"

Off the top of my head, this will by-pass position issues.

I will give this a go, but was wondering whether this is something others
have done?

Best wishes,
Edd


Edward Turner


On Fri, 6 Nov 2020 at 13:58, Edward Turner  wrote:

> Hi all,
>
> We are experiencing some unexpected behaviour for phrase queries which we
> believe might be related to the FlattenGraphFilterFactory and stopwords.
>
> Brief description: when performing a phrase query
> "Molecular cloning and evolution of the" => we get expected hits
> "Molecular cloning and evolution of the genes" => we get no hits
> (unexpected behaviour)
>
> I think it's worthwhile adding the analyzers we use to help you see what
> we're doing:
>  Analyzers 
> sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
>
> pattern="[- /()]+" />
> ignoreCase="true" />
> preserveOriginal="false" />
>   
> generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>  splitOnNumerics="0" stemEnglishPossessive="1"
> generateWordParts="1"
>  catenateNumbers="0" catenateWords="1" catenateAll="1" />
>   
>
>
> pattern="[- /()]+" />
> ignoreCase="true" />
> preserveOriginal="false" />
>   
> generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>  splitOnNumerics="0" stemEnglishPossessive="1"
> generateWordParts="1"
>  catenateNumbers="0" catenateWords="0" catenateAll="0" />
>
> 
>  End of Analyzers 
>
>  Stopwords 
> We use the following stopwords:
> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
> of, on, or, such, that, the, their, then, there, these, they, this, to,
> was, will, with, which
>  End of Stopwords 
>
>  Analysis Admin page output ---
> ... And to see what's going on when we're indexing/querying, I created a
> gist with an image of the (non-verbose) output of the analysis admin page
> for, index data/query, "Molecular cloning and evolution of the genes":
>
> https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png
>
> Hopefully this link works, and you can see that the resulting terms and
> positions are identical until the FlattenGraphFilterFactory step in the
> "index" phase.
>
> Final stage of index analysis:
> (1)molecular (2)cloning (3) (4)evolution (5) (6)genes
>
> Final stage of query analysis:
> (1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes
>
> The empty positions are because of stopwords (presumably)
>  End of Analysis Admin page output ---
>
> Main question:
> Could someone explain why the FlattenGraphFilterFactory changes the
> position of the "genes" token? From what we see, this happens after a,
> "the" (but we've not checked exhaustively, and continue to test).
>
> Perhaps, we are doing something wrong in our analysis setup?
>
> Any help would be much appreciated -- getting phrase queries to work is an
> important use-case of ours.
>
> Kind regards and thank you in advance,
> Edd
> 
> Edward Turner
>

exclude a solr node to not to take select requests

RE: Using Multiple collections with streaming expressions

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

Re: s3 or other cloud hosted storage options?

Re: Using Multiple collections with streaming expressions

get 'columns' name via JDBC ?

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

7 matches

Site Navigation

Mail list logo

Footer information