Re: $deleteDocByQuery and $deleteDocByID

2019-11-11 Thread Paresh
Hi Erik,

I am also looking for some example of deleteDocByQuery. Here is my
requirement -

I want to do the database query and get the list of values of which matching
documents should be deleted from Solr.

I want to delete the docs which matches following query 
SolrColumnName:

This  will come from query executed on RDBMS -
select columnName from Table where state = 'deleted'

This columnName is the value populated in Solr for SolrColulmnName.

Regards,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paresh
Hi Paras,

I have db-data-config.xml file with entities defined for data population and
update.

As a part of some process, I want to delete the documents from Solr for data
deleted in RDBMS. For this purpose I am writing SQL to fetch data from RDBMS
and delete it in Solr using $deleteDocByQuery.

I want to debug if the query getting formed is correct or not -

Here is the snippet of entity -






The said entity, I am not able to delete the docs from solr using
delta-import operation, so wanted to debug it.

Regards,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Question about memory usage and file handling

2019-11-11 Thread Erick Erickson
(1) no. The internal Ram buffer will pretty much limit the amount of heap used 
however.

(2) You actually have several segments. “.cfs” stands for “Compound File”, see: 

https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html
"An optional "virtual" file consisting of all the other index files for systems 
that frequently run out of file handles.”

IOW, _0.cfs is a complete segment. _1.cfs is a different, complete segment etc. 
The merge policy (TieredMergePolicy) controls when these are used .vs. the 
segment being kept in separate files.

New segments are created whenever the ram buffer is flushed or whenever you do 
a commit (closing the IW also creates a segment IIUC). However, under control 
of the merge policy, segments are merged. See: 
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

You’re confusing closing a writer with merging segments. Essentially, every 
time a commit happens, the merge policy is called to determine if segments 
should be merged, see Mike’s blog above.

Additionally, you say "I was hoping there would be only _0.cfs file”. This’ll 
pretty much never happen. Segment names always increase, at best you’d have 
something like _ab.cfs, if not 10-15 _ab* files.

Lucene likes file handles, essentially when searching a file handle will be 
open for _every_ file in your index all the time.

All that said, counting the number of files seems like a waste of time. If 
you’re running on a *nix box, the usual (Solr I’ll admit, but I think it 
applies to Lucene as well) is to set the limit to 65K or so.

And if you’re truly concerned, and since you say this is an immutable, you can 
do a forceMerge. Prior to Lucene 7.5, the would by default form exactly one 
segment. For Lucene 7.5 and later, it’ll respect max segment size (a parameter 
in TMP, defaults to 5g) unless you specify a segment count of 1.

Best,
Erick

> On Nov 11, 2019, at 5:47 PM, Shawn Heisey  wrote:
> 
> On 11/11/2019 1:40 PM, siddharth teotia wrote:
>> I have a few questions about Lucene indexing and file handling. It would be
>> great if someone can help with these. I had earlier asked these questions
>> on gene...@lucene.apache.org but was asked to seek help here.
> 
> This mailing list (solr-user) is for Solr.  Questions about Lucene do not 
> belong on this list.
> 
> You should ask on the java-user mailing list, which is for questions related 
> to the core (Java) version of Lucene.
> 
> http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
> 
> I have put the original sender address in the BCC field just in case you are 
> not subscribed here.
> 
> Thanks,
> Shawn



Re: different results in numFound vs using the cursor

2019-11-11 Thread Chris Hostetter


Based on the info provided, it's hard to be certain, but reading between 
the lines here are hte assumptions i'm making...

1) your core name is "dbtr"
2) the uniqueId field for the "dbtr" core is "debtor_id"

..are those assumptions correct?

Two key pieces of information that doesn't seem to be assumable from the 
imfo you've provided:

a) What is the fieldType of the uniqueKey field in use?
b) how are you determining that "The numFound: 35008"

...

You show the code that prints out "size of solrResults: 22006" but nothing 
in your code ever prints $numFound.  there is a snippet of code at the top 
of your perl logic that seems disconnected from the rest of the code which 
makes me think that before you do anything with a cursor you are already 
parsing some *other* query response to get $numFound that way...

: i am using this logic in perl:
: 
: my $decoded = decode_json( $solrResponse->{_content} );
: my $numFound = $decoded->{response}{numFound};
: 
: $cursor = "*";
: $prevCursor = '';
: 
: while ( $prevCursor ne $cursor )
: {
:   my $solrURI = "\"http://[SOLR URL]:8983/solr/";
:   $solrURI .= $fdat{core};
...

...what exactly does all the code *before* this look like? what is the 
request that you are using to get that initial '$solrResponse' that you 
are parsing to extract '$numFound'  are you sure it's exactly the same as 
the query whose cursor you are iterating over?

It looks like you are (also) extracting 'my $numFound = 
$decoded->{response}{numFound};' on every (cusor) request ... what do you 
get if add this to your cursor loop...

   print STDERR "numFound = $numFound at '$cursor'";


...because unless documents are being added/deleted as you iterate over 
hte cursor, the numFound value should be consistent on each request.


-Hoss
http://www.lucidworks.com/


Re: Question about memory usage and file handling

2019-11-11 Thread Shawn Heisey

On 11/11/2019 1:40 PM, siddharth teotia wrote:

I have a few questions about Lucene indexing and file handling. It would be
great if someone can help with these. I had earlier asked these questions
on gene...@lucene.apache.org but was asked to seek help here.


This mailing list (solr-user) is for Solr.  Questions about Lucene do 
not belong on this list.


You should ask on the java-user mailing list, which is for questions 
related to the core (Java) version of Lucene.


http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

I have put the original sender address in the BCC field just in case you 
are not subscribed here.


Thanks,
Shawn


Multiple versions of same documents with different effective dates

2019-11-11 Thread Susheel Kumar
Hello,

I am trying to keep multiple versions of same document (empId,
empName,deptID,effectiveDt,empTitle..,..) with different effective dates
(composite key: deptID,empID,effectiveDt) but mark/ soft delete (deleted=Y)
the older ones and keep deleted=N for the latest one.

This way i can query the latest one (AND deleted=N) and if required
show all of them.

I am thinking to do this in processAdd / ScriptUpdateProcessor and query
Solr to first see if there is any existing record with deptID,empID and
then update those with deleted=Y and then process new one with deleted=N.

Any suggestions or issues you see with this approach?

Thanks,
Susheel

P.S.  I need to figure out how update another document same time
in processAdd


Question about memory usage and file handling

2019-11-11 Thread siddharth teotia
Hi All,

I have a few questions about Lucene indexing and file handling. It would be
great if someone can help with these. I had earlier asked these questions
on gene...@lucene.apache.org but was asked to seek help here.


(1) During indexing, is there any knob to tell the writer to use off-heap
for buffering. I didn't find anything in the docs so probably the answer is
no. Just confirming.

(2) I did some experiments with buffering threshold using
setMaxRAMBufferSizeMB() on IndexWriterConfig. I varied it from 16MB
(default), 128MB, 256MB and 512MB. The experiment was ingesting 5million
documents. It turns out that buffering threshold also controls the number
of files that are created in the index directory. In all the cases, I see
only 1 segment (since there was just one segments_1) file but there were
multiple .cfs files  -- _0.cfs, _1.cfs, _2.cfs, _3.cfs.

How can there be multiple cfs files when there is just one segment? My
understanding from the documentation was that all files for each segment
will have the same name but different extension. In this case, even though
there is only 1 segment, there are still cfs files. Does each flush result
in a new file?

The reason to do this experiment is to understand the number of open files
both while building the index and querying. I am not quite sure why I am
seeing multiple CFS files when there is only 1 segment. I was hoping there
would be only_0.cfs file.  This is true when buffer threshold is 512MB, but
there are 2 cfs files when threshold is set to 256MB, 5 cfs files when set
to 128MB and I didn't see the CFS file for the default 16MB threshold.
There were individual files (.fdx, .fdt, .tip etc). I thought by default
Lucene creates a compound file at least after the writer closes. Is that
not true?

I can see that during querying, only the cfs file is kept opened. But I
would like to understand a little bit about the number of cfs files and
based on that we can set the buffering threshold to control the heap
overhead while building the index.

(2) In my experiments, the writer commits and is closed after ingesting all
the 5million documents and after that there is no need for us to index
more. So essentially it is an immutable index. However, I want to
understand the threshold for creating a new segment. Is that pretty high?
Or if the writer is reopened, then the next set of documents will go into
the next segment and so on?

I would really appreciate some help with above questions.

Thanks,
Siddharth


Re: Full-text search for Solr manual

2019-11-11 Thread Alexandre Rafalovitch
Grep on the source of the manual (which ships with Solr source).

Google search with domain or keywords limitations.

Online copy searching is not powered by Solr yet. Yes, we are aware of the
irony and are discussing it.

Regards,
Aled

On Tue, Nov 12, 2019, 1:25 AM Luke Miller,  wrote:

> Hi,
>
>
>
> I just noticed that since Solr 8.2 the Apache Solr Reference Guide is not
> available anymore as PDF.
>
>
>
> Is there a way to perform a full-text search using the HTML manual? E.g.
> I'd
> like to find every hit for "luceneMatchVersion".
>
>
>
> *   Using the integrated "Page title lookup." does not find anything (
> -
> sure, it only looks up page titles. )
> *   Google does not return anything either searching for:
> site:https://lucene.apache.org/solr/guide/8_3/ luceneMatchVersion
>
>
>
> Is there another search method I missed?
>
>
>
> Thanks.
>
>


different results in numFound vs using the cursor

2019-11-11 Thread rhys J
i am using this logic in perl:

my $decoded = decode_json( $solrResponse->{_content} );
my $numFound = $decoded->{response}{numFound};

$cursor = "*";
$prevCursor = '';

while ( $prevCursor ne $cursor )
{
  my $solrURI = "\"http://[SOLR URL]:8983/solr/";
  $solrURI .= $fdat{core};

  $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc";
  $solrOptions = "/select?indent=on=$getrows=$solrSort=";
  $solrURI .= $solrOptions;
  $solrURI .= $query;

 $solrURI .= ( $prevCursor eq '' ) ? "=*\"":
 "=$cursor\"";

 print STDERR "solrURI '$solrURI'\n";
 my $solrResponse = $ua->post( $solrURI );
   my $decoded = decode_json( $solrResponse->{_content} );
  my $numFound = $decoded->{response}{numFound};

 foreach my $d ( $decoded->{response}{docs} )
  {
  my @docs = @$d;
  print STDERR "size of docs '" . scalar( @docs ) . "'\n";
   foreach my $r ( @docs )
   {
   if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' )
   {
   push ( @solrResults, $r->{debtor_id} );
   }
   elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' )
   {
   push ( @solrResults, $r->{debt_id} );
   }
   }

}
   $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor;
 $cursor = $decoded->{nextCursorMark};
  print STDERR "cursor '$cursor'\n";
  print STDERR "prevCursor '$prevCursor'\n";
  print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n";
}

print out:

http://[SOLR
URL]:8983/solr/debt/select?indent=on=1000=id+asc=debt_id:
608384 OR debt_id: 393291=AoEmMzkzMjkx

The numFound: 35008
final size of solrResults: 22006

Am I missing something I should be using with cursorMark? Or is this
expected?

I've checked my logic, and I'm using the cursors the way this page is using
them in examples:

https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

Thanks

Rhys


Re: sort by score in join with geodist()

2019-11-11 Thread Mikhail Khludnev
Is it something like  https://issues.apache.org/jira/browse/SOLR-10673 ?

On Mon, Nov 11, 2019 at 3:47 PM Vasily Ogar  wrote:

> it's show nothing because I got an error
> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.search.SyntaxError"],
> "msg":"org.apache.solr.search.SyntaxError:
> geodist - not enough parameters:[]",
>
> If I set parameters then I got another error
> "metadata":[ "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.common.SolrException"], "msg":"A
> ValueSource isn't directly available from this field. Instead try a query
> using the distance as the score.",
>
> On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev  wrote:
>
> > Hello, Vasily.
> > Why not? What have you got in debugQuery=true?
> >
> > On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar 
> wrote:
> >
> > > Hello,
> > > Is it possible to sort by score in join by geodist()? For instance,
> > > something like this
> > > q={!join from=site_id to=site_id fromIndex=stores score=max}
> > > +{!func}gedist() +{!geofilt sfield=coordinates
> > > pt=54.6973867999,25.22481530046 d=10}
> > > sort=score desc
> > > Thank you
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-11 Thread Antony Alphonse
Thank you both. I will look into the options.

-AA

On Mon, Nov 11, 2019 at 6:05 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Antony,
> Like Erick explained, you still have to preprocess your field in order to
> be able to use doc values. What you can do is use update request processor
> chain and have all the logic in Solr. Here is blog post explaining how it
> could work:
> https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html <
> https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 10 Nov 2019, at 15:54, Erick Erickson 
> wrote:
> >
> > So “lowercase” is, indeed, a solr.TextField, which is ineligible for
> docValues. Given that definition, the difference will be that a “string”
> type is totally un-analyzed, so the values that go into the index and the
> query itself will be case-sensitive. You’ll have to pre-process both to do
> the right thing.
> >
> >> On Nov 9, 2019, at 6:15 PM, Antony Alphonse 
> wrote:
> >>
> >> Hi Shawn,
> >>
> >> Thank you. I switched the fieldType=string and it worked. I might have
> to
> >> check on the use-case to see if "string" will work for us.
> >>
> >> I have noted the "lowercase" field type which I believe is similar to
> the
> >> one in schema ver 1.6.
> >>
> >>
> >>  >>   positionIncrementGap="100">
> >>   
> >>>> class="solr.KeywordTokenizerFactory" />
> >>class="solr.LowerCaseFilterFactory"
> >> />
> >>   
> >>   
> >>
> >> Thanks,
> >> Antony
> >>
> >> On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
> >> wrote:
> >>
> >>> We can’t answer whether you should change the field type for two
> reasons:
> >>>
> >>> 1> It depends on your use case.
> >>> 2> we don’t know what the field type “lowercase” does. It’s composed
> of an
> >>> analysis chain that you may have changed. And whatever config you are
> using
> >>> may have changed with different releases of Solr.
> >>>
> >>> Grouping is generally done on a docValues-eligible field type. AFAIK,
> >>> “lowercase” is a solr-text based field so is ineligible for docValues.
> I’ve
> >>> got to guess here, but I’d suggest you start with a fieldType of
> “string”,
> >>> and enable docValues on it.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>>
> >>>
>  On Nov 9, 2019, at 12:54 AM, Antony Alphonse <
> antonyaugus...@gmail.com>
> >>> wrote:
> 
> >
> > Hi Shawn,
> >
> 
>  I will try that solution. Also I had to mention that the queries that
> >>> fail
>  with this error has the "group.field":"lowercase". Should I change the
>  field type?
> 
>  Thanks,
>  Antony
> >>>
> >>>
> >
>
>


Re: Printing NULL character in log files.

2019-11-11 Thread Chris Hostetter


: Some of the log files that Solr generated contain <0x00> (null characters)
: in log files (like below)

I don't know of any reason why solr would write any null bytes to the 
logs, and certainly not in either of the places mentioned in your examples 
(where it would be at the end of an otherwise "complete" log message).  If 
those null bytes are in fact being written by the SOlr JVM they would have 
to have come from log4j.  (the Logger abstraction would ensure that if 
they came from Solr they would still have date/time/level prefix, etc...)

A cursory bit of googling doesn't suggest any reason why log4j would write 
null bytes spuriously to the log files -- but it does suggest that some 
log rotation tools can cause this behavior due.

Are you using the default solrlog4j log rotation, or some external tool?


: Does anyone have the same issue before?
: If anyone knows a way to fix this issue or a cause of this issue, could you
: please let me know?
: 
: Any clue will be very appreciated.
: 
: 
: [Example Log 1]
: 
: 2019-10-20 06:02:03.643 INFO  (coreCloseExecutor-140-thread-4) [
:  x:corename1] o.a.s.m.SolrMetricManager Closing metric reporters for
: registry=solr.core.corename,
: tag=4c16<0x00><0x00><0x00><0x00>...<0x00><0x00>00ff
: 2019-10-20 06:02:03.643 INFO  (coreCloseExecutor-140-thread-4) [
:   x:corename1] o.a.s.m.r.SolrJmxReporter Closing reporter
: [org.apache.solr.metrics.reporters.SolrJmxReporter@17281659: rootName =
: null, domain = solr.core.corename, service url = null, agent id = null] for
: registry solr.core.corename1/
: 
com.codahale.metrics.MetricRegistry@6c9f45cc<0x00><0x00><0x00><0x00>..(continue
: printing <0x00> untill the end of file.)
: 
: [Example Log 2]
: 
: 2019-10-27 06:02:02.891 INFO  (coreCloseExecutor-140-thread-17) [
: x:core2] o.a.s.m.r.SolrJmxReporter Closing reporter
: [org.apache.solr.metrics.reporters.SolrJmxReporter@35e76d2e: rootName =
: null, domain = solr.core.core2, service url = null, agent id = null] for
: registry solr.core.core2 / com.codahale.metrics.MetricRegistry@76be90f4
: 2019-10-27 06:02:02.891 INFO  (coreCloseExecutor-140-thread-26) [
: x:core3]<0x00><0x00><0x00><0x00><0x00><0x00><0x00><0x00>...<0x00><0x00>
: o.a.s.m.SolrMetricManager Closing metric reporters for
: registry=solr.core.TUN000, tag=34f04984
: 2019-10-27 06:02:02.891 INFO  (coreCloseExecutor-140-thread-26) [
: x:TUN000] o.a.s.m.r.SolrJmxReporter Closing reporter
: [org.apache.solr.metrics.reporters.SolrJmxReporter@378cecb: rootName =
: null, domain = solr.core.TUN000, service url = null, agent id = null] for
: registry solr.core.TUN000 / com.codahale.metrics.MetricRegistry@9c3410c
: 2019-10-27 06:02:05.063 INFO  (Thread-1) [   ] o.e.j.s.h.ContextHandler
: Stopped o.e.j.w.WebAppContext@5fbe4146
: 
{/solr,null,UNAVAILABLE}{file:///E:/apatchSolr/RCSS-basic-4.0.1/LUSOLR/solr/server//solr-webapp/webapp}
: <0x00><0x00><0x00><0x00><0x00><0x00>...(printing <0x00> until the end of
: the file)..<0x00><0x00>
: 
: 
: Sincerely,
: Kaya Ota
: 

-Hoss
http://www.lucidworks.com/


Full-text search for Solr manual

2019-11-11 Thread Luke Miller
Hi,

 

I just noticed that since Solr 8.2 the Apache Solr Reference Guide is not
available anymore as PDF.

 

Is there a way to perform a full-text search using the HTML manual? E.g. I'd
like to find every hit for "luceneMatchVersion".

 

*   Using the integrated "Page title lookup." does not find anything ( -
sure, it only looks up page titles. )
*   Google does not return anything either searching for:
site:https://lucene.apache.org/solr/guide/8_3/ luceneMatchVersion

 

Is there another search method I missed?

 

Thanks.



Re: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException

2019-11-11 Thread jatinvyas
Thanks , its worked for me.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 7.2.1 - unexpected docvalues type

2019-11-11 Thread Emir Arnautović
Hi Antony,
Like Erick explained, you still have to preprocess your field in order to be 
able to use doc values. What you can do is use update request processor chain 
and have all the logic in Solr. Here is blog post explaining how it could work: 
https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 10 Nov 2019, at 15:54, Erick Erickson  wrote:
> 
> So “lowercase” is, indeed, a solr.TextField, which is ineligible for 
> docValues. Given that definition, the difference will be that a “string” type 
> is totally un-analyzed, so the values that go into the index and the query 
> itself will be case-sensitive. You’ll have to pre-process both to do the 
> right thing.
> 
>> On Nov 9, 2019, at 6:15 PM, Antony Alphonse  wrote:
>> 
>> Hi Shawn,
>> 
>> Thank you. I switched the fieldType=string and it worked. I might have to
>> check on the use-case to see if "string" will work for us.
>> 
>> I have noted the "lowercase" field type which I believe is similar to the
>> one in schema ver 1.6.
>> 
>> 
>> >   positionIncrementGap="100">
>>   
>>   > class="solr.KeywordTokenizerFactory" />
>>   > />
>>   
>>   
>> 
>> Thanks,
>> Antony
>> 
>> On Sat, Nov 9, 2019 at 7:52 AM Erick Erickson 
>> wrote:
>> 
>>> We can’t answer whether you should change the field type for two reasons:
>>> 
>>> 1> It depends on your use case.
>>> 2> we don’t know what the field type “lowercase” does. It’s composed of an
>>> analysis chain that you may have changed. And whatever config you are using
>>> may have changed with different releases of Solr.
>>> 
>>> Grouping is generally done on a docValues-eligible field type. AFAIK,
>>> “lowercase” is a solr-text based field so is ineligible for docValues. I’ve
>>> got to guess here, but I’d suggest you start with a fieldType of “string”,
>>> and enable docValues on it.
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
>>> 
 On Nov 9, 2019, at 12:54 AM, Antony Alphonse 
>>> wrote:
 
> 
> Hi Shawn,
> 
 
 I will try that solution. Also I had to mention that the queries that
>>> fail
 with this error has the "group.field":"lowercase". Should I change the
 field type?
 
 Thanks,
 Antony
>>> 
>>> 
> 



Re: sort by score in join with geodist()

2019-11-11 Thread Vasily Ogar
it's show nothing because I got an error
"metadata":[ "error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.search.SyntaxError"],
"msg":"org.apache.solr.search.SyntaxError:
geodist - not enough parameters:[]",

If I set parameters then I got another error
"metadata":[ "error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"], "msg":"A
ValueSource isn't directly available from this field. Instead try a query
using the distance as the score.",

On Mon, Nov 11, 2019 at 1:36 PM Mikhail Khludnev  wrote:

> Hello, Vasily.
> Why not? What have you got in debugQuery=true?
>
> On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar  wrote:
>
> > Hello,
> > Is it possible to sort by score in join by geodist()? For instance,
> > something like this
> > q={!join from=site_id to=site_id fromIndex=stores score=max}
> > +{!func}gedist() +{!geofilt sfield=coordinates
> > pt=54.6973867999,25.22481530046 d=10}
> > sort=score desc
> > Thank you
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-11 Thread Guilherme Viteri
Thanks
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
Yes. It always make sense the way we've been using.

> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
I see.

> *Let me explain again:* In your solrconfig.xml, look at your /search
Ok, using q now, removed all qf, performed the search and I got 23 results, and 
the one I really want, on the top.
As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I 
don't get anything (which make sense). However if I query name_exact, I get the 
23 results again, and unfortunately if I query stId^1.0 name_exact^10.0 I still 
don't get any results.

In summary
- without qf - 23 results
- dbId - 0 results
- name_exact - 16 results
- name - 23 results
- dbId^1.0
  name_exact^10.0 - 0 results
- 0 results if any other, stId, dbId (key) is added on top of the 
name(name_exact, etc).

Definitely lost here! :-/


> On 11 Nov 2019, at 07:59, Paras Lehana  wrote:
> 
> Hi
> 
> So I don't think removing it completely is the way to go from the scenario
>> we have
> 
> 
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
> 
> 
> Quite a considerable increase
> 
> 
> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
> 
> 
> 
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
> 
> 
> What combinations did you try? I was referring to the field-level boosting
> you have applied in edismax config.
> 
> *Let me explain again:* In your solrconfig.xml, look at your /search
> request handler. There are many qf and some bq boosts. I want you to remove
> all of these, check response again (with q now) and keep on adding them
> again (one by one) while looking for when the numFound drastically changes.
> 
> On Fri, 8 Nov 2019 at 23:47, David Hastings 
> wrote:
> 
>> I use 3 word shingles with stopwords for my MLT ML trainer that worked
>> pretty well for such a solution, but for a full index the size became
>> prohibitive
>> 
>> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
>> wrote:
>> 
>>> If we had IDF for phrases, they would be super effective. The 2X weight
>> is
>>> a hack that mostly works.
>>> 
>>> Infoseek had phrase IDF and it was a killer algorithm for relevance.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Nov 8, 2019, at 11:08 AM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
 
 the pf and qf fields are REALLY nice for this
 
 On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <
>> wun...@wunderwood.org>
 wrote:
 
> I always enable phrase searching in edismax for exactly this reason.
> 
> Something like:
> 
>  title^16 keywords^8 text^2
> 
> To deal with concepts in queries, a classifier and/or named entity
> extractor can be helpful. If you have a list of concepts (“controlled
> vocabulary”) that includes “Lamin A”, and that shows up in a query,
>> that
> term can be queried against the field matching that vocabulary.
> 
> This is how LinkedIn separates people, companies, and places, for
>>> example.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Nov 8, 2019, at 10:48 AM, Erick Erickson >> 
> wrote:
>> 
>> Look at the “mm” parameter, try setting it to 100%. Although that’t
>> not
> entirely likely to do what you want either since virtually every doc
>>> will
> have “a” in it. But at least you’d get docs that have both terms.
>> 
>> you may also be able to search for things like “Lamin A” _only as a
> phrase_ and have some luck. But this is a gnarly problem in general.
>>> Some
> people have been able to substitute synonyms and/or shingles to make
>>> this
> work at the expense of a larger index.
>> 
>> This is a generic problem with context. “Lamin A” is really a
>>> “concept”,
> not just two words that happen to be near each other. Searching as a
>>> phrase
> is an OOB-but-naive way to try to make it more likely that the ranked
> results refer to the _concept_ of “Lamin A”. The assumption here is
>> “if
> these two words appear next to each other, they’re more likely to be
>>> what I
> want”. I say “naive” because “Lamins: A new approach to...” would
>>> _also_ be
> found for a naive phrase search. (I have no idea whether such a title
>>> makes
> sense or not, but you figured that out already)...
>> 
>> To do this 

Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paras Lehana
Hey Paresh,

I have never worked with "SQL". Did you mean DIH logging? A simple google
search yields this:
https://grokbase.com/t/lucene/solr-user/12618pmah7/how-to-show-dih-query-sql-in-log-file

Turn the Solr logging level to "FINE" for the DIH packages/classes and
they will
> show up in the log.


On Mon, 11 Nov 2019 at 16:39, Paresh  wrote:

> How can we see SQL in log file?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: sort by score in join with geodist()

2019-11-11 Thread Mikhail Khludnev
Hello, Vasily.
Why not? What have you got in debugQuery=true?

On Mon, Nov 11, 2019 at 1:19 PM Vasily Ogar  wrote:

> Hello,
> Is it possible to sort by score in join by geodist()? For instance,
> something like this
> q={!join from=site_id to=site_id fromIndex=stores score=max}
> +{!func}gedist() +{!geofilt sfield=coordinates
> pt=54.6973867999,25.22481530046 d=10}
> sort=score desc
> Thank you
>


-- 
Sincerely yours
Mikhail Khludnev


Question about Luke

2019-11-11 Thread Kayak28
Hello, Community:

I am using Solr7.4.0 currently, and I was testing how Solr actually behaves
when it has a corrupted index.
And I used Luke to fix the broken index from GUI.
I just came up with the following questions.
Is it possible to use the repair index tool from CLI? (in the case, Solr
was on AWS for example.)
Is it different from checkIndex -exorcise option?
(As far as I recently leaned, checkIndex -exorcise will delete unreadable
indices. )

If anyone gives me a reply, I would be very thankful.

Sincerely,
Kaya Ota


Re: Cursor mark page duplicates

2019-11-11 Thread Dwane Hall
Thanks Erick/Hossman,

I appreciate your input it's always an interesting read seeing Solr legends 
like yourselves work through a problem!  I certainly learn a lot from following 
your responses in this user group.

As you recommended I ran the distrib=false query on each shard and the results 
were the identical in both instances.  Below is a snapshot from the admin ui 
showing the details of each shard which all looks in order to me (other than 
our large number of deletes in the corpus ...we have quite a dynamic 
environment when the index is live)


Last Modified:23 days ago

Num Docs:47247895

Max Doc:68108804

Heap Memory Usage:-1

Deleted Docs:20860909

Version:528038

Segment Count:41



Master (Searching) Version:1571148411550 Gen:25528 Size:42.56 GB

Master (Replicable) Version:1571153302013 Gen:25529



Last Modified:23 days ago

Num Docs:47247895

Max Doc:68223647

Heap Memory Usage:-1

Deleted Docs:20975752

Version:526613

Segment Count:43



Master (Searching) Version:1571148411615 Gen:25527 Size:42.63 GB

Master (Replicable) Version:1571153302076 Gen:25528

I was however able to replicate the issue but under unusual circumstances with 
some crude in browser testing.  If I use a cursorMark other than "*" and 
constantly re-run the query (just resubmitting the url in a browser with the 
same cursor and query) the first result on the page toggles between the 
expected value, and the last item from the previous page.  So if rows=50, page 
2 toggles between result 51 (expected) and result 50 (the last item from the 
previous page).  It doesn't happen all the time but every one in five or so 
refreshes I'm able to replicate it consistently (and on every subsequent 
cursor).

I failed to mention in my original email that we use the HdfsDirectoryFactory 
to store our indexes in HDFS.  This configuration uses an off heap block cache 
to cache HDFS blocks in memory as it is unable to take advantage of the OS disk 
cache.  I mention this as we're currently in the process of switching to local 
disk and I've been unable to replicate the issue when using the local storage 
configuration of the same index.  This maybe completely unrelated, and 
additionally the local storage index is freshly loaded so it has not 
experienced the same number of deletes or updates that our HDFS indexes have.

I think my best bet is to monitor our new index configuration and if I notice 
any similar behaviour I'll make the community aware of my findings.

Once again,

Thanks for your input

Dwane


From: Chris Hostetter 
Sent: Friday, 8 November 2019 9:58 AM
To: solr-user@lucene.apache.org 
Subject: Re: Cursor mark page duplicates


: I'm using Solr's cursor mark feature and noticing duplicates when paging
: through results.  The duplicate records happen intermittently and appear
: at the end of one page, and the beginning of the next (but not on all
: pages through the results). So if rows=20 the duplicate records would be
: document 20 on page1, and document 21 on page 2.  The document's id come

Can you try to reproduce and show us the specifics of this including:

1) The sort param you're using
2) An 'fl' list that includes every field in the sort param
3) The returned values of every 'fl' field for the "duplicate" document
you are seeing as it appears in *BOTH* pages of results -- allong with the
cursorMark value in use on both of those pages.


: (-MM-DD HH:MM.SS)), score. In this Solr community post
: 
(https://lucene.472066.n3.nabble.com/Solr-document-duplicated-during-pagination-td4269176.html)
: Shawn Heisey suggests:

...that post was *NOT* about using cursorMark -- it was plain old regular
pagination, where even on a single core/replica you can see a document
X get "pushed" from page#1 to page#2 by updates/additions of some other
doxument Z that causes Z to sort "before" X.

With cursors this kind of "pushing other docs back" or "pushing other docs
forward" doesn't exist because of the cursorMark.  The only way a doc
*should* move is if it's OWN sort values are updated, causing it to
reposition itself.

But, if you have a static index, then it's *possible* that the last time
your document X was updated, there was a "glitch" somewhere in the
distributed update process, and the update didn't succeed in osme
replicas -- so the same document may have different sort values
on diff replicas.

: In the Solr query below for one of the example duplicates in question I
: can see a search by the id returns only a single document. The
: replication factor for the collection is 2 so the id will also appear in
: this shards replica.  Taking into consideration Shawn's advice above, my

If you've already identified a particular document where this has
happened, then you can also verify/disprove my hypothosis by hitting each
of the replicas that hosts this document with a request that looks like...

/solr/MyCollection_shard4_replica_n12/select?q=id:FOO=false

Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paresh
How can we see SQL in log file?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Anyway to encrypt admin user plain text password in Solr

2019-11-11 Thread Kommu, Vinodh K.
Hi,

After creating admin user in Solr when security is enabled, we have to store 
the admin user's credentials in plain text format. Is there any option or a way 
to encrypt the plain text password?

Thanks,
Vinodh
DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email.


sort by score in join with geodist()

2019-11-11 Thread Vasily Ogar
Hello,
Is it possible to sort by score in join by geodist()? For instance,
something like this
q={!join from=site_id to=site_id fromIndex=stores score=max}
+{!func}gedist() +{!geofilt sfield=coordinates
pt=54.6973867999,25.22481530046 d=10}
sort=score desc
Thank you


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paresh
In my environment I was creating a windows service for Solr and there I
forgot to specify -Dsolr.log.dir as a command line for Windows Service.
After defining it to correct path and reinstalling solr as service solved
the issue.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paras Lehana
Hi Paresh,

Glad that it worked. For the sake of future views, please try to explain
how it worked and what you did. It will help users having similar issues in
future.

On Mon, 11 Nov 2019 at 14:32, Paresh  wrote:

> Thanks Paras. It solved my problem. I am now able to see the logs.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


$deleteDocByQuery is not working for me

2019-11-11 Thread Paresh
Hi,

I am trying to write an entity to delete documents for the records marked as
deleted in my rdbms database using db-data-config.xml file with following
entry in an entity -

query="SELECT CONCAT( 'ColName:', dbCol ) AS '$deleteDocByQuery' FROM TABLE1
t1 WHERE t1.state = 1 AND t1.lmd gt; TO_DATE
('${dih.last_index_time}','-MM-DD HH24:MI:SS')"

Through SOLR-Admin UI I am selecting "delta-import", entity-name and giving
proper jdbc string in custom-parameters.

I have marked the ColName represented in Solr as deleted in RDBMS Table1
using state column value of 1.

ColName is Solr column-name for the collection. dbCol - database column name
in Table1

I am trying to use $deleteDocByQuery clause to remove documents.

Any help is appreciated.

Regards,
Paresh



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: subquery highlight

2019-11-11 Thread Vasily Ogar
Ok, thank you

On Mon, Nov 11, 2019 at 9:56 AM Mikhail Khludnev  wrote:

> Oh.. gosh. Sure. Subquery yields doc results only, neither of facets,
> highlighting is attached to response.
>
> On Mon, Nov 11, 2019 at 10:07 AM Vasily Ogar 
> wrote:
>
> > My subquery is products I tried
> product.hl=on=products.title
> > products.description and like this product.hl=on=title
> > description and like this hl=on=title description and
> > hl.products=on=title description.
> > I don't know what else
> >
> >
> > On Mon, Nov 11, 2019 at 8:25 AM Mikhail Khludnev 
> wrote:
> >
> > > Hello,
> > > Have you tried to pefix hl.* params with particular subquery name?
> > >
> > > On Sun, Nov 10, 2019 at 11:46 PM Vasily Ogar 
> > > wrote:
> > >
> > > > Hello,
> > > > I am using Solr 8.2 and can't find out how to use highlight in the
> > > > subquery. Is it possible at all?
> > > > Thank you
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-11 Thread Sthitaprajna
Thanks paras, after your comment i realized/found it is using managed
schema.

After reading solr documentations
https://lucene.apache.org/solr/guide/6_6/schema-factory-definition-in-solrconfig.html
Added on solrconfig.xml , reloaded and it works thanks



Thanks Erick Erickson & Alexandre Rafalovitch , you have also mentioned
same.

On Mon, Nov 11, 2019 at 2:28 PM Paras Lehana 
wrote:

> Hi Sthitaprajna,
>
> In Admin UI, select core and go to Schema. Select "title" and post the
> screenshot (try to host it). Do the same for "id".
>
> On Mon, 11 Nov 2019 at 09:14, Alexandre Rafalovitch 
> wrote:
>
> > You still have a mismatch between what you think the schema is
> > (uniqueKey=title) and message of uniqueKey being id. Focus on that. Try
> to
> > get schema FROM Solr instead og looking at one you are providing. Or look
> > in Admin UI what it shows for field title and for field id.
> >
> > Regards,
> > Alex
> >
> > On Mon, Nov 11, 2019, 2:30 PM Sthitaprajna, <
> iamonlyforu.frie...@gmail.com
> > >
> > wrote:
> >
> > >
> > >
> >
> https://stackoverflow.com/questions/58763657/solr-missing-mandatory-uniquekey-field-id-or-unknown-field?noredirect=1#comment103816164_58763657
> > >
> > > May be this will help ? I added screenshots.
> > >
> > > On Fri, 8 Nov 2019, 22:57 Alexandre Rafalovitch, 
> > > wrote:
> > >
> > > > Something does not make sense, because your schema defines "title" as
> > > > the uniqueKey field, but your message talks about "id". Are you
> > > > absolutely sure that the Solr/collection you get an error for is the
> > > > same Solr where you are checking the schema?
> > > >
> > > > Also, do you have a bit more of the error and stack trace. I find
> > > > "...or Unknown field" to be very puzzling. What are you trying to do
> > > > when you get this error?
> > > >
> > > > Regards,
> > > >   Alex.
> > > >
> > > > On Sat, 9 Nov 2019 at 01:05, Sthitaprajna <
> > iamonlyforu.frie...@gmail.com
> > > >
> > > > wrote:
> > > > >
> > > > > Thanks,
> > > > >
> > > > > I did reload after solr configuration upload to zk
> > > > > Yes i push the config set to zk and i can see all my changes are on
> > > cloud
> > > > > I turned off the managed schema
> > > > > Yes it has, ypu could have seen it if the attachment are
> available. I
> > > > have attached again may be it will be available.
> > > > >
> > > > > On Fri, 8 Nov 2019, 21:13 Erick Erickson,  >
> > > > wrote:
> > > > >>
> > > > >> Attachments are aggressively stripped by the mail server, so I
> can’t
> > > > see them.
> > > > >>
> > > > >> Possibilities
> > > > >> - you didn’t reload your core/collection
> > > > >> - you didn’t push the configset to Zookeeper if using SolrCloud
> > > > >> - you are using the managed schema, which uses a file called
> > > > “managed-schema” rather than classic, which uses schema.xml
> > > > >> - your input doesn’t really have a field “title”.
> > > > >> - the doc just doesn’t have a field called “title” in it when it’s
> > > sent
> > > > to Solr.
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >> > On Nov 8, 2019, at 4:41 AM, Sthitaprajna <
> > > > iamonlyforu.frie...@gmail.com> wrote:
> > > > >> >
> > > > >> > title
> > > > >>
> > > >
> > >
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>


-- 

*Regards$th!t@*


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paresh
Thanks Paras. It solved my problem. I am now able to see the logs.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html