Re: Facet count incorrect

2019-05-23 Thread Erick Erickson
You’ll have subtle, or not so subtle problems. String types are a single token, 
so a document with “my dog has fleas” will not be returned when searching for 
any of those 4 words. My definition there’s no position information in stored 
with the string type, so no phrases will work against docs indexed before you 
made the change.

I do not know what other things will creep out of the woodwork. Point is you’re 
playing with fire here….

Depending on how your configured and the like, and assuming you’re using 
SolrCloud, you could delete one replica from each shard, add a new collection 
with exactly one replica (leader-only) per shard and index to that. From there, 
and assuming quiescent indexing, add one replica to the each shard of the new 
collection for failover.

Once that’s complete, switch the alias and delete the old collection. Then 
addreplicas to to the new collection to build it out.

Best,
Erick

> On May 23, 2019, at 2:11 PM, John Davis  wrote:
> 
> Reindexing to alias is not always easy if it requires 2x resources. Just to
> be clear the issues you mentioned are mostly around faceting because we
> haven't seen any other search/retrieval issues. Or is that not accurate?
> 
> On Wed, May 22, 2019 at 5:12 PM Erick Erickson 
> wrote:
> 
>> 1> I strongly recommend you re-index into a new collection and switch to
>> it with a collection alias rather than try to re-index all the docs.
>> Segment merging with the same field with dissimilar definitions is not
>> guaranteed to do the right thing.
>> 
>> 2> No. There a few (very few) things that don’t require starting fresh.
>> You can do some things like add a lowercasefilter, add or remove a field
>> totally and the like. Even then you’ll go through a period of mixed-up
>> results until the reindex is complete. But changing the type, changing from
>> multiValued to singleValued or vice versa (particularly with docValues)
>> etc. are all “fraught”.
>> 
>> My usual reply is “if you’re going to reindex everything anyway, why not
>> just do it to a new collection and alias when you’re done?” It’s much safer.
>> 
>> Best,
>> Erick
>> 
>>> On May 22, 2019, at 3:06 PM, John Davis 
>> wrote:
>>> 
>>> Hi there -
>>> Our facet counts are incorrect for a particular field and I suspect it is
>>> because we changed the type of the field from StrField to TextField. Two
>>> questions:
>>> 
>>> 1. If we do re-index all the documents in the index, would these counts
>> get
>>> fixed?
>>> 2. Is there a "safe" way of changing field types that generally works?
>>> 
>>> *Old type:*
>>> >> docValues="true" multiValued="true"/>
>>> 
>>> *New type:*
>>> >> omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
>>> stored="true" positionIncrementGap="100" sortMissingLast="true"
>>> multiValued="true">
>>> 
>>> 
>>> 
>>>   
>>> 
>> 
>> 



Re: Facet count incorrect

2019-05-23 Thread John Davis
Reindexing to alias is not always easy if it requires 2x resources. Just to
be clear the issues you mentioned are mostly around faceting because we
haven't seen any other search/retrieval issues. Or is that not accurate?

On Wed, May 22, 2019 at 5:12 PM Erick Erickson 
wrote:

> 1> I strongly recommend you re-index into a new collection and switch to
> it with a collection alias rather than try to re-index all the docs.
> Segment merging with the same field with dissimilar definitions is not
> guaranteed to do the right thing.
>
> 2> No. There a few (very few) things that don’t require starting fresh.
> You can do some things like add a lowercasefilter, add or remove a field
> totally and the like. Even then you’ll go through a period of mixed-up
> results until the reindex is complete. But changing the type, changing from
> multiValued to singleValued or vice versa (particularly with docValues)
> etc. are all “fraught”.
>
> My usual reply is “if you’re going to reindex everything anyway, why not
> just do it to a new collection and alias when you’re done?” It’s much safer.
>
> Best,
> Erick
>
> > On May 22, 2019, at 3:06 PM, John Davis 
> wrote:
> >
> > Hi there -
> > Our facet counts are incorrect for a particular field and I suspect it is
> > because we changed the type of the field from StrField to TextField. Two
> > questions:
> >
> > 1. If we do re-index all the documents in the index, would these counts
> get
> > fixed?
> > 2. Is there a "safe" way of changing field types that generally works?
> >
> > *Old type:*
> >   > docValues="true" multiValued="true"/>
> >
> > *New type:*
> >   > omitNorms="true" omitTermFreqAndPositions="true" indexed="true"
> > stored="true" positionIncrementGap="100" sortMissingLast="true"
> > multiValued="true">
> > 
> >  
> >  
> >
> >  
>
>


Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Jörn Franke
And this is only a solutions for testing. For production you need to import the 
certificate chain into your truststore 

> Am 23.05.2019 um 18:06 schrieb Shawn Heisey :
> 
>> On 5/23/2019 9:56 AM, Paul wrote:
>> Thanks for the reply Shawn.
>> What I was asking is whether there is an option to exclude the comms to SQL
>> from SOLR managed encryption as the JDBC driver manages the connection and
>> SOLR is acting as the Client in this instance and is already using encrypted
>> comms via the connection string parameters.
> 
> Enabling SSL should have no *direct* effect on JDBC.
> 
> But it might have an indirect effect by changing some of Java's SSL settings 
> that in turn could filter down to the JDBC driver.
> 
> I would think that explicitly telling the JDBC driver to not validate the 
> cert with Microsoft's "trustCertificate=true" option would fix this.  Other 
> than trying it, you would have to verify that with Microsoft.
> 
> Thanks,
> Shawn


Re: Unable to run solr | SolrCore Initialization Failures {{Core}}: {{error}}

2019-05-23 Thread Natarajan, Rajeswari
Please see if the zookeeper is installed before installing solrcloud , in case 
you are not running  embedded Zookeeper.
Hope it helps.

Regards,
Rajeswari

From: Karthic Viswanathan 
Reply-To: "solr-user@lucene.apache.org" 
Date: Wednesday, May 22, 2019 at 10:37 PM
To: "solr-user@lucene.apache.org" 
Subject: Unable to run solr | SolrCore Initialization Failures {{Core}}: 
{{error}}


Hi,
I am trying to install Solr for my Windows Server 2016 Standard edition. . 
While the installation of Solr itself succeeds, I am not able to get it running.
Everytime after installation and starting the service
 “SolrCore Initialization Failures {{Core}}: {{error}}”

I am not sure what the error is since it is not very clear. Also, the log files 
are all empty. It has just a few warnings.  I have attached them for reference. 
Solr is a requirement for installing Sitecore CMS and I am not able to proceed 
any further.  Any help on this would be greatly appreciated.


I have this same error with solr 7.2.1, 6.6.2.
I tried running this with both nssm 2.4 and nssm 2.24 pre.
I have jre 1.8.0_211 installed.

--
Regards,
Karthic Viswanathan


[solr.png]



[log.png]




Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Shawn Heisey

On 5/23/2019 9:56 AM, Paul wrote:

Thanks for the reply Shawn.

What I was asking is whether there is an option to exclude the comms to SQL
from SOLR managed encryption as the JDBC driver manages the connection and
SOLR is acting as the Client in this instance and is already using encrypted
comms via the connection string parameters.


Enabling SSL should have no *direct* effect on JDBC.

But it might have an indirect effect by changing some of Java's SSL 
settings that in turn could filter down to the JDBC driver.


I would think that explicitly telling the JDBC driver to not validate 
the cert with Microsoft's "trustCertificate=true" option would fix this. 
 Other than trying it, you would have to verify that with Microsoft.


Thanks,
Shawn


Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Paul
Thanks for the reply Shawn.

What I was asking is whether there is an option to exclude the comms to SQL
from SOLR managed encryption as the JDBC driver manages the connection and
SOLR is acting as the Client in this instance and is already using encrypted
comms via the connection string parameters.

Cheers
Paul


On 5/23/2019 5:45 AM, Paul wrote: 
> unable to find 
> valid certification path to requested target 

This seems to be the root of your problem with the connection to SQL server. 

If I have all the context right, Java is saying it can't validate the 
certificate returned by the SQL server. 

This page: 

https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl-encryption?view=sql-server-2017

Talks about a "trustCertificate" property you can set to "true" in the 
JDBC URL that will cause Microsoft's JDBC driver to NOT validate the 
server certificate. 

Alternatively, if the SQL server is sending all the necessary chain 
certificates, you could place the root cert for the CA that issued the 
SQL Server certificate in the Java keystore that you're using for SSL on 
Solr, that would probably also fix it -- because then the SQL cert would 
validate. 

Thanks, 
Shawn 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unable to run solr | SolrCore Initialization Failures {{Core}}: {{error}}

2019-05-23 Thread Erick Erickson
We don’t see your attachments, the mail server pretty aggressively strips them. 
You’ll have to put them somewhere shareable and post a link.

What exactly are you trying? SolrCloud? Stand-alone? What commands do you run 
when you start Solr?

You might review:
https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

> On May 22, 2019, at 11:04 PM, Mohomed Rimash  wrote:
> 
> Hi
> 
> Do you know what are the cores (name of the core used) in the solr instance
> you  trying to use? if create those cores manually and try
> 
> Regards
> Rimash
> 
> On Thu, 23 May 2019 at 11:07, Karthic Viswanathan <
> karthic.viswan...@gmail.com> wrote:
> 
>> 
>> Hi,
>> 
>> I am trying to install Solr for my Windows Server 2016 Standard edition. .
>> While the installation of Solr itself succeeds, I am not able to get it
>> running.
>> 
>> Everytime after installation and starting the service
>> 
>> “SolrCore Initialization Failures {{Core}}: {{error}}”
>> 
>> 
>> 
>> I am not sure what the error is since it is not very clear. Also, the log
>> files are all empty. It has just a few warnings.  I have attached them for
>> reference. Solr is a requirement for installing Sitecore CMS and I am not
>> able to proceed any further.  Any help on this would be greatly
>> appreciated.
>> 
>> 
>> I have this same error with solr 7.2.1, 6.6.2.
>> I tried running this with both nssm 2.4 and nssm 2.24 pre.
>> I have jre 1.8.0_211 installed.
>> 
>> --
>> Regards,
>> Karthic Viswanathan
>> 
>> 
>> [image: solr.png]
>> 
>> 
>> 
>> [image: log.png]



RE: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-23 Thread Russell Taylor
Thanks Erick,
Pretty stuck with the delete-by-query as it can be deleting a million docs.

I'll work through what you have said and also try to find the root cause of the 
recovery.




Regards

Russell Taylor



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 22 May 2019 20:17
To: solr-user@lucene.apache.org
Subject: Re: CloudSolrClient (any version). Find the node your query has 
connected to.

WARNING - External email from lucene.apache.org

You have to be a little careful here, one thing I learned relatively recently 
is that there are in-memory structures that hold pointers to _all_ 
un-searchable docs (i.e. no new searchers have been opened since the doc was 
added/updated) to support real-time get. So if you’re indexing a _lot_ of docs 
that internal structure can grow quite large….

FWIW, delete-by-query is painful. Each one has to lock all indexing on all 
replicas while it completes. If you can use delete-by-id it’d be better.

Let’s back up a bit and look at _why_ your nodes go into recovery…. Leave the 
replicas on if you can and look for “Leader Initiated Recovery” (not sure 
that’s the exact phrase, but you’ll see something very like that). If that’s 
the case, then one situation we’ve seen is that a request takes too long to 
return from a follower. So the sequence looks like this:

- leader gets update
- leader indexes locally _and_ forwards to follower
- follower is busy (and the delete-by-query could be why) and takes too long to 
respond so the request times out
- leader says “hmmm, I don’t know what happened so I’ll tell the follower to 
recover”.

Given your heavy update rate, there’ll be no chance for “peer sync” to fully 
recover so it’ll go into full recovery. That can sometimes be fixed by simply 
lengthening the timeout.

Otherwise also take a look at the logs and see if you can find a root cause for 
the replica going into recovery and we should see if we can fix that.

I didn’t ask what versions of Solr you’re using, but in the 7x code line (7.3 
IIRC) significant work was done to make recovery less likely.

Best,
Erick

> On May 22, 2019, at 10:27 AM, Shawn Heisey  wrote:
>
> On 5/22/2019 10:47 AM, Russell Taylor wrote:
>> I will add that we have set commits to be only called by the loading 
>> program. We have turned off soft and autoCommits in the solrconfig.xml.
>
> Don't turn off autoCommit.  Regular hard commits, typically with openSearcher 
> set to false so they don't interfere with change visibility, are extremely 
> important for good Solr operation.  Without it, the transaction logs will 
> grow out of control.  In addition to taking a lot of disk space, that will 
> cause a Solr restart to happen VERY slowly.  Note that a hard commit with 
> openSearcher set to false will be VERY fast -- doing them frequently is 
> usually not a problem for performance.  Sample configs in recent Solr 
> versions ship with autoCommit set to 15 seconds and openSearcher set to false.
>
> Not using autoSoftCommit is a reasonable thing to do if you do not need that 
> functionality ... but don't disable autoCommit.
>
> Thanks,
> Shawn




This message may contain confidential information and is intended for specific 
recipients unless explicitly noted otherwise. If you have reason to believe you 
are not an intended recipient of this message, please delete it and notify the 
sender. This message may not represent the opinion of Intercontinental 
Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a 
contract or guarantee. Unencrypted electronic mail is not secure and the 
recipient of this message is expected to provide safeguards from viruses and 
pursue alternate means of communication where privacy or a binding message is 
desired.


Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Shawn Heisey

On 5/23/2019 5:45 AM, Paul wrote:

unable to find
valid certification path to requested target


This seems to be the root of your problem with the connection to SQL server.

If I have all the context right, Java is saying it can't validate the 
certificate returned by the SQL server.


This page:

https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl-encryption?view=sql-server-2017

Talks about a "trustCertificate" property you can set to "true" in the 
JDBC URL that will cause Microsoft's JDBC driver to NOT validate the 
server certificate.


Alternatively, if the SQL server is sending all the necessary chain 
certificates, you could place the root cert for the CA that issued the 
SQL Server certificate in the Java keystore that you're using for SSL on 
Solr, that would probably also fix it -- because then the SQL cert would 
validate.


Thanks,
Shawn


Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Paul
Hi,

I have enabled HTTPS on my SOLR server and it works fine over HTTPS for
interaction with SOLR via the browser such as for data queries and
management actions.

However, I now get an error when attempting to retrieve data from the SQL
server for Indexing. The JDBC connection string has the parameters to manage
SQL connections that are encrypted which has been setup and works fine when
SSL is not specified for SOLR. When enabling SSL for SOLR client connections
how do I enable it just for clients making requests into SOLR and not change
any of the outgoing stuff which is already using encrypted comms, ie to SQL
Server.

The error message I get is below:

...
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT * from MYVIEW Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:327)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Unknown Source)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could
not establish a secure connection to SQL Server by using Secure Sockets
Layer (SSL) encryption. Error: "sun.security.validator.ValidatorException:
PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target".
ClientConnectionId:bb3e9ce0-8d93-4514-98ed-f19938b91e96
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2826)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1829)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2391)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2042)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1889)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1120)
at
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:700)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:192)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:172)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:528)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:317)
... 14 more
Caused by: javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Unknown Source)
at sun.security.ssl.SSLSocketImpl.fatal(Unknown Source)
at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
at sun.security.ssl.ClientHandshaker.serverCertificate(Unknown Source)
at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source)
at sun.security.ssl.Handshaker.processLoop(Unknown Source)
at sun.security.ssl.Handshaker.process_record(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown 
Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at com.microsoft.sqlse

Re: Negative boost query (bq) with edismax for lower scores with Solr 8

2019-05-23 Thread CO
I didn't see any response so I wanted to check if my observation simply is not 
relevant for other people or if I missed to provide any required details.

Thanks!


-Original Message-
Date: 05/09/2019 08:28 AM
Subject: Negative boost query (bq) with edismax for lower scores with Solr 8

I find that the edismax boost query implementation is not quite logical. It 
does not allow selectively decreasing the relevancy score anymore.

E.g. bq=color:red^2 can be added to increase the score of matching documents.

But how can I decrease the score for documents with color:red? Before Solr 8 it 
could be done with bq=color:red^-1. But negative boosts are not supported by 
Lucene anymore (LUCENE-7996).

Note that bq=color:red^0.1 does not lead to a lower score since other values 
will be treated like boost 0. If all potential values are known this would 
work: bq=color:red^0.1 color:green^1 color:blue^1. But what if not?

I tried bq=color:*^1 -color:red^0.1. This sort of works but the boost seems to 
be ignored. When I write bq=color:*^1 -color:red^0.1 -color:green^0.5 the score 
is the same for documents with color:red and color:green.

Any suggestion how I can decrease the score for documents with color:red a bit 
and a bit more for those with color:green after upgrading to Solr 8?





Re: Does Solr support retrieve a string text and get its filename accordingly?

2019-05-23 Thread luckydog xf
Thank you all. Learn a lot from you guys.

On Thu, May 23, 2019 at 3:54 PM Nicolas Franck 
wrote:

> In that case you'll have to duplicate that field:
>
> id: $name_of_file
> id_t: $name_of_file
>
> The first field should be marked as "string", and set to be the key field.
> Id-fields cannot be tokenized.
>
> The second field is a derivative (you can just copy the contents, or use
> copyField),
> and should be set to a type of field, that does tokenization. In this case
> you'll
> need a field type that uses n-grams:
>
>
> https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-N-GramTokenizer
>
> otherwise you'll end up using wildcard queries ( _id_s:my* ) that do not
> perform very well.
>
> On 23 May 2019, at 09:39, Mohomed Rimash  rim...@yaalalabs.com>> wrote:
>
> yes in that case your file name should be key field of each document you
> added to the solr
>
> On Thu, 23 May 2019 at 12:32, luckydog xf  luckydo...@gmail.com>> wrote:
>
> Thanks  guys.
>
> *Don't mean to be a bother*, just want to confirm, I know it's doable to
> search keywords, but what I want  is * FileName(s) * that contains the
> string. The answer is still a yes?
>
> Thanks again.
>
> On Thu, May 23, 2019 at 2:20 PM Jörn Franke  jornfra...@gmail.com>> wrote:
>
> You can go much more than grep. I recommend to get a book on Solr and
> read
> through it. Then you get the full context and you can see if it is useful
> for you.
>
> Am 23.05.2019 um 07:44 schrieb luckydog xf  luckydo...@gmail.com>>:
>
> Hi, list,
>
>   A quick question, we have tons of Microsoft docx/PDFs files( some
> PDFs
> are scanned copies), and we want to populate into Apache solr and
> search
> a
> few keywords that contain in the files and  return filenames
> accordingly.
>
>  # it's the same thing as `grep -r KEYWORD /PATH/XXX` in Linux system.
>
>  Is it doable ?
>
>  Thanks,
>
>
>
>


Re: Does Solr support retrieve a string text and get its filename accordingly?

2019-05-23 Thread Nicolas Franck
In that case you'll have to duplicate that field:

id: $name_of_file
id_t: $name_of_file

The first field should be marked as "string", and set to be the key field.
Id-fields cannot be tokenized.

The second field is a derivative (you can just copy the contents, or use 
copyField),
and should be set to a type of field, that does tokenization. In this case 
you'll
need a field type that uses n-grams:

https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-N-GramTokenizer

otherwise you'll end up using wildcard queries ( _id_s:my* ) that do not 
perform very well.

On 23 May 2019, at 09:39, Mohomed Rimash 
mailto:rim...@yaalalabs.com>> wrote:

yes in that case your file name should be key field of each document you
added to the solr

On Thu, 23 May 2019 at 12:32, luckydog xf 
mailto:luckydo...@gmail.com>> wrote:

Thanks  guys.

*Don't mean to be a bother*, just want to confirm, I know it's doable to
search keywords, but what I want  is * FileName(s) * that contains the
string. The answer is still a yes?

Thanks again.

On Thu, May 23, 2019 at 2:20 PM Jörn Franke 
mailto:jornfra...@gmail.com>> wrote:

You can go much more than grep. I recommend to get a book on Solr and
read
through it. Then you get the full context and you can see if it is useful
for you.

Am 23.05.2019 um 07:44 schrieb luckydog xf 
mailto:luckydo...@gmail.com>>:

Hi, list,

  A quick question, we have tons of Microsoft docx/PDFs files( some
PDFs
are scanned copies), and we want to populate into Apache solr and
search
a
few keywords that contain in the files and  return filenames
accordingly.

 # it's the same thing as `grep -r KEYWORD /PATH/XXX` in Linux system.

 Is it doable ?

 Thanks,





Re: Does Solr support retrieve a string text and get its filename accordingly?

2019-05-23 Thread Mohomed Rimash
yes in that case your file name should be key field of each document you
added to the solr

On Thu, 23 May 2019 at 12:32, luckydog xf  wrote:

> Thanks  guys.
>
>  *Don't mean to be a bother*, just want to confirm, I know it's doable to
> search keywords, but what I want  is * FileName(s) * that contains the
> string. The answer is still a yes?
>
>  Thanks again.
>
> On Thu, May 23, 2019 at 2:20 PM Jörn Franke  wrote:
>
> > You can go much more than grep. I recommend to get a book on Solr and
> read
> > through it. Then you get the full context and you can see if it is useful
> > for you.
> >
> > > Am 23.05.2019 um 07:44 schrieb luckydog xf :
> > >
> > > Hi, list,
> > >
> > >A quick question, we have tons of Microsoft docx/PDFs files( some
> PDFs
> > > are scanned copies), and we want to populate into Apache solr and
> search
> > a
> > > few keywords that contain in the files and  return filenames
> accordingly.
> > >
> > >   # it's the same thing as `grep -r KEYWORD /PATH/XXX` in Linux system.
> > >
> > >   Is it doable ?
> > >
> > >   Thanks,
> >
>


Re: Does Solr support retrieve a string text and get its filename accordingly?

2019-05-23 Thread luckydog xf
Thanks  guys.

 *Don't mean to be a bother*, just want to confirm, I know it's doable to
search keywords, but what I want  is * FileName(s) * that contains the
string. The answer is still a yes?

 Thanks again.

On Thu, May 23, 2019 at 2:20 PM Jörn Franke  wrote:

> You can go much more than grep. I recommend to get a book on Solr and read
> through it. Then you get the full context and you can see if it is useful
> for you.
>
> > Am 23.05.2019 um 07:44 schrieb luckydog xf :
> >
> > Hi, list,
> >
> >A quick question, we have tons of Microsoft docx/PDFs files( some PDFs
> > are scanned copies), and we want to populate into Apache solr and search
> a
> > few keywords that contain in the files and  return filenames accordingly.
> >
> >   # it's the same thing as `grep -r KEYWORD /PATH/XXX` in Linux system.
> >
> >   Is it doable ?
> >
> >   Thanks,
>