Does Solr support retrieve a string text and get its filename accordingly?

2019-05-22 Thread luckydog xf
Hi, list, A quick question, we have tons of Microsoft docx/PDFs files( some PDFs are scanned copies), and we want to populate into Apache solr and search a few keywords that contain in the files and return filenames accordingly. # it's the same thing as `grep -r KEYWORD /PATH/XXX` in

Unable to run solr | SolrCore Initialization Failures {{Core}}: {{error}}

2019-05-22 Thread Karthic Viswanathan
Hi, I am trying to install Solr for my Windows Server 2016 Standard edition. . While the installation of Solr itself succeeds, I am not able to get it running. Everytime after installation and starting the service “SolrCore Initialization Failures {{Core}}: {{error}}” I am not sure what the

Re: Facet count incorrect

2019-05-22 Thread Erick Erickson
1> I strongly recommend you re-index into a new collection and switch to it with a collection alias rather than try to re-index all the docs. Segment merging with the same field with dissimilar definitions is not guaranteed to do the right thing. 2> No. There a few (very few) things that don’t

Re: Ignore faceting for particular fields in solr using Solrconfig.xml

2019-05-22 Thread Erick Erickson
Just don’t ask for them. Or you saying that users can specify arbitrary fields to facet on and you want to prevent certain fields from being possible? No, there’s no good way to do that in solrconfig.xml. You could write a query component that stripped out certain fields from the facet.field

Re: Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Erick Erickson
You might get some pointer from the Luke code…. All in all I’d focus on re-indexing somehow. Unless the original documents are just totally impossible to find again it’s probably easier. Best, Erick > On May 22, 2019, at 3:30 PM, Shawn Heisey wrote: > > On 5/22/2019 3:51 PM, Pushkar Raste

Re: Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Shawn Heisey
On 5/22/2019 3:51 PM, Pushkar Raste wrote: Looks like giving Luke a shot is the answer. Can you point me to an example to extract the fields from inverted Index using Luke. Luke is a GUI application that can view the Lucene index in considerable detail. To use Luke directly, you'd have to

Ignore faceting for particular fields in solr using Solrconfig.xml

2019-05-22 Thread RaviTeja
Hello Solr Expert, How are you? Am trying to ignore faceting for some of the fields. Can you please help me out to ignore faceting using solrconfig.xml. I tried but I can ignore faceting all the fields that useless. I'm trying to ignore some specific fields. Really Appreciate your help for the

Facet count incorrect

2019-05-22 Thread John Davis
Hi there - Our facet counts are incorrect for a particular field and I suspect it is because we changed the type of the field from StrField to TextField. Two questions: 1. If we do re-index all the documents in the index, would these counts get fixed? 2. Is there a "safe" way of changing field

Re: Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Pushkar Raste
We have only a handful of fields that are stored and many (non Text) fields which are neither stored nor have docValues :-( Looks like giving Luke a shot is the answer. Can you point me to an example to extract the fields from inverted Index using Luke. On Wed, May 22, 2019 at 11:52 AM Erick

Slow ReadProcessor read fields Warnings - Ideas to investigate?

2019-05-22 Thread David Winter
Hello User Group, we run Solr with HDFS and got a lot of the following warning: Slow ReadProcessor read fields took 15093ms (threshold=1ms); ack: seqno: 3 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 798309 flag: 0 flag: 0 flag: 0, targets:

Re: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Erick Erickson
You have to be a little careful here, one thing I learned relatively recently is that there are in-memory structures that hold pointers to _all_ un-searchable docs (i.e. no new searchers have been opened since the doc was added/updated) to support real-time get. So if you’re indexing a _lot_ of

Re: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Shawn Heisey
On 5/22/2019 10:47 AM, Russell Taylor wrote: I will add that we have set commits to be only called by the loading program. We have turned off soft and autoCommits in the solrconfig.xml. Don't turn off autoCommit. Regular hard commits, typically with openSearcher set to false so they don't

RE: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Russell Taylor
Thanks Eric, I will add that we have set commits to be only called by the loading program. We have turned off soft and autoCommits in the solrconfig.xml. This is so when we upload, we move from one list of docs to the new list in one atomic operation (delete, add and then commit). I'll also

stats and facet is not working as expected after upgrade 5.3 to 6.0

2019-05-22 Thread ilango dhandapani
I upgraded my AD solr cloud environment from solr5.3 to 6.0 and everything worked fine. But when i did QA, after the upgrade stats and facet queries are not working as expected. when I run q=APPL_TOKN_ID_s:testApplication2=100=CTNT_FILE_PATH_NM_s=CTNT_FILE_PATH_NM_s=0=1=true am not getting any

Schema API Version 2 - 7.6.0

2019-05-22 Thread Joe Obernberger
Hi - according to the documentation here: https://lucene.apache.org/solr/guide/7_6/schema-api.html The V2 API is located at api/cores/collection/schema However the documentation here: https://lucene.apache.org/solr/guide/7_6/v2-api.html has it at api/c/collection/schema I believe the later is

Re: Cluster with no overseer?

2019-05-22 Thread Erick Erickson
110 isn’t all that many, well within the normal range _assuming_ that they are being processed…. When you restart Solr, every state change operation writes an operation to the work queue which can mount up. Perhaps you’re hitting: https://issues.apache.org/jira/browse/SOLR-13416? In which case

Re: Cluster with no overseer?

2019-05-22 Thread Walter Underwood
The ZK ensemble appears to be OK. It is the Solr-related stuff that is borked. There are 110 items in /overseer/collection-queue-work/, which doesn’t seem healthy. If it is really hosed, I’ll shut down all the nodes, clean out the files in Zookeeper and start over. wunder Walter Underwood

Slow soft-commit

2019-05-22 Thread André Widhani
Hi everyone, I need some advice how to debug slow soft commits. We use Solr for searches in a DAM system and in similar setups, soft commits take about one to two seconds, in this case nearly ten seconds. Solr runs on a dedicated VM with eight cores and 64 GB RAM (16G heap), which is common

Re: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Erick Erickson
OK, now we’re cooking with oil. First, nodes in recovery shouldn’t make any difference to a query. They should not serve any part of a query so I think/hope that’s a red herring. At worst a node in recovery should pass the query on to another replica that is _not_ recovering. When you’re

Re: Cluster with no overseer?

2019-05-22 Thread Erick Erickson
Good luck, this kind of assumes that your ZK ensemble is healthy of course... > On May 22, 2019, at 8:23 AM, Walter Underwood wrote: > > Thanks, we’ll try that. Bouncing one Solr node doesn’t fix it, because we did > a rolling restart yesterday. > > wunder > Walter Underwood >

Re: Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Erick Erickson
Well, if they’re all docValues or stored=true, sure. It’d be kind of slow.. The short form is “if you can specify fl=f1,f2,f3…. for all your fields and see all your values, then it’s easy if slow”. If that works _and_ you are on Solr 4.7+ cursorMark will help the “deep paging” issue. If

Re: Cluster with no overseer?

2019-05-22 Thread Walter Underwood
Thanks, we’ll try that. Bouncing one Solr node doesn’t fix it, because we did a rolling restart yesterday. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 22, 2019, at 8:21 AM, Erick Erickson wrote: > > Walter: > > I have no idea what the

Re: Cluster with no overseer?

2019-05-22 Thread Erick Erickson
Walter: I have no idea what the root cause is here, this really shouldn’t happen. But the Overseer role (and I’m assuming you’re talking Solr’s Overseer) is assigned similarly to a shard leader, the same election process happens. All the election nodes are ephemeral ZK nodes. Solr’s Overseer

RE: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Russell Taylor
Hi Erick, Every time any of the replication nodes goes into recovery mode we start seeing queries which don't match the correct count. I'm being told zookeeper will give me the correct node (Not one in recovery), but I want to prove it as the query issue only comes up when any of the nodes are

Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Pushkar Raste
I know this is a long shot. I am trying move from Solr4 to Solr7. Reindexing all the data from the source is difficult to do in a reasonable time. All the fields are of basic types like int, long, float, double, Boolean, date, string. Since these fields don’t have analyzers, I was wondering if

Re: Usage of docValuesFormat

2019-05-22 Thread Erick Erickson
> On May 22, 2019, at 12:51 AM, vishal patel > wrote: > > We enabled the DocValues on some schema fields for sorting and faceting query > result. > Is it necessary to add docValuesFormat for faster query process? Only if you sort/facet or group. And queries won’t necessarily be faster

Re: Unable to upgrade Lucene 6.x index using IndexUpgrader

2019-05-22 Thread Erick Erickson
Anticipating your next question “why can’t you use an index created 2 or more versions ago”, these two quotes were helpful for me to get my head around it. It’s _always_ been the case that going from X to X+2 has been unsupported, it’s just that the failures wouldn’t necessarily be obvious.

Re: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Erick Erickson
Why do you want to know? You’ve asked how do to X without telling us what problem Y you’re trying to solve (the XY problem) and frequently that leads to a lot of wasted time….. Under the covers CloudSolrClient uses a pretty simple round-robin load balancer to pick a Solr node to send the query

Re: CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Jörn Franke
You have to provide the addresses of the zookeeper ensemble - it will figure it out on its own based on information in Zookeeper. > Am 22.05.2019 um 14:38 schrieb Russell Taylor : > > Hi, > Using CloudSolrClient, how do I find the node (I have 3 nodes for this > collection on our 6 node

CloudSolrClient (any version). Find the node your query has connected to.

2019-05-22 Thread Russell Taylor
Hi, Using CloudSolrClient, how do I find the node (I have 3 nodes for this collection on our 6 node cluster) the query has connected to. I'm hoping to get the full URL if possible. Regards Russell Taylor This message may contain confidential information and

Re: Graph query extremely slow

2019-05-22 Thread Toke Eskildsen
On Wed, 2019-05-15 at 21:37 -0400, Rahul Goswami wrote: > fq={!graph from=from_field to=to_field returnRoot=false} > > Executing _only_ the graph filter query takes about 64.5 seconds. The > total number of documents from this filter query is a little over 1 > million. I tried building an index

Re: Solr8.0.0 Performance Test

2019-05-22 Thread Kayak28
Hello, Shawn, Toke Eskildsen and Solr Community: It might be too late to share, but the URL below is what I would try to share as the attachment. Again, Solr8.0.0 is somehow better, but I am doubting that it might be too better?

Re: Unable to upgrade Lucene 6.x index using IndexUpgrader

2019-05-22 Thread Jan Høydahl
Same as I answered in SOLR-13487: Note that you'll probably need to re-index from scratch due to changes in 8.0, see https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/indexupgrader-tool.html

alias read access impossible for anyone other than admin?

2019-05-22 Thread Sotiris Fragkiskos
Hi everyone! I've been trying unsuccessfully to read an alias to a collection with a curl command. The command only works when I put in the admin credentials, although the user I want access for also has the required role for accessing. Is this perhaps built-in, or should anyone be able to access

Unable to upgrade Lucene 6.x index using IndexUpgrader

2019-05-22 Thread Henrik B A
I'm trying to upgrade a index from Lucene 6.x to 7.x, and then to 8.x, using IndexUpgrader [1]. But it never successfully upgrades to 7, and I cannot figure out why. I've also tried using CheckIndex [2] with the -exorcise option to fix the index first, but that doesn't help. Any ideas? I've

Usage of docValuesFormat

2019-05-22 Thread vishal patel
We enabled the DocValues on some schema fields for sorting and faceting query result. Is it necessary to add docValuesFormat for faster query process? Which one should better? docValuesFormat="Memory" or docValuesFormat="Disk"? Note: Our indexed data size are high in one collection and different

CloudSolrClient Sesion

2019-05-22 Thread Rainman Sián
Hello all, I'm writing a Solr solution for a quite big project. To do so, I wrote an OSGi service that provides the add/delete/query functionalities to Solr. It works just fine, but we have a kind of Session issues along logs increasing size on server due that the application servers throws this