Hi, list,
A quick question, we have tons of Microsoft docx/PDFs files( some PDFs
are scanned copies), and we want to populate into Apache solr and search a
few keywords that contain in the files and return filenames accordingly.
# it's the same thing as `grep -r KEYWORD /PATH/XXX` in
Hi,
I am trying to install Solr for my Windows Server 2016 Standard edition. .
While the installation of Solr itself succeeds, I am not able to get it
running.
Everytime after installation and starting the service
“SolrCore Initialization Failures {{Core}}: {{error}}”
I am not sure what the
1> I strongly recommend you re-index into a new collection and switch to it
with a collection alias rather than try to re-index all the docs. Segment
merging with the same field with dissimilar definitions is not guaranteed to do
the right thing.
2> No. There a few (very few) things that don’t
Just don’t ask for them. Or you saying that users can specify arbitrary fields
to facet on and you want to prevent certain fields from being possible?
No, there’s no good way to do that in solrconfig.xml. You could write a query
component that stripped out certain fields from the facet.field
You might get some pointer from the Luke code….
All in all I’d focus on re-indexing somehow. Unless the original documents are
just totally impossible to find again it’s probably easier.
Best,
Erick
> On May 22, 2019, at 3:30 PM, Shawn Heisey wrote:
>
> On 5/22/2019 3:51 PM, Pushkar Raste
On 5/22/2019 3:51 PM, Pushkar Raste wrote:
Looks like giving Luke a shot is the answer. Can you point me to an example
to extract the fields from inverted Index using Luke.
Luke is a GUI application that can view the Lucene index in considerable
detail. To use Luke directly, you'd have to
Hello Solr Expert,
How are you?
Am trying to ignore faceting for some of the fields. Can you please help me
out to ignore faceting using solrconfig.xml.
I tried but I can ignore faceting all the fields that useless. I'm trying
to ignore some specific fields.
Really Appreciate your help for the
Hi there -
Our facet counts are incorrect for a particular field and I suspect it is
because we changed the type of the field from StrField to TextField. Two
questions:
1. If we do re-index all the documents in the index, would these counts get
fixed?
2. Is there a "safe" way of changing field
We have only a handful of fields that are stored and many (non Text) fields
which are neither stored nor have docValues :-(
Looks like giving Luke a shot is the answer. Can you point me to an example
to extract the fields from inverted Index using Luke.
On Wed, May 22, 2019 at 11:52 AM Erick
Hello User Group,
we run Solr with HDFS and got a lot of the following warning:
Slow ReadProcessor read fields took 15093ms (threshold=1ms); ack:
seqno: 3 reply: SUCCESS reply: SUCCESS reply: SUCCESS
downstreamAckTimeNanos: 798309 flag: 0 flag: 0 flag: 0, targets:
You have to be a little careful here, one thing I learned relatively recently
is that there are in-memory structures that hold pointers to _all_
un-searchable docs (i.e. no new searchers have been opened since the doc was
added/updated) to support real-time get. So if you’re indexing a _lot_ of
On 5/22/2019 10:47 AM, Russell Taylor wrote:
I will add that we have set commits to be only called by the loading program.
We have turned off soft and autoCommits in the solrconfig.xml.
Don't turn off autoCommit. Regular hard commits, typically with
openSearcher set to false so they don't
Thanks Eric,
I will add that we have set commits to be only called by the loading program.
We have turned off soft and autoCommits in the solrconfig.xml.
This is so when we upload, we move from one list of docs to the new list in one
atomic operation (delete, add and then commit).
I'll also
I upgraded my AD solr cloud environment from solr5.3 to 6.0 and everything
worked fine.
But when i did QA, after the upgrade stats and facet queries are not working
as expected.
when I run
q=APPL_TOKN_ID_s:testApplication2=100=CTNT_FILE_PATH_NM_s=CTNT_FILE_PATH_NM_s=0=1=true
am not getting any
Hi - according to the documentation here:
https://lucene.apache.org/solr/guide/7_6/schema-api.html
The V2 API is located at api/cores/collection/schema
However the documentation here:
https://lucene.apache.org/solr/guide/7_6/v2-api.html
has it at api/c/collection/schema
I believe the later is
110 isn’t all that many, well within the normal range _assuming_ that they are
being processed…. When you restart Solr, every state change operation writes an
operation to the work queue which can mount up.
Perhaps you’re hitting: https://issues.apache.org/jira/browse/SOLR-13416?
In which case
The ZK ensemble appears to be OK. It is the Solr-related stuff that is borked.
There are 110 items in /overseer/collection-queue-work/, which doesn’t seem
healthy.
If it is really hosed, I’ll shut down all the nodes, clean out the files in
Zookeeper and start over.
wunder
Walter Underwood
Hi everyone,
I need some advice how to debug slow soft commits.
We use Solr for searches in a DAM system and in similar setups, soft
commits take about one to two seconds, in this case nearly ten seconds.
Solr runs on a dedicated VM with eight cores and 64 GB RAM (16G heap),
which is common
OK, now we’re cooking with oil.
First, nodes in recovery shouldn’t make any difference to a query. They should
not serve any part of a query so I think/hope that’s a red herring. At worst a
node in recovery should pass the query on to another replica that is _not_
recovering.
When you’re
Good luck, this kind of assumes that your ZK ensemble is healthy of course...
> On May 22, 2019, at 8:23 AM, Walter Underwood wrote:
>
> Thanks, we’ll try that. Bouncing one Solr node doesn’t fix it, because we did
> a rolling restart yesterday.
>
> wunder
> Walter Underwood
>
Well, if they’re all docValues or stored=true, sure. It’d be kind of slow.. The
short form is “if you can specify fl=f1,f2,f3…. for all your fields and see all
your values, then it’s easy if slow”.
If that works _and_ you are on Solr 4.7+ cursorMark will help the “deep paging”
issue.
If
Thanks, we’ll try that. Bouncing one Solr node doesn’t fix it, because we did a
rolling restart yesterday.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On May 22, 2019, at 8:21 AM, Erick Erickson wrote:
>
> Walter:
>
> I have no idea what the
Walter:
I have no idea what the root cause is here, this really shouldn’t happen. But
the Overseer role (and I’m assuming you’re talking Solr’s Overseer) is assigned
similarly to a shard leader, the same election process happens. All the
election nodes are ephemeral ZK nodes.
Solr’s Overseer
Hi Erick,
Every time any of the replication nodes goes into recovery mode we start
seeing queries which don't match the correct count. I'm being told zookeeper
will give me the correct node (Not one in recovery), but I want to prove it as
the query issue only comes up when any of the nodes are
I know this is a long shot. I am trying move from Solr4 to Solr7.
Reindexing all the data from the source is difficult to do in a reasonable
time. All the fields are of basic types like int, long, float, double,
Boolean, date, string.
Since these fields don’t have analyzers, I was wondering if
> On May 22, 2019, at 12:51 AM, vishal patel
> wrote:
>
> We enabled the DocValues on some schema fields for sorting and faceting query
> result.
> Is it necessary to add docValuesFormat for faster query process?
Only if you sort/facet or group. And queries won’t necessarily be faster
Anticipating your next question “why can’t you use an index created 2 or more
versions ago”, these two quotes were helpful for me to get my head around it.
It’s _always_ been the case that going from X to X+2 has been unsupported, it’s
just that the failures wouldn’t necessarily be obvious.
Why do you want to know? You’ve asked how do to X without telling us what
problem Y you’re trying to solve (the XY problem) and frequently that leads to
a lot of wasted time…..
Under the covers CloudSolrClient uses a pretty simple round-robin load balancer
to pick a Solr node to send the query
You have to provide the addresses of the zookeeper ensemble - it will figure it
out on its own based on information in Zookeeper.
> Am 22.05.2019 um 14:38 schrieb Russell Taylor :
>
> Hi,
> Using CloudSolrClient, how do I find the node (I have 3 nodes for this
> collection on our 6 node
Hi,
Using CloudSolrClient, how do I find the node (I have 3 nodes for this
collection on our 6 node cluster) the query has connected to.
I'm hoping to get the full URL if possible.
Regards
Russell Taylor
This message may contain confidential information and
On Wed, 2019-05-15 at 21:37 -0400, Rahul Goswami wrote:
> fq={!graph from=from_field to=to_field returnRoot=false}
>
> Executing _only_ the graph filter query takes about 64.5 seconds. The
> total number of documents from this filter query is a little over 1
> million.
I tried building an index
Hello, Shawn, Toke Eskildsen and Solr Community:
It might be too late to share, but the URL below is what I would try to
share as the attachment.
Again, Solr8.0.0 is somehow better, but I am doubting that it might be too
better?
Same as I answered in SOLR-13487:
Note that you'll probably need to re-index from scratch due to changes in 8.0,
see
https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/indexupgrader-tool.html
Hi everyone!
I've been trying unsuccessfully to read an alias to a collection with a
curl command.
The command only works when I put in the admin credentials, although the
user I want access for also has the required role for accessing.
Is this perhaps built-in, or should anyone be able to access
I'm trying to upgrade a index from Lucene 6.x to 7.x, and then to 8.x,
using IndexUpgrader [1]. But it never successfully upgrades to 7, and I
cannot figure out why.
I've also tried using CheckIndex [2] with the -exorcise option to fix the
index first, but that doesn't help.
Any ideas? I've
We enabled the DocValues on some schema fields for sorting and faceting query
result.
Is it necessary to add docValuesFormat for faster query process?
Which one should better? docValuesFormat="Memory" or docValuesFormat="Disk"?
Note: Our indexed data size are high in one collection and different
Hello all,
I'm writing a Solr solution for a quite big project. To do so, I wrote an
OSGi service that provides the add/delete/query functionalities to Solr.
It works just fine, but we have a kind of Session issues along logs
increasing size on server due that the application servers throws this
37 matches
Mail list logo