int on this function, can this boost be done in a different way?
Any pointers will be appreciated.
Thanks,
Shamik
to
our organization (not sure of kstem filter can do that).
Any pointers will be appreciated.
Regards,
Shamik
er can do that).
Any pointers will be appreciated.
Regards,
Shamik
ppreciated.
Thanks,
Shamik
72 is ignored and what'll be the best way to address this scenario?
Any pointers will be appreciated.
Thanks,
Shamik
ppreciated.
Thanks,
Shamik
Thanks Jan, I was not aware of this, appreciate your help.
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Ahemad, I don't think its related to the field definition, rather looks like
an inherent bug. For the time being, I created a copyfield which uses a
custom regex to remove whitespace and special characters and use it in the
function. I'll debug the source code and confirm if it's bug, will raise a
I'm using Solr 7.5, here's the query:
q=line=language:"english"=Source2:("topicarticles"+OR+"sfdcarticles")=url,title=ADSKFeature:"CUI+(Command)"^7=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2+if(termfreq(ADSKFeature,'CUI
(Command)'),log(CaseCount),sqrt(CaseCount))=10
--
Sent from:
Edwin,
The field is a string type, here's the field definition.
-Shamik
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
iated.
Thanks,
Shamik
I'm still pretty clueless trying to find the root cause of this behavior. One
thing is pretty consistent that whenever a node restarts up and sends a
recovery command, the recipient shard/replica goes down due to sudden surge
in old gen heap space. Within minutes, it hits the ceiling and stall the
Thanks Eric. I guess I was not clear when I mentioned that I had stopped the
indexing process. It was just a temporary step to make sure that we are not
adding any new data when the nodes are in a recovery mode. The 10 minute
hard commit is carried over from our 6.5 configuration which actually
Erick,
Thanks for your input. All our fields (for facet, group & sort) have
docvalues enabled since 6.5. That includes the id field. Here's the field
cache entry:
CACHE.core.fieldCache.entries_count:0
CACHE.core.fieldCache.total_size: 0 bytes
Based on whatever I've seen so far,
t cache has gone up (0.61). It used
to be 0.9 and 0.3 in Solr 6.5.
Not sure what we are missing here in terms of Solr upgrade to 7.5 I can
provide other relevant information.
Thanks,
Shamik
Thanks Erick, appreciate your help
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Thanks Erick, that's extremely insightful. I'm not using batching and that's
the reason I was exploring ConcurrentUpdateSolrClient. Currently, N threads
are reusing the same CloudSolrClient to send data to Solr. Ofcourse, the
single point of failure was my biggest concern with
Hi,
I'm looking into the possibility of using ConcurrentUpdateSolrClient for
indexing a large volume of data instead of CloudSolrClient. Having an
async,batch API seems to be a better fit for us where we tend to index a
lot of data periodically. As I'm looking into the API, I'm wonderign if
The Solr uses REST based calls which is done over http or https which
cannot handle multiple requests at one shot. However what you can do is
return all the necessary data at one shot and group them according to your
needs.
Thanks and regards,
Shamik
On 02-Oct-2018 8:11 PM, "Greenhorn T
You may try to use tesseract tool to check data extraction from pdf or
images and then go forward accordingly. As far as I understand the PDF is
an image and not data. The searchable PDF actually overlays the selectable
text as hidden text over the PDF image. These PDFs can be indexed and
Hi,
I'm having issues using multiple terms in Solr function queries. For e.g.
I'm trying to use the following bf function using termfreq
bf=if(termfreq(ProductLine,'Test Product'),5,0)
This throws org.apache.solr.search.SyntaxError: Missing end to unquoted
value starting at 28
To index text in images the image needs to be searchable i. e. text needs
to be overlayed on the image like a searchable pdf. You can do this using
ocr but it is a bit unreliable if the images are scanned copies of written
text.
On 10-Apr-2018 4:12 PM, "Rahul Singh"
tools. Check the url for the same.
Then based on your requirement decide whether to use dih or oob indexing
Thanks and regards,
Shamik
On Mon 19 Mar, 2018, 1:02 PM Khalid Moustapha Askia, <
m.askiakha...@gmail.com> wrote:
> Hi. I am trying to index some data with Solr by using SolrJ. B
xceed more than 500k documents.
Any pointers will be appreciated.
Thanks,
Shamik
Thanks Emir and Zisis.
I added the maxRamMB for filterCache and reduced the size. I could the
benefit immediately, the hit ratio went to 0.97. Here's the configuration:
It seemed to be stable for few days, the cache hits and jvm pool utilization
seemed to be well within expected range. But
Zisis, thanks for chiming in. This is really an interesting information and
probably in line what I'm trying to fix. In my case, the facet fields are
certainly not high cardinal ones. Most of them have a finite set of data,
the max being 200 (though it has a low usage percentage). Earlier I had
Thanks Eric, in my case, each replica is running on it's own JVM, so even if
we consider 8gb of filter cache, it still has 27gb to play with. Isn't this
is a decent amount of memory to handle the rest of the JVM operation?
Here's an example of implicit filters that get applied to almost all the
Thanks Emir. The index is equally split between the two shards, each having
approx 35gb. The total number of documents is around 11 million which should
be distributed equally among the two shards. So, each core should take 3gb
of the heap for a full cache. Not sure I get the "multiply it by
rt the instance,
it goes into recovery mode and updates it's index with the delta, which is
understandable.But at the same time, the other replica in the same shard
stalls and goes offline. This starts a cascading effect and I've to end up
restarting all the nodes.
Any pointers will be appreciated.
Thanks,
Shamik
Hi,
I'm seeing this random Authentication failure in our Solr Cloud cluster
which is eventually rendering the nodes in "down" state. This doesn't seem
to have a pattern, just starts to happen out of the blue. I've 2 shards,
each having two replicas. They are using Solr basic authentication
Susheel, my inference was based on the Qtime value from Solr log and not
based on application log. Before the CPU spike, the query time didn’t give
any indication that they are slow in the process of slowing down. As the GC
suddenly triggers a high CPU usage, query execution slows down or chocks,
I usually log queries that took more than 1sec. Based on the logs, I haven't
seen anything alarming or surge in terms of slow queries, especially around
the time when the CPU spike happened.
I don't necessarily have the data for deep paging, but the usage of sort
parameter (date in our case) has
All the tuning and scaling down of memory seemed to be stable for a couple of
days but then came down due to a huge spike in CPU usage, contributed by G1
Old Generation GC. I'm really puzzled why the instances are suddenly
behaving like this. It's not that a sudden surge of load contributed to
Emir, after digging deeper into the logs (using new relic/solr admin) during
the outage, it looks like a combination of query load and indexing process
triggered it. Based on the earlier pattern, memory would tend to increase at
a steady pace, but then surge all of a sudden, triggering OOM. After
Thanks, the change seemed to have addressed the memory issue (so far), but on
the contrary, the GC chocked the CPUs stalling everything. The CPU
utilization across the cluster clocked close to 400%, literally stalling
everything.On a first look, the G1-Old generation looks to be the culprit
that
I agree, should have made it clear in my initial post. The reason I thought
it's little trivial since the newly introduced collection has only few
hundred documents and is not being used in search yet. Neither it's being
indexed at a regular interval. The cache parameters are kept to a minimum as
Walter, thanks again. Here's some information on the index and search
feature.
The index size is close to 25gb, with 20 million documents. it has two
collections, one being introduced with 6.6 upgrade. The primary collection
carries the bulk of the index, newly formed one being aimed at getting
Thanks for your suggesting, I'm going to tune it and bring it down. It just
happened to carry over from 5.5 settings. Based on Walter's suggestion, I'm
going to reduce the heap size and see if it addresses the problem.
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Apologies, 290gb was a typo on my end, it should read 29gb instead. I started
with my 5.5 configurations of limiting the RAM to 15gb. But it started going
down once it reached the 15gb ceiling. I tried bumping it up to 29gb since
memory seemed to stabilize at 22gb after running for few hours, of
. Does 6.6 command more memory
than what is currently available on our servers (30gb)? What might be the
probable cause for this sort of scenario? What are the best practices to
troubleshoot such issues?
Any pointers will be appreciated.
Thanks,
Shamik
for reference to show what I'm trying to achieve.
Any pointers will be helpful.
Thanks,
Shamik
Any suggestion?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4347068.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Koji,
I'm using a copy field to preserve the original term with stopword. It's
mapped to titleExact.
textExact definition:
Thanks Koji, I've tried KeywordRepeatFilterFactory which keeps the original
term, but the Stopword filter in the analysis chain will remove it
nonetheless. That's why I thought of creating a separate field devoiding of
stopwords/stemmers. Let me know if I'm missing something here.
--
View this
reciate if you someone can provide pointers. If there's a different
approach to solving this issue, please let me know.
Thanks,
Shamik
Charlie, this looks something very close to what I'm looking for. Just
wondering if you've made this available as a jar or can be build from
source? Our Solr distribution is not built from source, I can only use an
external jar. I'll appreciate if you can let me know.
--
View this message in
Charlie, thanks for sharing the information. I'm going to take a look and get
back to you.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-combine-third-party-search-data-as-top-results-tp4318116p4318349.html
Sent from the Solr - User mailing list archive at
Thanks, John.
The title is not unique, so I can't really rely on it. Also, keeping an
external mapping for url and id might not feasible as we are talking about
possibly millions of documents.
URLs are unique in our case, unfortunately, it can't be used as part of
Query elevation component since
a way to combine step 2 and 3
in a single query or a different approach altogether?
Any pointers will be appreciated.
-Thanks,
Shamik
Anyone ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Information-on-classifier-based-key-word-suggestion-tp4314942p4315492.html
Sent from the Solr - User mailing list archive at Nabble.com.
to be limited only providing taxonomy data which needs to be provided
as a flat text. Few people suggested using classifiers like Naive Bayes
classifier or other machine learning tools.
I'll appreciate if anyone can provide some direction in this regard.
Thanks,
Shamik
Thanks for the pointer Alex . I'll go through all four articles, thanksgiving
will be fun :-)
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-include-facet-fields-in-keyword-search-tp4306967p4307020.html
Sent from the Solr - User mailing list archive at Nabble.com.
xt, title, keyword, etc),
it's not returning any data.
We've a large set of facet fields, I would ideally like to avoid adding
them as part of the searchable list. Just wondering if there's a better
way to handle this situation. Any pointers will be appreciated.
Thanks,
Shamik
You can try something like :
query.add("json.facet", your_json_facet_query);
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrJ-doesn-t-work-with-Json-facet-api-tp4299867p4299888.html
Sent from the Solr - User mailing list archive at Nabble.com.
Did you take a look at Collapsin Query Parser ?
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272p4298305.html
Sent from the Solr - User mailing
Thanks again Alex.
I should have clarified the use of browse request handler. The reason I'm
simulating the request handler parameters of my production system using
browse. I used a separate request handler, stripped down all properties to
match "select". I finally narrowed down the issue to
Sorry to bump this up, but can someone please explain the parsing behaviour
of a join query (show above) in respect to different request handler ?
--
View this message in context:
Thanks Alex, this has been extremely helpful. There's one doubt though.
The query returns expected result if I use "select" or "query" request
handler, but fails for others. Here's the debug output from "/select" using
edismax.
Thanks for getting back on this. I was trying to formulate a query in similar
lines but not able to construct it (multiple clauses) correctly so far. That
can be attributed to my inexperience with Solr queries as well. Can you
please point to any documentation / example for my reference ?
Thanks Alex. With the conventional join query I'm able to return the parent
document based on a query match on the child. But, it filters out any other
documents which are outside the scope of join condition. For e.g. in my
case, I would expect the query to return :
1
Parent title
appreciated.
Thanks,
Shamik
Anyone ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Inventor-template-vs-Inventor-template-issue-with-hyphen-tp4293357p4293489.html
Sent from the Solr - User mailing list archive at Nabble.com.
icles^9.0 Source2:downloads^5.0
1.0/(3.16E-11*float(ms(const(147216960),date(PublishDate)))+1.0)
The part I'm confused is why the two queries are being interpreted
differently ?
Thanks,
Shamik
--
View this message in context:
http://lucene.472066.n3.nabble.com/Inventor-template-vs-Inven
642158 text:inventor
0.098686054 Source2:CloudHelp
0.009136423
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)
I'm using edismax.
Just wondering what I'm missing here. Any help will be appreciated.
Regards,
Shamik
Thanks for all the pointers. With 50% discount, picking a copy is a
no-brainer
--
View this message in context:
http://lucene.472066.n3.nabble.com/ANN-Relevant-Search-by-Manning-out-Thanks-Solr-community-tp4283667p4284107.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Doug,
Congratulations on the release, I guess, lot of us have been eagerly
waiting for this. Just one quick clarification. You mentioned that the
examples in your book are executed against elasticsearch. For someone
familiar with Solr, will it be an issue to run those examples in a Solr
Anyone ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multiple-context-field-filters-in-Solr-suggester-tp4283739p4283894.html
Sent from the Solr - User mailing list archive at Nabble.com.
y pointers will be appreciated.
Thanks,
Shamik
anyone ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solrj-Basic-Authentication-randomly-failing-request-has-come-without-principal-tp4277342p4277533.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I'm facing this issue where SolrJ calls are randomly failing on basic
authentication. Here's exception:
ERROR923629[qtp466002798-20] -
org.apache.solr.security.PKIAuthenticationPlugin.doAuthenticate(PKIAuthenticationPlugin.java:125)
- Invalid key
INFO923630[qtp466002798-20] -
Hi,
I'm trying to update the set-property option in security.json
authentication section. As per the documentation,
"Set arbitrary properties for authentication plugin. The only supported
property is 'blockUnknown'"
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
Ok, I found another way of doing it which will preserve the QueryResponse
object. I've used DefaultHttpClient, set the credentials and finally passed
it as a constructor to the CloudSolrClient.
*DefaultHttpClient httpclient = new DefaultHttpClient();
UsernamePasswordCredentials defaultcreds = new
onse or UpdateResponse objects instead.
Any pointers will be appreciated.
-Thanks,
Shamik
Brian,
Thanks for your reply. My first post was bit convoluted, tried to explain
the issue in the subsequent post. Here's a security JSON. I've solr and
beehive assigned the admin role which allows them to have access to "update"
and "read". This works as expected. I add a new role "browseRole"
Anyone ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Issues-with-Authentication-Role-based-authorization-tp4276024p4276153.html
Sent from the Solr - User mailing list archive at Nabble.com.
"v": 2
}
}
}
And authorization:
{
"responseHeader": {
"status": 0,
"QTime": 0
},
"authorization.enabled": true,
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"user-role": {
"solr": "admin",
"superuser": [
"browseRole",
"selectRole"
],
"beehive": [
"browseRole",
"selectRole"
]
},
"permissions": [
{
"name": "security-edit",
"role": "admin"
},
{
"name": "select",
"collection": "gettingstarted",
"path": "/select/*",
"role": "selectRole"
},
{
"name": "browse",
"collection": "gettingstarted",
"path": "/browse",
"role": "browseRole"
}
],
"": {
"v": 7
}
}
}
I was under the impression that these roles are independent of each other,
based on the assignment, individual user should be able to access their
respective areas. On a related note, I was not able to make roles like
"all", "read" work.
Not sure what I'm doing wrong here. Any feedback will be appreciated.
Thanks,
Shamik
--
View this message in context:
http://lucene.472066.n3.nabble.com/Issues-with-Authentication-Role-based-authorization-tp4276024p4276056.html
Sent from the Solr - User mailing list archive at Nabble.com.
mode "solr start -e cloud -noprompt"
2. zkcli.bat -zkhost localhost:9983 -cmd putfile /security.json
security.json
3. tried http://localhost:8983/solr/gettingstarted/browse , provided
dev/password but I'm getting the following exception:
[c:gettingstarted s:shard2 r:core_node3 x:gettingstarted_shard2_replica2]
org.apache.solr.servlet.HttpSolrCall; USER_REQUIRED auth header Basic
c29scjpTb2xyUm9ja3M= context : userPrincipal: [[principal: solr]] type:
[UNKNOWN], collections: [gettingstarted,], Path: [/browse] path : /browse
params :
Looks like I'm using the wrong way of generating the password.
solr/SolrRocks works as expected.
Also, sure what's wrong with the "readRole" . It doesn't seem to work when
I try with user "solr".
Any pointers will be appreciated.
-Thanks,
Shamik
the reverse
support through ChildDocTransformerFactory
Just wondering if there's a way to address query in a different way. Any
pointers will be appreciated.
-Thanks,
Shamik
Thanks Shawn and Alessandro. I get the part why id is needed. I was trying to
compare with the "mlt" request handler which doesn't enforce such
constraint. My previous example of title/keyword is not the right one, but I
do have fields which are unique to each document and can be used as a key to
Thanks Alessandro, that answers my doubt. in a nutshell, to make MLT Query
parser work, you need to know the document id. I'm just curious as why this
constraint has been added. This will not work for a bulk of use cases. For
e.g. if we are trying to generate MLT based on a text or a keyword, how
mlt documents based on a "keyword"
field. With the new query parser,I'm not able to see a way to use another
field except for id. Is this a constraint? Or there's a different syntax?
Any pointers will be appreciated.
Thanks,
Shamik
y pointers will be appreciated.
-Thanks,
Shamik
preferIPv4Stack=true
-Dlog4j.configuration=file:/mnt/ebs2/solrhome/log4j.properties
-Dsolr.autoCommit.maxTime=6 -Dsolr.clustering.enabled=true
Not sure what's going wrong. Any pointers will be appreciated.
-Thanks,
Shamik
Thanks Eric and Walter, this is extremely insightful. One last followup
question on composite routing. I'm trying to have a better understanding of
index distribution. If I use language as a prefix, SolrCloud guarantees that
same language content will be routed to the same shard. What I'm curious
Thanks a lot, Erick. You are right, it's a tad small with around 20 million
documents, but the growth projection around 50 million in next 6-8 months.
It'll continue to grow, but maybe not at the same rate. From the index size
point of view, the size can grow up to half a TB from its current
collection for English and one for rest of the
languages.
Any pointers on this will be highly appreciated.
Regards,
Shamik
Doug, do we've a date for the hard copy launch?
--
View this message in context:
http://lucene.472066.n3.nabble.com/understand-scoring-tp4260837p4260860.html
Sent from the Solr - User mailing list archive at Nabble.com.
David, this is tad weird. I've seen this error if you turn on docvalues for
an existing field. You can running an "optimize" on your index and see if it
helps.
--
View this message in context:
http://lucene.472066.n3.nabble.com/docValues-error-tp4260408p4260455.html
Sent from the Solr - User
I tried the function query route, but getting a weird exception.
*bf=if(termfreq(ContentGroup,'Developer Doc'),-20,0)* throws an exception
*org.apache.solr.search.SyntaxError: Missing end quote for string at pos 29
str='if(termfreq(ContentGroup,'Developer'* . Does it only accept single word
or
Thanks Walter, I've tried this earlier and it works. But the problem in my
case is that I've boosting on few Source parameters as well. My ideal "bq"
should like this:
*bq=Source:simplecontent^10 Source:Help^20 (*:*
-ContentGroup-local:("Developer"))^99*
But this is not going to work.
I'm
Emir, I don't Solr supports a negative boosting *^-99* syntax like this. I
can certainly do something like:
bq=(*:* -ContetGroup:"Developer's Documentation")^99 , but then I can't have
my other bq parameters.
This doesn't work --> bq=Source:simplecontent^10 Source:Help^20 (*:*
Binoy, 0.1 is still a positive boost. With title getting the highest weight,
this won't make any difference. I've tried this as well.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259552.html
Sent from the Solr - User mailing list archive at
Hi Emir,
I've a bunch of contentgroup values, so boosting them individually is
cumbersome. I've boosting on query fields
qf=text^6 title^15 IndexTerm^8
and
bq=Source:simplecontent^10 Source:Help^20
(-ContentGroup-local:("Developer"))^99
I was hoping
es.graphics". The boost on title pushes these documents at the
top.
What I'm looking is to see if there's a way deboost all documents that are
tagged with ContentGroup:"Developer" irrespective of the term occurrence is
text or title.
Any pointers will be appreciated.
Thanks,
Shamik
That's what I observed as well. Perhaps there's a way to customize
SignatureUpdateProcessorFactory to support my use case. I'll look into the
source code and figure if there's a way to do it.
--
View this message in context:
Thanks Markus. I've been using field collapsing till now but the performance
constraint is forcing me to think about index time de-duplication. I've been
using a composite router to make sure that duplicate documents are routed to
the same shard. Won't that work for SignatureUpdateProcessorFactory
Thanks Scott. I could directly use field collapsing on adskdedup field
without the signature field. Problem with field collapsing is the
performance overhead. It slows down the query to 10 folds.
CollapsingQParserPlugin is a better option, unfortunately, it doesn't
support ngroups equivalent,
Thanks for your reply. Have you customized SignatureUpdateProcessorFactory or
are you using the configuration out of the box ? I know it works for simple
dedup, but my requirement is tad different as I need to tag an identifier to
the latest document. My goal is to understand if that's possible
wondering if this is achievable by perhaps extending
UpdateRequestProcessorFactory or
customizing SignatureUpdateProcessorFactory ?
Any pointers will be appreciated.
Regards,
Shamik
Hi Kevin,
Were you able to get a workaround / fix for your problem ? I'm also
looking to secure Collection and Update APIs by upgrading to 5.3. Just
wondering if it's worth the upgrade or should I wait for the next version,
which will probably address this.
Regards,
Shamik
--
View
1 - 100 of 209 matches
Mail list logo