On 2/25/2015 5:50 AM, Benson Margulies wrote:
So, found the following line in the guide:
java -DzkRun -DnumShards=2
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -jar start.jar
using a completely clean, new, solr_home.
In my own bootstrap dir, I have my
As a general proposition, your first stop with any query interpretation
questions should be to add the debigQuery=true parameter and look at the
parsed_query in the query response which shows how the query is really
interpreted.
-- Jack Krupansky
On Wed, Feb 25, 2015 at 8:21 AM,
Moshe,
if you take a thread dump while a particular query stuck (via jstack of in
SolrAdmin tab), it may explain where exactly it's stalled, just check the
longest stack trace.
FWIW, in 4.x timeallowed is checked only while documents are collected, and
in 5 it's also checked during query
Hi there,
I'm looking for a library to connect Solr throught ODBC to Excel in order
to do some reporting on my Solr data?
Anybody knows a library for that?
Thanks.
--
Cordialement,
Best regards,
Hakim Benoudjit
On Wed, Feb 25, 2015 at 8:04 AM, Shawn Heisey apa...@elyograg.org wrote:
On 2/25/2015 5:50 AM, Benson Margulies wrote:
So, found the following line in the guide:
java -DzkRun -DnumShards=2
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -jar start.jar
using a
On 2/25/2015 5:21 AM, Moshe Recanati wrote:
We checked this option and it didn't solve our problem.
We're using https://github.com/healthonnet/hon-lucene-synonyms for query
based synonyms.
While running query with high number of words that have high number of
synonyms the query got stuck
Hi,
The facet component works with the whole result set, so you can't get the
facets for your topN documents. A naive way you can fulfill your
requirement is two implement it in two steps:
- Request your data and recover the doc ids.
- Create a new query with the selected ids (id:id1 OR
Hi Shawn,
thank you for your quick response. I will read your links and make some
tests.
Regards,
David Dávila
DIT - 915828763
De: Shawn Heisey apa...@elyograg.org
Para: solr-user@lucene.apache.org,
Fecha: 25/02/2015 13:23
Asunto: Re: Problem with queries that includes NOT
On
Hi, folks.
Currently KEYS file is present in:
- www.apache.org/dist/lucene/solr/version/KEYS
- www.apache.org/dist/lucene/solr/KEYS
- www.apache.org/dist/lucene/KEYS
Last two KEYS files are obsolete (both modified at Feb 2014).
Some actual keys used for release process aren't present in them.
I
Hi Shawn,
We checked this option and it didn't solve our problem.
We're using https://github.com/healthonnet/hon-lucene-synonyms for query based
synonyms.
While running query with high number of words that have high number of synonyms
the query got stuck and solr memory is exhausted.
We tried to
Hi all,
We are trying to deboost some documents while indexing depending on their
text available some thing like this
*doc boost=0.03 *
field name=pns![CDATA[Testing product
- Water Bottle. Testing product - Water Bottle. Testing product - Water
Bottle.
On 2/25/2015 4:04 AM, david.dav...@correo.aeat.es wrote:
We have problems with some queries. All of them include the tag NOT, and
in my opinion, the results don´t make any sense.
First problem:
This query NOT Proc:ID01returns 95806 results, however this one
NOT Proc:ID01 OR
So, found the following line in the guide:
java -DzkRun -DnumShards=2
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -jar start.jar
using a completely clean, new, solr_home.
In my own bootstrap dir, I have my own solrconfig.xml and schema.xml,
and I modified to
A little more data. Note that the cloud status shows the black bubble
for a leader. See http://i.imgur.com/k2MhGPM.png.
org.apache.solr.common.SolrException: No registered leader was found
after waiting for 4000ms , collection: rni slice: shard4
at
It's the zkcli options on my mind. zkcli's usage shows me 'bootstrap',
'upconfig', and uploading a solr.xml.
When I use upconfig, it might work, but it sure is noise:
benson@ip-10-111-1-103:/data/solr+rni$ 554331
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN
Hi Alex,
I get 1 error on start up
Is the error below serious:-
2/25/2015, 11:32:30 PM ERROR SolrCore
org.apache.solr.common.SolrException: undefined field text
org.apache.solr.common.SolrException: undefined field text
at
Hello David,
thanks for your answer. In the meantime I found the memory hint too in
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#Sorting_and_RelevancySo
Maybe we switch to LatLonType for this kind of searches. But the RPT is also
needed as we want to support search by arbitrary
Reading https://lucidworks.com/blog/document-expiration/
It seems that your Delete check interval granularity is 30 seconds,
but your TTL is 10 seconds. Have you tried setting
autoDeletePeriodSeconds to something like 2 seconds and seeing if the
problem goes away due to more frequent checking of
bq: And the data sync between leader/replica is always a problem
Not quite sure what you mean by this. There shouldn't need to be
any synching in the sense that the index gets replicated, the
incoming documents should be sent to each node (and indexed
to HDFS) as they come in.
bq: There is
I can't get the FileListEntityProcessor and TikeEntityProcessor to
correctly add a Solr document for each epub file in my local directory.
I've just downloaded Solr 5.0.0, on a Windows 7 PC. I ran solr start
and then solr create -c hn2 to create a new core.
I want to index a load of epub
Hi Trey,
Thanks for the detailed response and the link to the talk, it was very
informative.
Yes looking at the current system requirements ICUTokenizer might be the best
bet for our use case.
MultiTextField mentioned in the jira SOLR-6492 has some cool features and
definitely looking
Do I need a zkcli bootstrap or do I start with upconfig? What port does
zkRun put zookeeper on?
On Feb 25, 2015 10:15 AM, Shawn Heisey apa...@elyograg.org wrote:
On 2/25/2015 7:44 AM, Benson Margulies wrote:
Shawn, I _am_ starting from clean. However, I didn't find a recipe for
what you
HI Mikhail,
We're using 4.7.1. This means I can't stop the search.
I think this is mandatory feature.
Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile + 972-52-6194481
Skype : recanati
More at: www.kmslh.com | LinkedIn | FB
-Original Message-
From: Mikhail
On 2/25/2015 7:44 AM, Benson Margulies wrote:
Shawn, I _am_ starting from clean. However, I didn't find a recipe for
what you suggest as a process, and (following Hoss' suggestion) I
found the recipe above with the boostrap_confdir scheme.
I am mostly confused as to how I supply my
Hi,
The edismax parser should be able to manage the query you want to ask. I've
made a test and the next both queries give me the right result (see the
parenthesis):
- {!edismax}(NOT id:7 AND NOT id:8 AND id:9) (gives 1 hit
the id:9)
- {!edismax}((NOT id:7 AND NOT
Hi Alex,
Below shows that Solr is not getting anything from the text search.
I will try to search from / to and see hows the performance.
select BAD Error in IMAP command INBOX: Unknown command.
. select inbox
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded)
* OK [PERMANENTFLAGS
Which direction? You want import data from Solr into Excel? One off or
repeatedly?
For one off Solr - Excel, you could probably use Excel's Open from
Web and load data directly from Solr using CSV output format.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
Thanks for your answer.
For a one-off it seems like a nice way to import my data.
For an ODBC connection, the only solution I found is to replicate my Solr
data in Apache Hive (or Cassandra...), and then connect to that database
through ODBC.
2015-02-25 15:49 GMT+01:00 Alexandre Rafalovitch
Hi Alex,
Thanks for the suggestions. These steps will definitely help out with our use
case.
Thanks for the idea about the lengthFilter to protect our system.
Thanks,
Rishi.
-Original Message-
From: Alexandre Rafalovitch arafa...@gmail.com
To: solr-user
On 2/25/2015 8:35 AM, Benson Margulies wrote:
Do I need a zkcli bootstrap or do I start with upconfig? What port does
zkRun put zookeeper on?
I personally would not use bootstrap options. They are only meant to be
used once, when converting from non-cloud, but many people who use them
do NOT
Okay. Just to re-emphasize something I said but which may not have been
clear, it isn’t an either-or for filter sort. Filter with the spatial
field type that makes sense for filtering, sort (or boost) with the spatial
field type that makes sense for sorting. RPT sucks for distance sorting,
Bingo!
Here's the recipe for the record:
gcopts has the ton of gc options.
First, set up shop:
DIR=$PWD
cd ../solr-4.10.3/example
java -Xmx200g $gcopts DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
-Djetty.port=8983 -Dsolr.solr.home=/data/solr+rni/cloud_solr_home
-Dsolr.install.dir=/dat\
a/solr-4.10.3
This is very serious. You are missing a field called text. You have
a field _type_ called text, maybe that's where the confusion came
from. Is that something you configured in dovecote? Was it supposed to
be body or a catch-all field with copyFields into it?
I don't know Dovecote, but it is a
Alex,
Thanks for the suggestions. It always just indexes 1 doc, regardless of
the first epub file it sees. Debug / verbose don't show anything
obvious to me. I can include the output here if you think it would help.
I tried using the SimplePostTool first ( *java
Hello,
I'm trying to get Facet By Distance working on an index with LatLonType
fields. The schema is as follows:
fields
...
field name=trip_duration type=int indexed=true stored=true/
field name=start_station type=location indexed=true stored=true /
field name=end_station type=location
On 2/25/2015 9:31 AM, Nitin Solanki wrote:
I want to search lakhs of queries/terms concurrently.
Is there any technique to do multiprocessing on Solr?
Is Solr is capable to handle this situation?
I wrote a code in python that do multiprocessing and search lakhs of
queries and
: Following query posts a document and sets expire_at_dt explicitly. That
: is working perfectly ok and ducument expires at defined time.
so the delete trigge logic is working correctly...
: But when trying to post with TTL (following query), document does not
: expire after given time.
Nitin Solanki [nitinml...@gmail.com] wrote:
I want to search lakhs of queries/terms concurrently.
Is there any technique to do multiprocessing on Solr?
Each concurrent search in Solr runs in its own thread, so the answer is yes, it
does so out of the box with concurrent
Hello,
I want to search lakhs of queries/terms concurrently.
Is there any technique to do multiprocessing on Solr?
Is Solr is capable to handle this situation?
I wrote a code in python that do multiprocessing and search lakhs of
queries and do hit on Solr simultaneously/ parallely
What about recursive=true? Do you have subdirectories that could
make a difference. Your SimplePostTool would not look at
subdirectories (great comparison, BTW).
However, you do have lots of mapping options as well with
/update/extract handler, look at the example and documentations. There
is
Some time ago I encounter https://github.com/kawasima/solr-jdbc never tried
it.Anyway, it doesn't help to connect from odbc.
On top of my head, is
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets but
it returns only JSON, not csv. That's I wonder why.
Seems like a dead end
On 2/25/2015 9:03 AM, Benson Margulies wrote:
It's the zkcli options on my mind. zkcli's usage shows me 'bootstrap',
'upconfig', and uploading a solr.xml.
When I use upconfig, it might work, but it sure is noise:
benson@ip-10-111-1-103:/data/solr+rni$ 554331
Try removing that first epub from the directory and rerunning. If you
now index 0 documents, then there is something unexpected about them
and DIH skips. If it indexes 1 document again but a different one,
then it is definitely something about the repeat logic.
Also, try running with debug and
rebecca,
you probably need to dig into your queries, but if you want to force/preload
the index into memory you could try doing something like
cat `find /path/to/solr/index` /dev/null
if you haven't already reviewed the following, you might take a look here
Unfortunately (or luckily, depending on view), attachments does not work with
this mailing list. You'll have to upload it somewhere and provide an URL. It is
quite hard _not_ to get your whole index into disk cache, so my guess is that
it will get there eventually. Just to check: If you
I am also confused on this. Is adding replicas going to increase search
performance? I'm not sure I see the point of any replicas when using
HDFS. Is there one?
Thank you!
-Joe
On 2/25/2015 10:57 AM, Erick Erickson wrote:
bq: And the data sync between leader/replica is always a problem
Sorry, I should have been more specific.
I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.
I attached the admin UI screen shot
Thanks a lot Alex...
I thought about dynamic fields and will also explore the suggested
options...
On Wed, Feb 25, 2015 at 1:40 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
Several ways. Reading through tutorials should help to get the
details. But in short:
1) Map them to dynamic
Several ways. Reading through tutorials should help to get the
details. But in short:
1) Map them to dynamic fields using prefixes and/or suffixes.
2) Use dynamic schema which will guess the types and creates the
fields based on first use
Something like SIREn might also be of interest:
Hi,
Thank you for your reply. I added a filter query to the query in two ways
as follows:
fq={!geofilt}sfield=start_stationpt=40.71754834,-74.01322069facet.query={!frange
l=0.0 u=0.1}geodist()facet.query={!frange l=0.10001 u=0.2}geodist()d=0.2
-- returns 0 docs
No. You can, but only search (collecting results) and not a query expansion.
As I said, debugQuery=true, and the stacktrace or sampling can help to
understand the reason.
On Wed, Feb 25, 2015 at 5:45 PM, Moshe Recanati mos...@kmslh.com wrote:
HI Mikhail,
We're using 4.7.1. This means I can't
Hi,
This will “return all the documents in the index” because you did nothing
to filter them out. Your query is *:* (everything) and there are no filter
queries.
~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley
On Wed, Feb 25, 2015
Hi,
Just wondering if there is a way to handle this use-case in SOLR without
manually editing Schema.xml.
Scenario :
We have xml data with some elements/ attributes which we plan to index.
As we move forward there can be addition of xml elements.
Is there a way to handle this with out manually
Thanks for the two links.
The first one could be helpful if it works.
Regarding the second one, I think it's quite similar to using /select to
return json format.
2015-02-25 19:10 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com:
Some time ago I encounter
I’m working with term vectors via solr.
Is there a way to configure the RealTimeGetHandler to return tv info?
Here is my environment info:
Scotts-MacBook-Air-2:solr_jetty scottccote$ java -version
java version 1.8.0_31
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM)
If ‘q’ is absent, then you always match nothing (there may be exceptions?);
so it’s sort of required, in effect. I wish it defaulted to *:*.
~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley
On Wed, Feb 25, 2015 at 2:28 PM, Ahmed
Solr also now has a schema API to dynamically edit the schema without the
need to manually edit the schema file:
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaDynamicFieldRule
-- Jack Krupansky
On Wed, Feb 25, 2015 at 3:15 PM, Vishal Swaroop vishal@gmail.com
On Wed, Feb 25, 2015 at 10:31 PM, Hakim Benoudjit h.benoud...@gmail.com
wrote:
Thanks for the two links.
The first one could be helpful if it works.
Regarding the second one,
I think it's quite similar to using /select to
return json format.
not really. /export yields much more data
Hi Rajesh,
That was very helpful. Based on your experience, I dug deeper into it and
figured out that it does attempt to return collations for single term queries
in my configuration as well. However, in the test cases I have been using,
the suggested correction never gets any hits.
Before diving in too deeply, try attaching debug=timing to the query.
Near the bottom of the response there'll be a list of the time taken
by each _component_. So there'll be separate entries for query,
highlighting, etc.
This may not show any surprises, you might be spending all your time
In the examples it used to default to *:* with default params, which
caused even more confusion.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 25 February 2015 at 15:21, david.w.smi...@gmail.com
david.w.smi...@gmail.com
Hi Rishi,
As others have indicated Multilingual search is very difficult to do well.
At HathiTrust we've been using the ICUTokenizer and ICUFilterFactory to
deal with having materials in 400 languages. We also added the
CJKBigramFilter to get better precision on CJK queries. We don't use stop
bq: Is adding replicas going to increase search performance?
Absolutely, assuming you've maxed out Solr. You can scale the SOLR
query/second rate nearly linearly by adding replicas regardless of
whether it's over HDFS or not.
Having multiple replicas per shard _also_ increases fault tolerance,
Thank you! I'm mainly concerned about facet performance. When we have
indexing turned on, our facet performance suffers significantly.
I will add replicas and measure the performance change.
-Joe Obernberger
On 2/25/2015 4:31 PM, Erick Erickson wrote:
bq: Is adding replicas going to
Lots of suggestions here already. +1 for those JVM params from Boogie and
for looking at JMX.
Rebecca, try SPM http://sematext.com/spm (will look at JMX for you, among
other things), it may save you time figuring out
JVM/heap/memory/performance issues. If you can't tell what's slow via SPM,
we
Hi Tomoko,
Thanks for the link. Do you have build instructions somewhere? When I
executed ant with no params, I get:
BUILD FAILED
/home/dmitry/projects/svn/luke/build.xml:40:
/home/dmitry/projects/svn/luke/lib-ivy does not exist.
On Thu, Feb 26, 2015 at 2:27 AM, Tomoko Uchida
We have a pair of customized search components which we used
successfully with SolrCloud some releases back (4.x). In 4.10.3, I am
trying to find the point of departure in debugging why we get no
results back when querying to them with a sharded index.
If I query the regular /select, all is
I'll need to use /export since I retrieve large amount of data.
And I don't really need facets, so it won't be an issue.
Thanks again for your help.
2015-02-25 21:26 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com:
On Wed, Feb 25, 2015 at 10:31 PM, Hakim Benoudjit h.benoud...@gmail.com
Thanks!
Would you announce at LUCENE-2562 to me and all watchers interested in this
issue, when the branch is ready? :)
As you know, current pivots's version (that supports Lucene 4.10.3) is here.
http://svn.apache.org/repos/asf/lucene/sandbox/luke/
Regards,
Tomoko
2015-02-25 18:37 GMT+09:00
Hello,
Why Solr is taking too much of time to start all nodes/ports?
Hi,
We have a single solr instance serving queries to the client through out
the day and being indexed twice a day using scheduled jobs. During the
scheduled jobs, which actually syncs databases from data collection
machines to the master database, it can make many indexing calls. It is
usually
Use DocValues.
On Wed, Feb 25, 2015 at 3:14 PM, Joseph Obernberger j...@lovehorsepower.com
wrote:
Thank you! I'm mainly concerned about facet performance. When we have
indexing turned on, our facet performance suffers significantly.
I will add replicas and measure the performance change.
Hi Rajesh,
What configuration had you set in your schema.xml?
On Sat, Feb 14, 2015 at 2:18 AM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:
Hi Nitin,
Can u try with the below config, we have these config seems to be working
for us.
searchComponent name=spellcheck
How are you indexing? SolrJ? DIH? some other process?
And what, if anything, comes out in the Solr logs when this happens?
'cause this is pretty odd so I'm grasping at straws.
Best,
Erick
On Wed, Feb 25, 2015 at 9:10 PM, Vikas Agarwal vi...@infoobjects.com wrote:
Hi,
We have a single solr
We are trying to limit the number of facets returned only to the top 100 docs
and not the complete result set..
Is there a way of accessing topDocs in the custom Faceting component?
or
Can the scores of the docID's in the resultset be accessed in the Facet
Component?
--
View this message in
Ok, sure. The plan is to make the pivot branch in the current github repo
and update its structure accordingly.
Once it is there, I'll let you know.
Thank you,
Dmitry
On Tue, Feb 24, 2015 at 5:26 PM, Tomoko Uchida tomoko.uchida.1...@gmail.com
wrote:
Hi Dmitry,
Thank you for the detailed
Hello,
I'm trying to create a collection on HDFS with Solr 5.0.0.
I have my solrconfig.xml with the HDFS parameters, following the
confluence guidelines.
When creating with the bin/Solr script bin/solr create -c
collectionHDFS -d /my/conf/ I have this error:
Erick, Eric and Mike,
Thanks for your help and ideas.
It sounds like we'd need to do a bit of revamping in the highlighter.
Perhaps even PostingsHighligher should be taken as the baseline, since it
is faster. It uses the same extractTerms() method, that Erik has shown.
The user story here is
Hello,
We have problems with some queries. All of them include the tag NOT, and
in my opinion, the results don´t make any sense.
First problem:
This query NOT Proc:ID01returns 95806 results, however this one
NOT Proc:ID01 OR FileType:PDF_TEXT returns 11484 results. But it's
79 matches
Mail list logo