Hi Andreas,
You should be able to say:
(-organisations:[ TO *] -roles:[ TO *]) OR (+organisations:(150 42)
+roles:(174 72))
Study your queries with debuqQuery=true http parameter, at times this is
invaluable.
Dmitry
On Wed, Mar 5, 2014 at 2:54 AM, Andreas Owen a...@conx.ch wrote:
i want to
On 3/4/2014 5:54 PM, Andreas Owen wrote:
i want to use the following in fq and i need to set the operator to OR. My
q.op is AND but I need OR in fq. I have read about ofq but that is for
putting OR between multiple fq. Can I set the operator for fq?
(-organisations:[ TO *] -roles:[ TO
And if you need to cache OR legs separately, here is the workaround
http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
On Wed, Mar 5, 2014 at 12:31 PM, Shawn Heisey s...@elyograg.org wrote:
On 3/4/2014 5:54 PM, Andreas Owen wrote:
i want to use the following in fq and i
Hi Toke,
thank you for the mail.
On 04.03.2014 11:20, Toke Eskildsen wrote:
Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote:
[Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields]
The OS in use is a 64bit linux with an OpenJDK 1.7 Java with 48G RAM.
Hi Shawn,
It may be your facets that are killing you here. As Toke mentioned, you
have not indicated what your max heap is.20 separate facet fields with
millions of documents will use a lot of fieldcache memory if you use the
standard facet.method, fc.
Try adding facet.method=enum to all your
Hi;
As I said: The first link you provided includes ElasticSearch:
http://en.wikipedia.org/wiki/NoSQL
as a Document Store and plus a note that it is a search engine. What are
the main differences between ElasticSearch and Solr that makes
ElasticSearch a NoSQL store but not Solr. I think that
On Wed, 2014-03-05 at 09:59 +0100, Angel Tchorbadjiiski wrote:
On 04.03.2014 11:20, Toke Eskildsen wrote:
Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote:
[Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields]
The OS in use is a 64bit linux with an
I have managed to understand how to properly implement and change the words
on a CharFilter and a Filter, but I fail to understand how the Tokenizer
works...
I also fail to find any tutorials on the thing..
Could you provide some example implementation of incrementToken and how to
manipulate the
On 05.03.2014 11:51, Toke Eskildsen wrote:
On Wed, 2014-03-05 at 09:59 +0100, Angel Tchorbadjiiski wrote:
On 04.03.2014 11:20, Toke Eskildsen wrote:
Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote:
[Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields]
Hi Shawn,
On 05.03.2014 10:05, Angel Tchorbadjiiski wrote:
Hi Shawn,
It may be your facets that are killing you here. As Toke mentioned, you
have not indicated what your max heap is.20 separate facet fields with
millions of documents will use a lot of fieldcache memory if you use the
Unless Solr is your system of record, aren't you already replicating your
source data across the WAN? If so, could you load Solr in colo B from your
colo B data source? You may be duplicating some indexing work, but at least
your colo B Solr would be more closely in sync with your colo B
Hi Rui,
I think ClassicTokenizerImpl.jflex file is good start for understanding
tokenizers.
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex
Please see other *.jflex files in source tree.
But
Before indexing , this was the memory layout,
System Memory : 63.2% ,2.21 gb
JVM Memory : 8.3% , 81.60mb of 981.38mb
I have indexed 700 documents of total size 12MB.
Following are the results i get :
Qtime: 8122, System time : 00:00:12.7318648
System Memory : 65.4% ,2.29 gb
JVM Memory : 15.3% ,
Hi,
Batch/bulk indexing is the way to go for speed.
* Disable autoSoftCommit feature for the bulk indexing.
* Disable transaction log for the bulk indexing.
Ater you finish bulk indexing, you can enable above. Again you are too generous
with 1 second refresh rate (autoSoftCommit maxTime).
You know, I didn't even notice that. It did go up to 30M.
I've made a note to look into that before we release the 4.8 version to see
if it can be reduced at all. I suspect the screenshots are causing it to
balloon - we made some changes to the way they appear in the PDF for 4.7
which may be the
Not sure if it’s relevant anymore, but a few years ago Atlassian resolved as
won’t fix” a request to configure exported PDF compression ratio:
https://jira.atlassian.com/browse/CONF-21329. Their suggestion: zip the PDF.
I tried that - the resulting zip size is roughly 9MB, so it’s definitely
I am trying to pass a string of Japanese characters to an Apache Solr
query. The string in question is '製品'.
When a search is passed without any arguments, it brings up all of the
indexed information, including all of the documents that have this
particular string in them, however when this
On 3/5/2014 4:40 AM, Angel Tchorbadjiiski wrote:
Hi Shawn,
On 05.03.2014 10:05, Angel Tchorbadjiiski wrote:
Hi Shawn,
It may be your facets that are killing you here. As Toke mentioned, you
have not indicated what your max heap is.20 separate facet fields with
millions of documents will
On 3/5/2014 7:47 AM, sweety wrote:
Before indexing , this was the memory layout,
System Memory : 63.2% ,2.21 gb
JVM Memory : 8.3% , 81.60mb of 981.38mb
I have indexed 700 documents of total size 12MB.
Following are the results i get :
Qtime: 8122, System time : 00:00:12.7318648
System
Hi guys,
So, I keep facing this problem which I can't solve. I thought it was due to
HTML anchors containing the name of the hashtag, and thus repeating it, but
it's not.
So the use case is:
1 - I need to consider hashtags as tokens.
2 - The hashtag has to show up in the facets.
Right now if I
Now i have batch indexed, with batch of 250 documents.These were the results.
After 7,000 documents,
Qtime: 46894, System time : 00:00:55.9384892
JVM memory : 249.02mb, 24.8%
This shows quite a reduction in timing.
After 70,000 documents,
Qtime: 480435, System time : 00:09:29.5206727
System
hi there
my schema file is this---
?xml version=1.0 encoding=UTF-8 ?
schema name=example version=1.2
types
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true /
fieldType name=int class=solr.TrieIntField precisionStep=0
It doesn't sound like you have much of an understanding of java's garbage
collection. You might read
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
to get a better understanding of how it works and why you're seeing different
levels of memory utilization at any
Hi,
I suspect q=State:tamil nadu parsed as State:tamil text:nadu. You can confirm
this by adding debugQuery=on.
Either use quotes q=State:tamil nadu
or use term query parser q={!term f=State}tamil nadu
Ahmet
On Wednesday, March 5, 2014 8:29 PM, Kishan Parmar kishan@gmail.com wrote:
hi
Thanks ,
but
still no change in output --- q=State:tamil nadu it parse as
q: State:\tamil nadu\
Regards,
Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!
2014-03-06 0:17 GMT+05:30 Ahmet Arslan iori...@yahoo.com:
Hi,
I suspect q=State:tamil nadu parsed as
It¹s worth mentioning that scores should not be considered comparable
across queries, so equating ³confidence² and ³score² is a tricky
proposition.
That is, the maxScore for the search field1:foo may be 10.0, and the
maxScore for ³field1:bar² may be 1.0, but that doesn¹t mean the top result
for
Hi
I am trying to understand the flow between zk and SolrCloud nodes during
writes and restarts.
*Writes*:
When an indexing job runs , it looks like the leader for every shard is
identified from zk and the write requests goes to the leader and then
eventually data flows to replicas.
All,
Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to index
into
Hi,
6M is really not huge these days. 6B is big, though also still not huge
any more. What seems to be the bottleneck? Solr or DB or network or
something else?
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/
On Wed, Mar 5,
I should also mention that the watch count is in the order of 400-500 but
the maxClientConnections is 100. Not sure if this has to do with the issue
but just putting it out there
On Wed, Mar 5, 2014 at 11:37 AM, KNitin nitin.t...@gmail.com wrote:
Hi
I am trying to understand the flow
I debugged the PDF a little. FWIW, the following code (using iText)
takes it to 9MB:
public static void main(String args[]) throws Exception {
Document document = new Document();
PdfSmartCopy copy = new PdfSmartCopy(document, new
FileOutputStream(/home/rmuir/Downloads/test.pdf));
It seems the latency is introduced by collecting the data from different
sources and putting them together then actual Solr index. I would say
all these activities are contributing equally though I would say So, is
it normal to expect to run indexing to run for long? Wondering what to
expect
Hi, Pls help me with this.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html
Sent from the Solr - User mailing list archive at Nabble.com.
I will surely read about JVM Garbage collection. Thanks a lot, all of you.
But, is the time required for my indexing good enough? I dont know about the
ideal timings.
I think that my indexing is taking more time.
--
View this message in context:
Hi,
One thing to consider is, I think solrnet use xml update, there is xml parsing
overhead with it.
Switching to solrJ or CSV can cause additional gain.
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Ahmet
On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote:
Hi,
It depends. Are docs huge or small? Server single core or 32 core? Heap
big or small? etc. etc.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu rallav...@gmail.com wrote:
It
Hi Kashish,
This is confusing. You gave the following example :
query 1999/99* should return RABIAN NIGHTS #01 (1999/99)
However you said I cannot ignore parenthesis or other special characters...
Above two contadicts each other.
Since you are after autocomplete you might be interested in
Right, that patch is really about fixing the distribution solrconfig file...
What you need to do is (and I'm assuming you're running SolrCloud)
is change the solrconfig.xml file, push it up to ZK with the client tools
and restart all the nodes in your collection, or reload all the cores. I
don't
This, BTW, is an ENORMOUS number cached queries.
Here's a rough guide:
Each entry will be (length of query) + maxDoc/8 bytes long.
Think of the filterCache as a map where the key is the query
and the value is a bitmap large enough to hold maxDoc bits.
BTW, I'd kick this back to the default
Well, if you're going to go that route, how about developing
a patch for QEV? Of course there may be a very good reason
it wasn't done there, I haven't looked at the code
Best,
Erick
On Mon, Mar 3, 2014 at 1:07 PM, David Stuart d...@axistwelve.com wrote:
HI Erick,
Thanks for the response.
Otis,
Good points. I guess you are suggesting that it depends on the
resources. The document is 100k each the pre processing server is a 2
cpu VM running with 4G RAM. So, that could be a small machine
relatively to process such amount of data??
On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:
Hi Erick,
Let me make sure I understand you:
I'm NOT running SolrCloud; so I just have to put the default field in ALL of
my solrconfig.xml files and then restart and that should be it?
Thanks for your reply,
--
View this message in context:
You can just use OR
GQ clauses can be most any legal query.
On Mar 3, 2014 4:31 PM, Andreas Owen a...@conx.ch wrote:
ok i like the logic, you can do much more. i think this should do it for
me:
(-organisations:[ TO *] -roles:[ TO *]) (+organisations:(150
42) +roles:(174 72))
i
Hi Kishan,
can you please give us example document query pair that query should retrieve
that document.
e.g. query q=State:tamil nadu should return what document text?
Ahmet
On Wednesday, March 5, 2014 9:04 PM, Kishan Parmar kishan@gmail.com wrote:
Thanks ,
but
still no change in output
Ok, I updated all of my solrconfig.xml files and I restarted the tomcat
server
AND the errors are still there on 2 out of 10 cores
Am I not reloading correctly?
Here's my /browse handler:
requestHandler name=/browse class=solr.SearchHandler
lst name=defaults
str
Hi Erick
The patch is in progress. Looking at the code I can't figure out why this
restriction was added I'll create a jira issue and post.
Thanks for your help
Regards
Sent from my iPhone
On 5 Mar 2014, at 20:36, Erick Erickson erickerick...@gmail.com wrote:
Well, if you're going to go
Hi,
Each doc is 100K? That's on the big side, yes, and the server seems on the
small side, yes. Hence the speed. :)
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu
Hi Ahmet,
Let me explain with another scenario .
There is a title - ARABIAN NIGHTS - 1999/99
Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk
to it and form the solr url thsi way
q=titleName:1999/99*
I get the above mentioned title.- so works perfect
Now lets add
Make sure you're not doing a commit on each individual document add. Commit
every few minutes or every few hundred or few thousand documents is
sufficient. You can set up auto commit in solrconfig.xml.
-- Jack Krupansky
-Original Message-
From: Rallavagu
Sent: Wednesday, March 5,
I believe SolrJ uses XML under the covers. If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.
Toby
***
Toby Lazar
Capital Technology Group
Email: tla...@capitaltg.com
Hi,
Forget about patternReplaceCharFilter for a moment. Your example is more clear
this time.
q=titleName:1999/99*
should return following two docs:
d1) JULIUS CAESER (1999/99)
d2) ARABIAN NIGHTS - 1999/99
This is achievable with the following type.
1) MappingCharFilterFactory with
Hi Toby,
SolrJ uses javabin by default.
Ahmet
On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote:
I believe SolrJ uses XML under the covers. If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before
Thanks to Alexandre for pointing this out
Let's use SOLR-5819 for any followup investivation/discussion so it
doesn't get lost in the ANNOUNCE thread...
https://issues.apache.org/jira/browse/SOLR-5819
: Date: Wed, 5 Mar 2014 14:49:41 -0500
: From: Robert Muir rcm...@gmail.com
: Reply-To:
Thanks Ahmet for the correction. I used wireshark to capture an
UpdateRequest to solr and saw this XML:
adddoc boost=1.0field name=caseID123/fieldfield
name=caseNameblah/field/doc/add
and figured that javabin was only for the responses. Does wt apply for how
solrj send requests to solr?
On 3/5/2014 2:31 PM, Toby Lazar wrote:
I believe SolrJ uses XML under the covers. If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.
Until recently, SolrJ always used XML by default for requests and
OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer. I added the line:
solrServer.setRequestWriter(new BinaryRequestWriter())
after creating the server object and now see the difference through
wireshark. Is it fair to assume that this usage is multi-thread safe?
On 3/5/2014 2:58 PM, Toby Lazar wrote:
OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer. I added the line:
solrServer.setRequestWriter(new BinaryRequestWriter())
after creating the server object and now see the difference through
wireshark. Is it fair to
Hi,
I am a newbie to Solr and I am trying to index some xml documents using DIH
and XPath but I am unable to do it. I get a response message of successful
indexing but no document is added to the index. I do not know what i m
doing wrong.
This is my data config xml file
dataConfig
Bah. meant FQ clauses can be most any legal query.
Erick
On Wed, Mar 5, 2014 at 3:49 PM, Erick Erickson erickerick...@gmail.com wrote:
You can just use OR
GQ clauses can be most any legal query.
On Mar 3, 2014 4:31 PM, Andreas Owen a...@conx.ch wrote:
ok i like the logic, you can do much
Right, that's perfectly appropriate. Feel free to attach unfinished
versions of the patch! Just comment that it's not finished and
people may have time to take a look at what you've done so far
and make comments. Sometimes this saves you from a whole
bunch of work :)...
Best,
Erick
On Wed, Mar
Here's the easiest thing to try to figure out where to
concentrate your energies. Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from SolrJ.
My bet: Your program will run at about the same speed
it does when you actually index the docs,
One more suggestion is to collect/prepare the data in CSV format (1-2 million
sample depending on size) and then import data direct into Solr using CSV
handler curl. This will give you the pure indexing time the differences.
Thanks,
Susheel
-Original Message-
From: Erick Erickson
Sorry figured out my problem. It was stupid mistake on my part. Once again
sorry for that
Thanks
Farhan
On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali farhan@gmail.com wrote:
Hi,
I am a newbie to Solr and I am trying to index some xml documents using
DIH and XPath but I am unable to do it.
NP, Been there, done that, got the t-shirt :)...
On Wed, Mar 5, 2014 at 9:51 PM, Farhan Ali farhan@gmail.com wrote:
Sorry figured out my problem. It was stupid mistake on my part. Once again
sorry for that
Thanks
Farhan
On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali farhan@gmail.com
On 3/5/2014 1:36 AM, Shawn Heisey wrote:
On 3/4/2014 8:15 PM, Michael Sokolov wrote:
Thanks, Tim, it's great to hear you say that! I tried to make that
point myself with various patches, but they never really got taken up by
committers, so I kind of gave up, but I agree with you 100% this is a
Thanks,
my documents are xml files i am attaching that document in this and in my
project i have to search from each field defined in schema.xml
and my output should be in solr is like
{
responseHeader: {
status: 0,
QTime: 1,
params: {
indent: true,
q: State:Delhi,
On 6 March 2014 11:23, Kishan Parmar kishan@gmail.com wrote:
Thanks,
my documents are xml files i am attaching that document in this and in my
project i have to search from each field defined in schema.xml
[...]
The type for State in your schema is string which is a non-analysed
field
Hi,
I am planning a system for searching TB's of structured data in SolrCloud.
I need suggestions for handling such huge amount of data in SolrCloud.
(e.g., number of shards per collection, number of nodes, etc.)
Here are some specs of the system:
1. Raw data is 35,000 CSV files per day.
68 matches
Mail list logo