On 5/14/2013 11:00 PM, pankaj.pand...@wipro.com wrote:
We have to setup a billion document index using Apache Solr(about 2 billion
docs). I need some assistance on choosing the right configuration for the
environment setup. I have been going through the Solr documentation, but
couldn't
On 5/15/2013 12:31 AM, Shawn Heisey wrote:
If we assume that you've taken every possible step to reduce Solr's Java
heap requirements, you might be able to do a heap of 8 to 16GB per
server, but the actual heap requirement could be significantly higher.
Adding this up, you get a bare minimum
Thank you both for your answers.
I really like the idea of explaining the changes for luceneMatchVersion
in more detail. Maybe this could even go into the release notes?
Thanks,
Andreas
Shawn Heisey wrote on 10.05.2013 15:27:
On 5/10/2013 5:11 AM, Jan Høydahl wrote:
Hi,
The fastest way to
Hi list,
while I can't get solr 4.3 with run-jetty-run up and running under eclipse
for debugging I tried to switch back to slf4j and followed
the steps of http://wiki.apache.org/solr/SolrLogging
Unfortunately eclipse bothers me with an error:
The import org.apache.log4j.AppenderSkeleton cannot
On Wed, 2013-05-15 at 08:31 +0200, Shawn Heisey wrote:
http://wiki.apache.org/solr/SolrPerformanceProblems
I really was serious about reading that page, and not just because I
wrote it.
That page makes a clear recommendation of RAM over SSDs.
Have you done any performance testing on this?
-
Just on our experiences, we have a large collection (350M documents, but
1.2Tb in size spread across 4 shards/machines and multiple replicas, we may
well need more) and the first thing we needed to do for size estimation was
to work out how big a set number of documents would be on disk. So we
Hi Everyone,
I am working on Hierarchical Faceting. I am indexing location of
document with their state and district.
I would like to find counts of every country with state count and
district count. I found facet pivot working well to give me count if i
use single valued fields like
I am installing solr on tomcat7 in aws using bitmani tomcat stack.My solr
server is not starting; below is the errorINFO: Starting service Catalina
May 15, 2013 7:01:51 AM org.apache.catalina.core.StandardEngine
startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.39 May 15,
2013
Thanks Shawn for explaining everything in such detail, it was really helpful.
Have few more queries on the same. Can you please explain the purpose of the
3rd box in minimal configuration, with the standalone zookeeper?
On separate note, if we go with ahead with 4 box(8 shard with replication
On 15 May 2013 15:20, amit amit.mal...@gmail.com wrote:
I am installing solr on tomcat7 in aws using bitmani tomcat stack.
[...]
If you are using the Bitnami stack, I would direct questions
to their forums. Having used Bitnami for AWS deployments,
we have gone back to rolling our own starting
http://wiki.apache.org/solr/HierarchicalFaceting
On Wed, May 15, 2013, at 09:44 AM, varsha.yadav wrote:
Hi Everyone,
I am working on Hierarchical Faceting. I am indexing location of
document with their state and district.
I would like to find counts of every country with state count and
Hi,
I filed an issue at https://issues.apache.org/jira/browse/SOLR-4734
I also tried this with 4.3, but the same error occurs.
Should I post on the dev list ?
Kind regards
Alexander
Am 2013-04-16 23:47, schrieb Chris Hostetter:
: sorry for pushing, but I just replayed the steps with solr 4.0
Hi,
I just wanted to ask, if anyone is using the collections API to create
collections,
or if not how they use the coreAPI to create a collection with
replication ?
Because I run into errors when creating a collection on an empty solr.
Kind regards
Alexander
hi all
I want to index 2 separate unrelated tables from database into single solr
core and search in any one of the document separately how can I do it?
please help
thanks in advance
regards
Rohan
Hello again!
Of course that the part that parses the URL requests must be in Servlet
side. However, if the project is well modularized, that Java classes,
dependencies... whatever, could be re-used inside SolrJ project, so I don´t
think that it would be an enourmous extra work. That´s the magic
Hi , i would like to get all documents when searching for a keyword.
http://localhost:8080/solr/select?q=caramrows=_val_:docfreq(SEARCH_TERM,'caram')
Searching for 'caram', there are 200 documents, but iam getting first 10
documents.
I thought of adding function to the rows.
Can we pass result
Hi
I go through that but i want to index multiple location in single
document and a single location have multiple feature/attribute like
country,state,district etc. I want Index and want hierarchical facet
result on facet pivot query. One more thing , my document varies may
have single ,two
Yeah, I use both on an empty Solr - what is the error?
- Mark
On May 15, 2013, at 6:53 AM, A.Eibner a_eib...@yahoo.de wrote:
Hi,
I just wanted to ask, if anyone is using the collections API to create
collections,
or if not how they use the coreAPI to create a collection with replication
Although technically it may be possible to put 1 billion documents in a
single Solr/Lucene index (2 billion hard limit), I would recommend simply:
Don't do it! Don't try to put more than 250 million documents on a single
Solr node. In fact, 100 million is a better, more realistic limit.
To be
Hi Anshum,
What if you have more nodes than shards*replicationFactor.
In the example below, originally I created the collection to use 6
shards* 2 replicationFactor = 12 nodes total.
Now I added 6 more nodes, 18 nodes total. I just want to add 1 extra
replica per shard.
How will it get evenly
On Wed, May 15, 2013 at 7:25 AM, sathish_ix skandhasw...@inautix.co.in wrote:
Hi , i would like to get all documents when searching for a keyword.
http://localhost:8080/solr/select?q=caramrows=_val_:docfreq(SEARCH_TERM,'caram')
Searching for 'caram', there are 200 documents, but iam getting
We have a simple SolrCloud setup (4.1) running with a single shard and
multiple replicas across 3 servers, and it's working fine except once in a
while,
the leader logs this error. We fine-tuned GC among other things and everything
is lightning fast. However, we still receive this SEVERE error a
On 5/15/2013 1:57 AM, Toke Eskildsen wrote:
On Wed, 2013-05-15 at 08:31 +0200, Shawn Heisey wrote:
http://wiki.apache.org/solr/SolrPerformanceProblems
I really was serious about reading that page, and not just because I
wrote it.
That page makes a clear recommendation of RAM over SSDs.
On 5/15/2013 12:52 AM, Bernd Fehling wrote:
while I can't get solr 4.3 with run-jetty-run up and running under eclipse
for debugging I tried to switch back to slf4j and followed
the steps of http://wiki.apache.org/solr/SolrLogging
Unfortunately eclipse bothers me with an error:
The import
I'd use Jetty for SolrCloud - much, much, much better tested.
Here is a note on something similar around tomcat:
http://stackoverflow.com/questions/10570672/get-nohttpresponseexception-for-load-testing
Perhaps that helps, perhaps not.
The root cause is: org.apache.http.NoHttpResponseException:
You cannot currently adjust the number of replicas with the collections api -
you have to use the core admin api. Which means you determine the replica
placement based on what server you hit with the core admin api.
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin
Create 2
Hello All,
I have two different solr servers. Both server has different schema.
Is it possible to shard these two solr server?
Or is there any other way to combine/merge results of two different solr
servers?
--
View this message in context:
On 5/15/2013 8:49 AM, vrparekh wrote:
I have two different solr servers. Both server has different schema.
Is it possible to shard these two solr server?
Or is there any other way to combine/merge results of two different solr
servers?
In general, this won't work. If your two schemas use
1. Create a schema that accomodates both types of fields either using
optional fields or dynamic fields.
2. Create some sort of differentiator key (e.g. schema), separately
from id (which needs to be globally unique, so possibly schema+id)
3. Use that schema in filter queries (fq) to look only at
-Original Message-
From: Keith Naas [mailto:keithn...@dswinc.com]
Sent: Tuesday, May 14, 2013 3:31 PM
Stepping through the code on a live instance we can see the cache being
disabled by the destroy calls after each root doc. This destruction causes
EntityProcessorBase to change
On 5/15/2013 4:53 AM, A.Eibner wrote:
I just wanted to ask, if anyone is using the collections API to create
collections,
or if not how they use the coreAPI to create a collection with
replication ?
For my little SolrCloud install using 4.2.1, I have used the collections
API exclusively. It
On 5/15/2013 3:56 AM, pankaj.pand...@wipro.com wrote:
Thanks Shawn for explaining everything in such detail, it was really helpful.
Have few more queries on the same. Can you please explain the purpose of the
3rd box in minimal configuration, with the standalone zookeeper?
A zookeeper
Hi,
I use to upload data with Pentahoo Kettle into Solr. The average speed is
3500-5000 records per second.
This is a very small speed. Is there a quick tool that would give the
highest speed, or it depends on the Solr?
--
View this message in context:
On 15 May 2013 15:36, horot roman.she...@gmail.com wrote:
Hi,
I use to upload data with Pentahoo Kettle into Solr. The average speed is
3500-5000 records per second.
This is a very small speed. Is there a quick tool that would give the
highest speed, or it depends on the Solr?
First, you
Hi, Gora!
The data is pulled from the MSSQL database.
I think the bottleneck for indexing in SOLR.
Is it possible to further boost by kettle?
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-increase-upload-into-Solr-4-x-tp4063451p4063540.html
Sent from the Solr -
I have used both and they seem to work well for basic operations - create,
delete, etc. Although newer operations like reload do not function as they
should - the cores in the collection stay offline even if there are no
material changes.
On Wed, May 15, 2013 at 6:53 AM, A.Eibner
3500-5000 records per second. This is a very small speed.
That's hardly a slow rate for ingestion of data!
Who is telling you that it is?
That is not to say that the speed can't be improved, but let's keep things
in perspective.
And of course the speed does depend on your schema and actual
On 15 May 2013 21:44, horot roman.she...@gmail.com wrote:
Hi, Gora!
The data is pulled from the MSSQL database.
I think the bottleneck for indexing in SOLR.
Why do you think so? Have you checked the CPU/memory
usage on the Solr server? Likewise for the database
server?
Also, I had somehow
Seems fast to me, too. We get about 600/second pulling data from MySQL with a
pretty complicated query.
Check the CPU usage on the Solr machine. If that is not reaching 100% for
periods of time, then Solr is not the bottleneck. Indexing is very
CPU-intensive.
On a multi-CPU Solr machine, you
The data is pulled from the MSSQL database.
I think the bottleneck for indexing in SOLR.
Is it possible to further boost by kettle?
I don't know what kettle is or what its capabilities are.
Can you run more than one instance of kettle at the same time, each one
retrieving part of the
Thank you very much for your answers!!
As I said earlier, I am new in Solr and I would like to know if there is any
way to get the list of terms in the document with that ID.
I found that I can add the parameters facet = true and facet.field = TITLE
to return all terms ordered by frequency rather.
On May 15, 2013, at 12:26 PM, Jared Rodriguez jrodrig...@kitedesk.com wrote:
the cores in the collection stay offline even if there are no
material changes.
I've used reload - if you are having trouble with it, please post more details
or file a JIRA issue.
- Mark
Hi there,
I am trying to figure out what SOLR means by compatible collection in
order to be able to run the following query:
|Query all shards of multiple compatible collections, explicitly specified:|
Can't you use the PathHierarchyTokenizerFactory mentioned on that page?
I think it is called descendent-path in the default schema. Won't that
get you what you want?
UK/London/Covent Garden
becomes
UK
UK/London
UK/London/Covent Garden
and
India/Maharastra/Pune/Dapodi
becomes
India
They need to be similar enough to satisfy the particular queries.
- Mark
On May 15, 2013, at 12:23 PM, Marcin mar...@workdigital.co.uk wrote:
Hi there,
I am trying to figure out what SOLR means by compatible collection in order
to be able to run the following query:
|Query all shards
: After some research the following syntax worked
: start_time_utc_epoch:[1970-01-01T00:00:00Z TO
: _val_:merchant_end_of_day_in_utc_epoch])
that syntax definitely does not work ... i don't know if there is a typo
in your mail, or if you are just getting strange results that happen to
look
Hi Mark,
Yes, I am using reload. Here is the jira that I filed.
https://issues.apache.org/jira/browse/SOLR-4805
Please let me know if there is any additional data that you need.
On Wed, May 15, 2013 at 12:53 PM, Mark Miller markrmil...@gmail.com wrote:
On May 15, 2013, at 12:26 PM, Jared
: Subject: Hierarchical Faceting
: References:
: 15062_1368600769_zzi0n0aykpk6h.00_519330be.7000...@uni-bielefeld.de
: In-Reply-To:
: 15062_1368600769_zzi0n0aykpk6h.00_519330be.7000...@uni-bielefeld.de
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
: +Java +mysql +php TCL Perl Selenium -ethernet -switching -routing
that's missing one of the started requirements...
: 2. Atleast one keyword out of* TCL Perl Selenium* should be present
...should be...
+Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing
-Hoss
Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder
Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at
Shawn Heisey [s...@elyograg.org]:
Performance testing would be required in order to make a proper
determination on whether SSD makes financial sense.
I fully agree.
[Lack of TRIM with RAID]
then performance eventually suffers, and can become even worse than
a spinning hard disk.
Do you
The Solr stats search component does some basic aggregates: min, max,
count, sum, average, mean, sum of squares, standard deviation:
http://wiki.apache.org/solr/StatsComponent
-- Jack Krupansky
-Original Message-
From: eShard
Sent: Wednesday, May 15, 2013 2:45 PM
To:
You'll have to consult with the ManifoldCF project on exactly what
parameters they send SolrCell, but here's a raw SolrCell example:
curl http://localhost:8983/solr/update/extract?literal.id=doc-1\
commit=trueuprefix=attr_ -F myfile=@HelloWorld.docx
Query response:
...
result
I want to set up Solr replication between a master and slave, where no
automatic polling every X minutes happens, instead the slave only
replicates on command. [1]
So the basic question is: What's the best way to do that? But I'll
provide what I've been doing etc., for anyone interested.
Hi Shawn;
You said If I were doing this with the dataimport handler, I would define
more than one handler in solrconfig.xml, each with its own config file.
What is the benefit of using more than one handler?
2013/5/15 Shawn Heisey s...@elyograg.org
The data is pulled from the MSSQL database.
We have a system in which a client is sending 1 record at a time (via REST)
followed by a commit. This has produced ~65k tlog files and the JVM has run
out of file descriptors... I grabbed a heap dump from the JVM and I can see
~52k unreachable FileDescriptors... This leads me to believe that the
Hmmm, we keep open a number of tlog files based on the number of
records in each file (so we always have a certain amount of history),
but IIRC, the number of tlog files is also capped. Perhaps there is a
bug when the limit to tlog files is reached (as opposed to the number
of documents in the
Shawn,
Sorry I did not acknowledge the additional information you provided.
I'd have to go back and re-examine all of the 3.5 settings again as we had to
much with them somewhat to get 4.2.1 to work. Q.alt was a bit trickey, I have
to review our notes on that.
I solved the problem of the
Most definetly understand the don't commit after each record...
unfortunately the data is being fed by another team which I cannot
control...
Limiting the number of potential tlog files is good but I think there is
also an issue in that when the TransactionLog objects are dereferenced
their
On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote:
I'm hunting through the UpdateHandler code to try and find where this
happens now..
UpdateLog.addOldLog()
-Yonik
http://lucidworks.com
Maybe we need a flag in the update handler to ignore commit requests.
I just enabled a similar thing for our JVM, because something, somewhere was
calling System.gc(). You can completely ignore explicit GC calls or you can
turn them into requests for a concurrent GC.
A similar setting for Solr
Hi,
we are using faceted search for our queries. However neither sorting by
count nor sorting by index as described in [1] is suitable for our business
case. Instead, we would like to have the facets (or at least the beginning
of them) sorted by the score of the top document possessing the
I am running 2 separate 4.3 SolrCloud clusters. On one of them I noticed
the file data/index.properties on the replica nodes where the index
directory is named index.value of index property in index.properties.
On the other cluster, the index directory is just named index.
Under what condition
On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote:
when the TransactionLog objects are dereferenced
their RandomAccessFile object is not closed..
Have the files been deleted (unlinked from the directory), or are they
still visible via ls?
-Yonik
http://lucidworks.com
You can disable polling so that the slave never polls the Master(In Solr
4.3 you can disable it from the Admin interface). . And you can trigger a
replication using the HTTP API
http://wiki.apache.org/solr/SolrReplication#HTTP_API or again, use the
Admin interface to trigger a manual replication.
There seem to be quite a few places where the RecentUpdates class is used
but is not properly created/closed throughout the code...
For example in RecoveryStrategy it does this correctly:
UpdateLog.RecentUpdates recentUpdates = null;
try {
recentUpdates = ulog.getRecentUpdates();
They are visible to ls...
On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com wrote:
On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote:
when the TransactionLog objects are dereferenced
their RandomAccessFile object is not closed..
Have the files been
On Wed, May 15, 2013 at 5:06 PM, Steven Bower sbo...@alcyon.net wrote:
This leads me to believe that the
TransactionLog is not properly closing all of it's files before getting rid
of the object...
I tried some ad hoc tests, and I can't reproduce this behavior yet.
There must be some other
Yeah, I keep forgetting that min should match for a BooleanQuery defaults
to 1 only if there are no required terms, but if there are required terms it
defaults to 0.
-- Jack Krupansky
-Original Message-
From: Chris Hostetter
Sent: Wednesday, May 15, 2013 1:28 PM
To:
I m trying to use EmbeddedSolrServer but when trying to initialize the
coreContainer, get the following error;
java.lang.NoClassDefFoundError: org/apache/solr/common/ResourceLoader
Any ideas please?
-Peri.S
*** DISCLAIMER *** This is a PRIVATE message. If you are not the
It's fairly meaningless from a user perspective, but it happens when an index
is replicated that cannot be simply merged with the existing index files and
needs a new directory.
- Mark
On May 15, 2013, at 5:38 PM, Bill Au bill.w...@gmail.com wrote:
I am running 2 separate 4.3 SolrCloud
On 5/15/2013 5:02 PM, PeriS wrote:
I m trying to use EmbeddedSolrServer but when trying to initialize the
coreContainer, get the following error;
java.lang.NoClassDefFoundError: org/apache/solr/common/ResourceLoader
Did you only use the solrj jar and the jars in solrj-libs? This is
enough
On 5/15/2013 2:52 PM, Furkan KAMACI wrote:
You said If I were doing this with the dataimport handler, I would define
more than one handler in solrconfig.xml, each with its own config file.
What is the benefit of using more than one handler?
DIH is single-threaded. By using more than one
Actually fixed it. So by accident i was using sole-core 3.x version. Once I
upgraded the version of solar-core to 4.x it got resolved.
Thanks
-Peri.S
On May 15, 2013, at 7:14 PM, Shawn Heisey s...@elyograg.org wrote:
On 5/15/2013 5:02 PM, PeriS wrote:
I m trying to use EmbeddedSolrServer
Although now its complaining about even though i have provided the correct core
name.
org.apache.solr.common.SolrException: No such core
On May 15, 2013, at 7:21 PM, PeriS peri.subrahma...@htcinc.com wrote:
Actually fixed it. So by accident i was using sole-core 3.x version. Once I
You can't just hit the same handler twice? What about two different
handlers and pass the same config file via URL parameter?
Where does it make it single-threaded?
Regards,
Alex.
On 15 May 2013 19:18, Shawn Heisey s...@elyograg.org wrote:
On 5/15/2013 2:52 PM, Furkan KAMACI wrote:
You
Thanks for that info. So besides the two that I have already seen, are
there any more ways that the index directory can be named? I am working on
some home-grown administration scripts which need to know the name of the
index directory.
Bill
On Wed, May 15, 2013 at 7:13 PM, Mark Miller
I use the Solr 4.0.0 +, when I try to sort the results which within one
group, it does not work?
The wiki I referenced: http://wiki.apache.org/solr/FieldCollapsing
The xml return like:
..
lst name=grouped
lst name=title
int name=matches2/int
arr
group.order is not a valid parameter.
You're probably looking for group.sort
-Yonik
http://lucidworks.com
On Wed, May 15, 2013 at 9:30 PM, alexzhang zhangming1...@gmail.com wrote:
I use the Solr 4.0.0 +, when I try to sort the results which within one
group, it does not work?
The wiki I
: In Solr 1.4, on slave, I supplied a masterUrl, but did NOT supply any
: pollInterval at all on slave. I did NOT supply an enable
: false in slave, because I think that would have prevented even manual
: replication.
that exact same config should still work with solr 4.3
: This seemed to
Im trying to find out which routing algorithm (implicit/composite id) is being
used in my cluster. We are running solr 4.1. I was expecting to see it in my
clusterState (based on a previous thread that someone else posted) but I don't
see it there. Could someone please help?
Thanks!
Santoash
After I turned on the logging, I found the following stack trace:
Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
Not sure why the embeddedsolrserver is looking for it...
On May 15, 2013, at 7:26 PM, PeriS pvsub...@indiana.edu wrote:
Although now its complaining
There is Lucene faceting module, which doesn't do anything in common with
Solr, but it looks like it has something what you are looking for.
http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html
On Thu, May 16, 2013 at 1:33 AM, Jan Morlock jan.morl...@googlemail.comwrote:
Hi,
we
Figured it out; Added a dependency for the dataimporthandler in my pom file.
On May 15, 2013, at 11:59 PM, PeriS pvsub...@indiana.edu wrote:
After I turned on the logging, I found the following stack trace:
Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
Not
85 matches
Mail list logo