Hi all,
I am testing indexing with 2000 text documents of size 2 MB
each. These documents contain words created with random characters. I
observed that the tomcat memory usage goes on increasing slowly. I tried
by removing all the cache configuration, but still memory usage
increases. Once
what is the cntent of your text file?
Solr does not directly index files
--Noble
On Tue, Apr 14, 2009 at 3:54 AM, Alex Vu alex.v...@gmail.com wrote:
Hi all,
Currently I wrote an xml file and schema.xml file. What is the next step to
index a txt file? Where should I put my txt file I want to
Hi Shalin:
yes i tried with batchSize=-1 parameter as well
here the config i tried with
dataConfig
dataSource type=JdbcDataSource batchSize=-1 name=sp
driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development
user=root password=** /
document name=items
On Tue, Apr 14, 2009 at 11:30 AM, Gargate, Siddharth sgarg...@ptc.comwrote:
Hi all,
I am testing indexing with 2000 text documents of size 2 MB
each. These documents contain words created with random characters. I
observed that the tomcat memory usage goes on increasing slowly. I tried
On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.comwrote:
Hi Shalin:
yes i tried with batchSize=-1 parameter as well
here the config i tried with
dataConfig
dataSource type=JdbcDataSource batchSize=-1 name=sp
driver=com.mysql.jdbc.Driver
Yes its throwing the same OOM error and from same place...
yes i will try increasing the size ... just curious : how this dataimport
works?
Does it loads the whole table into memory?
Is there any estimate about how much memory it needs to create index for 1GB
of data.
thx
mani
On Tue, Apr 14,
Wow, that was pretty straight forward. Sorry I didn't catch that on the wiki
on my first few go rounds, I'll navigate harder next time.
Thanks.
Isaac
On Sun, Apr 12, 2009 at 11:40 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Mon, Apr 13, 2009 at 5:35 AM, Isaac Foster
DIH streams 1 row at a time.
DIH is just a component in Solr. Solr indexing also takes a lot of memory
On Tue, Apr 14, 2009 at 12:02 PM, Mani Kumar manikumarchau...@gmail.com wrote:
Yes its throwing the same OOM error and from same place...
yes i will try increasing the size ... just curious :
The machine's ulimit is set to 9000 and the OS has upper limit of
12000 on files. What would explain this? Has anyone tried Solr with 25
cores on the same Solr instance?
Thanks,
-vivek
2009/4/13 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com:
On Tue, Apr 14, 2009 at 7:14 AM, vivek sar
you should construct the xml containing the fields defined in your
schema.xml and give them the values from the text files. for example if you
have an schema defining two fields title and text you should construct
an xml with a field title and its value and another called text
containing the body
do you have an idea?
sunnyfr wrote:
Hi Noble,
Yes exactly that,
I would like to know how people do during a replication ?
Do they turn off servers and put a high autowarmCount which turn off the
slave for a while like for my case, 10mn to bring back the new index and
then
Hi Hossman,
I would love to know either how do you manage this ?
thanks,
Shalin Shekhar Mangar wrote:
On Fri, Mar 6, 2009 at 8:47 AM, Steve Conover scono...@gmail.com wrote:
That's exactly what I'm doing, but I'm explicitly replicating, and
committing. Even under these circumstances,
Hi,
I am using SolrJ and firing the query on Solr indexes. The indexed contains
three fields viz.
1. Document_id (type=integer required= true)
2. Ticket Id (type= integer)
3. Content (type=text)
Here the query formulation is such that I am having query with “AND” clause. So
We do not have such high update frequency. So we never encountered
this problem. If it is possible to take the slave offline during auto
warming that is a good solution.
--Noble
On Thu, Apr 9, 2009 at 2:02 PM, sunnyfr johanna...@gmail.com wrote:
Hi Noble,
Yes exactly that,
I would like to
Or in schema.xml you can set the defaultOperator to AND:
solrQueryParser defaultOperator=AND/ which applies only to the
Lucene/SolrQueryParser, not dismax.
Erik
On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote:
what about:
fieldA:value1 AND fieldB:value2
this can also be written
nope,
but it is possible to have multiple root entities within a document
and you can execute one at a time.
--Noble
On Tue, Apr 14, 2009 at 4:15 PM, gateway0 reiterwo...@yahoo.de wrote:
Hi,
is it possible to use more than one document tag within my data-config.xml
file?
Like:
Cheers guys, got it working!
Erik Hatcher wrote:
Or in schema.xml you can set the defaultOperator to AND:
solrQueryParser defaultOperator=AND/ which applies only to the
Lucene/SolrQueryParser, not dismax.
Erik
On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote:
what
On Apr 14, 2009, at 2:01 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
what is the cntent of your text file?
Solr does not directly index file
Solr's ExtractingRequestHandler (aka Solr Cell) does index text (and
Word, PDF, etc) files directly. This is a Solr 1.4/trunk feature.
Erik
On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote:
Hmmm,
Not sure how this all hangs together. But editing my solrconfig.xml
as follows
sorted the problem:-
requestParsers enableRemoteStreaming=false
multipartUploadLimitInKB=2048 /
to
requestParsers
hey,
I am trying to modify the lucene code by adding payload functionality
into it.
Now if i want to use this lucene with solr what should i do.
I have added this to the lib folder of solr.war replacing the old lucene..Is
this enough??
Plus i am also using a different schema than the
Pre tag fixed it instantly!
Thanks!
Shalin Shekhar Mangar wrote:
On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com
wrote:
Hey,
One of the fields returned from my queries (Content) is essentially the
body
of an e-mail. However, it's returned as one long stream
Hey,
One of the fields returned from my queries (Content) is essentially the body
of an e-mail. However, it's returned as one long stream of text (or at
least, that's how it appears on the web page). Viewing the source of the
page it appears with the right layout characteristics (paragraphs,
On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com wrote:
Hey,
One of the fields returned from my queries (Content) is essentially the
body
of an e-mail. However, it's returned as one long stream of text (or at
least, that's how it appears on the web page). Viewing the
Hi,
is it possible to use more than one document tag within my data-config.xml
file?
Like:
dataConfig
dataSource type=JdbcDataSource name=abc driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/my_zend_appz user=root password=/
document name=first
...entities
/document
Hi Oleg
Did you find a way to pass over this issue ??
thanks a lot,
oleg_gnatovskiy wrote:
Can you expand on this? Mirroring delay on what?
zayhen wrote:
Use multiple boxes, with a mirroring delaay from one to another, like a
pipeline.
2009/1/22 oleg_gnatovskiy
On Apr 14, 2009, at 5:38 AM, Sagar Khetkade wrote:
Hi,
I am using SolrJ and firing the query on Solr indexes. The indexed
contains three fields viz.
1. Document_id (type=integer required= true)
2. Ticket Id (type= integer)
3. Content (type=text)
Here the query
What is the query parsed to? Add debugQuery=true to your Solr
request and let us know what the query parses to.
As for whether upgrading a Lucene library is sufficient... depends on
what Solr version you're starting with (payload support is already in
all recent versions of Solr's Lucene
Hi,
I would like to know where are you about your script which take the slave
out of the load balancer ??
I've no choice to do that during update on the slave server.
Thanks,
Yu-Hui Jin wrote:
Thanks, guys.
Glad to know the scripts work very well in your experience. (well, indeed
they
Hi,
is there a way to disable all logging output in SOLR ?
I mean the output text like :
INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0
QTime=3736
greets -Ralf-
Grant,
This works:
String url = http://localhost:8983/solr;;
SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery();
query.setQueryType(/autoSuggest);
query.setParam(terms, true);
query.setParam(terms.fl, CONTENTS);
query.setParam(terms.lower, london);
Hello,
I am using both Solr server and Solr embedded versions in the same context.
I am using the Solr Server for indexing data which can be accessed at
enterprise level, and the embedded version in a desktop application.
The idea is that both index the same data, have the same schema.xml
Dang, had another server do this.
Syncing and committing a new index does not fix it. The two servers
show the same bad results.
wunder
On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote:
Restarting Solr fixes it. If I remember correctly, a sync and commit
does not fix it. I
Could you give us a dump of http://localhost:port/solr/admin/luke ?
A huge max field length and random terms in 2000 2 MB files is going to
be a bit of a resource hog :)
Can you explain why you are doing that? You will have *so* many unique
terms...
I can't remember if you can set it in
It just occurred to me that a query cache issue could potentially
cause this... if it's caching it would most likely be a query.equals()
implementation incorrectly returning true.
Perhaps check the JaroWinkler.equals() first?
Also, when one server starts to return bad results, have you tried
Hi all,
I'm trying to use solr1.3 and trying to index a text file. I wrote a
schema.xsd and a xml file.
*The content of my text file is *
#src dstprotook
sportdportpktsbytesflowsfirst
atest
Hi,
I have separate JDBC datasources (DS1 DS2) that I want to index with DIH
in a single SOLR instance. The unique record for the two sources are
different. Do I have to synthesize a uniqueKey that spans both the
datasources? Something like this? That is, the uniqueKey values will be like
(+
The JaroWinkler equals was broken, but I fixed that a month ago.
Query cache sounds possible, but those are cleared on a commit,
right?
I could run with a cache size of 0, since our middle tier HTTP
cache is leaving almost nothing for the caches to do.
I'll try that explain. The stored fields
now you should post (http post) your xml file (the schema must be in conf
folder) to the url in wich it's supossed you have deployed Solr. Don forget
to post a commit command after that or you won't see the results:
The commit command it's just an xml this way:
commit/commit
On Tue, Apr 14,
what about the text file?
On Tue, Apr 14, 2009 at 9:23 AM, Alejandro Gonzalez
alejandrogonzalezd...@gmail.com wrote:
now you should post (http post) your xml file (the schema must be in conf
folder) to the url in wich it's supossed you have deployed Solr. Don forget
to post a commit command
On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote:
*schema file is *
?xml version=1.0 encoding=UTF-8?
!--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)--
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema;
xs:element name=networkTraffic
and i'm not sure of understanding what are u trying to do, but maybe you
should define a text field and fill it with the text in each file for
indexing the text in them, or maybe a path to that file if that's what u
want.
On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar
I also wrote another schema file that is supplied by Solr, I do have some
questions.
*The content of my text file is *
#src dstprotook
sportdportpktsbytesflowsfirst
latest
192.168.220.135
It was actually our use of the field collapse patch. Once we disabled this
the random slow queries went away.
We also added *:* as a warmup query in order to speed up performance after
indexing.
sunnyfr wrote:
Hi Oleg
Did you find a way to pass over this issue ??
thanks a lot,
use TemplateTransformer to create a key
On Tue, Apr 14, 2009 at 9:49 PM, ashokc ash...@qualcomm.com wrote:
Hi,
I have separate JDBC datasources (DS1 DS2) that I want to index with DIH
in a single SOLR instance. The unique record for the two sources are
different. Do I have to synthesize a
I just want to be able to index my text file, and other files that carries
the same format but with different IP address, ports, ect.
I will have the traffic flow running in real-time. Do you think Solr will
be able to index a bunch of my text files in real time?
On Tue, Apr 14, 2009 at 9:35
I was wondering if those more up on SolrJ internals could take a look
if there were any serious gotchas with the AppEngine's Java urlfetch
with respect to SolrJ.
http://code.google.com/appengine/docs/java/urlfetch/overview.html
The URL must use the standard ports for HTTP (80) and HTTPS (443).
On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
wunderw...@netflix.com wrote:
The JaroWinkler equals was broken, but I fixed that a month ago.
Query cache sounds possible, but those are cleared on a commit,
right?
Yes, but if you use autowarming, those items are regenerated and if
there is
But why would it work for a few days, then go bad and stay bad?
It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.
We do use autowarming.
wunder
On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote:
On Tue, Apr 14,
Hi, got problem setting up solr + tomcat
Tomcat5.5 + apache solr 1.3.0 + centos 5.3
I don't familiar with java at all, so sorry if it's dumb question.
Here is what i did:
placed solr.war in webapps folder
changed solr home to /etc/solr
copied contents of solr distribution example folder to
I would say a language is supported if there is a Tokenizer available
for it. Everything else after that is generally seen as an improvement.
On Apr 9, 2009, at 5:26 AM, revas wrote:
Hi ,
With respect to language support in solr ,we have analyzers for some
languages and stemmers for
On Apr 9, 2009, at 7:09 AM, revas wrote:
Hi,
To reframe my earlier question
Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?
Some languages only have the stemmer from snowball but no analyzer?
Some have both.
Are there changes occuring when it goes bad that maybe aren't committed?
On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
But why would it work for a few days, then go bad and stay bad?
It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the
Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once per day. It went bad at a different time.
wunder
On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote:
Are there changes occuring when it goes bad that maybe aren't committed?
On Apr 14, 2009, at
SolrJ would require some modification. SolrJ internally uses Jakarta HTTP
Client via Solr's CommonsHttpSolrServer class. It would need to be ported to
a different implementation of SolrServer (the base class), one that uses
java.net.URL. I suggest JavaNetUrlHttpSolrServer.
~ David Smiley
Hi everybody,
I have a relatively large index (it will eventually contain ~4M
documents and be about 3G in size, I think) that indexes user data,
settings, and the like. The documents represent a community of users
whereupon a subset of them may be online at any time. Also, we want to
score
I see. So this is a show stopper for those wanting to use SolrJ with AppEngine.
Any chance this could be added as a Solr issue?
-glen
2009/4/14 Smiley, David W. dsmi...@mitre.org:
SolrJ would require some modification. SolrJ internally uses Jakarta HTTP
Client via Solr’s
On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote:
I see. So this is a show stopper for those wanting to use SolrJ with
AppEngine.
Any chance this could be added as a Solr issue?
Yes, commons-httpclient tries to use Socket directly. So it may not work.
It was
How could I get a count of distinct terms for a given query? For example:
The Wiki page
http://wiki.apache.org/solr/SimpleFacetParameters
has a section Facet Fields with No Zeros
which shows the query:
Background:
Set up a system for hierarchal categories using the following scheme:
level one#
level one#level two#
level one#level two#level three#
Trying to find the right combination of field type and query to get
the desired results. Saw some previous posts about hierarchal facets
which
: With this structure i think (correct me if i am wrong) i cant search for all
: attachBody_* and know where the match was (attachBody_1, _2, _3, etc).
correct
: I really don't know if this is the best approach so any help would be
: appreciated.
one option is to index each attachemnt as it's
: reference some large in-memory lookup tables. After the search components
: get done processing the orignal query, the query may contain SpanNearQueries
: and DisjunctionMaxQueries. I'd like to send that query to the shards, not
: the original query.
:
: I've come up with the following
: custom order that is fairly simple: there is a list of venues and some of
: them are more relevant than others (there is no logic, it's arbitrary, it's
: not an alphabetic order), it'd be something like this:
:
: Orange venue = 1
: Red venu = 2
: Blue venue = 3
:
: So results where venue is
Is bad memory a possibility? i.e. is it the same machine all the
time? Is there any recognizable pattern for when it happens?
-Grant (grasping at straws)
On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:
Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once
: I want it to match lor lorem and lorem i. However I am finding it
: matches the first two but not the third - the white space is causing
: problems. Here are the relevant parts of my config:
:
: fieldType name=text_substring class=solr.TextField
: positionIncrementGap=100
:
Hi everybody,
My index has latitude/longitude values for locations. I am required to
do a search based on a set of criteria, and order the results based on how
far the lat/long location is to the current user's location. Currently we
are emulating such a search by adding criteria of
I already ruled out cosmic rays. It has happened on different
hardware and at different times of day, including low load.
The only thing associated with it is load from a new faceted
browse thing we turned on.
wunder
On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote:
Is bad memory
Have you tried LocalSolr?
http://www.gissearch.com/localsolr
(I haven't but looks cool)
On 4/14/09 5:31 PM, Development Team dev.and...@gmail.com wrote:
Hi everybody,
My index has latitude/longitude values for locations. I am required to
do a search based on a set of criteria, and order
Ah, good question: Yes, we've tried it... and it was slower. To give some
avg times:
Regular non-distance Searches: 100ms
Our expanding-criteria solution: 600ms
LocalSolr: 800ms
(We also had problems with LocalSolr in that the results didn't seem to be
cached in Solr upon doing a search. So
Have you tried setting logging level to OFF from Solr's admin GUI:
http://wiki.apache.org/solr/SolrAdminGUI
Bill
On Tue, Apr 14, 2009 at 9:56 AM, Kraus, Ralf | pixelhouse GmbH
r...@pixelhouse.de wrote:
Hi,
is there a way to disable all logging output in SOLR ?
I mean the output text like :
: I see this interesting line in the wiki page LargeIndexes
: http://wiki.apache.org/solr/LargeIndexes (sorting section towards the
: bottom)
:
: Using _val:ord(field) as a search term will sort the results without
: incurring the memory cost.
:
: I'd like to know what this means, but I'm
: A related question. What does 'copyField' actually do? Does it 'append'
: content from the source field to the 'target' field? Or does it
: replace/overwrite it? Thank you.
:
:
: It appends the content of the source field to the target.
strictly speaking, it adds the content to the
Nasseam Elkarra wrote:
Background:
Set up a system for hierarchal categories using the following scheme:
level one#
level one#level two#
level one#level two#level three#
Trying to find the right combination of field type and query to get
the desired results. Saw some previous posts about
: As for the second part, I was thinking of trying to replace the standard
: SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very
: familiar with the workings of Solr, especially with respect to the caching
: that goes on. I thought that maybe people who are more familiar
: Solr cannot assume that the request would always come from http (think
: of EmbeddedSolrServer) .So it assumes that there are only parameters
exactly.
: Your best bet is to modify SolrDispatchFilter and readthe params and
: set them in the SolrRequest Object
SolrDispatchFilter is designed to
Hi,
Can someone provide a practical advice of how large a Solr search index can
be? for a better performance for consumer facing media website?.
Is it good or bad to think about Distributed Search and dividing index in
earlier stage of development?
Thanks
Ram
--
View this message in
OK, I guess details on the new faceting stuff would be in order.
Which faceting are using? Are you sure that it never occurred before
(i.e. it slipped under the radar)?
Obviously, the key is reproducibility here, but this has all the
earmarks of some weird threading issue, it seems, at
On Wed, Apr 15, 2009 at 12:39 AM, Development Team dev.and...@gmail.com wrote:
Hi everybody,
I have a relatively large index (it will eventually contain ~4M
documents and be about 3G in size, I think) that indexes user data,
settings, and the like. The documents represent a community of
I guess SOLR-599 can be easily fixed if we do not implement
Multipart-support (which is non-essential)
--Noble
On Wed, Apr 15, 2009 at 1:12 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote:
I see. So this is a
78 matches
Mail list logo