Shalin,
I understand that :-)
My problem is, if 1 solr instance process(save) 100 documents one-by-one, it
would not be very effective, I want to create 10 clones
(process/threads/cores) of the same solr instance, so that 10 documents get
processed(saved to solr) simaltaneously.
Geoff,
Perhaps you can find out the list of features/functionalities that your project
requires and we can give you quick yes/no.
Or perhaps you can get those others to list those Autonomy features that they
think they really need, and we can tell you how Solr compares.
Otis
--
Sematext --
A quick work-around is, I think, to tell Solr to use the non-binary response,
e.g. wt=xml (I think that's the syntax).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: syoung [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent:
Mohit,
I think you are thinking too hard - trying to optimize something that doesn't
sound like it needs optimizing at this point in your project. I suggest you
start with 1 Solr instance and then see if anything needs to be faster after
you've pushed that to its limits.
Otis
--
Sematext --
Hi,
I don't understand all the details, but I'll inline a few comments.
- Original Message
From: Geoff Hopson [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, September 18, 2008 1:44:33 AM
Subject: Field level security
Hi,
First post/question, so please be
On Thu, 18 Sep 2008 10:53:39 +0530
Sanjay Suri [EMAIL PROTECTED] wrote:
One of my field values has the name R__ikk__nen which contains a special
characters.
Strangely, as I see it anyway, it matches on the search query 'x' ?
Can someone explain or point me to the solution/documentation?
Hi Otis,
Thanks for the response. I'll try and inline some clarity...
2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]:
I am trying to put together a security model around fields in my
index. My requirement is that a user may not have permission to view
certain fields in the index when he does a
As per other thread
1) security down to field level
Otherwise I am mostly happy that Solr gives me everything that Autonomy does.
2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]:
Geoff,
Perhaps you can find out the list of features/functionalities that your
project requires and we can give
Hi Chris,
it was a long night for our solr server today because we rebuilt the complete
index using well formed date string. And the date field is stored now so that
we can see if there went something wrong :-)
But our problems are solved completely. Now I can give you a very exact
Ok Thanks it's very clear.
Just do you know why my cron job doesn't work :
# m h dom mon dow command
*/5 * * * * /usr/bin/wget
http://solr-test.books.com:8080/solr/books/dataimport?command=delta-import
When I go to check the date in conf/dataimport.properties, the date and hour
doesn't
I guess the post is not sending the correct 'wt' parameter. try
setting wt=javabin explicitly .
wt=xml may not work because the parser still is binary.
check this http://wiki.apache.org/solr/Solrj#xmlparser
On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
A quick
I don't think it can works at the index time, because I when somebody look
for a book I want to boost the search in relation with the user language
...so I dont think it can works, except if I didn't get it.
Thanks for your answer,
hossman wrote:
: Is there a way to convert to integer to
Thanks Akshay and Norberto,
I am still trying to make it work. I know the solution is what you pointed
me to but is just taking me some time to make it work.
thanks,
-Sanjay
On Thu, Sep 18, 2008 at 12:34 PM, Norberto Meijome [EMAIL PROTECTED]wrote:
On Thu, 18 Sep 2008 10:53:39 +0530
Sanjay
Hi,
I have a fairly simple solr setup with several predefined fields that are
indexed and stored and also depending on the type of product I also add various
dynamic fields of type string to a record, and I should mention that I am using
the
solr.DisMaxRequestHandler request handler called
Hi Yonik,
One approach I have been working on that I will integrate into SOLR is
the ability to use serialized objects for the analyzers so that the
schema can be defined on the client side if need be. The analyzer
classes will be dynamically loaded. Or there is no need for a schema
and plain
This should be done. Great idea.
On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:
My vote is for dynamically scanning a directory of configuration files. When
a new one appears, or an existing file is touched, load it. When a
configuration disappears, unload it. This
That would allow a single request to see a stable view of the
schema, while preventing having to make every aspect of the schema
thread-safe.
Yes that is the best approach.
Nothing will stop one from using java serialization for config
persistence,
Persistence should not be serialized.
Servlets is one thing. For SOLR the situation is different. There
are always small changes people want to make, a new stop word, a small
tweak to an analyzer. Rebooting the server for these should not be
necessary. Ideally this is handled via a centralized console and
deployed over the network
Dynamic changes are not what I'm against...I'm against dynamic changes
that are triggered by the app noticing that the config have changed.
Jason Rutherglen wrote:
Servlets is one thing. For SOLR the situation is different. There
are always small changes people want to make, a new stop word,
This XML file does not appear to have any style information
associated with it. The document tree is shown below.
−
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdata-config.xml/str
/lst
/lst
Yes, so it's probably best to make the changes through a remote
interface so that the app will be able to make the appropriate
internal changes. File based system changes are less than ideal,
agreed, however I suppose with an open source project such as SOLR the
kitchen sink affect happens and it
From the XML 1.0 spec.: Legal characters are tab, carriage return,
line feed, and the legal graphic characters of Unicode and ISO/IEC
10646. So, \005 is not a legal XML character. It appears the old StAX
implementation was more lenient than it should have been and Woodstox is
doing the
It was too long so I finally restart tomcat .. then 5mn later my cron job
started :
but it looks like nothing happening by cron job :
This is my OUTPUT file : tot.txt
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lstlst
It depends entirely on the needs of the project. For some things,
Solr is superior to Autonomy, for other things, not.
I used to work at Autonomy (and Verity and Inktomi and Infoseek),
and I chose Solr for Netflix. It is working great for us.
wunder
==
Walter Underwood
Former Ultraseek Architect
My project is looking to index 10s of millions of documents, providing
search across a live-live environment (hence index
distribution/replication is important). Most searches have to be done
(ie to end user) in 5 seconds or less. The index has about 30 fields,
and I reckon that the security
Otis Gospodnetic wrote:
Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?
Too slow?
Otis's questions regarding dropped inserts sent me back to the drawing
board. The system had been tuned to a slower database to
Hit /dataimport again from a browser and refresh periodically to see the
progress (number of documents indexed).
On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote:
It was too long so I finally restart tomcat .. then 5mn later my cron job
started :
but it looks like nothing
It is exactly what I've done but it can't works like that ...
- what would that mean ... cron job can't hit it properly ?
- I've browse to /dataimport but it was like nothing was running so I
finally went back to /dataimport?command=delta-import and then to
/dataimport and I refresh it
It is exactly what I've done but it can't works like that ...
- what would that mean ... cron job can't hit it properly ?
- I've browse to /dataimport but it was like nothing was running so I
finally went back to /dataimport?command=delta-import and then to
/dataimport and I refresh it
On Sep 18, 2008, at 3:23 AM, Geoff Hopson wrote:
As per other thread
1) security down to field level
how complex of a security model do you need?
Is each users field visibility totally distinct? are there a few
basic groups?
If you are willing to write (or hire someone to write) a
Well it shows the number of documents that have changed, you can't expect
1603970 documents to be indexed instantly.
On Thu, Sep 18, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote:
It is exactly what I've done but it can't works like that ...
- what would that mean ... cron job can't
I agree about that but the last time 4hours later the number wasn't different
:
and if I check now, nothing changed : does it have to go across all the data
like full import, I thought it would bring back just ids which need to be
modify ...?
lst name=statusMessages
str name=Time
On Thu, Sep 18, 2008 at 8:45 PM, sunnyfr [EMAIL PROTECTED] wrote:
I agree about that but the last time 4hours later the number wasn't
different
:
Do you mean that the number doesn't change at all on refreshing the page?
Can you check the solr log file for exceptions?
I suspect that you may
I would do the field visibility one layer up from the search engine.
That layer already knows about the user and can request the appropriate
fields. Or request them all (better HTTP caching) and only show the
appropriate ones.
As I understand your application, putting access control in Solr
this is my log file :
[EMAIL PROTECTED]:/home# tail -f /var/log/tomcat5.5/catalina.$(date
+%Y-%m-%d).log
Sep 18, 2008 5:25:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity books with URL:
jdbc:mysql://master-spare.vip.books.com/books
Sep 18,
I can't speak to a lot of this - but regarding the servers I'd go with
the more powerful ones, if only for the amount of ram. Your index will
likely be larger than 1 gig, and with only two you'll have a lot of
your index not stored in ram, which will slow down your QPS.
Thanks for your
Hi Geoff,
I cannot vouch for Autonomy however, earlier this year we did evaluate
Endeca Solr and we went with Solr some of the reasons were:
1. Freedom of open source with Solr
2. Very good active solr open source community
3. Features pretty much overlap with both solr Endeca
4. Endeca
Barry, does this return the correct hits:
http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=Output-Type-facet:Monochrome
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Barry Harding [EMAIL PROTECTED]
To:
Hi Christian,
While I can't tell you whether the problem with - will be solved when you try
it on 1.3, I can tell you that you should probably trim your dates so they are
not as fine as you currently have them, unless you need such precision. We
need to add this to the FAQ. :)
Otis
--
Hi,
If all you have to do is hide certain fields from search results for some
users, then your application -- the application that sends search requests to
Solr can just use different fl=XXX parameters based on user's permission. I
think that's all you need and the custom fieldType should
Geoff,
In short: all items that you listed are not a problem for Solr. Indices can be
sharded, distributed search is possible, custom ranking is possible, 30 fields
is possible, etc. etc.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From:
Hi
sorry I think I've started properly rsyncd :
[EMAIL PROTECTED]:/# ./data/solr/books/bin/rsyncd-enable
[EMAIL PROTECTED]:/# ./data/books/video/bin/rsyncd-start
but then I can't found this snapshot.current files ??
How can I check I did it properly ?
my rsyncd.log :
2008/09/18
I tried setting the 'wt' parameter to both 'xml' and 'javabin'. Neither
worked. However, setting the parser on the server to XMLResponseParser did
fix the problem. Thanks for the help.
Susan
Noble Paul നോബിള് नोब्ळ् wrote:
I guess the post is not sending the correct 'wt' parameter. try
Hi Otis,
no that does not seem to bring back the correct results either in fact its
still zero results.
Its also not bringing back results if I use the standard handler
http://127.0.0.1:8080/apache-solr-1.3.0/select?q=Output-Type-facet:Monochrome
but the field is visible in the documents
Daniel Papasian wrote:
Norberto Meijome wrote:
Thanks Yonik. ok, that matches what I've seen - if i know the actual
name of the field I'm after, I can use it in a query it, but i can't
use the dynamic_field_name_* (with wildcard) in the config.
Is adding support for this something that is
Barry,
You are seeing the value of the field as it was saved (as the original), but
perhaps something is funky with how it was analyzed/tokenized at search time
and how it is being analyzed now at query time. Double-check your
fieldType/analysis settings for this field and make sure you are
Here is what I was able to get working with your help.
(productId:(102685804)) AND liveDate:[* TO NOW] AND ((endDate:[NOW TO *]) OR
((*:* -endDate:[* TO *])))
the *:* is what I was missing.
Thanks for your help.
hossman wrote:
: If the query stars with a negative clause Lucene returns
Otis,
Would be reasonable to run a query like this
http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on
10 times, one for each result from an initial category query on a
different index.
So, it's still 1+10, but I'm not returning values.
This would give me the number
Matthew,
Thanks, a very good point.
Andrey.
-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 18, 2008 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Hardware config for SOLR
I can't speak to a lot of this - but regarding the
Hello. I am using the spellcheck component
(https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker
index is kept in RAM, it gets erased every time the Solr server gets
restarted. I was thinking of using either the firstSearcher or the
newSearcher to reload the index every time
On Fri, Sep 19, 2008 at 5:55 AM, oleg_gnatovskiy
[EMAIL PROTECTED] wrote:
Hello. I am using the spellcheck component
(https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker
index is kept in RAM, it gets erased every time the Solr server gets
restarted. I was thinking of
Gene,
I haven't looked at Field Collapsing for a while, but if you have a single
index and collapse hits on your category field, then won't first 10 hits be
items you are looking for - top 1 item for each category x 10 using a single
query.
Otis
--
Sematext -- http://sematext.com/ -- Lucene -
hi, all
when i post an xml file to solr, some errors happen as below:
==
com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
at
Thanks Otis for reply! Always appreciated!
That is indeed what we are looking for implementing. But, I'm running
out of time to prototype or experiment for this release.
I'm going to run the two index thing for now, unless I find something
saying is really easy and sensible to run one and
Hi guys.
Is the XML format for inputting data, is a standard one? or can I change it.
That is instead of :
adddoc
field name=id3007WFP/field
field name=nameDell Widescreen UltraSharp 3007WFP/field
field name=manuDell, Inc./field
/doc/add
can I enter something like,
custListclients
field
it is surprising as to why this happens
the the javabin offers significant perf improvements over the xml one.
probably you can also try this
requestHandler name=/search
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
str name=wtjavabin/str
/lst
/requestHandler
56 matches
Mail list logo